Machine Learning to Crack the Collatz Code # # Predicting the Unpredictable: Machine Learning Approaches for the Collatz Conjecture

albertocillan · edit-2 8 months ago

Machine Learning to Crack the Collatz Code # # Predicting the Unpredictable: Machine Learning Approaches for the Collatz Conjecture

Diggix@lemmy.dbzer0.com · 8 months ago

My ears…!

albertocillan · 8 months ago

sorry, because the letters appear so big, I don’t know how to fix it.
Alternative: https://www.reddit.com/r/Collatz/comments/1aohrci/only_program_rstudio_machine_learning_to_crack/

Johanno@feddit.de · 8 months ago

Why is everything caps?

noodlejetski · edit-2 8 months ago

probably because they started every line with the # symbol.

and also because it’s an important discovery, I guess

albertocillan · 8 months ago

sorry, because the letters appear so big, I don’t know how to fix it.
Alternative: https://www.reddit.com/r/Collatz/comments/1aohrci/only_program_rstudio_machine_learning_to_crack/

CrayonRosary@lemmy.world · edit-2 8 months ago

It’s because you copied and pasted all of the # characters from the original post. ~~I don’t know why they were there in the first place, but you need to remove them all.~~ Oh, this entire post is source code! smh

~~Copy your post text into a text editor and and Find/Replace all the number signs the paste the result pack into your post.~~ Better yet, just delete this post.

How headings work in Markdown:

Starting a line of text with a number sign indicates a heading, with a single number sign being the largest heading.

One number sign

Two number signs

Three number signs

Four number signs

Five number signs

Six number signs

####### Seven number signs is too many and you just get a sixth level heading with a visible number sign.

Source for the above:

# One number sign

## Two number signs

### Three number signs

#### Four number signs

##### Five number signs

###### Six number signs

####### Seven number signs is too many and you just get a sixth level heading with a visible number sign.

albertocillan · 8 months ago

https://www.reddit.com/r/Collatz/comments/1aohrci/only_program_rstudio_machine_learning_to_crack/

albertocillan · 8 months ago

sorry, because the letters appear so big, I don’t know how to fix it.
Alternative: https://www.reddit.com/r/Collatz/comments/1aohrci/only_program_rstudio_machine_learning_to_crack/

Johanno@feddit.de · 8 months ago

Just place three ` around the whole thing.

Like this 
\```
\```

albertocillan · 8 months ago

How can I replace the article I have uploaded? thanks

Johanno@feddit.de · 8 months ago

You can use the edit button which should be. Next to the share button and so on

albertocillan · 8 months ago

https://www.reddit.com/r/Collatz/comments/1aohrci/only_program_rstudio_machine_learning_to_crack/

nyakojiru@lemmy.dbzer0.com · 8 months ago

Dude I spent 5 mins scrolling to be able to write this: what happens after the crack? Will something like a portal to other dimension will be open? Thanks

rhebucks-zh@incremental.social · 8 months ago

I bet that in 2200 mathematicians will still be trying to crack this.

CrayonRosary@lemmy.world · 8 months ago

deleted by creator

albertocillan · 8 months ago

You can find the programme with # for comments at the following web link https://www.reddit.com/r/Collatz/comments/1aohrci/only_program_rstudio_machine_learning_to_crack/

CrayonRosary@lemmy.world · edit-2 8 months ago

Oh, this whole thing is source code? Great. Why would you think everyone here would immediately want all the source code? This is a very specialized thing from a very specialized subreddit, and not appropriate for Technology. If you don’t know how to post properly formatted source code on Lemmy, you shouldn’t be posting it at all.

Scrap this entire post and replace it with just a summary of the findings and a link to the original reddit post for people who want the source code.

While you’re at it, learn what Markdown is and how it affects the appearance of text on both Lemmy and Reddit so you don’t make this giant font mistake again. If you had used a plugin like RES to view the source of the original reddit post, you could have copied that and it would look identical over here. But like I said, no one wants all the source code to this very narrow mathematical problem, and if they do, they can go to the original post.

albertocillan · 8 months ago

I’m sorry. I going to understand Markdown.

deadbeef@lemmy.world · edit-2 8 months ago

```R

<code>

```

Is what you’re looking for</code>

Machine Learning to Crack the Collatz Code # # Predicting the Unpredictable: Machine Learning Approaches for the Collatz Conjecture

Machine Learning to Crack the Collatz Code # # Predicting the Unpredictable: Machine Learning Approaches for the Collatz Conjecture

Machine Learning to Crack the Collatz Code

Predicting the Unpredictable: Machine Learning Approaches for the Collatz Conjecture

Introduction

The Collatz conjecture, also known as the 3n+1 problem, is an unsolved problem in mathematics concerning the dynamics of certain number sequences.

The conjecture states that given any positive integer, if you repeatedly apply the following operation:

If the number is even, divide it by 2

If the number is odd, multiply it by 3 and add 1

The sequence will always reach 1.

While easy to state, the Collatz conjecture has eluded efforts to prove it for over 80 years. Directly applying the iterative Collatz algorithm on large numbers requires many computational steps.

This work presents an alternative machine learning approach to predict the number of steps needed to reach 1 for a given Collatz sequence, without executing the full algorithm.

By training statistical and machine learning models on metrics from large samples of randomly generated Collatz sequences, the steps can be predicted. This avoids the need to iterate through large sequences just to determine the length.

The following chapters outline the generation of a Collatz dataset, feature engineering, model training, final predictions, and conclusions of this machine learning approach to predict Collatz steps.

In summary, this work demonstrates a way to estimate Collatz sequence lengths without direct computation, providing an innovative alternative to traditionally applying the iterative algorithm.

o create a dataset for training machine learning models, this program first loads several key R packages:

tidyverse - for data manipulation and wrangling

stats - for statistical modeling functions

randomForest - for random forest models

gbm - for gradient boosting models

ranger - an additional random forest package

relaimpo - for variable importance estimation

It initializes an empty tibble called datos to accumulate the generated data.

For reproducible results, the random number generator seed is set. Then a large random odd number llamado numerox is created to seed the Collatz sequences.

A sample size of z=1000 is defined for the number of Collatz sequences to generate. This provides a robust dataset for modeling.

A for loop iterates z times, each time generating a new random large odd number called number_ini based on numerox. This number_ini serves as the input for a Collatz sequence.

Within the loop, several variables are initialized to store metrics on each sequence like:

pares - number of even steps

total - total steps

impares - number of odd steps

A collatz function is defined to implement the iterative Collatz algorithm, taking in number n:

If n is even, divide by 2

If n is odd, multiply by 3 and add 1

This function is called with number_ini to generate the full sequence.

The metrics from each sequence are stored in a dataframe datos1. And datos1 is appended to the main datos dataframe after each iteration.

In this way, a large dataset of 1,000 randomly sampled Collatz sequences is assembled, ready for further feature engineering and modeling.

Here is a draft of Chapter 3 on feature engineering:

With the raw dataset of Collatz sequence metrics assembled, additional features can be engineered to better capture patterns useful for modeling.

The initial metrics like number of steps, evens, and odds provide a starting point. But mathematical transformations of these can reveal deeper relationships.

Some engineered features include:

- Log transforms of the initial number and counts of evens/odds

- Ratios between evens, odds, and steps

- Products and differences of log-transformed values

- Polynomials and exponents of key terms

Incorporating domain knowledge about Collatz sequence properties allows creating meaningful derived variables. The natural logs and ratios between steps, evens and odds are particularly useful.

Another technique used is generating interaction features between key terms. This lets models account for combinations of variables in making predictions.

The engineered features are added to the main datos dataframe, augmenting the initial sequence metrics. This expands the dataset providing a richer input representation for the machine learning models.

With domain expertise guiding the creation of mathematical feature transformations, the model inputs are optimized to capture Collatz sequence characteristics.

The augmented dataset now has over 50 engineered features for each sequence, ready for training predictive models. Feature selection will further refine the set used in modeling.

Here is a draft of Chapter 4 on model training:

With the engineered dataset of Collatz sequences, various machine learning models can be trained to predict the number of steps.

The data is split into training and test sets for proper model evaluation. The training data is used to fit models, and test data is held back for independent assessment.

Several types of models are trained:

- Linear regression - A simple linear model predicting steps based on sequence features

- Random forest - An ensemble model averaging many decision trees fit on subsamples of data

- Gradient boosting machine - An ensemble approach that combines many weak tree models

Key hyperparameters are tuned for optimal performance including number of trees, tree depth, and learning rate.

Model performance is evaluated on the test set using metrics like R-squared and Root Mean Squared Error (RMSE). Test set metrics give an unbiased estimate of how well the models generalize.

Among the models, gradient boosting machine (GBM) achieved the lowest RMSE. The ensemble approach of GBM reduced variance and improved predictions.

Feature importance analysis on the GBM model revealed insights into the main drivers of Collatz sequence length. As expected, counts of evens and odds were important, along with various interaction terms.

The tuned GBM model demonstrated excellent predictive performance on new data. This model will be used in the final chapter to generate predictions and estimate Collatz sequence lengths.

By leveraging machine learning techniques on a robust training dataset, an accurate model was developed to predict Collatz steps without executing the full algorithm.

Here is a draft of Chapter 4 on model training:

With the engineered dataset of Collatz sequences, various machine learning models can be trained to predict the number of steps.

The data is split into training and test sets for proper model evaluation. The training data is used to fit models, and test data is held back for independent assessment.

Several types of models are trained:

- Linear regression - A simple linear model predicting steps based on sequence features

- Random forest - An ensemble model averaging many decision trees fit on subsamples of data

- Gradient boosting machine - An ensemble approach that combines many weak tree models

Key hyperparameters are tuned for optimal performance including number of trees, tree depth, and learning rate.

Model performance is evaluated on the test set using metrics like R-squared and Root Mean Squared Error (RMSE). Test set metrics give an unbiased estimate of how well the models generalize.

Among the models, gradient boosting machine (GBM) achieved the lowest RMSE. The ensemble approach of GBM reduced variance and improved predictions.

Feature importance analysis on the GBM model revealed insights into the main drivers of Collatz sequence length. As expected, counts of evens and odds were important, along with various interaction terms.

The tuned GBM model demonstrated excellent predictive performance on new data. This model will be used in the final chapter to generate predictions and estimate Collatz sequence lengths.

By leveraging machine learning techniques on a robust training dataset, an accurate model was developed to predict Collatz steps without executing the full algorithm.

/* The program loads several R packages including tidyverse for data manipulation, stats for statistical modeling, and multiple packages for machine learning like randomForest, gbm, ranger, and relaimpo.

It initializes an empty tibble dataframe called datos to store the generated data.

It sets some options like numeric precision.

It generates a random seed number llamado numerox that is large, odd, and random. This will be used to seed the Collatz sequences.

It defines some key parameters like z=1000 which is the number of Collatz sequences that will be generated.