Predict next integer in sequence using ML.NET

Given a lengthy sequence of integers in the range of 0-1 I would like to be able to predict the next likely integer based on the previous sequence.

Example dataset: 1 1 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0

A quick look at the above perhaps shows some obvious patterns which may be recognised by an ML model. After a lot of research it seems I'm after a LSTM solution but can't find any tutorials or examples that deal with a patterned sequence such as the above.

I do have other features available in the dataset but I don't think they correlate to the integer result so the prediction should be based purely on the statistical relevance of the supplied integer dataset.

I'm using ML.NET but would also welcome any general pointers on how to approach this using any other framework. Currently researching Keras in python.

Topic automl lstm keras regression

Category Data Science


LSTM and AutoML might be unnecessarily complex to find patterns of integers.

More established options are:


Understanding sequence prediction

There are 3 types of Sequence Prediction problems namely: predict class label, predict a sequence or predict a next value

In your case you are looking forward to predict the next value.

Tutorials

Few tutorials I have found related to sequence prediction with code example.

  1. Simple Sequence Prediction With LSTM helps to predict the next value
  2. A guide to sequence prediction using compact prediction tree python, CPT model is a Lossless Model ensures accurate sequence prediction
  3. 5 Examples of Simple Sequence Prediction Problems for LSTMs helps to achieve sequence prediction using LSTM recurrent neural networks

Sequence prediction models

There are many different ways to perform sequence prediction such as using Markov models, Directed Graphs etc. from the Machine Learning domain and RNNs/LSTMs from the Deep Learning domain. However, the RNNs and LSTM have few drawbacks

  • Longer training time
  • Re-training required for sequences not seen in the previous training iteration. This is a very costly process and is not feasible for scenarios where new items are encountered frequently

Most popular sequence prediction models from research communities are mentioned below

  • Prediction by Partial Matching (PPM) It is based on the Markov property and has inspired a multitude of other models such as

  • Dependancy Graph (DG)

  • All-K-Order-Markov (AKOM)

  • Transition Directed Acyclic Graph (TDAG)

  • Probabilistic Suffix Tree (PST)

  • Context Tree Weighting (CTW)

  • Compact Prediction Tree (CPT) is a much recent proposed prediction model which compress training sequences without information loss by exploiting similarities between subsequences. It has been reported as more accurate than state-of-the-art models PPM, DG, AKOM on various real datasets.

Conclusion

From the research communities I found, that Compact Prediction Tree be the choice for researchers considering how well it performs compared to other models. The tutorial is mentioned in #2 for implementation in python, however its original implementation is in Java.

References

If you are looking for a book this seems to cover most use cases, LSTM networks with python: Develop sequence prediction models with Deep Learning

Hope my post helps to solve your problem


One solution is to think of the 0s and 1s as characters (or words if you like), thus the problem becomes predicting the next letter given previous text. This is exactly what this Keras example is about. You can experiment with some parameters settings (e.g. # of dimensions of the LSTM) to find the sweet spot.

I am not familiar with ML.NET nor how it interface with Keras, but the idea is here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.