Steps to fit a Machine learning model for prediction of up and down market movement

I have around 5 years of data of an index containing many features on a daily basis. I want to classify whether the index will move up or down the next trading day (up or down movement is determined by next day open/close price). I am using an SVM classifier for this classification. What could be some essential steps which need to be followed? I suppose since I am using financial data, there would be some deviation from the traditional method of applying machine learning. I have the following steps in mind:

  1. Prepare data containing all features at-hand and direction variable.

  2. Feature engineering: That is creating various features using transformations like log, % change on various available features. I have not incorporated this step but will do it later when I have a decent running model.

  3. Feature Selection: How to go about this? I first am using various features which I think have predictive power but how to go about it systematically.

  4. Walk forward Modelling: After selecting features I have the data which I will be using for my model. I am using the first 500 days of data to train the model and testing it on the next day and then using data from day 2 to day 501 to train the model and then testing it for day 502, and so on until I reach the present day. I make sure that I scale my training data in the range 0-1 and using the same scaler for test data before running the model. I use the default parameters of SVM in sklearn right now.

  5. I then check performance of my model by using various metrics obtained from confustion matrix like accurace, F1 score, etc.

For applying SVM I am using this reference which tells to use cross-validation to choose C and gamma. How can we use CV here if we have financial time series data? What else should I include in my steps?

Topic finance classification svm machine-learning

Category Data Science


Why do you want to use SVM+feature engineering in 2021? If you have a time series dataset, use LSTM/GRU/biLSTM for binary classification (last layer is fully connected). Loss function is binary cross-entropy, accuracy metrics overall accuracy or F1 or precision-recall.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.