Workflow for stock prediction in machine learning

I'm trying to find the best workflow for a stock prediction problem.

My idea goes as follows : I will use a classfication and a regression at the same time

  • Classification (-1 ; 0 ; 1)
  • Regression (float) = I will classify the output in the end just like the classification to make a decision (if the float number is really close to 0, it will be a zero for me)

The pipeline of the classification and regression will be the same:

  1. Data engineer with X (increase X with moving averages, Z score, lags etc of the current X)
  2. Remove outliers from X
  3. Scale X
  4. Split (train, validate, predict) = ex(if length of all data is 120 : train = 118, validate = 1 , predict = 1)
  5. Calculate important features to reduce dataset = Question 1 : Feature importance should be done on train or validate ? I think train
  6. Remove colinearity from the selected features (using VIF)
  7. Tunning parameters = Question 2 : Apply on train or validate ? Train in my opinion
  8. Question 3 : Where I can use the validation in this case if the length is 1 ?

I think I'm missing something. Of course I will run a backtest on this framework, but I want be sure that in term of biais I'm not violating something or something else.

If you got any ressources or something else, please share

Topic data-science-model finance time-series python machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.