Workflow for stock prediction in machine learning
I'm trying to find the best workflow for a stock prediction problem.
My idea goes as follows : I will use a classfication and a regression at the same time
- Classification (-1 ; 0 ; 1)
- Regression (float) = I will classify the output in the end just like the classification to make a decision (if the float number is really close to 0, it will be a zero for me)
The pipeline of the classification and regression will be the same:
- Data engineer with X (increase X with moving averages, Z score, lags etc of the current X)
- Remove outliers from X
- Scale X
- Split (train, validate, predict) = ex(if length of all data is 120 : train = 118, validate = 1 , predict = 1)
- Calculate important features to reduce dataset = Question 1 : Feature importance should be done on train or validate ? I think train
- Remove colinearity from the selected features (using VIF)
- Tunning parameters = Question 2 : Apply on train or validate ? Train in my opinion
- Question 3 : Where I can use the validation in this case if the length is 1 ?
I think I'm missing something. Of course I will run a backtest on this framework, but I want be sure that in term of biais I'm not violating something or something else.
If you got any ressources or something else, please share
Topic data-science-model finance time-series python machine-learning
Category Data Science