How do I choose the right parameters for just plain old simple standarddeviation?
I am evaluating different models that do binary classifications and basically generate trade signals. They make a prediction of either buy or sell for the next day. I look at 10 different underlying assets and have 3 different variations of data that I train the models with. I evaluated 12 different types of models. That leaves me with 10 x 3 x 12 = 360 different models/predictions.
I backtested those trade signals they generate: Most of them do not really beat buyhold strategies. But some of them really perform exceptionally good. My fear is that I basically overfitted by using to many variations (I mean 10 different asset classes, 3 variations of data and then 12 different models). I feel like the models that perform exceptionally well might as well just be good because of randomness and because I tried many many different variations.
So I thought: How many models could I expect to perform extremely good if they were all just random? I thought I could use plain old simple standard deviation to figure out the probability of how many occurences of models could be expected to yield more than 1% daily on average or so. But now how do I choose the correct arithmetic average or standarddeviation? [EDIT: if I just take the arithmetic average of the 360 trade backtests, it is 0,0019676 and std 0,0118, so this cant be right? There would be so many good models then... but what would I choose instead?]
Or maybe do you have some other idea how to confirm that a few of these models are really good because they are good and not because of coincidence and overfitting by trying too many variations?
Topic binary-classification overfitting statistics
Category Data Science