How can I prevent overfitting?
hope to find you well ! I am trying to build a model to classiffy customers with propensity to buy, but i cannot get rid of overfitting! My approach is the following: I have created the train dataset with unbalanced approach and have now a target 1 of 6% and a total of 6.755 rows and 252 columns. On the other hand, the test dataset has 313.587 rows and target 1 is only 34 of the cases (really low %). The test set was constructed to reflect the reality : the universe to score is all customers from one month, so for the test set I also chose all customers from one month and actually this is a product with low expression... I used autoML code from H2o in R and I am getting very different AUC between train (0.9) and test (0.6). The code i used is the one below:
aml - h2o.automl(x = predictors, y = response,
training_frame = train_h2o,
validation_frame=test_h20,
nfolds=0,
max_models = 10,
seed = 3)
Any advice on what I can do to prevent from overfitting?
Topic h2o overfitting classification r machine-learning
Category Data Science