How can I prevent overfitting?

hope to find you well ! I am trying to build a model to classiffy customers with propensity to buy, but i cannot get rid of overfitting! My approach is the following: I have created the train dataset with unbalanced approach and have now a target 1 of 6% and a total of 6.755 rows and 252 columns. On the other hand, the test dataset has 313.587 rows and target 1 is only 34 of the cases (really low %). The test set was constructed to reflect the reality : the universe to score is all customers from one month, so for the test set I also chose all customers from one month and actually this is a product with low expression... I used autoML code from H2o in R and I am getting very different AUC between train (0.9) and test (0.6). The code i used is the one below:

aml - h2o.automl(x = predictors, y = response,
                  training_frame = train_h2o,
          validation_frame=test_h20,
              nfolds=0,
                  max_models = 10,
                  seed = 3)

Any advice on what I can do to prevent from overfitting?

Topic h2o overfitting classification r machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.