Imbalanced Classification: BOW vs doc2Vec in XGBoost with sample weights

I am new to machine learning. I have an imbalanced dataset of pages of reports with

class 1: 97%,

class 2: 2.2%

class 3: 0.25%

which are the different type of pages

I am mostly concerned with correctly predicting class 2 3. I tried

  1. doc2Vec with XGBoost (with sample weight to correct the imbalanced classes)
  2. BOW with XGBoost (with sample weight to correct the imbalanced classes)

Oddly, 2 outperformed 1. I thought doc2Vec should be better as it creates features embeddings for the relation between document/pages. So why is Doc2Vec faring worse than BOW? Thank you

model_dbow = Doc2Vec(dm=0, min_count=2, workers=cores, seed=0)
model_dbow.build_vocab(train_tagged.values)
model_dbow.train(train_tagged.values, total_examples=len(train_tagged.values), epochs=40)

model_dmm = Doc2Vec(dm=1, dm_mean=1, min_count=1, workers=cores, seed=0)
model_dmm.build_vocab(train_tagged.values)
model_dmm.train(train_tagged.values, total_examples=len(train_tagged.values), epochs=40)

new_model = ConcatenatedDoc2Vec([model_dbow, model_dmm])
``` 

Topic doc2vec xgboost class-imbalance

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.