Imbalanced Classification: BOW vs doc2Vec in XGBoost with sample weights

Question

Imbalanced Classification: BOW vs doc2Vec in XGBoost with sample weights

Peter

2021年11月2日 08:08

I am new to machine learning. I have an imbalanced dataset of pages of reports with

class 1: 97%,

class 2: 2.2%

class 3: 0.25%

which are the different type of pages

I am mostly concerned with correctly predicting class 2 3. I tried

doc2Vec with XGBoost (with sample weight to correct the imbalanced classes)
BOW with XGBoost (with sample weight to correct the imbalanced classes)

Oddly, 2 outperformed 1. I thought doc2Vec should be better as it creates features embeddings for the relation between document/pages. So why is Doc2Vec faring worse than BOW? Thank you

model_dbow = Doc2Vec(dm=0, min_count=2, workers=cores, seed=0)
model_dbow.build_vocab(train_tagged.values)
model_dbow.train(train_tagged.values, total_examples=len(train_tagged.values), epochs=40)

model_dmm = Doc2Vec(dm=1, dm_mean=1, min_count=1, workers=cores, seed=0)
model_dmm.build_vocab(train_tagged.values)
model_dmm.train(train_tagged.values, total_examples=len(train_tagged.values), epochs=40)

new_model = ConcatenatedDoc2Vec([model_dbow, model_dmm])
```

Topic doc2vec xgboost class-imbalance

Category Data Science

Imbalanced Classification: BOW vs doc2Vec in XGBoost with sample weights

About