How can I use Ensemble learning of two models with different features as an input?
I have a fake news detection problem and it predicts the binary labels 10 by vectorizing the 'tweet' column, I use three different models for detection but I want to use the ensemble method to increase the accuracy but they use different vectorezer.
I have 3 KNN models the first and the second one vectorizes the 'tweet' column using TF-IDF.
from sklearn.feature_extraction.text import TfidfVectorizer
vector = TfidfVectorizer(max_features =5000, ngram_range=(1,3))
X_train = vector.fit_transform(X_train['tweet']).toarray()
X_test = vector.fit_transform(X_test['tweet']).toarray()
for the third model I used fastText for sentence vectorization
%%time
sent_vec = []
for index, row in X_train.iterrows():
sent_vec.append(avg_feature_vector(row['tweet']))
%%time
sent_vec1 = []
for index, row in X_test.iterrows():
sent_vec1.append(avg_feature_vector(row['tweet']))
after scaling and... my third model fits the input like this
scaler.fit(sent_vec)
scaled_X_train= scaler.transform(sent_vec)
scaled_X_test= scaler.transform(sent_vec1)
.
.
.
knn_model1.fit(scaled_X_train, y_train)
now I want to combine the three models like this and I want the ensemble method to give me the majority just like
VotingClassifier
, but I have no idea how can I deal with the different inputs (TF-IDF fastText) is there another way to do that?
Topic fasttext tfidf ensemble-modeling python machine-learning
Category Data Science