sklearn models Parameter tuning GridSearchCV
Dataframe:
id review name label
1 it is a great product for turning lights on. Ashley 1
2 plays music and have a good sound. Alex 1
3 I love it, lots of fun. Peter 0
The aim is to classify the text; if the review is about the functionality of the product (e.g. turn the light on, music), label=1
, otherwise label=0
.
I am running several sklearn models to see which one works bests:
# Naïve Bayes:
text_clf_nb = Pipeline([('tfidf', TfidfVectorizer()), ('clf', MultinomialNB())])
# Linear Support Vectors Classifier:
text_clf_lsvc = Pipeline([('tfidf', TfidfVectorizer()), ('clf', LinearSVC(loss='hinge',
penalty='l2', max_iter = 50))])
# SGDClassifier
text_clf_sgd = Pipeline([('tfidf', TfidfVectorizer()), ('clf', SGDClassifier(loss='hinge', penalty='l2',alpha=1e-3, random_state=42,max_iter=50, tol=None))])
#Random Forest
text_clf_rf = Pipeline([('tfidf', TfidfVectorizer()), ('clf', RandomForestClassifier())])
#neural network MLPClassifier
text_clf_mlp = Pipeline([('tfidf', TfidfVectorizer()), ('clf', MLPClassifier())])
Problem: How to tune models using GridSearchCV? What I have so far:
from sklearn.model_selection import GridSearchCV
parameters = {'vect__ngram_range': [(1, 1), (1, 2)],'tfidf__use_idf': (True, False),'clf__alpha': (1e-2, 1e-3) }
gs_clf = GridSearchCV(text_clf_nb, param_grid= parameters, cv=2, scoring='roc_auc', n_jobs=-1)
gs_clf = gs_clf.fit((X_train, y_train))
This gives the following error on running gs_clf = gs_clf.fit((X_train, y_train))
:
ValueError: Invalid parameter C for estimator Pipeline(memory=None,
steps=[('tfidf',
TfidfVectorizer(analyzer='word', binary=False,
decode_error='strict',
dtype=class 'numpy.float64',
encoding='utf-8', input='content',
lowercase=True, max_df=1.0, max_features=None,
min_df=1, ngram_range=(1, 1), norm='l2',
preprocessor=None, smooth_idf=True,
stop_words=None, strip_accents=None,
sublinear_tf=False,
token_pattern='(?u)\\b\\w\\w+\\b',
tokenizer=None, use_idf=True,
vocabulary=None)),
('clf',
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))],
verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.
I would appreciate any suggestions. Thanks.
Topic text-classification gridsearchcv scikit-learn
Category Data Science