How would you approach a scenario where you have to quantify an abstract notion like “customer experience” without having any labeled data? So basically what you have are bunch of variables that you know, more or less, how they affect the experience, but you don't know the "weights" of importance of each one. So e.g. if it is an experience from a food delivery service, then I have the ETAs for the order, ratings (not very reliable by the day), …
I'm not an expert in the AI topic but for my underlying problem I need to find a function which rates data samples based on a specific value x. This means that based on the output of the function it should be determined, whether the data example is a good one or not. The score(=y of function) should be between 0 and 1. The rules I need to follow for the rating are the following: x should never be below …
I am working on very imbalanced dataset, I used SMOTEENN (SMOTE+ENN) to rebalance it, the following test is made using Random Forest Classifier : My train and Test score before using SMOTEENN: print('Train Score: ', rf_clf.score(x_train, y_train)) print('Test Score: ', rf_clf.score(x_test, y_test)) Train Score: 0.92 Test Score: 0.91 After using SMOTEEN : print('Train Score: ', rf_clf.score(x_train, y_train)) print('Test Score: ', rf_clf.score(x_test, y_test)) Train Score: 0.49 Test Score: 0.85 Edit x_train,x_test,y_train,y_test=train_test_split(feats,targ,test_size=0.3,random_state=47) scaler = MinMaxScaler() scaler_x_train = scaler.fit_transform(x_train) scaler_x_test = scaler.transform(x_test) X …
Based on my model, if I decline someone due to their score, it should be able to provide some reasoning as to which variables mainly contributed to the decision to decline. Typically in Logistic Regression models, this is a simple exercise where you calculate (Beta * X) for each variable and pick 1 or 2 variables which caused the biggest score drop. However, this isn't very straightforward for non-linear models. I would appreciate any ideas on handling something like this. …
There are 6 class labels encoded as 0,1,2,3,4,5 While executing classification report score it outputs accuracy,macro avg,weighted avg .The micro average score is missing in the output . Im not sure why micro average score is not getting printed . What should i do to get micro average score as well ? print(classification_report(y_test, best_preds,labels=[0,1,2,3,4,5])) precision recall f1-score support 0 0.65 0.76 0.70 46 1 0.74 0.56 0.64 41 2 0.60 0.68 0.64 41 3 0.65 0.59 0.62 41 4 0.75 …
I am using Repeated K-folds (RepeatedKFold(n_splits=10, n_repeats=10, random_state=999) from sklearn) to provide reliable scores for a linear regression on my dataset. The dataset has some outliers which should stay and also similar cases can be seen in future observations. When a trained data in a fold tries to predict such observations, I get negative scores (at least, this is my interpretation) Question: the main question is what should I do with one (or a few) bad score(s) out of many? …
I have a lasso regression model with the following definition : import sklearn from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import PolynomialFeatures from sklearn.preprocessing import scale from sklearn.feature_selection import RFE from sklearn.linear_model import LinearRegression, Lasso from sklearn.svm import SVR from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.model_selection import GridSearchCV from sklearn.pipeline import make_pipeline from sklearn.metrics import r2_score folds = KFold(n_splits = 5, shuffle = True, random_state = 100) # specify range of hyperparameters hyper_params = …
Can someone explain what each of these mean? both in simple terms and in terms of TP, TN, FP, FN? Also are there any other common metrics that I am missing? F-measure or F-score Recall Precision Accuracy
I tested my CatBoostModel model on part of data and get 0.92 score, but Kaggle public score was 0.9. I found new hyperparameters via randomsearch, new model score was 0.925, but on Kaggle score fell to 0.88. What should I do to validate the model correctly?
I have around 15 classification models for different products built in different ways (some are RF, some are Gradient Boosting, some were downsampled in one way, others in other way, some are built in 12 months historic, some are built in 24 months historic) and I have to compare the scores to choose what product to offer. All models have target 1 for "customer bought the product" and 0 for "customer didn't buy the product". I have read about this …
I want to create a function, which returns a value between (0,1) or (-1,1). The result of this function is then used for a boolean decision. E.g. if the value is closer to 0 decision D1 is made, if it is closer to 1 decision D2 (a threshold needs to be defined). The function is based on 4 indicators that are normalised (all values represent a percentage). My question is now, how can I combine the 4 indicators to create …
I know there is f1_score metric to get all types of F1 scores (micro, macro, and weighted); however, I would like to be able to print micro averaged F1 score using classification_report of scikit-learn. By default, it seems to be returning weighted micro averaged F1. But I would like the micro averaged F1 in the classification_report. How do I do that? Also, I know the difference in the formula between weighted and microaverages, but what are the instances where one …
I selected features using ANOVA (because I have Numerical data as input and Categorical data as target): anova = SelectKBest(score_func=f_classif, k='all') anova.fit(X_train, y_train.values.argmax(1)) # y_train.values.argmax(1) because I already one-hot-encoded the target. When I plot the score, it show me the figure in image : plt.xlabel("Number of features selected") plt.ylabel("Score (nb of correct classifications)") plt.plot(range(len(anova.scores_)), anova1.scores_) plt.show() What does the interpretation of this figure ? why there is some interruption in the plot ?
I have a very simple regression model and I am doing the cross validation. When cv=10 the highest score i got is 60.3 and lowest is -9.7 which is useless. Average will be 30. No of row data set= 658
In the company I work for there are 2 different evaluation metrics for a song: Yes / No (Equivalent to like/dislike) 1-5 Scale Customers can use both to rank songs they like. I would like to create a model that predicts the next possible songs you would like. Currently, I'm ignoring the Binary data. I wonder if there's a good way of utilizing the Binary data as tagged data [And not as a feature]. I've thought about two possible solutions: …
What constitutes as a "good enough" score for a Decision Tree Regressor? The .score() function gives us a general score about our model. This can be 1 if the model predicts all data with a 100% accuracy, and can be arbitrarily worse. If I understand correctly, a score of 0 means the prediction is quasi constant. But starting with what value can we say that the prediction is "usable" (i know this is ambiguous, but still). Is a score of …
I try to compare different interpolation models quality and I'm looking for a graphical tool to do that. Application case: I'm not familiar with intepolation using neural networks. I decide to test it on a dataset having 5 inputs for 1 output: https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise# And to prospect it quickly, I used Orange canvas that integrate the sklearn Multi layer perceptron. I'm surprised to see that "bigger the network is, better is the result". And I would like to investigate it. Basic …
I would like to know how to interpret classification scores (i am not sure about the word score or probability, please correct me). For example, for a binary classification positive values are labeled as 1, and -1 for negative ones. Now, is it fair to say that for a score 10 the instance is more likely to be successfully predicted than a score 5, despite the result that can be wrong. Thanks.
I am always having cross validation score smaller then the training score and I am performing cross validation on just training data is that normal thing ? Kfold = 5