Different results in same logistic regression model from sklearn and same dataset

Question

Different results in same logistic regression model from sklearn and same dataset

Hing

2022年5月20日 16:31

I got this strange behavior when deploying my logistic regression trained in scikit-learn into production. I trained the model on my own machine and stored it in form of .pickle. I use the same set of data for both locally and on server side (with docker) generating four columns for each sample in this binary classification problem: probability_of_class_0, probability_of_class_1, y_true, y_predict; where y_true and y_hat refer to the true label and the predicted label respectively for that sample row/record. And the probability_of_class_0, probability_of_class_1 means the predicted probability of a sample belonging to class 0 or 1 respectively.

Let _o denote the results I get locally and _s be those from the exact same model on the server. I checked that despite the fact that y_hat_o and y_hat_s are exactly the same, the probability_of_class_0_s, probability_of_class_1_s are different than probability_of_class_0_o, probability_of_class_1_o.

I was not aware there are any stochastic elements in my model building (i.e. in the logistic regression model itself, according to the documentation). But I am not sure if that is because there are just two different versions of scikit-learn on 2 machine. But what worries me more is that regardless of the versions, the results should always be the same. Or more dreading is that, there are other problems that I am not aware that contribute to this problem?

Topic logistic-regression

Category Data Science

Different results in same logistic regression model from sklearn and same dataset

About