Different results in same logistic regression model from sklearn and same dataset
I got this strange behavior when deploying my logistic regression trained in scikit-learn into production. I trained the model on my own machine and stored it in form of .pickle
. I use the same set of data for both locally and on server side (with docker) generating four columns for each sample in this binary classification problem: probability_of_class_0
, probability_of_class_1
, y_true, y_predict
; where y_true
and y_hat
refer to the true label and the predicted label respectively for that sample row/record. And the probability_of_class_0
, probability_of_class_1
means the predicted probability of a sample belonging to class 0 or 1 respectively.
Let _o
denote the results I get locally and _s
be those from the exact same model on the server. I checked that despite the fact that y_hat_o
and y_hat_s
are exactly the same, the probability_of_class_0_s
, probability_of_class_1_s
are different than probability_of_class_0_o
, probability_of_class_1_o
.
I was not aware there are any stochastic elements in my model building (i.e. in the logistic regression model itself, according to the documentation). But I am not sure if that is because there are just two different versions of scikit-learn on 2 machine. But what worries me more is that regardless of the versions, the results should always be the same. Or more dreading is that, there are other problems that I am not aware that contribute to this problem?
Topic logistic-regression
Category Data Science