Scipy curve_fit and method "dogbox"
I am trying to duplicate this papers feature engineering for user activity. They take 14 days of accumulated user activity and keep the parameters (2 parameters) that fit a sigmoid to it. I would like to do the same except with 7 days of activity. http://hanj.cs.illinois.edu/pdf/kdd18_cyang.pdf
They use the formula below and keep the parameters x0 and k as features.
from scipy.optimize import curve_fit
import numpy as np
def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y
I used scipy curve_fit to find these parameters as follows
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata)), ydata, maxfev=20000)
When I had a user that had the values below, I had the following error:
ydata1 = [0,0,0,0,0,91,91]
RuntimeError: Optimal parameters not found: gtol=0.000000 is too small func(x) is orthogonal to the columns of the Jacobian to machine precision.
I noticed that if I add the method 'dogbox' I know longer get the error.
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata1)), ydata1, maxfev=20000, method='dogbox')
print(ppov[0], ppov[1])
5.189237217957538 11.509279446215949
However, I played around with other values and noticed that the resulting parameters can have very different values.
For example. If I have values for that are
ydata2=[0,3,5,30,34,50,91]
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata2)), ydata2, maxfev=20000)
print(ppov[0], ppov[1])
-24.681668846480264 118.77183210605865
However, if I add the method='dogbox' I get very different k and x0 parameter values.
ppov, pcov = curve_fit(sigmoid, np.arange(len(ydata2)), ydata2, maxfev=20000, method='dogbox')
print(ppov[0], ppov[1])
0.28468096463676695 8.154477352500013
Can anybody help me with 2 things:
I read the doc about 'dogbox' and don't really understand it. Can anybody explain it more simply?
The curve_fit scipy function is looping through about 100,000 users and I need to set the parameters of the curve_fit so it does not throw an error. Is using the 'dogbox' method okay for my purposes knowing that the parameter results seem very different between the 'dogbox' and default 'lm' method? Or, are there other arguments in the curve_fit function that I could set instead that will help me get past this error?
Topic scipy
Category Data Science