MinMaxScaler returned values greater than one

Basically I was looking for a normalization function part of sklearn, which is useful later for logistic regression.

Since I have negative values, I chose MinMaxScaler with: feature_range=(0, 1) as a parameter.

x = MinMaxScaler(feature_range=(0, 1)).fit_transform(x)

Then using sm.Logit trainer I got and error,

import statsmodels.api as sm
logit_model=sm.Logit(train_data_numeric_final,target)
result=logit_model.fit()
print(result.summary())

ValueError: endog must be in the unit interval.

I presume my values are out of (0,1) range, which is the case:

np.unique(np.less_equal(train_data_numeric_final.values, 1))

array([False,  True])

How come? then how can I proceed.

Topic numerical normalization feature-scaling logistic-regression python

Category Data Science


I am not sure why your MinMaxScaler didn't work, but here is a function that should scale your data into the desired range:

def rescale(data, new_min=0, new_max=1):
    """Rescale the data to be within the range [new_min, new_max]"""
    return (data - data.min()) / (data.max() - data.min()) * (new_max - new_min) + new_min

Looking at the documentation of the MinMaxScaler, it seems my function above it the same as their method.

You could break your code down a little to explicitly comppute each step on its own line. This might help find the origins of your problem. I tried it out and got the expected results:

In [1]: import numpy as np

In [2]: from sklearn.preprocessing import MinMaxScaler

In [3]: x = np.random.randint(0, 10, (10, 10)).astype(np.float)

In [4]: x                                    # generate random data in range [0, 9]
Out[4]: 
array([[ 1.,  4.,  5.,  4.,  6.,  1.,  8.,  1.,  8.,  9.],
       [ 3.,  1.,  4.,  4.,  6.,  2.,  5.,  1.,  0.,  8.],
       [ 2.,  0.,  6.,  1.,  5.,  2.,  5.,  8.,  8.,  4.],
       [ 8.,  9.,  2.,  8.,  5.,  6.,  0.,  5.,  0.,  5.],
       [ 1.,  3.,  2.,  2.,  3.,  2.,  4.,  1.,  7.,  5.],
       [ 7.,  0.,  8.,  8.,  3.,  6.,  6.,  6.,  4.,  3.],
       [ 4.,  3.,  4.,  4.,  7.,  6.,  4.,  5.,  6.,  7.],
       [ 9.,  0.,  8.,  9.,  7.,  1.,  2.,  2.,  4.,  6.],
       [ 7.,  4.,  2.,  8.,  6.,  5.,  2.,  9.,  9.,  9.],
       [ 7.,  6.,  9.,  2.,  9.,  0.,  1.,  5.,  7.,  3.]])


In [5]: scaler = MinMaxScaler()              # defaults to range [0, 1]

In [6]: scaler.fit(x)                        # compute the scaling factors
Out[6]: MinMaxScaler(copy=True, feature_range=(0, 1))

In [7]: scaled_data = scaler.transform(x)    # scale the data

In [8]: scaled_data.shape                    # still the same shape
Out[8]: (10, 10)

In [9]: scaled_data.min()                    # min and max are 0 and 1 as expected
Out[9]: 0.0

In [10]: scaled_data.max()
Out[10]: 1.0

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.