Logistic regression based prediction model using flask(python) to predict if Student will pass or fail. Error

Question

Logistic regression based prediction model using flask(python) to predict if Student will pass or fail. Error

shreya saxena

2020年5月18日 14:37

I am trying to create a web application on Python using Flask that predicts if a student is likely to pass or fail using a Kaggle dataset. I changed the dataset a little and want to predict if the student will Pass or Fail using Logistic Regression by setting all students with Average marks (calculated as (math score+reading score+writing score)/3) below 45 as fail and others as pass. I also dropped the lunch column. I am getting an error when I run the following code---

model.py

import pandas as pd
import pickle
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

data = pd.read_csv('1. StudentsPerformance main.csv')

target_name='Average Score'
x = data.iloc[:, :3].values
y=data[target_name].values

le = LabelEncoder()  
x['Parental Education Level']= le.fit_transform(x['Parental Education Level']) 
x['Race/Ethnicity']= le.fit_transform(x['Race/Ethnicity'])
x['Test Preparation Course']= le.fit_transform(x['Test Preparation Course'])

onehotencoder = OneHotEncoder() 
x = onehotencoder.fit_transform(x).toarray()

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,random_state = 40)
print('The number of samples into the train data is {}.'.format(x_train.shape[0]))
print('The number of samples into the test data is {}.'.format(x_test.shape[0]))

from sklearn.linear_model import LogisticRegression
reg=LogisticRegression(n_jobs=-1, random_state=15, solver='lbfgs')
reg.fit(x_train,y_train)


pickle.dump(reg, open('model.pkl','wb'))
#model = pickle.load(open('model.pkl','rb'))

app.py

app = Flask(__name__)
model = pickle.load(open('model.pkl', 'rb'))

@app.route('/')
def home():
    return render_template('index.html')

@app.route('/predict',methods=['POST'])
def predict():
    '''
    For rendering results on HTML GUI
    '''
    int_features = [int(x) for x in request.form.values()]
    final_features = [np.array(int_features)]
    prediction = model.predict(final_features)

    output = round(prediction[0], 2)


    #return render_template('index.html', prediction_text='The student is more likely to PASS')
    return render_template('index.html', prediction_text='The student is more likely to $ {}'.format(output))                       

@app.route('/predict_api',methods=['POST'])
def predict_api():
    '''
    For direct API calls trought request
    '''
    data = request.get_json(force=True)
    prediction = model.predict([np.array(list(data.values()))])

    output = prediction[0]
    return jsonify(output)

if __name__ == "__main__":
    app.run(debug=True)

index.html

html
head
link rel="stylesheet" href="a.css"
titlePrediction Model/title
/head
body
center
h4Prediction Model for StudentPerformance.csv/h4
form action="{{ url_for('predict')}}"method="post"
label for="tpc"Test Preparation Course Status-/labelbr
    select id="tpc" name="tpc" 
      option value="0"Complete/option 
      option value="1"None/option  
    /selectbr 
label for="pel"Parental Education Level-/labelbr
    select id="pel" name="pel" 
      option value="0"High School/option 
      option value="1"Some College/option 
      option value="2"Bachelor's Degree/option 
      option value="3"Associate's Degree/option 
      option value="4"Master's Degree/option 
      /selectbr
label for="re"Race/Ethnicity-/labelbr
    select id="re" name="re" 
      option value="0"Group A/option 
      option value="1"Group B/option 
      option value="2"Group C/option 
      option value="3"Group D/option 
      option value="4"Group E/option 
    /selectbr
input type="Submit" value="Submit"br/form
pOutput: /p br
{{ prediction_text }}
/center
/body
/html

request.py

import requests

url = 'http://localhost:5000/predict_api'
r = requests.post(url,json={'tpc':"completed", 'pel':"high school", 're':"group A"})

print(r.json())

Traceback---

Traceback (most recent call last):
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 2463, in __call__
    return self.wsgi_app(environ, start_response)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 2449, in wsgi_app
    response = self.handle_exception(e)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1866, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\admin\shreyaflask\app.py", line 25, in predict
    prediction = model.predict(final_features)
  File "C:\Users\admin\anaconda3\lib\site-packages\sklearn\linear_model\_base.py", line 293, in predict
    scores = self.decision_function(X)
  File "C:\Users\admin\anaconda3\lib\site-packages\sklearn\linear_model\_base.py", line 273, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 3 features per sample; expecting 12

Topic spyder logistic-regression scikit-learn python predictive-modeling

Category Data Science

maya-ami · Accepted Answer · 2020年5月18日 14:11

The error is self-explanatory. You provide the model with only 3 features whereas it needs 12 features. In model.py you select 3 features from the dataset, indeed. However, you apply one-hot encoding that creates new columns. Each new column describes only one category and contains values 0 and 1: whether this category is observed in a sample or not. And the number of these newly-created columns depends on the number of the categories you need to encode.

Example:

What you have before OHE:


PEL

High School
High School
Bachelor's
...

After applying OHE:


PEL High School PEL Bachelor's ... 
1               0 
1               0
0               1

In your case, there're 5 categories for Parent Education Level, 5 -for Race/Ethnicity groups and 2 - for Test Preparation Course. In total, one-hot encoding creates 12 features from your 3-columns dataset. A piece of advice: check x.shape before modeling. In this way you'll always know how many features your model needs for prediction.

Logistic regression based prediction model using flask(python) to predict if Student will pass or fail. Error

About