Logistic regression based prediction model using flask(python) to predict if Student will pass or fail. Error

I am trying to create a web application on Python using Flask that predicts if a student is likely to pass or fail using a Kaggle dataset. I changed the dataset a little and want to predict if the student will Pass or Fail using Logistic Regression by setting all students with Average marks (calculated as (math score+reading score+writing score)/3) below 45 as fail and others as pass. I also dropped the lunch column. I am getting an error when I run the following code---

model.py

import pandas as pd
import pickle
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

data = pd.read_csv('1. StudentsPerformance main.csv')

target_name='Average Score'
x = data.iloc[:, :3].values
y=data[target_name].values

le = LabelEncoder()  
x['Parental Education Level']= le.fit_transform(x['Parental Education Level']) 
x['Race/Ethnicity']= le.fit_transform(x['Race/Ethnicity'])
x['Test Preparation Course']= le.fit_transform(x['Test Preparation Course'])

onehotencoder = OneHotEncoder() 
x = onehotencoder.fit_transform(x).toarray()

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,random_state = 40)
print('The number of samples into the train data is {}.'.format(x_train.shape[0]))
print('The number of samples into the test data is {}.'.format(x_test.shape[0]))

from sklearn.linear_model import LogisticRegression
reg=LogisticRegression(n_jobs=-1, random_state=15, solver='lbfgs')
reg.fit(x_train,y_train)


pickle.dump(reg, open('model.pkl','wb'))
#model = pickle.load(open('model.pkl','rb'))

app.py

app = Flask(__name__)
model = pickle.load(open('model.pkl', 'rb'))

@app.route('/')
def home():
    return render_template('index.html')

@app.route('/predict',methods=['POST'])
def predict():
    '''
    For rendering results on HTML GUI
    '''
    int_features = [int(x) for x in request.form.values()]
    final_features = [np.array(int_features)]
    prediction = model.predict(final_features)

    output = round(prediction[0], 2)


    #return render_template('index.html', prediction_text='The student is more likely to PASS')
    return render_template('index.html', prediction_text='The student is more likely to $ {}'.format(output))                       

@app.route('/predict_api',methods=['POST'])
def predict_api():
    '''
    For direct API calls trought request
    '''
    data = request.get_json(force=True)
    prediction = model.predict([np.array(list(data.values()))])

    output = prediction[0]
    return jsonify(output)

if __name__ == "__main__":
    app.run(debug=True)

index.html

html
head
link rel="stylesheet" href="a.css"
titlePrediction Model/title
/head
body
center
h4Prediction Model for StudentPerformance.csv/h4
form action="{{ url_for('predict')}}"method="post"
label for="tpc"Test Preparation Course Status-/labelbr
    select id="tpc" name="tpc" 
      option value="0"Complete/option 
      option value="1"None/option  
    /selectbr 
label for="pel"Parental Education Level-/labelbr
    select id="pel" name="pel" 
      option value="0"High School/option 
      option value="1"Some College/option 
      option value="2"Bachelor's Degree/option 
      option value="3"Associate's Degree/option 
      option value="4"Master's Degree/option 
      /selectbr
label for="re"Race/Ethnicity-/labelbr
    select id="re" name="re" 
      option value="0"Group A/option 
      option value="1"Group B/option 
      option value="2"Group C/option 
      option value="3"Group D/option 
      option value="4"Group E/option 
    /selectbr
input type="Submit" value="Submit"br/form
pOutput: /p br
{{ prediction_text }}
/center
/body
/html

request.py

import requests

url = 'http://localhost:5000/predict_api'
r = requests.post(url,json={'tpc':"completed", 'pel':"high school", 're':"group A"})

print(r.json())

Traceback---

Traceback (most recent call last):
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 2463, in __call__
    return self.wsgi_app(environ, start_response)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 2449, in wsgi_app
    response = self.handle_exception(e)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1866, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\admin\shreyaflask\app.py", line 25, in predict
    prediction = model.predict(final_features)
  File "C:\Users\admin\anaconda3\lib\site-packages\sklearn\linear_model\_base.py", line 293, in predict
    scores = self.decision_function(X)
  File "C:\Users\admin\anaconda3\lib\site-packages\sklearn\linear_model\_base.py", line 273, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 3 features per sample; expecting 12

Topic spyder logistic-regression scikit-learn python predictive-modeling

Category Data Science


The error is self-explanatory. You provide the model with only 3 features whereas it needs 12 features. In model.py you select 3 features from the dataset, indeed. However, you apply one-hot encoding that creates new columns. Each new column describes only one category and contains values 0 and 1: whether this category is observed in a sample or not. And the number of these newly-created columns depends on the number of the categories you need to encode.

Example:

What you have before OHE:


PEL

High School
High School
Bachelor's
...

After applying OHE:


PEL High School PEL Bachelor's ... 
1               0 
1               0
0               1

In your case, there're 5 categories for Parent Education Level, 5 -for Race/Ethnicity groups and 2 - for Test Preparation Course. In total, one-hot encoding creates 12 features from your 3-columns dataset. A piece of advice: check x.shape before modeling. In this way you'll always know how many features your model needs for prediction.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.