Predicting High-School test scores after a disciplinary action
I'm somewhat new to machine learning and have learned to apply many of the basic regression and classification methods using python and various packages. However, approaching this problem has me stumped. To illustrate the problem, I created a fictitious scenario where a guidance counselor wants to predict test scores for a student after disciplinary action. Suppose they have data available like the mock-up below:
Column definition:
Student - Student Identification #
Gender - Male/Female
Age - Current Age
Athlete - Sport that student plays (0-None, 1-Basketball, 2-Football, 3-Soccer)
Online - Student takes classes online (0-No, 1-Yes)
Before_Disciplinary_Action_Scores - Sequence of their last n test scores prior to discipline (in date order)
Discplinary_Action - Action Counselor Took (0-None, 1-Assigned Tutor, 2-Guardian Meeting, 3-Study Program, 4-Ineligible for game)
After_Disciplinary_Action_Scores - Sequence of their next n test scores after discipline (in date order)
Assume the fictitious school system is quite large and there are about 80k records in total. All students have a total of 12 test scores from Before/After discipline action, but the number of test scores varies based on when the disciplinary action was given. For simplicity, you can assume the scores are given on a weekly basis for a single course.
I can compute mean before/after scores and create a fairly good classification model, but I'd like to take it a step further and predict the post-discipline scores.
I've tried using Prophet and LSTM models to predict the after scores from a time series approach with poor results and the varying number of scores makes it difficult. I can see that the features around Athlete, Transfer, and Online are important components and I tried adding them as regressors, but that too has failed. I appreciate any guidance you can give.
Topic methods python machine-learning
Category Data Science