Unable to make accurate predictions?

I have a dataset of diabetes patients and I am trying to predict the next blood glucose level. I have attached an image below and I have about 1600 records in that csv file containing data of 10 patients. Each patient is uniquely identified by the Id column and the Glucose_t-1 means the glucose value in which the patient had before the current reading(Glucose_t) likewise this apply to Glucose_t-2 and Glucose_t-3. And same applies to the Insulin_t-1,Insulin_t-2. The event column is the Glycemic event in which the current reading of the blood glucose value falls to. For example,

  • blood glucose value = 70 then 0,
  • 70 blood glucose value =180 then 1,
  • blood glucose value > 180 then 2.

I have applied different regression algorithms like Logisitic Regression, Random Forest Regression and so on but I was unable to predict the Glucose_t value accurately. The accuracy comes to 0.008.. which is very depressing :(. Please any help to improve the accuracy of will be greatly appreciated. Thanks.

Topic predict prediction deep-learning machine-learning

Category Data Science


It's a quite complex problem and there might be better options for the design (I'm thinking maybe something more specific to times series)...

However before that there's a more obvious problem to solve: it seems that you are calculating accuracy on the "Glucose_t" numeric value, right? If yes this is incorrect and that would explain your terrible results:

  • Accuracy is an evaluation measure for classification tasks, not for regression tasks: accuracy simply checks for each instance whether the predicted value (which is supposed to be a categorical value) is the same as the gold true value, and then divides the number of correct cases by the total. Naturally it's very hard to predict the exact true value in the case of numbers: if the true value is 183 but the algorithm predicts 184, then accuracy would count that as incorrect even though it's very close. That would explain why your accuracy is very low.
  • Typical regression evaluation measures are mean absolute error and mean squared errors. These measures (and their variants) are designed to calculate how far the predicted numerical value is from the true value. If you use these keep in mind that it's an error score, so the lower the value the better.
  • If you want to use accuracy, it would make sense to use it on your "event" value (which is derived from your "glucose_t" prediction I assume?): in this case you have 3 classes (categorical values), and I bet the accuracy will be much better. Note that accuracy can be biased by class imbalance, so don't forget to check the confusion matrix.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.