Classification with feature not available at time of model creation
I have problem statement to predict the probability of solving a task depending on multiple features for e.g. when the task was created, the time needed to work on a task, etc Please find a dummy snippet attached
task_id date_time_open time_needed day_created time_created status
aa 12/09/2019 20 hrs Tuesday 3 pm done
cc 17/10/2019 4 hrs Friday 10 pm not_done
I know I can run a classification model to identify the class. However, things complicate when I add a time dimension to it since the data set now gets an added feature which highly impacts the status
The task was scanned at suppose 7 pm and a new feature added for 7 pm
task_id date_time_tsk_open time_needed day_created time_created status_7pm status
aa 12/09/2019 20 hrs tuesday 3pm done done
cc 17/10/2019 4 hrs friday 10 pm done not_done
dd 19/10/2019 6 hrs friday 2 pm done done
ff 19/10/2019 9 hrs Monday 4 pm not_done not_done
The task id was again scanned at a fixed interval of 1 hr and added new features to data
task_id date_time_tsk_open time_needed day_created time_created status_8pm status
aa 12/09/2019 20 hrs tuesday 3pm done done
cc 17/10/2019 4 hrs friday 10 pm not_done not_done
dd 19/10/2019 6 hrs friday 2 pm done done
ff 19/10/2019 9 hrs Monday 4 pm not_done not_done
The final prediction of status == resolved / un_resolved in my understanding should be based on features including status_7pm and status_8pm.
How should the data structure for training such a classification model look like to generate a prediction at time 9 pm for sample task ff respectively
task_id date_time_tsk_open time_needed day_created time_created status_7pm status_8pm status
ff 19/10/2019 9 hrs Monday 4 pm not_done not_done not_done
I assume the classification model should be trained on all status_1, status_2 ....status_8pm to classify status. Or would the model be trained every time in memory once it gets a new column updated status every hour
Topic time prediction classification
Category Data Science