Predict status of upcoming project milestones with intermediate activities
I have data of 100+ project data. Each project has about 175 sequential activities from start to end. There are approximately 7 key milestones between those 175 activities that we want to predict. Data is completely categorical (means every activity status is R, A, G, B, GR.) So we want to predict the status of those 7 milestones (R,A,G), say after every 25 activities.
Projects are civil work projects where sequential activities are reqt gathering, review, approvals, high level design, review, low level design, risk identified, build, deliver etc. Milestones are End of Reqts, End of Design, End of Build and End of Deploy etc. Have past data of 500 such projects to train for 24 months. Since we are new to Machine learning, we tried random forest for each milestone. But, that requires all activities data ready. To predict milestone in advance, this seems to be incorrect model.
Proj Start--activity1,2,3....24,Milestone1 (Predict this), 26,27,28...49,MS2 (Predict),51,52,.....74,MS3(Predict)....MS4...MS6, 151...174 MS7(Predict)--ProjEnd
Also, we should be in position to predict at least one next milestone (e.g. MS3) status based on current milestone status (e.g. MS2) and take action accordingly for activities (e.g. 51 to 74) in between. Please suggest how should this problem be approached?
Topic sequence time-series machine-learning
Category Data Science