Predict status of upcoming project milestones with intermediate activities

I have data of 100+ project data. Each project has about 175 sequential activities from start to end. There are approximately 7 key milestones between those 175 activities that we want to predict. Data is completely categorical (means every activity status is R, A, G, B, GR.) So we want to predict the status of those 7 milestones (R,A,G), say after every 25 activities.

Projects are civil work projects where sequential activities are reqt gathering, review, approvals, high level design, review, low level design, risk identified, build, deliver etc. Milestones are End of Reqts, End of Design, End of Build and End of Deploy etc. Have past data of 500 such projects to train for 24 months. Since we are new to Machine learning, we tried random forest for each milestone. But, that requires all activities data ready. To predict milestone in advance, this seems to be incorrect model.

Proj Start--activity1,2,3....24,Milestone1 (Predict this), 26,27,28...49,MS2 (Predict),51,52,.....74,MS3(Predict)....MS4...MS6, 151...174 MS7(Predict)--ProjEnd

Also, we should be in position to predict at least one next milestone (e.g. MS3) status based on current milestone status (e.g. MS2) and take action accordingly for activities (e.g. 51 to 74) in between. Please suggest how should this problem be approached?

Topic sequence time-series machine-learning

Category Data Science


Given the relative complexity of the data to the amount, machine learning might not be useful.

Here are several different machine learning options:

  • If you ignore the sequence information (aka, the bag-of-words style assumption), you can fit any traditional multi-class classification algorithm.
  • If you make the Markov assumption, you'll only need the to look at the last element in the sequence. Then it can be modeled as a simple Probabilistic Graphical Model (PGM).
  • Then start relaxing the Markov assumption by looking back at progressively more time steps.
  • It is possible to frame it as reinforcement learning (RL) but that would require far more data.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.