What are some good methods to forecast future revenue on categorical and value based data?
I have monthly snapshots (3 years) of all the contract data. It includes following information:
- Contract status [Categorical]: Proposed, tracked, submitted, won, lost, etc
- Contract stages [Categorical]: Prospecting, engaged, tracking, submitted, etc.
- Duration of contract [Date/Time] : months and years
- Bid Start date [Date/Time]: Date (But this changes when the contracts are delayed)
- Contract value [Numerical] : Value of the contract in local currency
- Future revenue projection [Numerical]: Currency value breakdown of revenue for next 5 years (this value is available for all the contracts, no matter if it's won or lost)
I also have other information about the contracts like id, name, description, etc.
Answers I am trying to get:
- Total value of contracts that are changing status from month to month
- Total value of contracts that are changing stages from month to month
- Average delay of the start date of the contracts
- Future revenue projection (5 years) based on change of status and average delay
Problems I am having with this data:
It's not time series data, it's monthly snapshot, so I can either turn it into monthly time series dataset and accumulate revenues based on each status and stages or count of all the contracts.
Do I accumulate the contracts data or leave it as individual contracts? In the later case, how do I feed it to any model? It won't be a time series data then.
Main problem with finding the right approach:
I am not sure what approaches to use to answer very different questions. Some values are categorical and some are numerical. I am not sure if it is a forecasting problem or 'change in event' prediction problem. Or mix of both?
How do I incorporate, these very different categorical variables with numerical revenue value, into any model.
Methods I looked into:
I have read about forecasting models like ARIMA (mostly sales data). It takes time series data to forecast revenues based on historical data. I am not sure if it is valid here because I have contracts that changes status, and I am not sure how to use it in ARIMA model. Or if it is necessary to do so. I am also not sure if there is a seasonality to the data. Contracts winning or losing is not a seasonal event.
I also looked into Simple Exponential Smoothing (SES) and Holt Winter's Exponential Smoothing (HWES) examples and found the same issue while calculating average delays or forecasting future revenue. The current data is not univariate.
I looked at following answers: https://stats.stackexchange.com/questions/246151/difference-between-time-series-prediction-vs-point-process-prediction and it made me think that maybe my problem is not time series prediction.
Best Approach to Forecasting Numerical Value Based on time series and categorical data? : This made me think maybe I should look into RNN and LSTM.
How do you predict a continuous variable when all your independent variables are categorical : Or my problem is similar to this one.
I am sorry for the long post. I am trying to make the problem as clear as possible. I have no idea what would be the best approach to solve this problem and what data to feed to the model. I am also lost at how to structure the data to get the best use out of all the variables.
I would be grateful if you can help me suggest any good methods or reading resources, so I can answer the questions. Thank you for your time!