How can one generate future forecasts from probabilistic events?

I have an event "whether an item sold will be returned or not" which I can predict with a certain probability based on information gathered at the time that the purchase occurs (product features, customer information, time and place, etc). So:

P(Return | transaction information) = x% for a specific unit sold

I also have a historical time series of total units sold for that item, and a future forecast of sales of that item over the next few weeks. Assuming I gave the relevant transaction data for each historical sale that occurred, is there away to generate a future forecast from the return probability, so that I can state with some confidence that I will get 15% total returns on the item next week, 10% the week after etc?

Topic forecast time-series

Category Data Science


One potential idea might be to approach this as a classification problem: manually label returns and train a Logistic Regression with a probability output. Now you can give a probabilistic answer.


You may wish to do some data aggregation.
for each week and for each item unit, aggregate all of its records, such that you could define how many are returned, also you need to expose transactions,in such a way it could infer those numbers of returns, following that formulate the problem as time series, you have time steps defined as weeks, preprocess the data by computing the relative value of return units instead of absolute ones, apply z-score normalization, then auto-encode the data by feeding it to RNN like models to predict the value at timestep t+1 given previous t timesteps.


If you have historical data on percentage of return for few weeks after the sales, I believe you can do it. It will be a multiple-output regression problem e.g. you will have to formulate the target values (i.e. output matrix) in a way so that %-return in first week will be in column one, %-return in second week will be in columns two and so on. After this, plug in all your input variables and run a multivariate multiple output regression algorithm. Such an algorithm on scikit-learn can be found (sklearn.multioutput.MultiOutputRegressor: here)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.