How to Predict/Forecast street's Traffic based on previous values?

I have a dataset which has the following 5 columns:

date, hour, day_of_week, street_id, counts

My dataset has information about the number of cars that each street (same city) has in a given hour of a certain date, and I want to predict the traffic count that a certain street has in a given hour of a certain date.

I think I could use certain variables depending on the day and hour that I want to predict, for example, if a want to predict the traffic count of a working Wednesday:

  1. Results of others working days

  2. Results of others Wednesdays

  3. ...

I want to use Spark MLlib to perform the prediction because I have experience with Spark and I have large datasets.

How you deal with this kind of problem?

Any ideas?

Topic prediction apache-spark

Category Data Science


This looks like a Time Series problem. So based on a variable's past values, you try to predict the future values.

Usually an "unheard of" problem with Spark, but you are in luck ; spark-ts library seems to be doing what you need, so you don't need to code your own using MLlib. I recommend you try it out and then circle back to something in MLlib if things don't work.

They have introduced a TimeSeriesRDD and once you can encode your data in this data structure (Note that this still behaves like a normal RDD), you can play around with the models available. For example, implementing the ARIMA model would be as simple as -

val arimaModel = ARIMA.fitModel(1, 0, 1, ts)

Hope that helps!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.