Time Series Generation - Multi Dimensional Time Series Data

Disclaimer: Mathematicians please don't be mad at me for the use of some of the terminologies in this post. I am an Engineer. :-)

Background: So I am currently working on a problem where I have to generate a time series sequence of a process in which n actors are moving in a 2d space. But i don't know if this is even possible .The process being learned by some machine learning model M.

BTW! I have never worked with time series data, but have a good experience with training models on images and signals, without a sequence, so i have been reading up on it on the go.

So to start of with trying something very simple, I took a football player position dataset from : Here . And I am trying to model it as a supervised learning problem where I try to predict the positions of n players at timestamp T, given that at timestamp T-1. But I very quickly realised that it wouldnt work because the positions of the players also depend on the position of the ball and that of the opponent team players.

Anyway my questions are as follows :- 1. How do i model the dataset? Will it just be a (Nx2xNo.Timestamps) like 3-d tensor dataset(N corresponding to the players. 2 for the x-position and y-position. and No. of timestamps as the last dimension)?

  1. Is my way of modelling the time series generation problem as a supervised learning problem correct?

  2. what Preprocessing steps should i be looking at? Also how do i handle missing values.

  3. The reason why i dropped the idea about using the soccer dataset : Here again because it only includes positions of one team. The other team didnt wear sensors :-( . I read something about exogenous variables also affecting the process, when reading something about the ARIMA model.

  4. If all this is possible and I hope it is (cos impossible is nothing!) what models should i be looking at? Because i ultimately have to work on this problem on a very different dataset... I have past experience with training Neural Network models like CNNs and ANNs, and feel very comfortable working with Neural Networks, and ideally would love to do so here. Uptil now my research has pointed me towards LSTMs RNNs and the ARIMA model.

Please guide me on the same as i'm very new to time series analysis.

Topic generative-models supervised-learning deep-learning time-series machine-learning

Category Data Science


Time series data must contain all your observations with some standard effect of time (Bit obvious here). If i wants to test theory on some model & need some dataset then the parameters would be like, Timestamps * no. of features (includes player's position with respect to a source). A 2d tensor would suffice. My reason for not making a 3d tensor is that it would lead to more complex scenario where i would have to co-relate the 3rd dimension (No of players [N]) with first 2 dimensions to predict. Better to simplify the positions & in a single row i can have multiple labels as every position will matter to my model.

Time series generation generally falls under continuous prediction with previous data predicted taken to be an observation. I would rather make it fall under Reinforcement learning. Yes you can work with supervised learning in mind but try RL approach as well.

As per the missing values i would say that remove them if they don't make more than 10-15% of data. There are no fix bars on the mentioned percentage. If it is more than that please fill in with interpolation or rolling average (both benefited me). Rest pre-processing depends upon type of data, normalise data, remove outlier etc.

Yes it will affect the data, but you can generate data using sign wave's different fluctuation for your testing or use any other functions to generate a signal(scipy preferred).

Currently i am testing bidirectional lstm - CNN combination for my time series, yes ARIMA is good but doing a little of this expirmenet won't hurt. I would say go for CNN-Any of RNN combination for time series.

Hope this helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.