Summarize events per ID

Data: Each corresponds to an event (a person's visit to the hospital, as an example). I have a series of data associated with this event (duration of visit, motive, etc...).

Objective: Summarize the above information in a per person data set (meaning that the new data set should have only on row per person and capturing as much information about their history as possible).

My initial solutions: 1 - The most obvious, and potentially useful, is to create relevant variables by hand. For instance, if the objective is to predict the average time of next visit, the average time on the past is relevant. However, this is very problem specific, and I feel there should be other (not in replacement) options.

2 - Recurrent neural networks. As visits have a time sequence, it seams logical to mean to apply a recurrent autoencoder, in order to summarize this data. (If this is by no means correct, can you point out why?)

3 - What if I don't have a time sequence?

Topic automatic-summarization autoencoder feature-engineering unsupervised-learning rnn

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.