Detect time pattern in sequence of events

I have a time series with a timestamp and an associated event:

Time Event
1 A
2 B
3 C
T A

I was wondering if there is a technique/method to figure out which events most often precede others in a given time window. I.e. 10 seconds before A happens B and C occur with 90% chance. Ideally the output should be the sequences that occur most often.

So far I have thought of using some sort of clustering, though a cluster would not suffice as I need to get the time direction right.

Does anyone have an idea?

Thanks

Topic sequential-pattern-mining unsupervised-learning time-series clustering

Category Data Science


This is a very cool question! Clearly clustering won't work because of time dependent relationships, as you said. I came up with the following algorithm:

  1. Have the time series data you have as a list
  2. Instantiate an iterator, that will be another list/array that encompasses the time frame you have, e.g. 10 seconds would be an array of length 10, which includes 5 points at current point and 4 points after it
  3. Count what events occur, hold them in a variable if they occur after/before your event of interest

Let me write a prototypical Python script:

### You can expand these variables depending on the types of events you have
number_of_events_b = 0
number_of_events_c = 0
### I take your data given as a variable time_series_data
for i in range(5, len(time_series_data) ):
    current = time_series_data[i)
    time_frame = time_series_data[i-5:i+4]
    ### If your event is A, count B and C 
    if current == "A":
        for j in time_frame:
            if j == "B":
                number_of_events_b += 1
            elif j == "C":
                number_of_events_c += 1

    ### If you want to get relative frequency of each you can simply do it based on 
    the counts, so I don't need to over complicate the answer

This code gives the result as what events precede or how they proceed with other events. If you want to get what sequences occur most often, that too is easy, you can just store events as a matrix. Like:

### Store the sequences as a matrix
sequences = []
### Again, I take your data given as a variable time_series_data
for i in range(5, len(time_series_data) ):
    current = time_series_data[i)
    time_frame = time_series_data[i-5:i+4]
    ### If your event is A, count B and C 
    if current == "A":
        sequences.append(time_frame)

### Now you should count to what extent each sequences occur
sequence_frequencies  = {} ### Store as a dict for later referral opportunity
for sequence in sequences:
    sequence_count = 0
    for other_seq in sequences:
       if sequence == other_seq:
           sequence_count += 1
    sequence_frequencies[sequence] = sequence_count

I believe then you can get which sequences occur most using dict methods, in python. I am a bit rusty on my coding and my process is often iterative, so if any syntax errors occur or the algorithm fall short, note that I just give a rough guideline.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.