What is the formal category of problem described by identifying consecutive occurrences of attributes in records?

Question

What is the formal category of problem described by identifying consecutive occurrences of attributes in records?

Recalcitrant Caprine

2022年5月5日 09:01

Apologies for the garbled title, I'd really need to know the answer to the question before I could phrase it properly...

Let's imagine I've got a data set of football(soccer if you prefer) match results

Let's further imagine that each result has the following attributes

Date
Venue
Team
Opponent
Home Team Goals
Away Team Goals
Result

Then let's consider a future match, for which we know some attributes but not all (obviously, because it hasn't happened yet)

Date - W
Venue - X
Team - Y
Opponent - Z

Given the future match, and the set of results, I want to produce some interesting pieces of information that are relevant to the given future match. The interesting part is probably still something of a manual step, so the automated part is really finding ALL sequences so that they can then be picked out

For example:

Team Y have won their last 3 games
Team Z have lost their last 3 games
Team Y have won their last 2 games against Team Z
Team Y have won their last 6 games against Team Z at Venue X

These examples are trivial, but the trick I am looking for is to algorithmically compose the qualification criteria - i.e. Team Y or Team Y against Team Z

Don't think it's relevant to the question but three heuristics for semi-automating the process of selecting the 'interesting' sequences from the set of all sequences will be:

Preferring sequences that have been done the fewest number of times previously (so Team Y has won 3 games in a row for the first time supersedes Team Y has won 3 games in a row for the third distinct time)
Preferring the most general sequence of the same length (So Team Y has won 3 games in a row supersedes Team Y has won 3 games in a row against Team Z)
Preferring sequences of greater length

I feel absolutely certain this must be a common category of problem with common algorithms and tools but when I try to google it, I'm not getting any useful results - I presume because I am using the wrong terminology - whenever I look for anything related to sequence detection, I get information related to sequence databases - and that's not really what I have, I've got something rather more akin to a transaction database of itemsets

Can anyone give me some guidance on:

Terminology for this type of problem (so that I can use this information to identify...)
Common algorithms used to tackle it
Common tools used to tackle it

Topic sequence data-mining

Category Data Science

lpounng · Accepted Answer · 2022年5月4日 03:27

What you describe is feature engineering, or specifically handcrafting feature. Specifically, sequence detection is the indeed the term for crafting features like "team X lost their last 5 games". If you think about it, you get sequential data if you order transaction data by time.

Basket and association rule analysis are classical methods which study this kind of problem. Modern deep learning domain would point you to the class of sequential models e.g. RNN and LSTM.

What is the formal category of problem described by identifying consecutive occurrences of attributes in records?

About