Good "frequent sequence mining" packages in Python?

Question

Good "frequent sequence mining" packages in Python?

Edamame

2021年4月8日 16:36

Has anyone used (and liked) any good "frequent sequence mining" packages in Python other than the FPM in MLLib? I am looking for a stable package, preferable stilled maintained by people. Thank you!

Topic sequential-pattern-mining python

Category Data Science

skadio · Accepted Answer · 2021年4月8日 16:36

To complement some of the great answers/libraries:

Seq2Pat: Sequence-to-Pattern Generation Library might be relevant to your case.

The library is written in Cython to take advantage of a fast C++ backend with a high-level Python interface. It supports constraint-based frequent sequential pattern mining.

Here is an example that shows how to mine a sequence database while respecting an average constraint for the prices of the patterns found.

# Example to show how to find frequent sequential patterns
# from a given sequence database subject to constraints
from sequential.seq2pat import Seq2Pat, Attribute

# Seq2Pat over 3 sequences
seq2pat = Seq2Pat(sequences=[["A", "A", "B", "A", "D"],
                             ["C", "B", "A"],
                             ["C", "A", "C", "D"]])

# Price attribute corresponding to each item
price = Attribute(values=[[5, 5, 3, 8, 2],
                          [1, 3, 3],
                          [4, 5, 2, 1]])

# Average price constraint
seq2pat.add_constraint(3 <= price.average() <= 4)

# Patterns that occur at least twice (A-D)
patterns = seq2pat.get_patterns(min_frequency=2)

Notice that sequences can be of different lengths, and you can add/drop other Attributes and Constraints. The sequences can be any string, as in the example, or integers.

The underlying algorithm uses Multi-valued Decision Diagrams, and in particular, the state-of-the-art algorithm from AAAI 20019.

Hope this helps!

Disclaimer: I am a member of the research collaboration between Fidelity & CMU on the Seq2Pat Library.

HonzaB · Accepted Answer · 2020年8月4日 17:00

Have you considered to write it by yourself? Because there is probably no up-to-date maintained library right now.

Check this out, its the basic - PrefixSpan and Closed/Maximal patterns are actually not that hard to implement.

Chuancong Gao · Accepted Answer · 2020年8月4日 17:00

1

Chuancong Gao answered at 2020年8月4日 17:00

I am actively maintaining an efficient implementation of both PrefixSpan and BIDE in Python 3, supporting mining both frequent and top-k (closed) sequential patterns.

Samaneh Saadat · Accepted Answer · 2020年8月4日 16:59

1

Samaneh Saadat answered at 2020年8月4日 16:59

SPMF sounds like a useful library for pattern mining.

yossico · Accepted Answer · 2020年8月4日 13:23

1

yossico answered at 2020年8月4日 13:23

The only Python package I've found is on Github.

They have an implementation of BIDE there, but it's not maintained code.

Lorenz Leitner · Accepted Answer · 2020年4月22日 18:09

1

Lorenz Leitner answered at 2020年4月22日 18:09

Since none of the existing solutions were satisfactory for me, I created my own Python Wrapper for SPMF (the Java library mentioned in other answers here).

Jed · Accepted Answer · 2017年9月11日 18:11

I've used fim's fpgrowth function in the past and it worked well. It's kind of a pain to install on Windows machines however. It seems to be an academic website so I'm not sure if they're doing many updates to the code over time...

Good "frequent sequence mining" packages in Python?

About