Good "frequent sequence mining" packages in Python?

Has anyone used (and liked) any good "frequent sequence mining" packages in Python other than the FPM in MLLib? I am looking for a stable package, preferable stilled maintained by people. Thank you!

Topic sequential-pattern-mining python

Category Data Science


To complement some of the great answers/libraries:

Seq2Pat: Sequence-to-Pattern Generation Library might be relevant to your case.

The library is written in Cython to take advantage of a fast C++ backend with a high-level Python interface. It supports constraint-based frequent sequential pattern mining.

Here is an example that shows how to mine a sequence database while respecting an average constraint for the prices of the patterns found.

# Example to show how to find frequent sequential patterns
# from a given sequence database subject to constraints
from sequential.seq2pat import Seq2Pat, Attribute

# Seq2Pat over 3 sequences
seq2pat = Seq2Pat(sequences=[["A", "A", "B", "A", "D"],
                             ["C", "B", "A"],
                             ["C", "A", "C", "D"]])

# Price attribute corresponding to each item
price = Attribute(values=[[5, 5, 3, 8, 2],
                          [1, 3, 3],
                          [4, 5, 2, 1]])

# Average price constraint
seq2pat.add_constraint(3 <= price.average() <= 4)

# Patterns that occur at least twice (A-D)
patterns = seq2pat.get_patterns(min_frequency=2)

Notice that sequences can be of different lengths, and you can add/drop other Attributes and Constraints. The sequences can be any string, as in the example, or integers.

The underlying algorithm uses Multi-valued Decision Diagrams, and in particular, the state-of-the-art algorithm from AAAI 20019.

Hope this helps!

Disclaimer: I am a member of the research collaboration between Fidelity & CMU on the Seq2Pat Library.


Have you considered to write it by yourself? Because there is probably no up-to-date maintained library right now.

Check this out, its the basic - PrefixSpan and Closed/Maximal patterns are actually not that hard to implement.


I am actively maintaining an efficient implementation of both PrefixSpan and BIDE in Python 3, supporting mining both frequent and top-k (closed) sequential patterns.


SPMF sounds like a useful library for pattern mining.


The only Python package I've found is on Github.

They have an implementation of BIDE there, but it's not maintained code.


Since none of the existing solutions were satisfactory for me, I created my own Python Wrapper for SPMF (the Java library mentioned in other answers here).


I've used fim's fpgrowth function in the past and it worked well. It's kind of a pain to install on Windows machines however. It seems to be an academic website so I'm not sure if they're doing many updates to the code over time...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.