Code or Package to cluster sequences (or time series) of different lengths based on HMM?

Question

Code or Package to cluster sequences (or time series) of different lengths based on HMM?

mflowww

2018年10月10日 17:01

Is there any existing code or packages in Python, R, Java, Matlab, or Scala that implements the sequence clustering algorithms in any of the following 2 papers?

1) 'Clustering Sequences with Hidden Markov Models' by Padhraic Smyth (1997): https://papers.nips.cc/paper/1217-clustering-sequences-with-hidden-markov-models.pdf

The paper gives a probabilistic model-based approach to clustering sequences (or time series), using hidden Markov models (HMM).

2) 'Visual Cluster Exploration of Web Clickstream Data' by Jishang Wei, Zeqian Shen, Neel Sundaresan, Kwan-Liu Ma (2012): http://www.cs.tufts.edu/comp/250VIS/papers/VAST2012-ClickStream.pdf

The paper is quite relevant to 1), as it maps each high-dimensional sequences (each sequence may have different length or in other words different dimension) to a 2D map (self organizing maps) where the distance metric is no longer Euclidean distance as that present in the conventional Kohonen Self-Organizing Maps, but instead the distance metrics becomes the log likelihood of how each sequence fits in a candidate hidden Markov model (HMM). Then on the 2D self-organizing map, K-means is used to cluster the map's nodes.

I haven't found existing package or code that implements the above clustering algorithms. There's hmmlearn (https://github.com/hmmlearn/hmmlearn) Python package to fit sequences to HMM, and there's existing Python package to implement SOM (self-organizing maps, such as this one https://github.com/stephantul/somber), but I wonder if there's existing code to implement clustering algorithms for sequential data points, based on updating the distance metric of HMM's likelihood function (or negative log likelihood, etc)? It can be Python, R, Java, Matlab, or Scala, or any other languages.

Thanks!

Topic markov-hidden-model expectation-maximization sequence clustering open-source

Category Data Science

Code or Package to cluster sequences (or time series) of different lengths based on HMM?

About