Is it meaningful to use word2vec for non-string inputs like time series analysis?

I am working on a project that detects anomalies in a time series. I wonder if I can use word2vec for anomaly detection for non-string inputs like exchange rates?

Topic word2vec anomaly-detection deep-learning time-series

Category Data Science


No - the word2vec algorithm assumes the data is a series of discrete symbols. Exchange rates are continuous.


The answer in short is yes. In general the domains of NLP and time series are very similar in the sense that they are both sequential data. The main difference is that text is discrete, whereas the values of a signal belong to the continuous space. Thus be discretizing a time series (regarding the values it can take) we can have a sequence in the discrete space.

There are already many algorithms that are based on the discretization of a time series and some of them actually convert a time series to words. Some of the most popular time series representations are PAA, SAX, BOSS, COTE and most recently Signal2Vec.

Signal2vec (I am one of the authors) includes two steps. The first one is to discretize the time series, which can happen using a clustering algorithm or any other discretization method. The second step is the model of Word2vec, which can be applied either on each symbol or on words that are constructed by the symbols of the discretized time series.

As far as anomaly detection is concerned, you can use any of the abovementioned time series representations. There are also very good surveys comparing anomaly detection methods for time series and I would strongly recommend to read at least the most recent ones to get an idea of the state of the art methods. I would also recommend the Matrix Profile, which is very simple to implement and is very robust.


The goal of Word2vec is to represent each element of a sequence into an "embedded space", i.e. a lower dimensional space where "similar" elements are located closer to each other.

I think this is a bit off topic for time series analysis. If you want to detect outliers, you might try looking at the normalized distance of each datapoint from the trend, and set a threshold.

Alternatively, if you really want to stick with Neural Networks, you can use Autoencoders. They can be applied to outlier detection, but it's not the simplest model to implement.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.