Is it meaningful to use word2vec for non-string inputs like time series analysis?

Question

Is it meaningful to use word2vec for non-string inputs like time series analysis?

irmgnr

2022年4月14日 16:04

I am working on a project that detects anomalies in a time series. I wonder if I can use word2vec for anomaly detection for non-string inputs like exchange rates?

Topic word2vec anomaly-detection deep-learning time-series

Category Data Science

Brian Spiering · Accepted Answer · 2021年11月12日 13:03

1

Brian Spiering answered at 2021年11月12日 13:03

No - the word2vec algorithm assumes the data is a series of discrete symbols. Exchange rates are continuous.

ChristoferNal · Accepted Answer · 2019年10月19日 09:30

The answer in short is yes. In general the domains of NLP and time series are very similar in the sense that they are both sequential data. The main difference is that text is discrete, whereas the values of a signal belong to the continuous space. Thus be discretizing a time series (regarding the values it can take) we can have a sequence in the discrete space.

There are already many algorithms that are based on the discretization of a time series and some of them actually convert a time series to words. Some of the most popular time series representations are PAA, SAX, BOSS, COTE and most recently Signal2Vec.

Signal2vec (I am one of the authors) includes two steps. The first one is to discretize the time series, which can happen using a clustering algorithm or any other discretization method. The second step is the model of Word2vec, which can be applied either on each symbol or on words that are constructed by the symbols of the discretized time series.

As far as anomaly detection is concerned, you can use any of the abovementioned time series representations. There are also very good surveys comparing anomaly detection methods for time series and I would strongly recommend to read at least the most recent ones to get an idea of the state of the art methods. I would also recommend the Matrix Profile, which is very simple to implement and is very robust.

Leevo · Accepted Answer · 2019年8月3日 15:55

The goal of Word2vec is to represent each element of a sequence into an "embedded space", i.e. a lower dimensional space where "similar" elements are located closer to each other.

I think this is a bit off topic for time series analysis. If you want to detect outliers, you might try looking at the normalized distance of each datapoint from the trend, and set a threshold.

Alternatively, if you really want to stick with Neural Networks, you can use Autoencoders. They can be applied to outlier detection, but it's not the simplest model to implement.

Is it meaningful to use word2vec for non-string inputs like time series analysis?

About