Which algorithm can be used to reduce dimension of multiple time series?

In my dataset, a data point is essentially a Time series of 6 feature over a year per month so in all, it results in 6*12=72 features. I need to find class outliers so I perform dimensionality reduction hoping the difference in data is maintained and then apply k-means clustering and compute distance.

For dimensionality reduction I have tried PCA and simple autoencoder to reduce dimension from 72 to 6 but results are unsatisfactory.

Can anyone please suggest any other way to reduce dimension of this type of data.

Topic pytorch pca autoencoder python dimensionality-reduction

Category Data Science


You can apply the Wrapper Sequential Feature Selection (SFS) algorithm. SFS is a family of greedy search algorithms that are used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k < d. The idea of the algorithm is to run any classification algorithm (parameter) based on k_features (parameter). It will select those feature gives the highest accuracy for corresponding algorithm.

clf = svm.SVC()

# Build step forward feature selection
sfs1 = sfs(clf,
           k_features=15,
           forward=True,
           floating=False,
           verbose=2,
           scoring='accuracy',
           cv=5)

# Perform SFFS
sfs1 = sfs1.fit(X_Train, Y_Train)

feat_cols = list(sfs1.k_feature_idx_)
print(feat_cols)

One downside is- you have to fix the desired number of features, the algorithm will give you which those features better fits in terms of accuracy. For your scenario, you have 72 features. Let's say, you want to use only 15 features but you don't know which 15 features among these 72 features. SFS will tell you the best selection. Here, in the above code, feat_cols is the list of those 15 features.


So, to be clear, you observe $N$ samples with $P$ measurements at $T$ times (where $P = 6$ and $T = 12$).

I would first determine whether or not you expect the joint distribution of your $P$ features to change over time. For example, if these features are $X$, does $\mathcal{P}(X) = \mathcal{P}(X | T)$ for all values of $T$? If so, you could do something like stack your data over time (so, you'd have $NT$ samples of $P$ features) and perform PCA, then use the top $k$ loadings to rotate the measurements at individual timepoints, leaving you with $N$ $k$-dimensional time series of length $T$. You could choose $k = 1$ if necessary, and use it to visually identify outliers or set a simple cutoff. If not, you'd want to be more careful with this technique - especially if the covariance changes. If it's the variance that's changing but correlation is remaining the same, perhaps some procedures can be done (e.g. perform PCA on the correlation matrix/standardized data) and this technique would still work.

What form do you expect your outliers to take? If it's a temporary deviance from the rest of the sample (e.g. only at one or two measurement times), your technique of PCA on the concatenated data will not be very sensitive. The technique I just described would be more sensitive.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.