t-SNE on extremely high-dimensional spaces

I successfully applied t-SNE to the number handwriting dataset. n=3823 data points (i.e. handwritten numbers) in an D=64 dimensional space (i.e. 8x8 pixels). Worked great.

Now I would like to cluster n≈60 data points in an D≈3000 dimensional space. Even after many iterations, t-SNE fairs far worse than say PCA.

Is there an upper bound on the number of dimensions (relative to the number of data points) above which applying t-SNE is not adviced?

Topic tsne pca

Category Data Science


TSNE is mainly used for used for visualising High Dimensional Data. It is not advisable to use TSNE for clustering as it preserves neither the density nor the distance. It just tries to ensure neighbours close in high dimension remain close in low dimension. But if you apply any density based or distance based clustering on the output it will not give you good results. It has been illustarted on lot of different datasets. ** Avoid Clustering on TSNE Output **

Now Regarding dimensionality, this paper clearly shows that it works better than other algorithms like ISOMAP even on Olivetti faces data set which has dimension of 92×112 = 10,304 pixels


There is no theoretical upper bound for t-SNE. However, pragmatically it will become increasingly computationally impracticable to reduce higher and higher dimensions to lower and lower dimensions. This is because t-SNE constructs a probability distribution over pairs of high-dimensional objects. In your problem, 60 data points in an 3,000 dimensional space are is more computationally intensive than 3,823 data points in 64 dimensions.

Additionally, t-SNE is a dimensionality reduction, not a clustering technique. You can directly cluster in the high-dimensional space.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.