tsne - Geeks Mental

Does t-SNE have to result in clear clusters / structures?

Ben

2022年5月11日 12:16

I have a data set which, no matter how I tune t-SNE, won't end in clearly separate clusters or even patterns and structures. Ultimately, it results in arbitrary distributed data points all over the plot with some more data points of the one class there and some of another one somewhere else. Is it up to t-SNE, me and/or the data? I'm using Rtsne(df_tsne , perplexity = 25 , max_iter = 1000000 , eta = 10 , check_duplicates = FALSE)

Topic: tsne r clustering

Category: Data Science

Would you ever chose t-SNE over UMAP?

Adrian Evensen

2022年4月29日 17:01

UMAP is both faster and captures the global structure better than t-SNE when visualizing high-dimensional data. Is there ever a situation where you would pick t-SNE over UMAP?

Topic: tsne visualization dimensionality-reduction

Category: Data Science

How do I calculate a similarity matrix with a Student-t kernel?

BioMatt

2022年4月27日 08:07

As the title says, how do I calculate a similarity matrix with an un-normalized Student-t kernel? I'm attempting to calculate Kullback-Leibler divergence for different t-SNE runs, but need a Q-matrix for that. A few steps before the Q-matrix, I need the similarity matrices made using the un-normalized Student-t kernel. I'm using r, not sure if that's relevant to an answer.

Topic: tsne visualization dataset similarity r

Category: Data Science

What Clustering Method Should I Use?

ChessGrandMaster

2022年4月20日 07:02

My data is a group of 10 thousand points (each having an node location (x,y)) that are spread across a plane. They are also chromatically-colored based on their weight. I need to finalize a bayesian nonparametric clustering method that groups points on mainly weight, but also on distance: that is, clusters are by defintion have some relevance to distance, but there are clear topological distinguishing factors between the first quarter and the last quarter of data (I say quarter as …

Topic: bayesian-nonparametric tsne python clustering

Category: Data Science

Is it always possible to get well-defined clusters from the data?

Oleg Ivanytskyi

2022年4月8日 22:53

I have TV watching data and I have been trying to cluster it to get different sets of watchers. My dataset consists of 64 features (such as total watching time, percent of ads skipped, movies vs. shows, etc.). All the variables are either numerical or binary. But no matter how I treat them (normalize them, standardized, leave them as is, take a subset of features, etc.), I always end up getting pictures similar to this: This particular picture was constructed …

Topic: tsne pca clustering machine-learning

Category: Data Science

t-SNE on extremely high-dimensional spaces

lars20070

2022年4月2日 19:14

I successfully applied t-SNE to the number handwriting dataset. n=3823 data points (i.e. handwritten numbers) in an D=64 dimensional space (i.e. 8x8 pixels). Worked great. Now I would like to cluster n≈60 data points in an D≈3000 dimensional space. Even after many iterations, t-SNE fairs far worse than say PCA. Is there an upper bound on the number of dimensions (relative to the number of data points) above which applying t-SNE is not adviced?

Topic: tsne pca

Category: Data Science

If in t-SNE digaram of binary classification both classes follow the similar curve what does t-SNE diagram show?

user10296606

2022年3月25日 17:02

If in t-SNE digaram of binary classification both classes follow the similar curve what does t-SNE diagram show for instance: Figure1 or Figure2

Topic: tsne binary classification dimensionality-reduction

Category: Data Science

Grouping already clustered data (with a pre-defined x and y)

Ângelo D

2022年3月8日 13:06

I have an already clustered data set (I wanna keep my x and y), where there's clearly a small group of elements in the middle that don't follow the expected patterns. I can select them manually, but I wonder if there's a way of automating the selection part of these elements, efficiently. Something like using just the grouping part of a clustering algorithm, I've been trying it with a threshold, but it doesn't produce good results in cases that won't …

Topic: tsne unsupervised-learning clustering

Category: Data Science

Visualizing outliers using T-SNE

Sarah Grimes

2022年3月6日 22:01

I'm trying to visualize outliers in my data using T-SNE and it seems like the outliers appear as three different clusters. The original data has 7 different columns but I chose to plot the outliers on a two dimensional graph. I expected the outliers to be clustered into one single group but I have three different clusters (red dots) on my graph. Is it normal to see different groups of outliers? For example, the red cluster on the far left …

Topic: tsne outlier machine-learning

Category: Data Science

Does make sense to use t-SNE and then applied HDBSCAN to cluster?

HenDoNR

2022年2月24日 10:00

I believe that the title is self-contained. Does make sense to use t-SNE and then applied HDBSCAN to cluster the data with dimensionality reduction?

Topic: tsne clustering

Category: Data Science

How to reduce position changes after dimensionality reduction?

dtrinh

2022年2月17日 22:05

Disclaimer: I'm a machine learning beginner. I'm working on visualizing high dimensional data (text as tdidf vectors) into the 2D-space. My goal is to label/modify those data points and recomputing their positions after the modification and updating the 2D-plot. The logic already works, but each iterative visualization is very different from the previous one even though only 1 out of 28.000 features in 1 data point changed. Some details about the project: ~1000 text documents/data points ~28.000 tfidf vector features …

Topic: tsne data visualization dimensionality-reduction

Category: Data Science

what information can we obtain from t-SNE?

Shrijit Basak

2022年1月20日 02:03

I see that t-SNE can help us reduce dimensions and visualize the data. But what information are we gaining from this visualization? As we know that the new axis don't have a meaning in our context. Moreover, if we have a class labeled data, then what information can we gain from the visualization? We already know that there are some 'n' classes and that we have to classify new examples in one of these classes. Or am I wrong to …

Topic: tsne supervised-learning

Category: Data Science

Can t-SNE be applied to visualize time series datasets

CuishleChen

2022年1月12日 14:33

I have multiple time-series datasets containing 9 IMU sensor features. Suppose I use the sliding window method to split all these data into samples with the sequence length of 100, i.e. the dimension of my dataset would be (number of samples,100,9). Now I want to visualize those splitted samples to find out the patterns inside. Can I treat it as tabular data and transform the original dimension firstly to (number of samples, 900), then apply t-sne method directly on that …

Topic: tsne visualization time-series dimensionality-reduction machine-learning

Category: Data Science

PCA vs t-SNE in asset pricing

Soumik Mukherjee

2021年12月17日 09:30

So I am trying dimensionality reduction techniques on the S&P500 FY2020 data. I understand the CAPM model and the fact that doing a PCA determines my market variability factor (the first PCA component). What I am wondering is, what intuitions (if any) does t-SNE give on the same data? Using scikit-learn I have embeddings for the first component, but does the embedding relate to CAPM in any way? Or for that matter any other asset pricing model? What I have …

Topic: tsne pca dimensionality-reduction

Category: Data Science

t-SNE - how variance is set and how it affects dense vs sparse clusters in HD space

ItDepends

2021年12月11日 04:49

When learning about t-SNE, I found a resource saying "width of the normal curve (a gaussian centered at $x_i$) depends on the density of data near the point of interest". Which is why we do the normalization with $\sum_{k\neq i} e^{(-||x_i - x_k||^2/2\sigma^2)}$ in $p_{j|i}= \frac { e^{(-||x_i - x_j||^2/2\sigma^2)}} {\sum_{k\neq i} e^{(-||x_i - x_k||^2/2\sigma^2)}}$. I know that the gaussian's width depends on the variance, ${\sigma}^2$. However there was no mention of calculating the variance and I read that variance …

Topic: gaussian tsne

Category: Data Science

Good classification, poor separation with TSNE/UMAP

tensormoby

2021年10月25日 18:09

I have been working on a classification problem for which I have been able to achieve good results across various classification metrics. I have been careful to ensure that I am not leaking information at any stage in my pipeline, but its always possible that I've missed something. Recently, I ran the data through TSNE and UMAP and found that my classes do not separate well at all. This is surprising given the success of my models. Is this discrepancy …

Topic: overfitting tsne classification machine-learning

Category: Data Science

TSNE interpreration and separability

Imaxd

2021年10月18日 21:04

I have a binary classification problem where I train a neural network on a training and validation data sets. But I am not satisfied with the performance of my trained classifier (the NN above). The loss function (a binary cross entropy) did not get lower than 0.1280 the on validation set and on the test set it is about 0.1340. I tried to somehow debug my data with a TSNE to visualize how "separable" my training data is. My question …

Topic: binary-classification machine-learning-model tsne

Category: Data Science

Is t-SNE good at clustering instances with the same trend?

mCalado

2021年10月1日 12:28

I have a dataset of time-series data with 50k examples and a length of 90, like the images showed below: I was wondering whether t-SNE or any type of dimensionality reduction could group the instances that I showed above based on the trend, for example.

Topic: tsne dimensionality-reduction

Category: Data Science

TSNE parameters

Jean

2021年9月18日 08:24

Trying to tune the parameters of sklearn.manifold.TSNE(n_components=2, *, perplexity=30.0, early_exaggeration=12.0, learning_rate=200.0, n_iter=1000, n_iter_without_progress=300, min_grad_norm=1e-07, metric='euclidean', init='random', verbose=0, random_state=None, method='barnes_hut', angle=0.5, n_jobs=None, square_distances='legacy') even I tried a number of combinations, the visual is not showing a clear seperation between two classes. Is there a way to tune tsne automatically or manually to find the best parameters?

Topic: tsne classification visualization dimensionality-reduction

Category: Data Science

Why it is recommended to use T SNE to reduce to 2-3 dims and not higher dim?

Boom

2021年9月15日 10:45

According to wiki it is recommenced to use T-SNE to map to 2-3 dimensional. I can understand this , if we want to visualizing the data. If we want to reduce the number of features (i.e from 30 features to 5 dims), is it recommended to do this with T-SNE ? or we should use other dimensional reduction algorithm ?

Topic: tsne visualization dimensionality-reduction machine-learning

Category: Data Science

About