I have a problem similar to this one posted here MDS scikit-learn example. I have a set of similarities between 2D points that I want to place in a map/plane while keeping this similarities as best as I can. The difference is that I want to keep some of this points fixed while optimizing for the others. So following the MDS example provided, say I want to keep 2 specific points from the similarities variable held constant. How should I …
I'm trying to build a Deep Learning predictor that takes as the input a set of word vectors (in Euclidian space) and outputs Poincaré embeddings. So far I am not having much luck, because model predicts arbitrary points in the n-dimensional real space, not the hyperbolic space. This causes the distance, and thus the loss function to be undefined. Therefore I need to restrict the output of the model somehow. I have tried several things. First was defining the loss …
I use manifold to analyze image and want to do image segmentation. So my idea is to first use manifold to downsize the dimensions to 2D. In 2D, I can do cluster and label, but how can I put this label in raw image data.
It is a common assumption that high-dimensional objects are lying in low-dimensional manifolds. And this constitutes a foundation for manifold learning or dimensional reduction techniques or (a way to beat the curse of dimensionality). My question is that assuming this is valid, how one can utilize this assumption in doing something such as manifold learning? I think the general goal is to find a nonlinear representation of this high-dimensional objective using a small degree of freedom. However, we know neither …
I am using the Isomap algorithm to operate a dimension reduction on a distance matrix $M_{dist}$. For a given choice of nearest neighbors k to compute the geodesic distance, I use the following method to determine the dimensionality of the corresponding manifold: I compute a distance matrix of D-simensional gaussian vectors, with the same average square distance as $M_{dist}$, which I call $M_{rand}$. I then use a loop to compute the reconstruction error $e_{dist}$ of the Isomap fitted on $M_{dist}$ …
Can anyone explain the meaning of this line: "Deep networks have been shown to learn representations in which interpolations between embedding pairs tend to be near the data manifold". Reference: Section 4.3 of the paper Generative Adversarial Text to Image Synthesis
Suppose that I have data points, in the form of vectors with binary entries. We create a metric space, or Vietoris-Rips complex, using the Hamming distance between the data points. I would like to imagine that my data points naturally sit on some manifold, which I want to somehow reconstruct (at least, its dimension). What are the tools/techniques/references/keywords for such data analysis? Thank you!
Some methods related to manifold-learning are commonly stated as good-for-visualization, such as T-SNE and self-organizing-maps (SOM). I understand that when referring specifically to "visualization" means that the non-linear dimensionality reduction can provide good insights of data in its low-dimensional projection, but that most commonly this low-dimensional projection cannot be used in machine learning algorithms, since some of information of the high-dimensional structure is lost (roughly). However, and here the question, If "clusters" are being observed in the visualization is it …
I just wonder since the manifold learning under scikit-learn has component of graph-based transformation (e.g. Shortest-path graph search under Isomap) I can then transform the feature data set (i.e. measurements of a chemical physical attributes to decide its toxicity) to something that will measure the graph-based measurements.
From sklearn docs: Note that the purpose of the MDS is to find a low-dimensional representation of the data (here 2D) in which the distances respect well the distances in the original high-dimensional space, unlike other manifold-learning algorithms, it does not seeks an isotropic representation of the data in the low-dimensional space. Can someone elaborate, in layman's terms, what the distinction is?
I understand from Hinton's paper that T-SNE does a good job in keeping local similarities and a decent job in preserving global structure (clusterization). However I'm not clear if points appearing closer in a 2D t-sne visualization can be assumed as "more-similar" data-points. I'm using data with 25 features. As an example, observing the image below, can I assume that blue datapoints are more similar to green ones, specifically to the biggest green-points cluster?. Or, asking differently, is it ok …