Project/scale a set of 2D points to keep a set of similarities constraints

I have a problem similar to this one posted here MDS scikit-learn example. I have a set of similarities between 2D points that I want to place in a map/plane while keeping this similarities as best as I can. The difference is that I want to keep some of this points fixed while optimizing for the others. So following the MDS example provided, say I want to keep 2 specific points from the similarities variable held constant. How should I …
Category: Data Science

Hyperbolic coordinates (Poincaré embeddings) as the output of a neural network

I'm trying to build a Deep Learning predictor that takes as the input a set of word vectors (in Euclidian space) and outputs Poincaré embeddings. So far I am not having much luck, because model predicts arbitrary points in the n-dimensional real space, not the hyperbolic space. This causes the distance, and thus the loss function to be undefined. Therefore I need to restrict the output of the model somehow. I have tried several things. First was defining the loss …
Category: Data Science

Low dimensional manifold in a high dimensional space and Geodesic distance

It is a common assumption that high-dimensional objects are lying in low-dimensional manifolds. And this constitutes a foundation for manifold learning or dimensional reduction techniques or (a way to beat the curse of dimensionality). My question is that assuming this is valid, how one can utilize this assumption in doing something such as manifold learning? I think the general goal is to find a nonlinear representation of this high-dimensional objective using a small degree of freedom. However, we know neither …
Category: Data Science

Can an Isomap be embedded in a manifold of higher dimension than the corresponding MDS?

I am using the Isomap algorithm to operate a dimension reduction on a distance matrix $M_{dist}$. For a given choice of nearest neighbors k to compute the geodesic distance, I use the following method to determine the dimensionality of the corresponding manifold: I compute a distance matrix of D-simensional gaussian vectors, with the same average square distance as $M_{dist}$, which I call $M_{rand}$. I then use a loop to compute the reconstruction error $e_{dist}$ of the Isomap fitted on $M_{dist}$ …
Category: Data Science

Dimension of the manifold on which my data sits

Suppose that I have data points, in the form of vectors with binary entries. We create a metric space, or Vietoris-Rips complex, using the Hamming distance between the data points. I would like to imagine that my data points naturally sit on some manifold, which I want to somehow reconstruct (at least, its dimension). What are the tools/techniques/references/keywords for such data analysis? Thank you!
Category: Data Science

Can I apply Clustering algorithms to the result of Manifold Visualization Methods?

Some methods related to manifold-learning are commonly stated as good-for-visualization, such as T-SNE and self-organizing-maps (SOM). I understand that when referring specifically to "visualization" means that the non-linear dimensionality reduction can provide good insights of data in its low-dimensional projection, but that most commonly this low-dimensional projection cannot be used in machine learning algorithms, since some of information of the high-dimensional structure is lost (roughly). However, and here the question, If "clusters" are being observed in the visualization is it …
Category: Data Science

Can I use manifold learning to transform the feature set as a substitute of graph kernel of SVC

I just wonder since the manifold learning under scikit-learn has component of graph-based transformation (e.g. Shortest-path graph search under Isomap) I can then transform the feature data set (i.e. measurements of a chemical physical attributes to decide its toxicity) to something that will measure the graph-based measurements.
Category: Data Science

Difference between MDS and other manifold learning algorithms

From sklearn docs: Note that the purpose of the MDS is to find a low-dimensional representation of the data (here 2D) in which the distances respect well the distances in the original high-dimensional space, unlike other manifold-learning algorithms, it does not seeks an isotropic representation of the data in the low-dimensional space. Can someone elaborate, in layman's terms, what the distinction is?
Category: Data Science

Can closer points be considered more similar in T-SNE visualization?

I understand from Hinton's paper that T-SNE does a good job in keeping local similarities and a decent job in preserving global structure (clusterization). However I'm not clear if points appearing closer in a 2D t-sne visualization can be assumed as "more-similar" data-points. I'm using data with 25 features. As an example, observing the image below, can I assume that blue datapoints are more similar to green ones, specifically to the biggest green-points cluster?. Or, asking differently, is it ok …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.