Clustering Algorithm + Euclidean Distance to find similarities
Goal: Create a tool that recommends similar players based on their statistical profile
Process: (1) Standardize data (2) UMAP to reduce dimensionality (c. 50 features) (3) First-Stage Clustering: GMM to create macro clusters of players (4) Second-Stage Clustering: GMM to create micro clusters of each macro cluster with different features based on their position (e.g. only 10/50 that are relevant) (5) Calculate Euclidean Distance using PCA (UMAP led to weird results)
Question: How good/reasoanble is this approach on a scale from 1-10 (10=best)? Are there any downsides to my approach that I'm not considering?
Topic preprocessing python dimensionality-reduction machine-learning
Category Data Science