Clustering Algorithm + Euclidean Distance to find similarities

Goal: Create a tool that recommends similar players based on their statistical profile

Process: (1) Standardize data (2) UMAP to reduce dimensionality (c. 50 features) (3) First-Stage Clustering: GMM to create macro clusters of players (4) Second-Stage Clustering: GMM to create micro clusters of each macro cluster with different features based on their position (e.g. only 10/50 that are relevant) (5) Calculate Euclidean Distance using PCA (UMAP led to weird results)

Question: How good/reasoanble is this approach on a scale from 1-10 (10=best)? Are there any downsides to my approach that I'm not considering?

Topic preprocessing python dimensionality-reduction machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.