What is the most effective unsupervised ML algorithm to use when outliers are present in data set?

I am analyzing a portfolio of about 225 stocks and have gotten data for each of them based on their "Price/Earnings ratio", "Return on Assets", and "Earnings per share growth". I would like to cluster these stocks based on their attributes into 3 or 4 groups. However, there are substantial outliers in the data set. Instead of removing them altogether I would like to keep them in. What ML algorithm would be best suited for this? I have been told that K Means would not work so well since the outliers would skew the centroids of a particular cluster. Any and all thoughts welcome!

Topic unsupervised-learning outlier algorithms machine-learning

Category Data Science


DBSCAN is a density-based clustering method that is designed to apply to cases with noise. The user controls the minimum cluster size, which hopefully can be informed by the problem, and clusters that are smaller than this are ignored as noise.


You could try a hierarchical clustering approach. As an example, K clusters could initially be found for the data points. Then, for each of the K clusters, an arbitrary number of clusters could be found from the data points within the cluster to further refine the clustering.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.