Best classification technique for following kind of data set
I have a large table where each record or row represents a single salesperson, and there are 50 columns or dimensions where each column represents one of 50 products potentially sold by any given salesperson, with one final column representing their total compensation as a percentile of their salesperson peers.
The values within each column range from 0 to 100, reflective of the salesperson's percentile performance in sales for that product, and then in the final column, percentile in total compensation.
e.g.
prodA | prodB | prodC | prodD | ... | prodAW | prodAX | comp
0 | 93 | 0 | 87 | ... | 73 | 0 | 88
100 | 0 | 44 | 0 | ... | 99 | 63 | 67
88 | 91 | 0 | 89 | ... | 85 | 88 | 24
...
assume there are at least 50K records (i.e. 50K salespeople)
If the value in any of the 50 columns equals 0, it means the salesperson didn't sell any of the given product. Using the last row in the example table above
that salesperson may have been 88th percentile in terms of sales of product A, and 91st percentile in sales of product B, they are only 24th percentile in terms of total salesperson compensation (see last column).
What approach would be best to categorize and classify natural groupings of salespeople? Would I use something like a k-means clustering?
TL;DR: What is the best approach for classifying/clustering/categorizing salespeople into appropriate groups?
Topic pca hierarchical-data-format classification k-means clustering
Category Data Science