regarding computing the centroid of high dimensional data

Question

regarding computing the centroid of high dimensional data

user297850

2022年4月23日 04:44

In scikit-learn, or other python libraries, are there any existing implementations to compute centroid for high dimensional data sets?

Topic multivariate-distribution anomaly-detection scikit-learn clustering machine-learning

Category Data Science

NewsGuy · Accepted Answer · 2022年4月23日 04:44

You could try using np.mean along the axis that you care about. Let's say you have 100 vectors of 1200 dimensions each, and you want a centroid vector of dimension 1200. Then the following code would work:

>>> import numpy as np
>>> data = np.random.rand(100, 1200)
>>> centroid = np.mean(data, axis=0)
>>> centroid.shape
(1200,)

Here's documentation for the function.

regarding computing the centroid of high dimensional data

About