Reduce number of vectors in dataset to achieve the "same average dimensions result"?

Question

Reduce number of vectors in dataset to achieve the "same average dimensions result"?

Sanxofon

2021年1月26日 06:37

Edit for re-opening the question, I'll try to answer questions made by @user2974951:

I have a large user preference statistics for trichotomic data sets. You can visualize each data trio as a 3D vector with X, Y and Z values. All vectors complies to X + Y + Z = 1 because of the trichotomous shape of the data I'm using. It can also be visualized as a points in an equilateral triangle.

I have many tests, each with a large set of 3D vectors (features).

Simply averaging all components from X1 to Xn:

X = (X1+X2+X3+...Xn)/n
Y = (Y1+Y2+Y3+...Yn)/n
Z = (Z1+Z2+Z3+...Zn)/n

This gives me some averaged X,Y,Z for each test

           feature1     feature2     feature3               Average
test1-   X11,Y11,Z11  X12,Y12,z12  X13,Y13,Z13  ...  -    X1,Y1,Z1
test2-   X21,Y21,Z21  X22,Y22,Z22  X23,Y23,Z23  ...  -    X2,Y2,Z2
test3-   X31,Y31,Z31  X32,Y32,Z32  X33,Y33,Z33  ...  -    X3,Y3,Z3

What I want now is to reduce the number features to get a similar result to some defined extent. I want reduce, for example, the features by half, only picking those features that would get the most similar average.

(It would be too much to ask the same if I would want to also keep the most similar standard deviation possible or a combination of both?)

So, how could I select best features?

Some PYTHON EXAMPLE would be amazing

Sorry if I'm asking using the wrong words. I'm new to machine learning.

UPDATE: Some sample data

      V0                   V1                   V2                   V3                   V4                   [...]
T1    0.5,0,0.5            1,0,0                0.5,0.5,0            0.16,0,0.84          0,0,1                [...]
T2    0.57,0.11,0.32       0.53,0.15,0.32       0.24,0.51,0.24       0.18,0.15,0.67       0.54,0.15,0.31       [...]
T3    0,0.17,0.83          0.57,0.03,0.4        0.31,0.4,0.29        0.04,0.3,0.66        0.07,0.05,0.87       [...]
T4    0.1,0.43,0.47        0.81,0,0.19          0.25,0,0.75          0,0.21,0.79          0.43,0.19,0.38       [...]
T5    0,1,0                0.99,0.01,0          0.21,0.58,0.21       0,0.61,0.39          0.5,0,0.5            [...]
T7    0.29,0.37,0.34       0.53,0.36,0.11       0.27,0.48,0.25       0.13,0.47,0.4        0.28,0.56,0.16       [...]
T8    0.82,0.15,0.03       0.43,0.38,0.19       0.47,0.31,0.22       0.2,0.22,0.58        0.35,0.33,0.33       [...]
T9    0.29,0.22,0.49       0.35,0.32,0.33       0.3,0.4,0.3          0.28,0.36,0.36       0.33,0.34,0.34       [...]

Each Tn averages all Vn of each dimension X, Y or Z.

I want to reduce the number or Vn to calculate the average, selecting the most relevant Vn according to my data to get similar average of each T. Each vector V comes from a sensor and I want to reduce the number of sensors to get the same average to some approximation or to some limited number of sensors and calculate the difference. I have like +2000 tests T each with +100 vectors V.

So the Target of the Feature Selection is the average of all Features

Topic feature-reduction ranking feature-selection python machine-learning

Category Data Science

Reduce number of vectors in dataset to achieve the "same average dimensions result"?

About