Reduce number of vectors in dataset to achieve the "same average dimensions result"?
Edit for re-opening the question, I'll try to answer questions made by @user2974951:
I have a large user preference statistics for trichotomic data sets. You can visualize each data trio as a 3D vector with X, Y and Z values. All vectors complies to X + Y + Z = 1
because of the trichotomous shape of the data I'm using. It can also be visualized as a points in an equilateral triangle.
I have many tests, each with a large set of 3D vectors (features).
Simply averaging all components from X1 to Xn:
X = (X1+X2+X3+...Xn)/n
Y = (Y1+Y2+Y3+...Yn)/n
Z = (Z1+Z2+Z3+...Zn)/n
This gives me some averaged X,Y,Z
for each test
feature1 feature2 feature3 Average
test1- X11,Y11,Z11 X12,Y12,z12 X13,Y13,Z13 ... - X1,Y1,Z1
test2- X21,Y21,Z21 X22,Y22,Z22 X23,Y23,Z23 ... - X2,Y2,Z2
test3- X31,Y31,Z31 X32,Y32,Z32 X33,Y33,Z33 ... - X3,Y3,Z3
What I want now is to reduce the number features to get a similar result to some defined extent. I want reduce, for example, the features by half, only picking those features that would get the most similar average.
(It would be too much to ask the same if I would want to also keep the most similar standard deviation possible or a combination of both?)
So, how could I select best features?
Some PYTHON EXAMPLE would be amazing
Sorry if I'm asking using the wrong words. I'm new to machine learning.
UPDATE: Some sample data
V0 V1 V2 V3 V4 [...]
T1 0.5,0,0.5 1,0,0 0.5,0.5,0 0.16,0,0.84 0,0,1 [...]
T2 0.57,0.11,0.32 0.53,0.15,0.32 0.24,0.51,0.24 0.18,0.15,0.67 0.54,0.15,0.31 [...]
T3 0,0.17,0.83 0.57,0.03,0.4 0.31,0.4,0.29 0.04,0.3,0.66 0.07,0.05,0.87 [...]
T4 0.1,0.43,0.47 0.81,0,0.19 0.25,0,0.75 0,0.21,0.79 0.43,0.19,0.38 [...]
T5 0,1,0 0.99,0.01,0 0.21,0.58,0.21 0,0.61,0.39 0.5,0,0.5 [...]
T7 0.29,0.37,0.34 0.53,0.36,0.11 0.27,0.48,0.25 0.13,0.47,0.4 0.28,0.56,0.16 [...]
T8 0.82,0.15,0.03 0.43,0.38,0.19 0.47,0.31,0.22 0.2,0.22,0.58 0.35,0.33,0.33 [...]
T9 0.29,0.22,0.49 0.35,0.32,0.33 0.3,0.4,0.3 0.28,0.36,0.36 0.33,0.34,0.34 [...]
Each Tn averages all Vn of each dimension X, Y or Z.
I want to reduce the number or Vn to calculate the average, selecting the most relevant Vn according to my data to get similar average of each T. Each vector V comes from a sensor and I want to reduce the number of sensors to get the same average to some approximation or to some limited number of sensors and calculate the difference. I have like +2000 tests T each with +100 vectors V.
So the Target of the Feature Selection is the average of all Features
Topic feature-reduction ranking feature-selection python machine-learning
Category Data Science