feature-reduction

Using PCA as features for production

Humpalum Druf

2022年6月1日 04:04

I struggle with figuring out how to proceed with taking PCA into production in order to test my Models with unknown samples. I'm using both an One-Hot-Encoding an an TF-IDF in order to classify my elements with various models, mainly KNN. I know i can use the pretrained One-Hot-Encoder and the TF-IDF encoder in order to encode the new elements in order to match the final feature Vector. Since these feature vectors become very large i use an PCA in …

Topic: feature-reduction pca scikit-learn feature-selection

Category: Data Science

Heuristics, methods to speed up searches over subsets of big set (combinatorially NP hard probably)

Intelligent-Infrastructure

2021年4月6日 13:27

I have a reasonable-sized set of size N (say 10 000 objects) in which I am searching for groups of compatible elements. Meaning that I have a function y = f(x_1, x_2, x_3, ..., x_n) returning bool 0/1 answer whether n elements are compatible. We are interested in executing this search on each subset smaller than 8 elements of N. Which is obviously either NP hard or close to this. Even for pairwise search for n elements set we have …

Topic: feature-reduction grid-search dimensionality-reduction machine-learning

Category: Data Science

Reduce number of vectors in dataset to achieve the "same average dimensions result"?

Sanxofon

2021年1月26日 06:37

Edit for re-opening the question, I'll try to answer questions made by @user2974951: I have a large user preference statistics for trichotomic data sets. You can visualize each data trio as a 3D vector with X, Y and Z values. All vectors complies to X + Y + Z = 1 because of the trichotomous shape of the data I'm using. It can also be visualized as a points in an equilateral triangle. I have many tests, each with a …

Topic: feature-reduction ranking feature-selection python machine-learning

Category: Data Science

Feature reduction by removing certain columns in dataframe

adikh

2020年12月16日 08:03

I am working with the Emotion recognition model with the IEMOCAP dataset. For the feature extraction, I am taking mel-spectrogram and then convert it into a NumPy array and converting the array into a data frame of spectrogram features. The generated dataframe has a shape of 2380 rows X 11761 columns like 0 1 2 3 4 5 6 7 ... 11754 11755 11756 11757 11758 11759 11760 11761 262 0.036491 0.037793 0.041035 0.044644 0.047210 0.048467 0.049556 0.052137 ... 0.0 …

Topic: feature-reduction feature-engineering

Category: Data Science

How should I encode 'dynamic' features (with multiple instances) along with 'static' features (single instances)?

Toma Dragos

2020年11月12日 06:51

Suppose I have to predict if a certain product from an assembly line in a factory will be a scrap. This product has let's say 'static' data like a certain shape. A certain vendor, etc. And, it can have 'dynamic' data this meaning it can have for example: one or more sets of measurements (pressures,temperatures ,etc) from production processes. How to treat this 'dynamic' features ? Somehow it doesn't seem right to repeat the 'static' data for all 'dynamic' events. …

Topic: feature-reduction feature-engineering deep-learning python machine-learning

Category: Data Science

Information compression for variable input size

Simon

2020年6月15日 18:38

Is there a way to compress information of a variable input size? Autoencoder requires standardized input sizes. Although I can add masks on the cost function and add dummy features to standardize input/output size, I am hesitant with the potential drawbacks. The input structures I am interested in are graphs and images. If input sizes and shapes vary too much, padding, resizing and rescaling do not work.

Topic: feature-reduction autoencoder machine-learning

Category: Data Science

Correlation Matrix for non-numeric features

ray mai

2019年11月23日 09:36

Currently, I have dataset with numeric as well non-numeric attributes. I am trying to remove the redundant features in the dataset using R Programming Languages. Note: Non-numeric attributes cannot be turned into binary. The Caret R package provides the findCorrelation which will analyze a correlation matrix of your data’s attributes report on attributes that can be removed. However, It only works numeric values of 'x'. I have been unable to find a package which does it for non-numeric attributes. Is …

Topic: feature-reduction r machine-learning

Category: Data Science

Does it make sense to randomly select features as a baseline?

Sterls

2019年5月13日 14:44

In my paper, I am saying that the accuracy of classification is $x\%$ when using the top N features. My supervisor thinks that we should capture the classification accuracy when using N randomly selected features to show that the initial feature selection technique makes an actual difference. Does this make sense? I've argued that no one cares about randomly selected features so this addition doesn't make sense. It's quite obvious that randomly selecting features will provide a worse classification accuracy …

Topic: feature-reduction feature-selection

Category: Data Science

Does it exist feature selection/reduction techniques in $O(n \cdot d)$?

KyBe

2018年9月23日 07:13

I'm curious to know if feature selection and/or feature reduction techniques exist, which are linear on number of data $n$ and on number of dimensions $d$. References and source code are very welcome.

Topic: feature-reduction feature-selection

Category: Data Science

About