features

Create features for each row or only for a specific value

Test

2022年5月30日 08:43

I have a problem. I want to predict when the customer will place another order in how many days if an order comes in. I have already created my target variable next_day_in_days. This specifies in how many days the customer will place an order again. And I would like to predict this. Since I have too few features, I want to do feature engineering. I would like to specify how many orders the customer has placed in the last 90 …

Topic: features feature-engineering regression machine-learning

Category: Data Science

Reverse engineer PII sensitive data from Inceptionv3 pre-trained model generated features

GM1313

2022年5月23日 15:55

I'm using the pre-trained Inceptionv3 to build out features from proprietary documents. Some of these documents contain sensitive PII data. I use the 2K output from the second last layer as the feature vector. My question is if a set (say 2000) of these 2K generated features are available to someone, can they be used to reverse engineer the sensitive data like SSN, date of birth, etc. My thinking is since the Inceptionv3 was never trained with these proprietary documents, …

Topic: features machine-learning

Category: Data Science

How to handle a feature vector that could be variable length?

Crazy9

2022年5月16日 07:04

I would like to train a machine learning model with several features as input as X[] and with one output as Y. For example Every sample has a Data frame like this: X[0], X[1], X[2], X[3], X[4], Y Let's say One sample the followings Data is only one value: X[0], X[1], X[2], X[4], Y This is normal machine training problem. But now, if I would like to set X[3] multiple values for example sample 1 Data is: X[0] | X[1] …

Topic: features feature-engineering feature-construction

Category: Data Science

Hard time finding literature on feature clustering using Principal Component Analysis

aryan

2022年5月11日 12:23

Im new to StackExchange, so i am sorry if this is not the right way to ask a question on StackExhange. For my thesis I wish to propose a methode for future research on using PCA to cluster features (feature clustering) and then apply per-cluster PCA. I got the idea from this paper: this paper. But I have a hard time finding literature about PCA being used to cluster variables (not reduce variables). I could imagine that it is not …

Topic: features pca clustering

Category: Data Science

LSTM for binary classification using multiple attributes

bill

2022年5月5日 14:10

I haven't used neural networks for many years, so excuse my ignorance. I was wondering what is the most appropriate way to train a LSTM model based on my dataset. I have 3 attributes as follows: Attribute 1: small int e.g., [123, 321, ...] Attribute 2: text sequence ['cgtaatta', 'ggcctaaat', ... ] Attribute 3: text sequence ['ttga', 'gattcgtt', ... ] Class label: binary [0, 1, ...] The length of each sample's attributes (2 or 3) is arbitrary; therefore I do …

Topic: features lstm deep-learning

Category: Data Science

How to build multiple variable regression having a mix of numerical & categorical features?

Артём Ощепков

2022年4月26日 22:02

There is a need to estimate Annual Average Daily Traffic Volume (AADT). We have bunch of data about vehicles' speeds during several years. It is noticed that AADT depends on the average number of such samples during some time, so a regression model $Y = f(x_1)$ could help estimating the AADT. The problem is there are other features affecting the dependency which are both numerical $(x_2, .., x_k)$ and categorical $(c_1 = data\ provider, c_2 = road\ class, .., c_m)$. …

Topic: multivariate-distribution features regression categorical-data

Category: Data Science

Relation between Features & Polynomial Equations in Machine Learning

Apoorva

2022年4月26日 18:36

In Machine Learning, if the data we are working on has, say, 6 features/variables, does that mean the prediction line/curve of our ML model is represented by a Hexic polynomial equation whose degree is 6? In short, is the degree of our prediction line/curve the same as the number of features in our data?

Topic: features machine-learning

Category: Data Science

Feature engineering before splitting

Hing

2022年4月12日 12:49

This is a sister post to the original closed post (here). Since the data transformation part is done after data spliting on the TRAINING data only, I wonder wouldn't such transformation has dependency with how we subsample our data? We can have different transformation results when we pick different portion of training data. But I personally find it hard to convince myself that: isn't data transformation should be as invariant and generalizable as possible, across different subsamplings of dataset? Also, …

Topic: transformation features feature-engineering feature-selection

Category: Data Science

Is there a multi-modal population based metaheuristic that is non-GA?

pauper

2022年3月27日 03:58

I have a feature set from which I want to select various combinations and permutations of the features. The length of a solution feature vector can range between , say 5 - 20 features , and the ordering of the features are important , meaning that feature vector ABC is different from BCA i.e they are sequential and depends on each others output. The goal is to find many near optimal solutions around optimal solutions and the solution space is …

Topic: metaheuristics features genetic-algorithms optimization

Category: Data Science

Is there a way to combine multiple ML models where each use datasets with different features?

Bruce

2022年3月26日 17:55

I have a dataset where some features (c,d) apply to only when a feature (a) is a specific value. For example a, b, c, d T, 60, 0x018, 3252002711 U, 167, , U, 67, , T, 66, 0x018, 15556 So I'm planning to splitting the dataset so that there are no missing values. a, b, c, d T, 60, 0x018, 3252002711 T, 66, 0x018, 15556 a, b U, 167 U, 67 and then put these into individual models which combine …

Topic: features ensemble-modeling dataset machine-learning

Category: Data Science

train-test split on forecasting a time series using external features

tsjm

2022年2月15日 07:48

I have a question regarding the train-test split when forecasting a timeseries using features instead of the time series itself. I know that I should use a time-based train-test-split if i use lagged values of the time series to predict, but I am wondering if that is the case also if I use an external feature. Suppose I try to forecast the watermelon consumption using only the temprature (X feature) instead of using the time series regarding the watermelon. Leaving …

Topic: features time-series

Category: Data Science

Finding attributes that make up dense clusters of fraudulent transactions

what-a-snarky-puppy

2022年1月8日 20:45

I have data about purchases customers made in my website. Some users later decline the purchase, a scenario I'd like to avoid. I have lots of data about the purchases made in my website, so I'd like to find clusters of users who share similar attributes, and are dense in "decliners" type of users. I have labelled data about those users (as we know who later declined the payment). The problem is, How do I cluster them in a meaningful …

Topic: features clustering

Category: Data Science

Feature Map setup for Faster RCNN with resnet50 backbone

einsteinxx

2021年12月17日 09:02

I'm trying to get an activation map using a Faster RCNN Resnet50 backbone, but am having issues getting the proper hook setup for output information. Most of the libraries, like gradcam, don't seem to have built-in support for faster rcnn setups. I think the flow for Faster RCNN requires something extra, but am unable to figure out what I need to hook into the model. Layer 4 is what I've concentrated on, as it's called out in numerous tutorials (which …

Topic: features faster-rcnn pytorch

Category: Data Science

vertical or horizontal storage of timesteps in feature store

seb2704

2021年11月29日 15:55

I'd like to use a feature store to store some time series and I asked myself what's the best way to store the timesteps. Is it better to store each timestep horizontal and then doing windowing after collecting it from the feature store to create the feature vector. Or is it better to store all timestep addiotionally in a column and doing the windowing before storing it to the feature store. Personally I think the better way is, to do …

Topic: features feature-engineering feature-selection

Category: Data Science

How can I assess feature importance when determining whether a missing data is MCAR or not?

embedded_dev

2021年10月31日 17:30

I was reading some lecture notes on missing data and the author suggests the following approach to determine whether some varibale is missing completely at random (MCAR) or not: Supervised Learning method: Code ‘missing’ as a new category. Run a supervised analysis (to predict a separate target variable) and check if ‘missing’ has an effect on the prediction of the response in the learned model. If the category ‘missing’ has an effect, this is evidence that data is not MCAR. …

Topic: features missing-data supervised-learning

Category: Data Science

Query regarding the 'Data type' of features in Machine Learning

Apoorva

2021年10月22日 21:02

Should all the features in a dataset be converted to the same data type? For instance, if all the features have numerical values, some int & some float, should they all be converted to float? What difference would this conversion make?

Topic: features preprocessing dataset machine-learning

Category: Data Science

How to insert two features in a model when a feature only applies to a certain group in the model

Caldass_

2021年8月8日 14:09

I'm building a machine learning model in Python to predict soccer player values. Consider the following feature columns of the dataframe: [features] --------------------------------- position | goals | goals_conceded -------- |-------|--------------- Forward | 23 | NaN Defender | 2 | NaN Defender | 4 | NaN Keeper | NaN | 20 Keeper | NaN | 43 Since keepers don't usually score goals, they'll almost always have null values in the "goals" column, but they still can have this statistic, so it …

Topic: features machine-learning-model prediction machine-learning

Category: Data Science

If a categorical feature only occurs a few times in a data set, should I drop it?

dawndance

2021年8月6日 09:33

I have a data set of mostly categorical variables. When I one-hot encoded them some of the features occur less than 3% of the time. For instance the Tech-support feature only occurs 928 times in a data set with 32561 samples ie. it only occurs 2.9% of the time. Is there a general cutoff point for when I should scrap these variables? I'm cleaning up this data set for binary logistic regression and an SVM. Thank you!

Topic: features one-hot-encoding logistic-regression svm

Category: Data Science

Training & Test feature shape is different from number of columns in dataset

Victor Melvin

2021年6月24日 20:18

I am making a Sequential Neural Network for regression with 3 dense layers which will be trained on a simple dataset. But before I even get to that part of the code to execute the model I am getting a different shape of my features than columns in dataset. Columns of the dataset includes: one categorical "Name" column which is one-hot encoded 2)the other 20 columns are integers/floats I have 21 features in my dataset. ValueError is telling me it …

Topic: features keras feature-engineering dataset data-cleaning

Category: Data Science

combine two features into one

DOT

2021年6月14日 12:59

In an epidemic disease dataset of 3 months, I have a feature (var dt_died) with the death dates of patients (800 people died out of all 12k unique subjects in this dataset, so obviously only dead subjects have data for this feature). I also have a feature that indicates the (var dt_test_positive) date of testing positive for the disease (with no missing values). I would like to combine these two features into one (var difference). If I just make the …

Topic: features machine-learning

Category: Data Science

About