feature-construction

Feature construction widget on Orange 3.13

Mao

2022年5月27日 09:02

I am working with Orange for my thesis with logs and core data; however, since I am a beginner, I am a little bit stuck with the feature construction widget. Ultimately, I would like to combine different features to compare them. What kind of information should I put in "Values" field with a categorical feature? If you have any examples on this it would be really appreciated (the ones from Orange did not help me).

Topic: feature-construction orange

Category: Data Science

Feature creation ideas for propensity models?

NAS

2022年5月24日 05:22

I'm working on a propensity model, predicting whether customers would buy or not. While doing exploratory data analysis, I found that customers have a buying pattern. Most customers repeat the purchase in a specified time interval. For example, some customers repeat purchases every four quarters, some every 8,12 etc. I have the purchase date for these customers. What is the most useful feature I can create to capture this pattern in the data?. I'm predicting whether in the next quarter …

Topic: feature-engineering feature-construction classification feature-selection machine-learning

Category: Data Science

How to handle a feature vector that could be variable length?

Crazy9

2022年5月16日 07:04

I would like to train a machine learning model with several features as input as X[] and with one output as Y. For example Every sample has a Data frame like this: X[0], X[1], X[2], X[3], X[4], Y Let's say One sample the followings Data is only one value: X[0], X[1], X[2], X[4], Y This is normal machine training problem. But now, if I would like to set X[3] multiple values for example sample 1 Data is: X[0] | X[1] …

Topic: features feature-engineering feature-construction

Category: Data Science

How to treat the undefined values which make sense?

qwertzuiop

2022年5月3日 17:01

I'm currently trying to create a few features to improve the performances of a model. One of those features that I would like to create corresponds to the difference in days between a customer's purcharse and his last one. To create this feature is not a problem. However, I don't know which value to set if this is the first purcharse of a customer. Which value should I set and, more generally, how to treat these cases ? customer_id date_purchase …

Topic: machine-learning-model feature-construction machine-learning

Category: Data Science

Are there any tools for feature engineering?

John

2022年4月30日 11:40

Specifically what I am looking for are tools with some functionality, which is specific to feature engineering. I would like to be able to easily smooth, visualize, fill gaps, etc. Something similar to MS Excel, but that has R as the underlying language instead of VB.

Topic: feature-construction feature-extraction feature-selection

Category: Data Science

Algorithms for casual feature selection for continuous Y

minattosama

2022年4月15日 10:06

Currently I have been trying to find some good algorithms for feature selection. Using correlation or other non casual type of method will not be the right way to do a feature selection. I'm am searching for aglorithms in python or libraries that use casual effects for feature selection. Currently there are only for binary outcomes, I'm searching for a regression problem so it must be continuous. "Causality-Guided Feature Selection"

Topic: feature-construction feature-extraction feature-selection python

Category: Data Science

How to represent a time duration feature for cases where time is still counting

MiguelL

2022年4月2日 19:40

I have a problem where I am trying to classify the outcome of costumer complaint cases. I have several features already such as type of item bought, reason for complaint etc... I am trying to add a feature that represents how long a case is 'open' (meaning waiting for resolution). The logic being that a case that is 'open' for long is unlikely to have a positive outcome. Issue is, I am traning my model on 'closed' cases, hence have …

Topic: feature-engineering feature-construction classification feature-selection machine-learning

Category: Data Science

Best way to represent a version feature based on percentiles

Gabriel Ballesteros

2022年2月25日 19:05

We're training a binary classifier in AutoML, and one of the features consist of browser versions. Currently these versions are provided "normalized" to the model, according to the percentile of the browser the current observation falls into. For example, if the percentiles of some specific browser versions are: percentile version p25 34 p50 45 p75 53 p99 70 then an observation with said browser and version=54 would be represented as: p25 p50 p75 p99 1 1 1 0 My question …

Topic: binary-classification google-cloud-platform automl feature-construction feature-extraction

Category: Data Science

How to model a supervised recommender system with varying data

Katatonia

2022年2月19日 07:02

Suppose there are 2000 movies and a company wants to recommend some movies (for example, at most 5 movies) to each visitor. The objective is to learn how to predict which movie will be selected if a specific set of movies is recommended. option-1 option-2 option-3 option-4 option-5 Selected-Movie 1. movie1 movie3 movie4 movie4 2. movie3 movie4 movie100 movie1000 movie1001 movie1001 3. movie4 movie5 movie34 movie34 Based on this data set, I want to learn when sample 1 is suggested …

Topic: feature-engineering feature-construction decision-trees recommender-system

Category: Data Science

List of feature engineering techniques

icm

2021年12月26日 16:32

Is there any resource with a list of feature engineering techniques? A mapping of type of data, model and feature engineering technique would be a gold mine.

Topic: feature-engineering feature-construction featurization feature-extraction feature-selection

Category: Data Science

How to add more weight to certain features?

ShengLi

2021年10月24日 05:02

I have extracted features from two types of signals. Prior to merging them to create one feature vector, I have computed an importance score of every feature within that type of signal. I would like to weight the features according to those scores. Would the best way to do this be by multiplying every feature with its score and then concatenate the features of both signals, and should the data be normalized again after multiplication? Or, is there a different …

Topic: feature-engineering feature-construction deep-learning

Category: Data Science

Deep learning / computer vision technique: aggregating many input images to a single representation of the features within

mluerig

2021年10月20日 21:39

I have a few thousand grayscale images, and I would like to generate a universal representation of the patterns within - a semantic/ordered composition of all features, so to speak. For instance, take 10000 images of a dog and draw the archetypical dog. Does this task have a technical name, and is there a method out there specifically for such purposes? I guess this similar to what happens during the training of a neural network. I just don't necessarily need …

Topic: feature-construction terminology computer-vision deep-learning

Category: Data Science

Representing user information

data101

2021年10月1日 18:02

I have a task of representing a users feature matrix , i have features like gender , age etc but I also have a multivalue feature called as "movies watched" which is essentially another table of movie names watched by that user with a numeric duration, the order of movies does not matter here. Also, movies watched can be from 20 movies to 300 movies. So what is the best way of representing this "movies watched" as a feature vector?

Topic: representation feature-engineering feature-construction

Category: Data Science

Combining Latitude/Longitude position into single feature

mainstringargs

2021年8月13日 23:11

I have been playing with two dimensional machine learning using pandas (trying to do something like this), and I would like to combine Lat/Long into a single numerical feature -- ideally in a linear fashion. Is there a "best practice" to do this?

Topic: feature-engineering feature-construction pandas python

Category: Data Science

Finding if an outcome is predictable

Bharathi A

2021年7月18日 13:39

Suppose we are asked to predict something given a set of features, how do we know if that target is actually predictable? That is, how do we know if there is actually some relation between the dependant and independent features or there are some patterns in the data which could be exploited by a machine learning algorithm? What if the target outcomes are just random? How do we test for this relationship before we start building ML/DL models?

Topic: feature-engineering feature-construction correlation data-mining machine-learning

Category: Data Science

How do you aggregate features of lists (pooling alternatives)?

janniks

2021年7月13日 14:27

Is it possible to reduce non-correlated multi-dimensional data over features to 1D data? A working option is pooling (mean/min/max) over an embedding vector (n samples of embeddings of m dimensions). E.g. converts many embeddings (n × m) to a list of means (1 × m). However, these all loose a lot of information (especially the relationships between features in single embeddings). This doesn't have to be a reduction (i.e. the resulting 1D vector can be larger than m). If it's …

Topic: pooling feature-engineering aggregation feature-construction dimensionality-reduction

Category: Data Science

How to model a 3D graph into a vector so that I can feed it into a classification algorithm?

Tarun Maganti

2021年7月7日 11:09

I have a 3D graph like below: Ref: google images It has 2 angles as X and Y and the Z axis is amplitude value (Each 3D graph is representing a pixel). I want to model this into some useful data structure like a graph or a vector considering some parameters extracted from the above 3D graph, so that I'll be able to feed it into a classification algorithm. But, I'm unable to extract all the local minimas/maximas, or slopes. …

Topic: feature-construction graphs feature-extraction feature-selection python

Category: Data Science

Tsallis entropy - advice needed regarding obtaining probability distribution

Little Code

2021年6月23日 21:55

As is always the way I stumbled across Tsallis entropy on SO whilst looking for something completely different. This soon lead me reading all sorts of interesting but terse academic papers. I am unfortunately a mere layman and I still have one big unsolved question. The key input to Tsallis entropy is a probability array. What I don't understand is how do you get it out of a time-series ? Allow me to give you a completely hypothetical example: I …

Topic: feature-engineering feature-construction time-series

Category: Data Science

Label Encode with pre defined classes

ravishankar

2021年6月9日 17:13

I have trained a model (Random Forest) and now I would like to use it to predict certain data on a particular day. I have a categorical column where there are some values (say a,b,c,d,e) over a period. Now on a particular day, only some of those values are there (say b,d). Now while making them to one-hot encoding, I am using LabelEncoder and the one-hot encoder. But, if I give that column for label encoding, it is labelling only …

Topic: labels feature-construction machine-learning

Category: Data Science

Regression with a feature which has its own depth

ffxx68

2021年3月26日 15:52

I'm relatively new to ML/Statistical Analysis, and I'm facing a dataset structured like this person_id, pay, task, hours 1, 560, A, 3 1, 560, B, 5 2, 650, A, 7 3, 520, C, 6 3, 520, A, 2 ... meaning person 1 is cumulatively paid 560 to perform task A 3 hrs and task B 5 hrs; person 2 paid 650 for task A 7 hrs; person 3 paid 520 for task C 6 hrs and A 2 hrs, etc. …

Topic: feature-construction regression feature-extraction feature-selection

Category: Data Science

About