Feature construction widget on Orange 3.13

I am working with Orange for my thesis with logs and core data; however, since I am a beginner, I am a little bit stuck with the feature construction widget. Ultimately, I would like to combine different features to compare them. What kind of information should I put in "Values" field with a categorical feature? If you have any examples on this it would be really appreciated (the ones from Orange did not help me).
Category: Data Science

Feature creation ideas for propensity models?

I'm working on a propensity model, predicting whether customers would buy or not. While doing exploratory data analysis, I found that customers have a buying pattern. Most customers repeat the purchase in a specified time interval. For example, some customers repeat purchases every four quarters, some every 8,12 etc. I have the purchase date for these customers. What is the most useful feature I can create to capture this pattern in the data?. I'm predicting whether in the next quarter …
Category: Data Science

How to handle a feature vector that could be variable length?

I would like to train a machine learning model with several features as input as X[] and with one output as Y. For example Every sample has a Data frame like this: X[0], X[1], X[2], X[3], X[4], Y Let's say One sample the followings Data is only one value: X[0], X[1], X[2], X[4], Y This is normal machine training problem. But now, if I would like to set X[3] multiple values for example sample 1 Data is: X[0] | X[1] …
Category: Data Science

How to treat the undefined values which make sense?

I'm currently trying to create a few features to improve the performances of a model. One of those features that I would like to create corresponds to the difference in days between a customer's purcharse and his last one. To create this feature is not a problem. However, I don't know which value to set if this is the first purcharse of a customer. Which value should I set and, more generally, how to treat these cases ? customer_id date_purchase …
Category: Data Science

Algorithms for casual feature selection for continuous Y

Currently I have been trying to find some good algorithms for feature selection. Using correlation or other non casual type of method will not be the right way to do a feature selection. I'm am searching for aglorithms in python or libraries that use casual effects for feature selection. Currently there are only for binary outcomes, I'm searching for a regression problem so it must be continuous. "Causality-Guided Feature Selection"
Category: Data Science

How to represent a time duration feature for cases where time is still counting

I have a problem where I am trying to classify the outcome of costumer complaint cases. I have several features already such as type of item bought, reason for complaint etc... I am trying to add a feature that represents how long a case is 'open' (meaning waiting for resolution). The logic being that a case that is 'open' for long is unlikely to have a positive outcome. Issue is, I am traning my model on 'closed' cases, hence have …
Category: Data Science

Best way to represent a version feature based on percentiles

We're training a binary classifier in AutoML, and one of the features consist of browser versions. Currently these versions are provided "normalized" to the model, according to the percentile of the browser the current observation falls into. For example, if the percentiles of some specific browser versions are: percentile version p25 34 p50 45 p75 53 p99 70 then an observation with said browser and version=54 would be represented as: p25 p50 p75 p99 1 1 1 0 My question …
Category: Data Science

How to model a supervised recommender system with varying data

Suppose there are 2000 movies and a company wants to recommend some movies (for example, at most 5 movies) to each visitor. The objective is to learn how to predict which movie will be selected if a specific set of movies is recommended. option-1 option-2 option-3 option-4 option-5 Selected-Movie 1. movie1 movie3 movie4 movie4 2. movie3 movie4 movie100 movie1000 movie1001 movie1001 3. movie4 movie5 movie34 movie34 Based on this data set, I want to learn when sample 1 is suggested …
Category: Data Science

How to add more weight to certain features?

I have extracted features from two types of signals. Prior to merging them to create one feature vector, I have computed an importance score of every feature within that type of signal. I would like to weight the features according to those scores. Would the best way to do this be by multiplying every feature with its score and then concatenate the features of both signals, and should the data be normalized again after multiplication? Or, is there a different …
Category: Data Science

Deep learning / computer vision technique: aggregating many input images to a single representation of the features within

I have a few thousand grayscale images, and I would like to generate a universal representation of the patterns within - a semantic/ordered composition of all features, so to speak. For instance, take 10000 images of a dog and draw the archetypical dog. Does this task have a technical name, and is there a method out there specifically for such purposes? I guess this similar to what happens during the training of a neural network. I just don't necessarily need …
Category: Data Science

Representing user information

I have a task of representing a users feature matrix , i have features like gender , age etc but I also have a multivalue feature called as "movies watched" which is essentially another table of movie names watched by that user with a numeric duration, the order of movies does not matter here. Also, movies watched can be from 20 movies to 300 movies. So what is the best way of representing this "movies watched" as a feature vector?
Category: Data Science

Finding if an outcome is predictable

Suppose we are asked to predict something given a set of features, how do we know if that target is actually predictable? That is, how do we know if there is actually some relation between the dependant and independent features or there are some patterns in the data which could be exploited by a machine learning algorithm? What if the target outcomes are just random? How do we test for this relationship before we start building ML/DL models?
Category: Data Science

How do you aggregate features of lists (pooling alternatives)?

Is it possible to reduce non-correlated multi-dimensional data over features to 1D data? A working option is pooling (mean/min/max) over an embedding vector (n samples of embeddings of m dimensions). E.g. converts many embeddings (n × m) to a list of means (1 × m). However, these all loose a lot of information (especially the relationships between features in single embeddings). This doesn't have to be a reduction (i.e. the resulting 1D vector can be larger than m). If it's …
Category: Data Science

How to model a 3D graph into a vector so that I can feed it into a classification algorithm?

I have a 3D graph like below: Ref: google images It has 2 angles as X and Y and the Z axis is amplitude value (Each 3D graph is representing a pixel). I want to model this into some useful data structure like a graph or a vector considering some parameters extracted from the above 3D graph, so that I'll be able to feed it into a classification algorithm. But, I'm unable to extract all the local minimas/maximas, or slopes. …
Category: Data Science

Tsallis entropy - advice needed regarding obtaining probability distribution

As is always the way I stumbled across Tsallis entropy on SO whilst looking for something completely different. This soon lead me reading all sorts of interesting but terse academic papers. I am unfortunately a mere layman and I still have one big unsolved question. The key input to Tsallis entropy is a probability array. What I don't understand is how do you get it out of a time-series ? Allow me to give you a completely hypothetical example: I …
Category: Data Science

Label Encode with pre defined classes

I have trained a model (Random Forest) and now I would like to use it to predict certain data on a particular day. I have a categorical column where there are some values (say a,b,c,d,e) over a period. Now on a particular day, only some of those values are there (say b,d). Now while making them to one-hot encoding, I am using LabelEncoder and the one-hot encoder. But, if I give that column for label encoding, it is labelling only …
Category: Data Science

Regression with a feature which has its own depth

I'm relatively new to ML/Statistical Analysis, and I'm facing a dataset structured like this person_id, pay, task, hours 1, 560, A, 3 1, 560, B, 5 2, 650, A, 7 3, 520, C, 6 3, 520, A, 2 ... meaning person 1 is cumulatively paid 560 to perform task A 3 hrs and task B 5 hrs; person 2 paid 650 for task A 7 hrs; person 3 paid 520 for task C 6 hrs and A 2 hrs, etc. …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.