numerical

Dealing with little available data: transfer learning

jacob

2022年5月17日 17:13

Suppose I seek to predict a certain numerical value, whereby the data set which contains the predetermined correct labels is only very small. However, I'm also provided a large data set with a label that is correlated to the one I want to predict. I read that transfer learning could be used to make use of this larger data set for predicting the desired label from the smaller data set. Could someone elaborate a bit on this?

Topic: transfer-learning numerical predictive-modeling

Category: Data Science

Over-sampling when predicting a contionuous variable

Kev

2022年1月22日 02:55

Lets say i am predicting house selling prices (continuous) and therefore have multiple independent variables (numerical and categorical). Is it common practice to balance the dataset when the categorical independent variables are imbalanced? Ratio not higher 1:100. Or do i only balance the data when the dependent variable is imbalanced? Thanks

Topic: imbalanced-data imbalanced-learn numerical regression categorical-data

Category: Data Science

Separating numerical and categorical features in a binary classification problem

Bluetail

2022年1月22日 02:43

I have a dataset with employee data with around 9500 rows, and have to predict if the target is 0 or 1. Some of my features are the department of an employee, gender, salary, review_score(numerical), average_number_of_hours per month, bonus(1 or 0), number of projects an employee is involved in, and tenure. I have a question if number of projects (3,4,5,6) and tenure(2,3,4,5,6,7,8,9,10,11,12) should be treated as 'categories' rather than numerical values. I can make them ordinal. However, I am not …

Topic: binary-classification categorical-encoding numerical scikit-learn categorical-data

Category: Data Science

With a 5000x20 CSV as data input discover the most common occurrences of numbers in a row

Bill

2022年1月17日 16:31

As input I have a CSV with 5000 lines (and growing) and 20 fixed columns containing a number from 1-80. A row may look like this. Is it possible using Orange3 to analyze each row and find out what pairs, tripes, quads, quints, etc. occur the most often on a row? The output I am looking to get is "these 2 numbers occur the most often on a row" "These 3 numbers occur the most often on a row", "These …

Topic: orange3 numerical statistics

Category: Data Science

Separate discrete and continuous variables

spectre

2022年1月8日 06:50

I know how to separate numerical and categorical data as follows: num_data = [cname for cname in df.columns if df[cname].dtypes == 'object'] cat_data = [cname for cname in df.columns if df[cname].dtypes in ['int64', 'float64']] Now I want to separate my numerical variables into discrete and continuous. How do I do that?

Topic: dataframe numerical python

Category: Data Science

Transforming Categorical to Numerical variable

2021年12月21日 12:59

I have a categorical variable with 4 levels ('8 c', '6 c','NAN','Others') and I want to convert it to numerical form. an Obvious way is to simply remove the 'c' part from the first two categories and replace NAN with 0. However, I was wondering about the 'Others' level? What could be the best way to transform this level? Please note that the variable represents the number of cylinders for a given car.

Topic: transformation feature-engineering numerical categorical-data

Category: Data Science

How to choose the optimal k in k-protoypes?

Sergi F.

2021年10月6日 10:47

To analyze a dataset from banking I have both numerical and categorical values. I transform them to analyze with k-prototypes. The original dataset: The modified dataset: E.g.: Job (for 1 to 12 'cos there are 12 levels) Should I scale the dataset before doing the k-prototypes? How could I determine the optimal "k" to choose (coding)? I thought to execute: library(clustMixType) lbd <- lambdaest(BPor) kpres <- kproto(BPor, 5, lambda = lbd) #Change '5' for every possible value of k. print(kpres) …

Topic: numerical r categorical-data k-means clustering

Category: Data Science

Separate numerical and categorical variables

spectre

2021年7月21日 19:01

I have a dataset (42000, 10) which contains 7 categorical features and 3 numerical. I would like to separate both the numerical and categorical features into 2 different data frames i.e I would like 2 data frames where one contains only numerical data (42000, 3) and the other only categorical data (42000, 7), perform some pre-processing on both of them, and lastly concatenate them into one data frame. So, my question is how do I separate my initial dataframe into …

Topic: numerical preprocessing pandas categorical-data

Category: Data Science

Problem with binning

Ricky

2021年3月8日 14:16

I am trying to change continuous data points to categorical by using binning. I know two techniques, i) equal width bins ii) bins with equal number of elements. My questions are: Which type of binning is appropriate for which kind of problem? I use pandas for my data analysis task and it has pd.cut method for arbitrary binning which I use for equal wdith bins and pd.qcut method for bins with equal number of elements. The second function always produces …

Topic: feature-engineering numerical data categorical-data

Category: Data Science

partial numerical array - pattern matching

Mironline

2020年1月27日 22:25

I have a linear numerical array source and I want to find/match test array as pattern : source = [39,36,23,21,28,36,30,22,34,37] test = [36,23,21,28] we can use brute force or similar method for finding the exact match, by checking test array from index 0 to len(source)-len(test) but in our problem, we can accept this pattern too ( order is important ) test = [36,24,21,28] // changed 23 to 24 since we have many different ways of solving this problem ( maybe …

Topic: numerical fuzzy-classification time-series

Category: Data Science

5 digit number mis-reads analysis

BAC83

2019年11月21日 14:41

Nothing to do with number recognition in the classical 'hand-written' sense Disclaimer above to avoid this being counted as a repeat. I have a selection of 96 serial numbers, and a separate selection of >220 serial numbers. Within the larger set typically resides the smaller set (not always though), but also ~ 120 incorrect numbers. See below for an example - for the record I have matched things up as best as I can... the correct number is first, the …

Topic: jupyter numerical python

Category: Data Science

MinMaxScaler returned values greater than one

bacloud14

2019年7月30日 09:33

Basically I was looking for a normalization function part of sklearn, which is useful later for logistic regression. Since I have negative values, I chose MinMaxScaler with: feature_range=(0, 1) as a parameter. x = MinMaxScaler(feature_range=(0, 1)).fit_transform(x) Then using sm.Logit trainer I got and error, import statsmodels.api as sm logit_model=sm.Logit(train_data_numeric_final,target) result=logit_model.fit() print(result.summary()) ValueError: endog must be in the unit interval. I presume my values are out of (0,1) range, which is the case: np.unique(np.less_equal(train_data_numeric_final.values, 1)) array([False, True]) How come? then how …

Topic: numerical normalization feature-scaling logistic-regression python

Category: Data Science

Purpose of converting continuous data to categorical data

maltodextrin

2019年6月22日 07:20

I was reading through a notebook tutorial working with the Titanic dataset, linked here, and noticed that they highly favored ordinal data to continuous data. For example, they converted both the Age and Fare features into ordinal data bins. I understand that categorizing data like this is helpful when doing data analytics manually, as fewer categories makes data easier to understand from a human perspective. But intuitively, I would think that doing this would cause our data to lose precision, …

Topic: numerical data categorical-data machine-learning

Category: Data Science

Encoding features like month and hour as categorial or numeric?

Funkwecker

2019年5月28日 23:15

Is it better to encode features like month and hour as factor or numeric in a machine learning model? On the one hand, I feel numeric encoding might be reasonable, because time is a forward progressing process (the fifth month is followed by the sixth month), but on the other hand I think categorial encoding might be more reasonable because of the cyclic nature of years and days ( the 12th month is followed by the first one). Is there …

Topic: feature-engineering numerical encoding feature-extraction machine-learning

Category: Data Science

Cluster method with binary variable

Ann

2019年5月7日 04:08

I need to do a cluster analysis for the following variables: Trickquestion answer: Good/Wrong count variable : range 0-9 time in minutes count variable Number of observations: 3300 Since I am new to cluster algorithms I'm struggling with choosing the best cluster algorithm. I have read about the following methods: k prototypes k means with Gower's distance PAM algorithm. For the cluster analysis I need to use R. Can someone give advice about which methods suits the data best. Since …

Topic: numerical binary r clustering

Category: Data Science

What is the intuition behind using Monte Carlo to solve a differential equation

Victor

2019年4月2日 14:37

Conceptually, I understand how a numerical method like Monte Carlo is used to solve a definite integral. Because integral of a function is the area bounded by the curve, the ratio of random points that land inside the curve to the total number of points is the value of the integral. Conceptually, can someone explain for a non math person, how we can solve a PDE/ODE using Monte Carlo?

Topic: monte-carlo numerical

Category: Data Science

How to do non arithmetic operation in python 3

Ishrak Alaxander Hasin

2019年2月23日 06:38

lets say a=3 b=4 and c is an unknown constant. a=3 b=4 F=0 F=a*b*c print(F) Its an error. I want 12*c or 12c

Topic: numpy numerical python

Category: Data Science

Replacing words by numbers in multiple columns of a data frame in R

Yellow whale

2018年12月25日 18:03

I want to replace the values in a data set (sample in the picture) using numbers instead of words, e.g., 1 instead of D, -1 instead of R, 0 for all other values. How can I do it with a loop? I know it can be done doing this instead: (suppose d is index name) d[d$Response == "R",]$Response = -1 d[d$Response == "D",]$Response = 1 ... (other values code it and assign value of) = 0

Topic: word dataframe numerical rstudio r

Category: Data Science

Convert nominal to numeric variables?

mrc

2018年7月20日 14:16

I am trying to develeop an algorithm with sklearn and Tensorflow to predict which car can be offer to each customer. To do that I have a database with the answers of one survey to 1000 customers. An example of questions/[Answers] are: Color/[Green,Red,Blue] NumberOfPax/[2,4,5,6,7] HorsePower/[Integer] InsuranceIncluded[yes/no/Don't know] As you can see all questions are answer previously tipified, and in case the answer can be open I validate the value to be an integer or a radio button. The purpose of …

Topic: numerical scikit-learn classification categorical-data machine-learning

Category: Data Science

Homemade deep learning library: numerical issue with relu activation

Learning is a mess

2018年3月21日 11:14

For the sake of learning the finer details of a deep learning neural network, I have coded my own library with everything (optimizer, layers, activations, cost function) homemade. It seems to work fine when benchmarking in on the MNIST dataset, and using only sigmoid activation functions. Unfortunately I seem to get issues when replacing these with relus. This is what my learning curve looks like for 50 epochs on a training dataset of ~500 examples: Everything is fine for the …

Topic: activation-function backpropagation numerical deep-learning machine-learning

Category: Data Science

About