pearsons-correlation-coefficient

Python: calculate the weighted average correlation coefficient

Richard H

2022年5月28日 14:19

I am calculating the volatility (standard deviation) of returns of a portfolio of assets using the variance-covariance approach. Correlation coefficients and asset volatilities have been estimated from historical returns. Now what I'd like to do is compute the average correlation coefficient, that is the common correlation coefficient between all asset pairs that gives me the same overall portfolio volatility. I could of course take an iterative approach, but was wondering if there was something simpler / out of the box …

Topic: numpy pearsons-correlation-coefficient correlation pandas python

Category: Data Science

Scaling and handling highly correlated features in tabular data for regression

hAcKnRoCk

2022年5月14日 07:38

I am working on a regression problem trying to predict a target variable with seven predictor variables. I have a tabular dataset of 1400 rows. Before delving into the machine learning to build a predictor, I did an EDA(exploratory data analysis) and I got the below correlation coefficients (Pearson r) in my data. Note that I have included both the numerical predictor variables and the target variable. I am wondering about the the following questions: We see that pv3 is …

Topic: pearsons-correlation-coefficient preprocessing regression data-cleaning data-mining

Category: Data Science

Transforming negative correlated non linear variable to linear positive correlated variable

shivanshu dhawan

2022年4月13日 13:02

At my office, I am stuck in a weird situation. I am asked to perform a regression algorithm on the data, in which the target variable is continuous having values range between 0.6 to 0.9 with 8 digits of precision after the decimal. Although I know and have applied many linear and non-linear regression algorithms in the past the case here is something different. There is one variable, which, according to my BU, should have a positive and linear correlation …

Topic: collinearity pearsons-correlation-coefficient regression

Category: Data Science

Correlation analysis yields conflicting results. Positive Pearson and Negative Spearman

No-Time-To-Day

2022年3月29日 11:38

I have four features x1,x2,x3,x4 all of their correlation with y are similar in Pearson and in Spearman separately. However, all these are +0.15 in Pearson and -0.6 in Spearman, more or less. Does this make sense? How can I interpret this result? All four features and target are definitely related. From a common sense perspective the sign of Spearman is more accurate as well.

Topic: spearmans-rank-correlation data-analysis pearsons-correlation-coefficient correlation

Category: Data Science

What statistical method should i use to find Correlation between number of days and AmountEarned

LordoftheRingYVR

2022年3月11日 17:07

I am new to Data Science and I have a python data frame with Number of days, CountofJobs, and AmountEarned what statistical method should I use to find a correlation between Days and AmountEarned. NumberofDays CountofJobs AmountEarned 20 3 50000 22 18 10000 35 10 80000

Topic: spearmans-rank-correlation pearsons-correlation-coefficient correlation

Category: Data Science

How many features do I select when doing feature selection for regression algorithms? Is R2 and RMSE good measures of success for overfitting?

pythonnoob2

2022年3月4日 17:00

Context: I'm currently crafting and comparing machine learning models to predict housing data. I have around 32000 data points, 42 features, and I'm predicting housing price. I'm comparing Random Forest Regressor, Decision Tree Regressor, and Linear Regression. I can tell there is some overfitting going on, as my initial values vs cross validated values are as follows: RF: 10 Fold R Squared = 0.758, neg RMSE = -540.2 vs unvalidated R Squared of 0.877, RMSE of 505.6 DT: 10 Fold …

Topic: rmse overfitting pearsons-correlation-coefficient regression feature-selection

Category: Data Science

Is pearson correlation matrix a good indicator for label encoded categorical and numeric independent data?

Echo

2022年2月25日 22:01

I have a dataset having 22 independent variables out of which 15 are categorical data that has already been label encoded i.e the dtype is int64 and the contents are in a range of 0 to n (n is the number of distinct classes). I got the data in this form and didnot have to encode it. Since, the data has been already encoded I can directly use python pearson's correlation to get the correlation matrix for all combinations (encoded-encoded, …

Topic: pearsons-correlation-coefficient correlation scikit-learn pandas

Category: Data Science

Correlation with target variable for regression problem

william007

2022年2月13日 06:38

Given the following dataframe age job salary 0 1 Doctor 100 1 2 Engineer 200 2 3 Lawyer 300 ... with age as numeric, job as categorical, I want to test the correlation with salary, for the purpose of selecting the features (age and/or job) for predicting the salary (regression problem). Can I use the following API from sklearn (or other api) sklearn.feature_selection.f_regression sklearn.feature_selection.mutual_info_regression to test it? If yes, what's the right method and syntax to test the correlation? Following …

Topic: spearmans-rank-correlation pearsons-correlation-coefficient correlation scikit-learn feature-selection

Category: Data Science

Optimizing rolling Pearson correlation

Дмитрий Сажнев

2022年2月8日 09:13

I have Pandas DataFrame with multiple columns (3000 or more) with timeseries in them (I have dates as indecies). |id1 id2 id3 ------------------------------- 2021-01-06 | 27 29 5 2021-01-07 | 24 20 9 ... 2021-01-08 | 21 13 14 2021-01-09 | 10 6 24 ... And I need to do rolling window computations of Pearson correlation on each pair of columns. I'm using multiprocessing and regular pandas.DataFrame.corr() function and it take days to complete the calculation. Is it possible to …

Topic: pearsons-correlation-coefficient pandas python

Category: Data Science

How to calculate relation function of State and Action in delayed action effect Environment?

jcpark

2021年11月6日 12:39

I'm trying to calculate corr-coef(or other good relation function) of State and Action in 'delayed action effect' Envrironment. In this environment, agent observes states, then it returns action and reward. But actions effect to 'T' time after states. So, it is really hard to imagine how to access this environment.(Because this is the first trial.) Are there any good approaches in this situation?

Topic: pearsons-correlation-coefficient preprocessing reinforcement-learning

Category: Data Science

Association between Categorical Variables and regression

Milan Amrut Joshi

2021年8月16日 07:08

We perform data analysis and build models. Say, for example, I built a regression model that has more than one predictor (multiple regression). We then check many things: normality, multicollinearity, etc. Specifically, we check for multicollinearity, for a numeric/continuous variable, VIF (Variance Inflation Factors) etc. If we find that there is multicollinearity, we then drop one of the highly correlated features. My question is: what can be done with categorical variables? I mean if two categorical variables are correlated/associated does …

Topic: pearsons-correlation-coefficient data regression categorical-data

Category: Data Science

Analysing process data with sub groupings and checking for correlation

Aesir

2021年8月12日 08:17

I have a dataset of process data for different equipment with many sensors. I would like to check the correlation of the different sensors to see if there is any strong correlation between some sensors and potentially reduce the size of my dataset. Within this process data there are many different processes of varying lengths and different equipment. For now I am asserting that the different equipment shouldn't make a difference and therefore I do not want to include this …

Topic: spearmans-rank-correlation pearsons-correlation-coefficient correlation pyspark

Category: Data Science

Agglomerative Clustering (average linkage) and Pearson Correlation

Hoda

2021年7月17日 00:15

Does having a positive or negative correlation between features being clustered affect the agglomerative clustering result? I have three columns in my dataset, and I'm trying to figure out if I should cluster on all three features or use only a subset. The Pearson correlation coefficients are: X & Z --> -0.07, p=0.14 X & Y --> -0.08, p=0.08 Z & Y --> 0.68, p<0.001 The Variance Inflation Factor is: variables VIF Y 2.816716 X 3.552227 Z 6.232414 Should I …

Topic: pearsons-correlation-coefficient statistics clustering machine-learning

Category: Data Science

Pearson vs Spearman vs Kendall

2021年5月22日 10:09

What are the characteristics of the three correlation coefficients and what are the comparisons of each of them/assumptions? Can somebody kindly take me through the concepts?

Topic: kendalls-tau-coefficient spearmans-rank-correlation pearsons-correlation-coefficient correlation

Category: Data Science

Pearson correlation on two categorical variables

Pierre O

2021年4月28日 15:01

I am using the fourth-corner method in one of my papers (for those who need the name). The method was developed to test associations between variables in two datasets. In my case, the datasets contains traits of species (e.g. trait Size with modalities 'small', 'medium', 'large'). The method recognizes the data type and then apply appropriate statistics. The correct cases: If two variables are quantitative, the fourthcorner calculates Pearson correlations. If two variables are qualitative, factorial, the method calculates a …

Topic: one-hot-encoding pearsons-correlation-coefficient correlation binary

Category: Data Science

Which method to use to remove correlation between independent variables comprising of both categorical and numerical variables?

Umair-Mehmood

2021年4月18日 16:40

The independent variables in the dataset contains categorical variables such as Gender ( 2 levels) Mode of Shipment ( 3 levels) Product Importance ( 4 levels) and Numerical Variables such as Customer care calls Discount Offered Package weight How do I find the correlations between these variables? Converting categorical variables in to dummy variables and then using pearson correlation? What if the dummy variable categories also shows correlations too? such as correlation between Mode of shipment categories, Flight, ship, road? …

Topic: pearsons-correlation-coefficient correlation

Category: Data Science

How to interpret Low Pearson correlation coefficient between stable signals and high Pearson correlation coefficient between unstable signals?

bvl

2021年3月26日 08:54

I calculated the Pearson correlation coefficient between two signals, that described the state of the unit. During normal operation of the unit, both signals were fairly stable and fluctuated very little. At some point in time, the unit began to form a defect, in connection with which the oscillations of the signals increased, and also a trend of growth in their absolute values began to be observed. Signals describing the normal operation of the unit. Signals describing emergency operation of …

Topic: pearsons-correlation-coefficient regression correlation

Category: Data Science

Calculating correlation for categorical variables

Ricky

2021年3月8日 06:33

I am struggling to find out a suitable way to calculate correlation coefficient for categorical variables. Pearson's coefficient is not supported for categorical features. I want to find out features with most highest influence on the target variable. My objectives are: Correlation between categorical and categorical variables. e.g. For a binary target (like Titanic dataset), I want to find out what is the influence of a category on the target (like, influence of gender on survival (0/1)) Capture some non …

Topic: pearsons-correlation-coefficient descriptive-statistics categorical-data

Category: Data Science

Determine relationship between users and age?

Math

2021年2月7日 17:26

I would like to understand how to find an association between users, spam and email's age. My dataset looks like as follows: User Spam Age (yr) porn_23 1 1 Mary_g 0 6 cricket_s54 0 4 rewuoiou 1 0 pure75 1 2 giogio35 0 10 viv3roe 1 1 I am looking at the correlation using Pearson. Is it right? I would like to determine the correlation between age and user: spam email should likely come from users having recent email's addresses …

Topic: pearsons-correlation-coefficient correlation python statistics

Category: Data Science

Pearson correlation with data sets that have values on different scales

user9317212

2020年12月20日 13:02

I have two datasets with which I want to do a Pearson correlation analysis. I have carried out the analysis which makes sense, however I want to be sure it is valid given that both datasets have values on different scales. The features in both datasets are exactly the same (the actual samples are of course different). The range of values are as follows: dataset1 = 3-20 dataset2 = 10-30 Now my understanding is that pearson correlation coefficient is not …

Topic: pearsons-correlation-coefficient correlation similarity

Category: Data Science

About