Correlation analysis yields conflicting results. Positive Pearson and Negative Spearman

I have four features x1,x2,x3,x4 all of their correlation with y are similar in Pearson and in Spearman separately. However, all these are +0.15 in Pearson and -0.6 in Spearman, more or less. Does this make sense? How can I interpret this result? All four features and target are definitely related. From a common sense perspective the sign of Spearman is more accurate as well.
Category: Data Science

What statistical method should i use to find Correlation between number of days and AmountEarned

I am new to Data Science and I have a python data frame with Number of days, CountofJobs, and AmountEarned what statistical method should I use to find a correlation between Days and AmountEarned. NumberofDays CountofJobs AmountEarned 20 3 50000 22 18 10000 35 10 80000
Category: Data Science

Correlation with target variable for regression problem

Given the following dataframe age job salary 0 1 Doctor 100 1 2 Engineer 200 2 3 Lawyer 300 ... with age as numeric, job as categorical, I want to test the correlation with salary, for the purpose of selecting the features (age and/or job) for predicting the salary (regression problem). Can I use the following API from sklearn (or other api) sklearn.feature_selection.f_regression sklearn.feature_selection.mutual_info_regression to test it? If yes, what's the right method and syntax to test the correlation? Following …
Category: Data Science

How to assess whether neural network performance is associated with a nuisance variable

Problem I have a convolutional neural network model which intakes a video and outputs a continuous variable. I want to assess whether the performance of the model is associated with another continuous variable (age; not included in the model). Solution attempt If this were a linear regression model, I think I could do a Spearman rank correlation test: basically, plot the absolute value of the residuals (true value - predicted value) against the nuisance variable (age), then calculate the Spearman …
Category: Data Science

Analysing process data with sub groupings and checking for correlation

I have a dataset of process data for different equipment with many sensors. I would like to check the correlation of the different sensors to see if there is any strong correlation between some sensors and potentially reduce the size of my dataset. Within this process data there are many different processes of varying lengths and different equipment. For now I am asserting that the different equipment shouldn't make a difference and therefore I do not want to include this …
Category: Data Science

Correlation/distance between sparse vectors

I am looking for a metric for comparing gene count tables. These are long columns of data (a few millions genes by a few dozen samples), with all non-negative entries, about 90% of which are zeros. The goal is to compare the performance of several tools/algorithms that these tables originate from, by comparing the resulting tables among themselves or with the expected counts (in a case of sumulates data). In principle, one compares on a sample-by-sample basis, but comparing different …
Category: Data Science

When should mutual information be used for feature selection over other feature selection methods like correlation, ANOVA , etc?

I have a data set with categorical and continuous/ordinal explanatory variables and continuous target variable. I tried to filter features using one-way ANOVA for categorical variables and using Spearman's correlation coefficient for continuous/ordinal variables.I am using p-value to filter. I then also used mutual information regression to select features.The results from both the techniques do not match. Can someone please explain what is the discrepancy and what should be used when ?
Category: Data Science

Slightly different results between scipy.stats.spearmanr and manual calculation

I have the following dataset. When I calculate the Spearman correlation coefficient with scipy.stats.spearmanr, it returns 0.718182. import pandas as pd import numpy as np from scipy.stats import spearmanr df = pd.DataFrame( [ [7,3], [6,5], [5,4], [3,2], [6,4], [8,9], [9,7] ], columns=['Set of A','Set of B']) correlation, pval = spearmanr(df) print(f'correlation={correlation:.6f}, p-value={pval:.6f}') It returns this: correlation=0.718182, p-value=0.069096 However, when I tried to calculate it manually: df_rank = pd.DataFrame( [ [5,2], [3.5,4], [2,4], [1,1], [3.5,4], [6,7], [7,6] ], columns=['Rank of A','Rank …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.