mutual-information

How to measure F1 score and NMI for clustering task?

Jay Patel

2022年5月21日 13:04

I see the authors of this paper are measuring F1 and NMI scores to measure the clustering quality. However, I don't understand the algorithm of how they actually measure it. See the Evaluation Section. Although I have looked at the code, I am not sure about the actual algorithm.

Topic: mutual-information evaluation k-means clustering

Category: Data Science

For feature selection, do we use Chi-squared with Mutual Information together?

O O

2022年5月18日 12:46

Or do we only choose one out of two for categorical data.

Topic: chi-square-test mutual-information feature-extraction feature-selection machine-learning

Category: Data Science

VIF Vs Mutual Info

Rohan

2022年4月29日 21:43

I was searching for the best ways for feature selection in a regression problem & came across a post suggesting mutual info for regression, I tried the same on boston data set. The results were as follows: # feature selection f_selector = SelectKBest(score_func=mutual_info_regression, k='all') # learning relationship from training data f_selector.fit(X_train, y_train) # transform train input data X_train_fs = f_selector.transform(X_train) # transform test input data X_test_fs = f_selector.transform(X_test) The scores were as follows: Features Scores 12 LSTAT 0.651934 5 RM …

Topic: variance mutual-information regression feature-selection

Category: Data Science

Feature selection with information gain (KL divergence) and mutual information yields different results

AutoMiner

2022年3月16日 19:01

I'm comparing different techniques for feature selection / feature ranking. Two of the techniques under scrutiny are the mutual information (MI) and the information gain (IG) as used in decision trees, i.e. the Kullback-Leibler divergence. My data (class and features) is all binary. All sources I could find state, that MI and IG are basically "two sides of the same coin", i.e. that one can be tranformed into the oher via mathematical manipulation. (For example [source 1, source 2]) Yet, …

Topic: mutual-information information-theory ranking feature-selection

Category: Data Science

clustering algorithms' evaluation

Alireza

2021年12月21日 08:25

How can I show clustering performance of various clustering algorithms on various datasets using adjusted mutual information and adjusted rand index. for instance, the plot below .

Topic: matplotlib mutual-information evaluation clustering

Category: Data Science

How to fix my CSV files? (ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required)

Noob

2021年9月14日 09:06

I have tried to import two csv files into df1 and df2. Concatenated them to make df3. I tried to call the mutual_info_regression on them but I am getting a value error ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required. I have checked the dimensions of X, y, and discrete_features. They all seem okay. Since the code works with other csv files (I have tested), I think the problem is with my csv …

Topic: mutual-information scikit-learn python data-cleaning

Category: Data Science

Pipelines with categorical and nan values

spectre

2021年7月21日 10:55

I am trying a Regression model on a dataset which has categorical and numerical variables along with nan values. I want to use Pipelines for imputation and encoding purposes. Now I have a few conditions which must be satisfied in building the model which are as follows: 1.) Use of Pipelines is a must for imputation and encoding (one hot encoding) purpose. 2.) Imputation should be done AFTER train test split. 3.) For feature selection (should be done AFTER train …

Topic: rfe pipelines one-hot-encoding mutual-information preprocessing

Category: Data Science

A measure of redundancy in mutual information

user1767774

2021年7月17日 07:50

Mutual information quantifies to what degree $X$ decreases the uncertainty about $Y$. However, to my understanding, it does not quantify "in how many ways" $X$ decreases the uncertainty. E.g., consider the case where $X$ is a 3D vector, and consider $X_1=[Y,0,0]$ vs. $X_2 = [Y,Y^2, 3.5Y]$. Intuitively, $X_2$ contains "more information" about $Y$, or is more redundant with respect to $Y$, than $X_1$; but if I understand correctly, both have the same mutual information. Is there an alternative information-theoretic measure …

Topic: mutual-information information-theory descriptive-statistics

Category: Data Science

Visualizing mutual information of each convolution layer for image classification problem

Ashesh

2021年7月13日 00:01

I recently came across this paper where the author has proposed a compression based theory on understanding the layers of a DNN. In order to visualize what was going on the authors showed Figure 2 of this paper which is also shown as a video here. For my image classification problem I want to visualize the mutual information exactly in this format. Can someone kindly explain to me how to calculate this numerically for images passing through conv layers in …

Topic: cnn mutual-information information-theory

Category: Data Science

Does Sample Size affects Mutual Information for Feature Selection?

Fernando

2021年6月14日 00:27

There is a dataset with n rows (samples) and p columns (variables/features), the objective is to predict a certain target variable (y). Should n (sample size) matter to the results of pairwise mutual information tests between every feature and y ? Meaning if n is too small or too large, the results can't be trusted ? My intuition says no, but I'm not fully confident. And is there a good reason, besides domain knowledge, to not exclude a variable that …

Topic: mutual-information feature-selection

Category: Data Science

Understanding math notation in infoGAN paper

zipline86

2021年4月12日 21:14

I'm reading this paper about mutual information in infoGAN infoGAN_paper_link and already have the code to run it. I pretty much found code for it which is fine and dandy except for the fact that I kinda don't understand some of the code in the cost function. So, I looked at the paper to dissect it for better understanding and came across some math notation that I don't understand (pic below). The usage of the notations I'm trying to figure …

Topic: gan mutual-information descriptive-statistics statistics

Category: Data Science

Mututal Information in sklearn

Matthew Walker

2021年4月9日 12:59

I expected sklearn's mutual_info_classif to give a value of 1 for the mutual information of a series of values with itself but instead I'm seeing results ranging between about 1.0 and 1.5. What am I doing wrong? This video on mutual information (from 4:56 to 6:53) says that when one variable perfectly predicts another then the mutual information score should be log_2(2) = 1. However I do not get that result: import pandas as pd from sklearn.metrics import confusion_matrix y …

Topic: mutual-information scikit-learn python

Category: Data Science

How does Mutual Information handle background overlap

pramesh

2020年7月26日 12:42

I have been reading about mutual information in Image Registration. It's in the literature that MI is better able to handle the cases with the large background where anatomical structures are not aligned than entropy. Can someone provide intuitive information regarding how can MI handle such cases? Thanks in advance

Topic: mutual-information

Category: Data Science

When should mutual information be used for feature selection over other feature selection methods like correlation, ANOVA , etc?

Ankita Talwar

2020年6月18日 06:23

I have a data set with categorical and continuous/ordinal explanatory variables and continuous target variable. I tried to filter features using one-way ANOVA for categorical variables and using Spearman's correlation coefficient for continuous/ordinal variables.I am using p-value to filter. I then also used mutual information regression to select features.The results from both the techniques do not match. Can someone please explain what is the discrepancy and what should be used when ?

Topic: spearmans-rank-correlation anova mutual-information feature-selection machine-learning

Category: Data Science

Difference between Information Gain and Mutual Information for feature selection

Abhik Banerjee

2020年3月18日 16:02

What is the difference between information gain and mutual information? At this point, I understand that information gain is calculated between a random variable and target class for classification while mutual information is calculated between two random variables. Does mutual information become the same as information when it is calculated between a random variable and target class?

Topic: mutual-information feature-selection

Category: Data Science

Several independent variables based on the same underlying data

user2340939

2020年2月10日 01:06

I got a data containing, among others, two feature variables, which are based from the same underlying data (i.e. have mutual information), but they convey different information/message. How to handle such cases? Since, logically, they will be highly correlated, it would make sense to only use one of them, preferably the one which conveys more information. But: Is this the correct approach, or do we actually lose a valuable information by not including it? If including it is the correct …

Topic: mutual-information multiclass-classification correlation feature-extraction feature-selection

Category: Data Science

Upper bound on 'relatedness'?

Kaare

2020年1月14日 02:02

We have ~100 answers to a questionnaire with five questions (Q5). Independently from that, we have about 50, somewhat overlapping, features describing the people who answers the questions (F50). After having thrown an impressive amount of 'black box' regression models at trying to predict any of the 5 answers from the 50 features, we are approaching the conclusion that the features are just completely orthogonal to the topic of the questionnaire. This is interesting, and a little surprising, and it …

Topic: mutual-information regression correlation

Category: Data Science

Conditional Entropy and Mutual Information - Clustering evaluation

ismlhk

2019年4月19日 19:17

First of all, I am doing clustering and I have the true labels for my data. For evaluation, I am using the weighted average of the entropy values for each predicted cluster. I also came across with Mutual Information as a similar approach while going over the alternatives. On my data, they seem to give similar results. However there is one issue that puzzles me. Given the predicted cluster set $U$ and true clusters $V$, mutual information was defined as: …

Topic: mutual-information information-theory evaluation clustering

Category: Data Science

Concept of Mutual Information

Saeed

2019年1月26日 11:01

I want to get mutual information in iris dataset to select best feature but i confused about mutual information. What is concept of mutual information for selecting feature? Can anyone explain it in simple way? You do not really understand something unless you can explain it to your grandmother. Albert Einstein

Topic: mutual-information feature-selection

Category: Data Science

PMI between lemma vs surface

alvas

2018年8月8日 14:55

I was wondering whether it's possible to compute the some sort of pointwise mutual information between lemma and its surface form. First if we assume, p('to go') = count('to go') / sum(all lemmas) p('went') = count('went') / sum(all words) Breakpoint here, since every word comes with its respective lemma, we have the condition that sum(all lemmas) == sum(all words) The joint probability is also a little hard to normalize # count of "went" being lemmatize to "to go p('went', 'to …

Topic: mutual-information probability nlp

Category: Data Science

About