Upper bound on 'relatedness'?

Question

Upper bound on 'relatedness'?

Kaare

2020年1月14日 02:02

We have ~100 answers to a questionnaire with five questions (Q5). Independently from that, we have about 50, somewhat overlapping, features describing the people who answers the questions (F50). After having thrown an impressive amount of 'black box' regression models at trying to predict any of the 5 answers from the 50 features, we are approaching the conclusion that the features are just completely orthogonal to the topic of the questionnaire.

This is interesting, and a little surprising, and it could be fun to try to 'prove'. Does anyone know of a measure or method for which we could argue that if

'X does not produce any predictive value in Q5 when applied to F50'

then

'the causal relationship between F50 and Q5 is weaker than C'

Could some flavour of multivariate mutual information be a way forward?

I hope the question makes sense. It seems like it would be generally interesting.

Topic mutual-information regression correlation

Category Data Science

Erwan · Accepted Answer · 2019年12月14日 02:17

Very interesting question, it's always hard to prove a negative. I have a vague idea but I really don't know if it's worth anything or even applicable to this problem, so please take it with a grain of salt!

The idea is to use randomness and multiple samples in order to compare the result of predicting from random noise vs. actual data X: if the result from X isn't significantly better than from random noise, then you have proved that X doesn't have any predictive power. Of course this relies on the assumption that the model used to predict is sensible enough.

I have seen methods using this idea but I don't remember the details unfortunately. The only directions I can give are these:

the impostor method for authorship verification, which is based on a somewhat similar idea
I've seen some works using the binomial test to formally test whether the outcome from a method X is significantly different from a random baseline.

Sorry for the lack of details, I hope this helps.

Upper bound on 'relatedness'?

About