Upper bound on 'relatedness'?
We have ~100 answers to a questionnaire with five questions (Q5). Independently from that, we have about 50, somewhat overlapping, features describing the people who answers the questions (F50). After having thrown an impressive amount of 'black box' regression models at trying to predict any of the 5 answers from the 50 features, we are approaching the conclusion that the features are just completely orthogonal to the topic of the questionnaire.
This is interesting, and a little surprising, and it could be fun to try to 'prove'. Does anyone know of a measure or method for which we could argue that if
'X does not produce any predictive value in Q5 when applied to F50'
then
'the causal relationship between F50 and Q5 is weaker than C'
Could some flavour of multivariate mutual information be a way forward?
I hope the question makes sense. It seems like it would be generally interesting.
Topic mutual-information regression correlation
Category Data Science