Upper bound on 'relatedness'?

We have ~100 answers to a questionnaire with five questions (Q5). Independently from that, we have about 50, somewhat overlapping, features describing the people who answers the questions (F50). After having thrown an impressive amount of 'black box' regression models at trying to predict any of the 5 answers from the 50 features, we are approaching the conclusion that the features are just completely orthogonal to the topic of the questionnaire.

This is interesting, and a little surprising, and it could be fun to try to 'prove'. Does anyone know of a measure or method for which we could argue that if

'X does not produce any predictive value in Q5 when applied to F50'

then

'the causal relationship between F50 and Q5 is weaker than C'

Could some flavour of multivariate mutual information be a way forward?

I hope the question makes sense. It seems like it would be generally interesting.

Topic mutual-information regression correlation

Category Data Science


Very interesting question, it's always hard to prove a negative. I have a vague idea but I really don't know if it's worth anything or even applicable to this problem, so please take it with a grain of salt!

The idea is to use randomness and multiple samples in order to compare the result of predicting from random noise vs. actual data X: if the result from X isn't significantly better than from random noise, then you have proved that X doesn't have any predictive power. Of course this relies on the assumption that the model used to predict is sensible enough.

I have seen methods using this idea but I don't remember the details unfortunately. The only directions I can give are these:

Sorry for the lack of details, I hope this helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.