How far or close would feature importance information from an ML model is from causal diagrams?
The title pretty much covers my question, but to elaborate it: given data (let's assume, for simplicity, it is good enough representation of the underlying distribution) for a binary classification problem (again, for simplicity, and to give a 'feel' of treatment and control groups), when we employ a machine learning model such as random forest, we eventually obtain feature importance from the trained model. The training has taken care of data imbalance using up or down sampling or some other method, as well as used proper samplings such as stratified during training and validations, to mimic randomized control trials. Let's also assume that we have all the confounds in the feature list, i.e., no other confounds left. I know that an ML model would only hope to learn correlations but certainly not causality among features. How far or close would the feature importance plot would be from the actual causal structure? Sure, there won't be any causal arrows in the feature importance plot. Would a first guess on the causal arrows going from the most important feature to the least important feature be too far from reality? Genuinely trying to understand this issue rather than giving opinion here. If there is also some reference that discusses this, that would be helpful too.
Topic causalimpact random-forest
Category Data Science