Mediators, confounders, colliders – a crash course in causal inference

Although one would think that the basic concepts of statistics should be the same across all sciences, there is an amazing heterogeneity between fields in how statistics is taught and practiced.

I find one example of this are the validity concepts taught in the social sciences and economics (see Wikipedia). In short, those categorize “failure modes” of inference (e.g. construct validity, internal validity, external validity). For sure, ecologists are aware of these problems as well, but in ecology, they are not typically taught as a concise list / framework in the standard curriculum, which I have found to be immensely helpful for students.

Another example is causal inference, and specifically the concept of mediators, confounders and colliders. This goes back at least to Pearl 2000 (see also Pearl 2009a,b), and with the popularity of SEMs in ecology, I’m sure that people have at least heard about causal inference in general. However, when reading the really excellent and highly recommended paper Lederer et al., 2019 “Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals.” in our group seminar, I got the distinct feeling that the practical interpretation of these ideas differs quite strongly between medical and ecological fields.

Lederer et al. first nicely establish an operational concept of causality that I would broadly agree with also for ecology: assume we look at the effect of a target variable (something that could be manipulated = predictor) on another variable (the outcome = response) in the presence of other (non-target) variables. The goal of a causal analysis is is to control for these other variables, in such a way that we estimate the same effect size that we would obtain if only the target predictor was manipulated (as in a RCT).

You probably have learned in your intro stats class that, to do so, we have to control for confounders. I am less sure, however, if everyone is clear about what a confounder is. In particular, confounding is more specific than having a variable that correlates with predictor and response. The direction is crucial to identify true confounders. For example, Fig. 1 C from the Lederer paper shows a collider, i.e. a variable that is influenced by predictor and response. Although it correlates with predictor and response, correcting for it (or including it) in a multiple regression will create a collider bias on the causal link we are interested in (corollary: including all variables is not always a good thing). The bottomline of this discussions (and the essence of Pearl 2000, 2009) is that to establish causality for a specific link, we have to close the so-called back-door paths for this link, by 

  • Controlling for confounders (back-doors, blue paths in the figure)
  • Not controlling for colliders, M-Bias, and other similar relationships (red paths)
  • It depends on the question whether we should control for mediators (yellow paths)

My impression is that these type of arguments are well-established in the medical and economic literature (in the sense that people regularly use them to defend inclusion / exclusion of variables in a regression), but that they are rarely invoked in the ecological literature.

Screenshot 2019-03-28 at 17.37.18.png

Fig 1 DAGs, visualising the most important concepts. Red lines should not be accounted for. From Lederer et al., 2019

Moreover, what I really liked about the Lederer paper is their discussion of the Table 2 fallacy. The paper recommends that variables included as confounders should NOT be discussed and not be presented in the regression table at all (this is typically Table 2 in a paper, thus the name), because they are themselves usually not corrected for confounding (and they shouldn’t or at least don’t have to be corrected for, see Pearl 2000 / discussion above). Sensible advice, but I think contrary to common practice in standard and SEM regression reporting in ecology.

A cynical (but possibly accurate) explanation for why the Table 2 fallacy is the norm in ecology is that we rarely have a clear target variable / hypothesis, and thus we feel all variables that were used have to be discussed. A side effect is that this makes for the most boring result / discussion sections, where the effect of one variable after the other has to be discussed an interpreted. More importantly, however, each variable that is discussed as a causal effect must be controlled for confounding, or else we should make a clear distinction between the variables that are controlled, and those that aren’t. As I said, Lederer recommend not mentioning uncontrolled variables at all. I’m not sure if that is practical for ecology (as analyses are often semi-explorative), but I have recently been wondering about the option to separate reasonably controlled from possibly confounded variables by a bar or extra section in the regression table.

My only small quibble with the otherwise excellent Lederer paper relates to their comments about significance. First, I strongly support their call for concentrating on parameters and CIs instead of p-values. However, I find their recommendation to avoid the word “not significant” in favor of a vague term such as “the estimate is imprecise” a bad one (this is btw. similar to some other recent papers, e.g. Dushoff et al., 2019, Amrhein et al, 2019, which would make a nice topic for another post). The idea behind this recommendation is that researchers tend to misinterpret n.s. as “no effect”, but it seems to me the response should be to better educate researchers about what n.s. means, not to muddy the waters by hiding the fact that a test was done. 

References

Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., … & Stewart, P. W. (2019) Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. Annals of the American Thoracic Society16(1), 22-28.

Pearl, J. (2009) Causal inference in statistics: An overview. Statistics surveys 3, 96-146.

Pearl, J. (2000 / 2009) Causality. Cambridge University Press, 1st / 2nd ed.

Dushoff, J., Kain, M.P. and Bolker, B.M., 2019. I can see clearly now: reinterpreting statistical significance. Methods in Ecology and Evolution.

Amrhein, V., Greenland, S. and McShane, B., 2019. Scientists rise up against statistical significance.

 

 

 

2 thoughts on “Mediators, confounders, colliders – a crash course in causal inference

  1. Pingback: Friday links: a sobering story about publicizing scholarship on social media, polynomial cows, and more | Dynamic Ecology

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s