Although one would think that the basic concepts of statistics should be the same across all sciences, there is an amazing heterogeneity between fields in how statistics is taught and practiced.
One example are the validity concepts (e.g. construct validity, internal validity, external validity) that are heavily taught in the social sciences and economics (see Wikipedia). In short, these concepts categorize “failure modes” of inference. I would guess that ecologists are aware about these problems on an abstract level, but they are not taught and used as a framework in the community. I find this regrettable, as my experience is that the “validity checklist” is immensely helpful for students to examine the strength of the evidence provided by a study.
Another example is causal inference, and specifically the concept of mediators, confounders and colliders. This goes back at least to Pearl 2000 (see also Pearl 2009a,b), and with the popularity of SEMs in ecology, I’m sure that people have at least heard about causal inference in general. However, when reading the really excellent and highly recommended paper Lederer et al., 2019 “Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals.” in our group seminar, we also discussed that the practical implementation of these ideas is probably not very far in ecology (and seems to be further in medicine, given that this is a joint paper of the editors of this journal).
Lederer et al. first nicely establish an operational concept of causality that I would broadly agree with also for ecology: assume we look at the effect of a target variable (something that could be manipulated = predictor) on another variable (the outcome = response) in the presence of other (non-target) variables. The goal of a causal analysis is is to control for these other variables, in such a way that we obtain the same effect size for the target variable that we would if the target predictor was manipulated in a controlled intervention (= experiment).
You probably learned in your intro stats class that, to infer causal effects, we have to control for confounders. I am less sure, however, if everyone is clear about what a confounder is. In particular, confounding is more specific than having a variable that correlates with predictor and response. The direction is crucial to identify true confounders. For example, Fig. 1 C from the Lederer paper shows a collider, i.e. a variable that is influenced by predictor and response. Although it correlates with predictor and response, correcting for it (or including it) in a multiple regression will create a collider bias on the causal link we are interested in (corollary: including all variables is not always a good thing). The bottomline of this discussions (and the essence of Pearl 2000, 2009) is that to establish causality for a specific link, we have to close the so-called back-door paths for this link, by
- Controlling for confounders (back-doors, blue paths in the figure)
- Not controlling for colliders, M-Bias, and other similar relationships (red paths)
- It depends on the question whether we should control for mediators (yellow paths)
My impression is that these type of arguments are well-established in the medical and economic literature (in the sense that people regularly use them to defend inclusion / exclusion of variables in a regression), but that they are rarely invoked in the ecological literature, where the selection of predictors happens by an eclectic selection of methods, ranging from AIC selection to “ecological plausibility”, but rarely based on formal causality arguments.
Moreover, what I really liked about the Lederer paper is their discussion of the Table 2 fallacy. The paper recommends that variables included as confounders should NOT be discussed and not be presented in the regression table at all (this is typically Table 2 in a paper, thus the name), because they are themselves usually not corrected for confounding (and they shouldn’t or at least don’t have to be corrected for, see Pearl 2000 / discussion above). Sensible advice, but I think contrary to common practice in standard and SEM regression reporting in ecology.
A cynical (but possibly accurate) explanation for why the Table 2 fallacy is the norm in ecology is that we rarely have a clear target variable / hypothesis, and thus we feel all variables that were used have to be discussed. A side effect is that this makes for the most boring result / discussion sections, where the effect of one variable after the other has to be discussed an interpreted. More importantly, however, each variable that is discussed as a causal effect must be controlled for confounding, or else we should make a clear distinction between the variables that are controlled, and those that aren’t. As I said, Lederer recommend not mentioning uncontrolled variables at all. I’m not sure if that is practical for ecology (as analyses are often semi-explorative), but I have recently been wondering about the option to separate reasonably controlled from possibly confounded variables by a bar or extra section in the regression table.
My only small quibble with the otherwise excellent Lederer paper relates to their comments about significance. First, I strongly support their call for concentrating on parameters and CIs instead of p-values. However, I find their recommendation to avoid the word “not significant” in favor of a vague term such as “the estimate is imprecise” a bad one (this is btw. similar to some other recent papers, e.g. Dushoff et al., 2019, Amrhein et al, 2019, which would make a nice topic for another post). The idea behind this recommendation is that researchers tend to misinterpret n.s. as “no effect”, but it seems to me the response should be to better educate researchers about what n.s. means, not to muddy the waters by hiding the fact that a test was done.
Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., … & Stewart, P. W. (2019) Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. Annals of the American Thoracic Society, 16(1), 22-28.
Pearl, J. (2009) Causal inference in statistics: An overview. Statistics surveys 3, 96-146.
Pearl, J. (2000 / 2009) Causality. Cambridge University Press, 1st / 2nd ed.
Dushoff, J., Kain, M.P. and Bolker, B.M., 2019. I can see clearly now: reinterpreting statistical significance. Methods in Ecology and Evolution.
Amrhein, V., Greenland, S. and McShane, B., 2019. Scientists rise up against statistical significance.
2 thoughts on “Mediators, confounders, colliders – a crash course in causal inference”
Funny, just as I retweet this, I see a discussion about collider bias on twitter https://twitter.com/statsepi/status/1117351180244529152. The collision is hard to see because it is caused by selection (see e.g. https://academic.oup.com/ije/article/47/1/226/4259077)
Pingback: Friday links: a sobering story about publicizing scholarship on social media, polynomial cows, and more | Dynamic Ecology