I have been musing about the use of the wording “error” in statistical analysis – it is common to speak about “observation error” or “process error”, but what we really mean are unobserved ecological or physical processes that affect our system or our observations, and that we choose to subsume in a probabilistic model. One could argue that this is all semantics, but I wonder whether the negative touch that we probably all have associated with the word “error” explains why “error models” are seldom discussed or interpreted as a matter of ecological concern. People discuss about whether the residuals follow the error model assumed (see e.g. here), but apart from that, the “error model” is usually viewed as an arbitrary flexible thing that, like play dough, can be adjusted at will until it wraps nicely around the residuals.

Parts of this mindset is certainly connected to historical limits in methodology – the “error” in regression models is inevitably a mixture of different error sources and therefore difficult to interpret ecologically. But this is changing, for example with hierarchical Bayesian models that allow separating “errors” (I would rather say stochastic processes!) at different levels, or approximate Bayesian methods that allow to generate approximate likelihoods from stochastic processes that have a clear mechanistic interpretation. I wonder whether it would be better to avoid the “error-terminology” altogether in publications and teaching, in favor of simply referring to the “observation stochasticity”, “process-stochasticity” and so on …

I agree and I raised the point a few times, in my case the culprit was the term noise, a concept I find pretty disturbing, since it is typical of physicist, mathematicians or wanna-bes involved in ecological analysis.

Yes, noise would be another candidate.

Is Jim Clark’s position interesting here? ‘…processes perceived as stochastic at one level of abstraction have explanations at another.’

http://www.uwyo.edu/vegecology/pdfs/readings/clark_tree2009.pdf

I think it is. In fact, I guess all of his work on hierarchical Bayesian models of the last 10 years is quite interesting in this context – maybe his 2011 EL paper (http://dx.doi.org/10.1111/j.1461-0248.2011.01685.x) has set out his arguments most clearly though, it’s more concerned with the modeling questions and less with battling neutral theory than some of the others.

To add some thoughts: the way I understand Jim’s position, he is in general less concerned with having a “fundamentally mechanistic” description of the generating stochastic effect in the system he’s looking at, but rather to have a model description that is at the right level of detail. He argues that, when modeling data at a too coarse scale, things look stochastic while they may contain a lot more detail when looking at them in more detail with appropriate statistical models; hence the whole fuzz about higher-dimensional trade-offs and so on. Maybe that’s just another way of saying that there are mechanisms underlying the stochasticity – I have the feeling it is a slightly different angle though. The conclusion, however, is the same: think well about what you model with your “error” model.

Pingback: Do we need to derive mechanistic error distributions for deterministic models? | Just Simple Enough: The Art of Mathematical Modelling

Pingback: Probabilistic models in statistical analysis – mechanism or phenomenology? « theoretical ecology