22 thoughts on “DHARMa – an R package for residual diagnostics of GLMMs

  1. This is a great contribution! I’m looking forward to trying it, and I imagine the actual simulation process should be pretty quick for generating new residuals.

    A couple things I was wondering about:
    I was curious why you went with uniform residuals, rather than passing the uniform resids through the normal quantile function to get normally distributed residuals as in Dunn and Smyth. I’m not sure which is easier to visually detect deviations from the actual distribution, so both approaches seem fine, but my inner statistician always leans towards the normal distribution. 🙂

    I was also wondering why you went with simulating new data, rather than just using the quantile function for the error distribution of the glmm that a user chose. I would think that, as long as you’re simulating new data from the same fixed effects and random effects as fitted by the model, the simulated distribution should just be the error distribution conditional on the mean value anyway. Does the simulation take model uncertainty into account, by drawing new fixed effects from the var/covar matrix?

    Thanks again for putting this together! This does seem really useful, and hopefully helps people start to adopt a more error-statistical, model-testing approach with glmms.


    • Hi Eric,

      thanks for the questions, great points!

      About the uniformity: my statistical home is built more in Bayesian territory, and there you transform to uniformity (Bayesian p-values), so uniform seems more natural to me. I also think it’s a better choice because our eye is good at detecting inhomogeneities, but not deviations from normality.

      That being said, I did consider the transformation to normality, because statistical tests that a user might want to apply on the residuals may require normality. As an example, I was concerned about the testTemporalAutocorrelation() function in DHARMa, which employs the Durbin-Watson test. I therefore implemented the normal transformation, you can access it via $scaledResidualsNormal in the simulated residual object.

      I realized though that the transformation is not without problems – the main issue is that in many situations, it is quite likely to have all simulations lower or higher than the observed value, which leads to 0 or 1 in the current residuals, mapping to -Inf / Inf with qnorm. This could probably be fudged away, e.g. by adding some kernel density to the ecdf estimation, but who knows what that would do to the test properties, so in the end I decided that it’s safer to perform all tests on the uniform residuals (and the DB test seemed to perform fine on the uniform residuals).

      Why not using the analytical quantiles / cumulative densities? Reason 1 is integer-valued distributions: imagine you expect 30% zeros, and you observe a 0 – on which quantile are you? Adding some noise essentially solved this, which is the point of the Dunn and Smyth paper. Reason 2 are the random effects – if you would use the CDF of your outer distribution, you residuals would be conditional on the fitted random effects. In DHARMa, the default is to re-simulate the random effects as well, which I think provides a stronger test of the model assumptions.


      • Hi Florian,

        Sorry for the delayed reply; I’ve been in a big meeting for the past couple days, and wasn’t able to reply, but I appreciate the response!

        Glad to hear you implemented both uniform and normal residuals. Do you know of any psychology literature on what kind of distributions it’s easiest to see a deviation from, and what kind of deviations people can detect visually? I tried Googling around, but couldn’t find anything. My instinct is that it should be easier to see changes in variance and maybe non-linearity with uniform residuals, but easier to detect outliers with normal residuals, as in the uniform case, large outliers would get contracted toward the center.

        For the case of re-simulating the random effects: I wonder about the effect of this. It seems like this would always increase the variance of the simulated values, and therefore reduce how extreme a given residual is relative to its simulated distribution. In not sure I see when that would be useful for diagnosing modelling issues, as it would seem to obscure most patterns that would occur in the residuals. Is there a case you were thinking of where this would make a pattern of errors clearer or easier to detect?

        I do generally think we should treat random effects as a higher-level residual as well; we often don’t test to see if the random effects follow the assumed distribution, or if there are relationships between our linear group-level predictors and the remaining random effects. However, I think that’s often best done by plotting the fitted random effects separately.


        • Interesting idea with the behavioral studies – I don’t know of any. I find it very plausible that general violations of the distributional assumptions are easier to detect in the uniform option, but yes, outliers may be less prominent to the eye – one would have to do an experiment.

          About the random effects – my thinking is this: imagine you include an individual-level RE in a model with a missing quadratic predictor. The RE will fudge away the model misfit. However, now the individual random effect estimates will systematically depend on the missing predictor. This problem is pervasive in more complicated hierarchical models, e.g. state-space models always look great if you only look at the top level. If you re-simulate, you will see this. Of course, there would be other options to check this, most importantly to look at the RE estimates, but with re-simulating you get everything in one go. I therefore think it’s the better default option, even if you loose a bit power.


          • Hhhmm, good point about the individual-level random effects. Those are definitely important to simulate, since the random effect is essentially there as part of the error model anyway. For random effects where you have multiple observations per group, though, I still think you might be better off just treating the random effect as a higher-order residual, and look for patterns in the random effects.

            For the behavioural studies: I feel like I have seen something about this, but I can’t find it for the life of me. Given how frequently we use residuals to assess model fit, it feels like such an obvious study to do, to actually test how well people can detect problems with a model using different types of residuals. I may poll the Rstats people on twitter to see if anyone knows about any study like this.


          • About the RE: hmm … I think I will do a few simulations to see how much power you loose if you re-simulate the RE. I wonder if people have given this much thought in general, also for Bayesian p-values.

            If you do find out about the uniform / normal question, please do let me know.


  2. There was a real need for a function like this, and this is a very nice solution, which I’ve been recommending to people who come to me with weird looking residuals plots. I’ve been using a simple visual version of the same approach (sim.residplot from https://github.com/pcdjohnson/GLMMmisc), which involves comparing the residuals-v-fitted plot to a simulated version, repeating a few times, and deciding informally whether the two look sufficiently similar. It also corrects the residuals from Poisson-lognormal GLMMs, which look nasty with the default resid and fitted functions.

    Have you seen this paper?

    Click to access Paper%2041%202001%20Elston%20Tick%20aggregation%20Parasitology.pdf

    There are some similarities, e.g. using noise to get around the discrete response problem (Fig 2). It also looks at the distribution of the REs — I guess there’s less of a need for simulation here as the assumed distribution is always normal (in lme4, at least).

    I’ll look forward to further developments!

    Liked by 1 person

    • Hi Paul,

      thanks for the encouraging words, and the reference as well, which I hadn’t known before. I guess there have to be even more applied papers that use this method, it’s a known concept after all, especially in Bayesian Analysis.

      How to deal with the random effects is still an issue for me, see also the discussion above. The random effect model assumes, but importantly also ENFORCES normality of the REs, in the sense that even if the underlying distribution of the REs is not normal, the estimates may look normal after the fit. Or at least more normal than they really are. So, performing a normality test on the RE will not be optimal in terms of power, and simulation could solve that issue – whether that matters in practice is yet another question.

      Besides this more theoretical point, however, there is also a practical issue. Re-simulating the REs tests all assumptions of the model in one go. If you don’t do that, you would not only have to test your REs for normality, but also for correlation with predictors, residuals, space and time, because all this can indicate that the REs mask an issue in your model. Imagine, for example, that you have a spatial effect that is not accounted for by the model. An individual-level RE added to account for overdispersion may absorb this spatial effect so that nothing appears in the normal residuals – you would only notice if you plot the RE estimates against space, but who does that? So, I’m still thinking, but I do lean towards keeping the current default of DHARMa to re-simulate the REs as a default.


  3. Hi Florian,

    I’ve been trying out your package, and it seems to be working quite well, except when I try to use it on glmms with a gamma distribution. The simulationOutput function gives the message

    “Warning message:
    In rgamma(nsim * length(ftd), shape = shape, rate = shape/ftd) :
    NAs produced”

    and plotSimulatedResiduals produces very bizarre looking plots.

    I tried to simulate some data to understand why, but I can see that the createData function doesn’t support a gamma distribution. Is this a feature you could potentially add?


  4. Pingback: Bayesian model checking via posterior predictive simulations (Bayesian p-values) with the DHARMa package | theoretical ecology

  5. This seems like a great package. I’m familiar with these models in general, but I’m new to simulation to handle residuals. However, one thing I’m curious about is how researchers might go back to their original data set and transform or exclude cases based on the results of the simulation. Perhaps as I dig more around it will become obvious, but a description of that would make your package even more useful. Thank you so much for creating this.


  6. Hi Florian!

    I’m trying the DHARMa package for a GLMM. To note in advance, I’m not a professional in statistics.
    So, I’m running into following problem:
    I use the simulateResiduals to do the residual analysis of my GLMM. Then the following is returned:
    “Unable to calculate quantile regression for quantile 0.25. Possibly to few (unique) data points / predictions. Will be ommited in plots and significance calculations.
    Unable to calculate quantile regression for quantile 0.5. Possibly to few (unique) data points / predictions. Will be ommited in plots and significance calculations.
    Unable to calculate quantile regression for quantile 0.75. Possibly to few (unique) data points / predictions. Will be ommited in plots and significance calculations.”

    Is this a problem and if yes how to solve it?
    Many thanks in advance,



    • Hi Saskia, the error message seems clear to me – what exactly is your question / doubt? Also, if you have further questions about DHARMa, the best place is to create a issue on the DHARMa GitHub repo.


      • Hi Florian!
        Thank you for your reply. Unfortunately, the error message is not clear to me.
        I thought it is because of the limited observations the GLMM is based on (~140 oberservations) and the maybe too many variables (about 5 or even more) of the model. However, if reducing the variables in the model to only one fixed and one random effect, I still run into the same error message. So I probably don’t understand why I run into it.


  7. Maybe I got it now. Sorry for the second posting. The issue of not having enough data points for doing quantile regression is not dependent on the number of variables within a model?
    However, the points within the plot residuals vs. predicted should be as randomly distributed as possible, right? So this is similar to the build-in LM diagnostic plots in base r, particularly the first one with residuals vs fitted? This is how I understood it after reading about GLMM, DHARMa, etc.


    • Hi Saskia, sorry for the late reply – I don’t check this blog very frequently, for faster responses to questions regarding DHARMa, feel free to use the issue tracker at https://github.com/florianhartig/DHARMa/issues

      Anyway, regarding your questions

      a) whether the quantile regression works depends indeed mainly on the number of data points, but also on their distribution (and not on the predictors). I would have to see your residuals to see why the regression lines fail in this case. Maybe a large number of your values has the same x or y value, which would make it harder to fit a quantile line.

      b) yes, the DHARMA residuals are similar to the LM, just that the expected distribution if the model is correct is uniform, not normal. See the package’s vignette for a more detailed discussion.


      • Hi Florian!

        It could also be related to using a quite old version of R. I updated it and now it shows the regression lines for quantile regression.

        Thanks a lot!



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s