67 thoughts on “A simple Metropolis-Hastings MCMC in R

  1. Pingback: Setting up OpenBugs and R « theoretical ecology

  2. Pingback: MCMC chain analysis and convergence diagnostics with coda in R « theoretical ecology

  3. Thank you very much for an excellent blog post about MCMC. I have noticed, however, a bit strange thing in my opinion: in the “prior” function you return a sum, not a vector, of obtained value (return(aprior+bprior+sdprior)). Should not that be a vector? otherwise how else would you call again this function when you do sampling?
    Anna
    PhD student in evolutionary biology

    • Hi Anna,

      glad you like the post.

      Regarding your question: in principle, I could also return a vector with the prior probability for each parameter separaetely, but later those would be summed up anyways.

      The posterior (which is what we are working on with the MCMC) is given by

      P(Parameters|Data) = L(Data|Parameters) * Prior(Parameters)

      where Prior(Parameters) is the prior probability that a certain combination of parameters will occurr together. As we assume here that the prior probabilities for each parameter are independent, Prior(Parameters) = Prior(Parameter1) * Prior(Parameter1) * … , or if you work with the log of the probabilities, it’s the sum – this is why I return the sum here, it’s basically already the joint prior for a certain combination of parameters to occurr together based on the individual priors we assumed.

      Cheers,
      Florian

  4. Wonderful post, it helped me a lot. But I have a question. In the section “Defining the prior”, you said “To make it easy, I used uniform distributions for all three parameters, but not that for the standard deviation”. For paramter “b”, you define bprior = dnorm(b, sd = 5, log = T) in the prior function. I’m confused.

  5. Pingback: A simple Approximate Bayesian Computation MCMC (ABC-MCMC) in R « theoretical ecology

  6. great post. It helped me a lot, but i have a question. How can we use mcmc when we want to estimate the parameters of a multiple regression with 4 variables for example?

  7. Hi Florian,

    Thank you very much for a very clear introduction on MCMC. However, after playing around with your code, I have noticed a problem:

    The performance of the algorithm seems to get progressively worse as the sample size increases. E.g. sampleSize <- 301 and sampleSize <- 3001. The acceptance rate becomes extremely low in the latter case. As the log-likelihoods become large negative numbers, so will their differences (i.e., posterior(proposal) – posterior(chain[i,]) is a very large negative number for most proposed steps).

    When sample size is large, it seems to be difficult to get the chains to mix well again by reducing the variances of the proposal distributions.

    Am I missing something, or is this a common problem? Is there a way to get around this problem, or is it a matter of find tuning the proposal variances?

    Best regards,
    Wilson

    • Hi Wilson,

      your observation is correct. The reason is that if you have more data, the posterior gets more narrow. As a result of that, the proposal function that I suggested is too wide and produces lots of rejections – adjust it and it will get better (professional samplers like JAGS do this automatically for you). I explain this in detail in a follow-up post https://theoreticalecology.wordpress.com/2011/12/09/mcmc-chain-analysis-and-convergence-diagnostics-with-coda-in-r/ . Also, note that with so much data, you initial guess might be much worse (in terms of relative goodness of fit) than before, so you might need to use a longer burn-in time. As a third comment, using that amount of data, it becomes very clear now that a flat prior is not uninformative, i.e. it carries a preference towards larger values for the standard deviation. If you want a more uninformative prior, a 1/sd prior scaling for the standard deviation would probably correct that and lead asymptotically to a better retrieval of the original parameters.

  8. Pingback: Olhando o MCMC mais de perto… « Recologia

  9. Hello Florian, the explanation is superb and very helpful!!! I have one question though, i was trying to use the same code to solve another(similar) simple regression question. I run into trouble especially in the proposal function. I suspect it is in the choice of the std deviations. In your example above, how did you choose the sd values sd= c(0.1,0.5,0.3), in the proposal function? Thank you in advance.

    • Hi Ngesa, first have a look at the last section of this post https://theoreticalecology.wordpress.com/2011/12/09/mcmc-chain-analysis-and-convergence-diagnostics-with-coda-in-r/ which explains the basic problem.

      Basically, you want your proposal to fit to the shape of the posterior you sample from – very brief, one could say that for unimodal posterior distributions, a proposal with a similar shape as the posterior (including correlations between parameters), but a bit shrunk as compared to the posterior usually works well.

      How much you should shrink depends on the number of parameters you vary in the proposal, the shape of the posterior and the how well your proposal mimics that shape – typically you will have to adjust the proposal after running your sampler for the first time (that is all right as long as you discard all samples during adjustment). As an initial guess (don’t quote me for that) I would start for a typical regression problem with 3-8 parameters and all parameters varied at once with 1/10th of the posterior uncertainty that I expect, but again, this depends a lot of how much parameters you have and how much correlation I would expect.

      For more details see

      Gelman, Andrew, et al. Bayesian data analysis. Chapman & Hall/CRC, 2004.

      Andrieu, Christophe, and Johannes Thoms. “A tutorial on adaptive MCMC.” Statistics and Computing 18.4 (2008): 343-373.

  10. I was wondering what if instead of log =T, I substitute log = F. How does that affect the over all results?
    If you can please explain then I would greatly appreciate it.
    Thank you

    • In principle it’s the same of course, it’s simply that computers have problems storing small numbers, as explained in the paragraph “why we work with likelihoods”

      Try it out, if you create sufficiently large data, you will get rounding or underflow problems when working with Log = F (of course you have to update the formulas, e.g. instead of probab = exp(posterior(proposal) – posterior(chain[i,])) you sould have probab = posterior(proposal) / posterior(chain[i,]). We sometimes do this as an exercise in our courses.

  11. this may be a basic question, but in the above you call a the intercept. isn’t it the slope of the regression line? forgive me if there is something obvious i am missing.

  12. Pingback: MCMC for maximum likelihood in R | Trees 'r' Stats

  13. Pingback: Running Bayesian models | Sam Clifford

  14. Super helpful post !
    I have been reading books, chapters, presentations, and papers about MCMC in the last 2 months. This post is just in time to clear up my cloudy neural network in my brain for MCMC.

  15. Pingback: MCMC | Letian's Blog

    • Looks roughly the same to me, except that you have a different size of your plot window, this may depend on your OS / editor / output device.

      I would say everything is fine. Small differences of the results are expected of course, because, as you say, we haven’t fixed the random seed, so each time we are running this we have a slightly different dataset.

  16. Pingback: A simple explanation of rejection sampling in R | theoretical ecology

  17. Pingback: Making the Most of MCMC | Sweet Tea, Science

  18. Thanks a lot for this very helpful post. I have just a doubt. Is it normal than the prior distributions are not updated through the iterations of the chain?
    Thanks again
    Fabio

  19. Amazing, great post, is it possible to re-calculate it by excel so it would make easy to understand for those who unfamiliar with R.
    thaks
    Haolia

    • Hi Haolia, glad you liked the post! About Excel – I guess it would be possible, but I have to admit I don’t see the point. Probably I would have to use Visual Basic to do this in Excel, and if you understand VBA, you should also be able to read R code. In general, if you want to do serious stats, the best thing to do is to stop using Excel for anything else than data entering, and get used to R.

  20. Great post. Really cleared things up for me. Question, however. If a non-symmetric proposal function was used a “correction factor” would be required, correct? I’m familiar with how to do it for a single parameter but, if required, how would it be implemented in the multi-parameter case you’ve described? Thanks.

    • Yes, if you have a non-symmetric proposal distribution g(x’|x) for going from x to x’, you will have to correct the Metropolis ratio with g(x’|x) / g(x|x’). This works for any number of dimensions.

      In practice, this is rarely used though. Most situations where a non-symmetric proposal would be useful can better be dealt with by applying a transformation on the parameter space.

  21. Hello, that’s great tutorial and sample. Thank you! But in the likelihood function, you are using the truth data x as the prediction. What should I use if x is unknown? I only have data y, and know that y is generated from an unkown x. Thanks.

    • Hi Jerome – in a regression, the assumption is that you know both the observed response (y) and the predictor (x), and what you want to estimate is the unknown effect of x on y.

      The problem you have seems different, but I’m afraid you will have to be a bit more specific about what you want to achieve to get sensible help.

      If you want to do something that is completely different from a regression, I would suggest that a forum such as stats.stackexchange.com/ is a more suitable place to get help than this post.

  22. Very helpful post! I read several articles and book chapters on MCMC, but this post makes it more concrete for me. Can you help me with a question though? Why didn’t you update each parameter separately? Thank you!

    • Glad this was useful to you Guoguo.

      About your question – sure, one can update one parameter at a time. This algorithm is called Metropolis-within-Gibbs. It is one of the many extensions of the standard Metropolis that aim at improving the speed of convergence for higher-dimensional more complex target distributions. For low-dimensional normal targets it will converge slightly slower though.

  23. Oh my GOD!! This is an extremely well executed post! Thank you very much to make it so clear! This is the exact example of how to be a teacher.

  24. Hi,

    thanks a lot for very helpful code. I just have one question. Which parts of the code except number of parameters and size of the matrix should be changed in order to make it suitable for estimating just one parameter?

  25. I’m not sure this is a Metropolis-Hastings implementation rather than Metropolis only. The former is for asymmetric distributions and includes the marginal probabilities in the accept/reject criteria ratio. In this case, because you are using symmetric distributions (normal and uniform) they are equivalent (I think anyway), but if you were to include asymmetric distributions (e.g. beta, gamma, lognormal) then this would not be a Metropolis-Hastings example.

    Your line of code below does not include the marginal probabilities.
    probab = exp(posterior(proposal) – posterior(chain[i,]))

    See Hobbs and Hooten 2015 Ch. 7. Please correct me if I’m wrong.

    • I guess that’s true – I didn’t code the correction term because I used a symmetric proposal (in this case, the correction is always 1), but I guess it might be more pedagogical to add it, or change the name.

      Note, however, that the correction works with the proposal distribution – you seem to refer to the prior distribution, the prior is accounted for here.

      I didn’t get what you mean by the “Your line of code below does not include the marginal probabilities. …”

  26. Excellent post with nice responses! Many of my questions were answered in the discussion below!

    However, I still don’t understand how you chose sd = c(0.1,0.5,0.3) in the proposal. I understand that you will adapt this based on your expectations on the uncertainty you expect:

    “As an initial guess (don’t quote me for that) [sorry I quoted you here…] I would start for a typical regression problem with 3-8 parameters and all parameters varied at once with 1/10th of the posterior uncertainty that I expect”

    How should I “guess” those values and how do you get a posterior uncertainty?

    Also, why are you comparing the probability of acceptance with a random uniform number (between 0 and 1)? Is it just a convenience in the coding? I bet that in JAGS or STAN it’s way more complex than that!

    if (runif(1) < probab){ # If the probability we get is higher than a random probability
    chain[i+1,] = proposal
    } else {
    chain[i+1,] = chain[i,]
    }

    Thanks again!

    • About the guessing – I often have a rough idea, but if you don’t know, would start with 1/20 of the prior or so.

      About the runif(1) < probab – yes, this is one option to code a draw from a Bernoulli trial, another option would be using rbinom(). As for any random number generation, there are many algorithms to do the job, and optimizing this is quite an art. I'm simply using this because this is a standard way to code this, and I wouldn't be surprised to see this code also in other software. I doubt this is a runtime-critical part of the code for any sampler.

  27. Hi. You use a different seed for every iteration of the random uniform. I’m wondering why you didn’t use the same seed for each random variate. Thanks, Brent

  28. Thank you for this concrete example of the code underlying MCMC. It was extremely helpful. It took me a while to “unpack” and understand your likelihood and prior functions, since I am not fully conversant with functions in R, but it was definitely worth the trouble.

  29. Thank you for this example, extremely useful for a newbie to R and MCMC. Could I ask how you would use this type of model practically.

    In my mind we could use MCMC to see reasonable acceptable results for y given the same x.

    To that end, how could we use our chain results to produce a sequence of y? Or would we need to edit the functions to produce a chain of y values?

    Please correct me if I have got this wrong!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s