This is a somewhat belated introduction of a package that we published on CRAN at the beginning of the year already, but I hadn’t found the time to blog about this earlier.
In the R environment and beyond, a large number of packages exist that estimate posterior distributions via MCMC sampling, either for specific statistical models (e.g. MCMCglmm, INLA), or via general model specifications languages (such as JAGS, STAN).
Most of these packages are not designed for sampling from an arbitrary target density provided as an R function. For good reasons, as statistical modelers will seldom have the need for such a tool – it’s hard to come up with a likelihood that cannot be specified in one of the existing packages, and if so, the problem is usually so complicated that it requires especially adopted MCMC strategies.
It’s another story, however, for people that work with process-based models (simulation models, differential equation models, …). These models are usually implemented in standalone code that cannot easily be integrated into JAGS or STAN (yes, I know STAN has an ODE solver, but generally speaking … ). What we can do, however, is to call any process model from R, calculate model predictions and then calculate a likelihood based on some distributional assumptions, e.g. as in the following pseudocode.
likelihood = function(param){ predicted = processmodel(param) ll = sum(dnorm(observed, mean = predicted, sd =param[x], log = T)) return(ll) }
This leaves us with a function likelihood(par) or posterior(par) that we want to sample from.
BayesianTools is a toolbox for this problem. It provides general-purpose MCMC and SMC samplers, as well as plot and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. The samplers implemented in the package are optimized for problematic target functions (strong correlations, multimodal targets) that are more commonly found in process-based than in statistical models. Available samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter.
Here an example how to run an MCMC on a likelihood function in BayesianTools
library(BayesianTools) setup = createBayesianSetup(likelihood, lower = c(-100,-100,0), upper = c(100,100,100)) out = runMCMC(setup, sampler = "DEzs", settings = NULL)
Obviously, you should read the help for details. The package supports most common summaries and diagnostics, including convergence checks, plots, and model selection indices, such as DIC, WAIC, or marginal likelihood (to calculate the Bayes factor).
gelmanDiagnostics(out) summary(out) plot(out, start = 1000) correlationPlot(out, start = 1000) marginalPlot(out, start = 1000) MAP(out) DIC(out) WAIC(out) # requires special definition of the likelihood, see help marginalLikelihood(out)
Below some example outputs obtained from calibrating the VSEM model, a simple ecosystem model that is provided as a test model in the package. The code to produce these plots is available when you type ?VSEM in R.

Summary of a Bayesian Tools MCMC output

One of the plot functions for time series data – the top panel shows observed data (crosses), the posterior predictive credible interval of the model (red) and the predictive interval (via posterior predictive simulations, yellow). The posterior predictive simulations can be used to formulate Bayesian residuals (Bayesian p-values) as model diagnostics, displayed in the bottom panels.
References
Hartig, F, Minunno, F, Paul, S (2017) BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics. R package version 0.1.3 [CRAN] [GitHub]
What is the processmodel function and the observed data?
LikeLike
It’s the processmodel you want to fit, and your observed data that you want to use to calibrate this model. As written in the text, this is pseudocode, it’s not supposed to run, you have to adjust this to your specific problem.
LikeLike
Dear Florian Hartig,
Thank you for your informations and package. I have been using this BayesianTools for estimating Posterior distribution of non-communicable disease model. I have a question “How can I turn these parameter chains into outcome chains”? Do you have an example code in your package if this is possible directly from the chains or if we have to rerun the model with parameters from the chains.
Best regards,
Wiriya
LikeLike
See https://github.com/florianhartig/BayesianTools/issues/70
LikeLike
Is the lower plot using plotTimeSeriesResults()? I had trouble getting this to work because I do not have an observation at every time step, and it didn’t seem to like NA. Does the error function have to return a vector of absolute errors corresponding to the “mean”?
LikeLike
Yes. I had thought NAs should work, but maybe it doesn’t – I have filed this as a question at https://github.com/florianhartig/BayesianTools/issues/102
Thanks for the hint with the error function, it’s really not clear in the help. I updated the help to
@param error function with signature f(mean, par) that generates observations with error (error = stochasticity according to what is assumed in the likelihood) from mean model predictions. Par is a vector from the matrix with the parameter samples (full length). f needs to know which of these parameters are parameters of the error function. See example in \code{\link{VSEM}}
LikeLike
Thanks for this fantastic tool!
I’m trying to calibrate my ecosystem model and in order to speed things up, I am calculating a cost function within the model. I wrote a wrapper to execute the model from within R (model is in Fortran) and am passing a single parameter (which is to be calibrated) through a system call from R to execute the model (parameter value is read from standard input). A single cost value is returned and read into R from the standard output. My aim was that this replaces having to calculate the log-likelihood as is described here. However, it doesn’t seem to work and I don’t understand why.
The general question is: Can one use any function that is to be maximised within runMCMC()?
I did:
negcost_function <- function( par ){
## this executes the model and reads the cost that is returned through the standard output (very last line)
cost = system( paste0("echo fcover ", sprintf( "%f", par ), " | ./runcmodel_simsuite | tail -n 1"), intern = TRUE)
## return the negative of the cost!
return(-(as.numeric(cost)))
}
setup <- createBayesianSetup( negcost_function, lower = 1, upper = 20 )
settings <- list( iterations = 1000, message = FALSE )
out <- runMCMC( bayesianSetup = setup, sampler = "DEzs", settings = settings )
LikeLiked by 1 person
yes, in principle this should work for any function that is to be maximized … what’s the error? Do you really have only 1 parameter, as your lower/upper suggests? if so, why use MCMC? For more technical questions, possibly better to use issues ob the BT github repo. best f
LikeLiked by 1 person
Dear Florian,
Thank you very much for this fantastic package. I love it. Here, I have a quick question about the proposal function used by runMCMC (sampler = “Metropolis”). I am wondering whether I can specify the proposal function by myself. If so, how? For a parameter P, for example, I want to use this proposal function: P_new = P_old + r * (P_max – P_min) / D. How can I incorporate this function into “runMCMC”? How about different proposal functions for different parameters?
I had briefly checked the code of runMCMC as well as Metropolis, but cannot work out how the proposal is generated. I appreciate any details about this.
Regards,
Zhongkui
LikeLike
Hi Zhongkui,
you can modify your proposal function, see help of Metropolis, https://rdrr.io/cran/BayesianTools/man/Metropolis.html
The proposal function should be multivariate, so you can do whatever you want with different parameters.
That being said – usually the DEzs will sample more efficiently than any naive guess proposal function, so I would recommend benchmarking your idea against the DEzs.
I have copied the question to https://github.com/florianhartig/BayesianTools/issues/170 – I’m trying to collect all questions there, so feel free to follow up there, or ask new questions via https://github.com/florianhartig/BayesianTools/issues
LikeLike
Dear Florian,
I just want to say thank you for this fantastic Bayesian tool. I’ve been using it for estimating posterior distributions of parameters for complex civil engineering models.
LikeLike
Thanks William!
LikeLike
Dear Florian
Thank you for this good job!
I want to use BayesianTool package to make sensitivity analysis of a Bayesian Model.
Have you some research or example to do such analysis.
LikeLike
Hi Fati, you mean sensitivity analysis of the likelihood? There’s an example here https://github.com/florianhartig/BayesianTools/blob/master/Examples/VSEM-Extended.R
LikeLike
Hi, Thanks gor such great tutorial. I have been trying to use this “Bayesiantools” package to estimate water column depth and benthic albedo from water leaving radiance as an input. I have a forward model which is used to calculate the estimated observations of water leaving radiance in order to generate the likelihood function. I also have created a set of prior for the parameters but when I try to run the sampling including the priors, The runMCMC function returns an error. Can you shed some light on this please?
LikeLike
Hello Sohan, glad you find the package useful! For technical questions, please use https://github.com/florianhartig/BayesianTools/issues, ideally with a reproducible example.
LikeLike
Hi Florian,
Thank you for a fantastic package and a great tutorial. I have been using BayesianTools to fit a highly non-linear ODE model to longitudinal data. Some of the model parameters have very different magnitudes (~ 10^-8 vs ~ 10^4).
To make the fitting easier, I was wondering if BayesianTools allows parameter transformations to log10 or log scale when defining the prior densities? Would you just add the Jacobian adjustment into the definition of the prior density itself?
LikeLike
Hello Tuulia,
sure, just transform you parameters in whatever way you like in the likelihood / prior. So, e.g. in the likelihood write
likelihood(par){
parT = log(par)
model(parT)
}
and the same for the prior. How to exactly set the prior depends on your problem, so I can’t say what’s sensible from the outside.
Best,
F
LikeLike
Thank you so much Florian for your quick response! I was getting confused because of how parameter transformations are implemented in STAN, but that works perfectly.
LikeLike