This is a somewhat belated introduction of a package that we published on CRAN at the beginning of the year already, but I hadn’t found the time to blog about this earlier.

In the R environment and beyond, a large number of packages exist that estimate posterior distributions via MCMC sampling, either for specific statistical models (e.g. MCMCglmm, INLA), or via general model specifications languages (such as JAGS, STAN).

Most of these packages are not designed for sampling from an arbitrary target density provided as an R function. For good reasons, as statistical modelers will seldom have the need for such a tool – it’s hard to come up with a likelihood that cannot be specified in one of the existing packages, and if so, the problem is usually so complicated that it requires especially adopted MCMC strategies.

It’s another story, however, for people that work with process-based models (simulation models, differential equation models, …). These models are usually implemented in standalone code that cannot easily be integrated into JAGS or STAN (yes, I know STAN has an ODE solver, but generally speaking … ). What we can do, however, is to call any process model from R, calculate model predictions and then calculate a likelihood based on some distributional assumptions, e.g. as in the following pseudocode.

likelihood = function(param){ predicted = processmodel(param) ll = sum(dnorm(observed, mean = predicted, sd =param[x], log = T)) return(ll) }

This leaves us with a function likelihood(par) or posterior(par) that we want to sample from.

BayesianTools is a toolbox for this problem. It provides general-purpose MCMC and SMC samplers, as well as plot and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. The samplers implemented in the package are optimized for problematic target functions (strong correlations, multimodal targets) that are more commonly found in process-based than in statistical models. Available samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter.

Here an example how to run an MCMC on a likelihood function in BayesianTools

library(BayesianTools) setup = createBayesianSetup(likelihood, lower = c(-100,-100,0), upper = c(100,100,100)) out = runMCMC(setup, sampler = "DEzs", settings = NULL)

Obviously, you should read the help for details. The package supports most common summaries and diagnostics, including convergence checks, plots, and model selection indices, such as DIC, WAIC, or marginal likelihood (to calculate the Bayes factor).

gelmanDiagnostics(out) summary(out) plot(out, start = 1000) correlationPlot(out, start = 1000) marginalPlot(out, start = 1000) MAP(out) DIC(out) WAIC(out) # requires special definition of the likelihood, see help marginalLikelihood(out)

Below some example outputs obtained from calibrating the VSEM model, a simple ecosystem model that is provided as a test model in the package. The code to produce these plots is available when you type ?VSEM in R.

**References**

Hartig, F, Minunno, F, Paul, S (2017) BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics. R package version 0.1.3 [CRAN] [GitHub]

What is the processmodel function and the observed data?

LikeLike

It’s the processmodel you want to fit, and your observed data that you want to use to calibrate this model. As written in the text, this is pseudocode, it’s not supposed to run, you have to adjust this to your specific problem.

LikeLike

Dear Florian Hartig,

Thank you for your informations and package. I have been using this BayesianTools for estimating Posterior distribution of non-communicable disease model. I have a question “How can I turn these parameter chains into outcome chains”? Do you have an example code in your package if this is possible directly from the chains or if we have to rerun the model with parameters from the chains.

Best regards,

Wiriya

LikeLike

See https://github.com/florianhartig/BayesianTools/issues/70

LikeLike

Is the lower plot using plotTimeSeriesResults()? I had trouble getting this to work because I do not have an observation at every time step, and it didn’t seem to like NA. Does the error function have to return a vector of absolute errors corresponding to the “mean”?

LikeLike

Yes. I had thought NAs should work, but maybe it doesn’t – I have filed this as a question at https://github.com/florianhartig/BayesianTools/issues/102

Thanks for the hint with the error function, it’s really not clear in the help. I updated the help to

@param error function with signature f(mean, par) that generates observations with error (error = stochasticity according to what is assumed in the likelihood) from mean model predictions. Par is a vector from the matrix with the parameter samples (full length). f needs to know which of these parameters are parameters of the error function. See example in \code{\link{VSEM}}

LikeLike

Thanks for this fantastic tool!

I’m trying to calibrate my ecosystem model and in order to speed things up, I am calculating a cost function within the model. I wrote a wrapper to execute the model from within R (model is in Fortran) and am passing a single parameter (which is to be calibrated) through a system call from R to execute the model (parameter value is read from standard input). A single cost value is returned and read into R from the standard output. My aim was that this replaces having to calculate the log-likelihood as is described here. However, it doesn’t seem to work and I don’t understand why.

The general question is: Can one use any function that is to be maximised within runMCMC()?

I did:

negcost_function <- function( par ){

## this executes the model and reads the cost that is returned through the standard output (very last line)

cost = system( paste0("echo fcover ", sprintf( "%f", par ), " | ./runcmodel_simsuite | tail -n 1"), intern = TRUE)

## return the negative of the cost!

return(-(as.numeric(cost)))

}

setup <- createBayesianSetup( negcost_function, lower = 1, upper = 20 )

settings <- list( iterations = 1000, message = FALSE )

out <- runMCMC( bayesianSetup = setup, sampler = "DEzs", settings = settings )

LikeLiked by 1 person

yes, in principle this should work for any function that is to be maximized … what’s the error? Do you really have only 1 parameter, as your lower/upper suggests? if so, why use MCMC? For more technical questions, possibly better to use issues ob the BT github repo. best f

LikeLiked by 1 person

Dear Florian,

Thank you very much for this fantastic package. I love it. Here, I have a quick question about the proposal function used by runMCMC (sampler = “Metropolis”). I am wondering whether I can specify the proposal function by myself. If so, how? For a parameter P, for example, I want to use this proposal function: P_new = P_old + r * (P_max – P_min) / D. How can I incorporate this function into “runMCMC”? How about different proposal functions for different parameters?

I had briefly checked the code of runMCMC as well as Metropolis, but cannot work out how the proposal is generated. I appreciate any details about this.

Regards,

Zhongkui

LikeLike

Hi Zhongkui,

you can modify your proposal function, see help of Metropolis, https://rdrr.io/cran/BayesianTools/man/Metropolis.html

The proposal function should be multivariate, so you can do whatever you want with different parameters.

That being said – usually the DEzs will sample more efficiently than any naive guess proposal function, so I would recommend benchmarking your idea against the DEzs.

I have copied the question to https://github.com/florianhartig/BayesianTools/issues/170 – I’m trying to collect all questions there, so feel free to follow up there, or ask new questions via https://github.com/florianhartig/BayesianTools/issues

LikeLike