Introduction to mcunit

Increasingly sophisticated MCMC and Monte Carlo algorithms raise the scope for errors, either in derivations of mathematical quantities needed for sampling, or implementation errors. Testing for such errors should be an integral and routine part of the workflow of any Bayesian analysis.

mcunit makes it easy to do this, by allowing for unit testing MCMC and other Monte Carlo implementations within the framework of the testthat package (Wickham 2011). The methods correspond to those proposed in Gandy and Scott (2020). They are based on statistical hypothesis testing, and are designed to achieve an arbitrarily low false rejection probability, so that the user can be confident that a correctly implemented algorithm will almost certainly not be rejected. By default, this false rejection rate is set to 10⁻⁵.

MCMC samplers can be testes using expect_mcmc or expect_mcmc_reversible. These functions use different statistical hypothesis tests described in Section 2.1 and 2.2 of (Gandy and Scott 2020) respectively. expect_mcmc_reversible requires that the sampler to be tested is (or at least, is expected to be) reversible.

This vignette demonstrates how to use these tests, using the following simple example. Consider the model y ∼ θ₁ + θ₂ + ϵ, where θ := (θ₁, θ₂) is apriori independent, zero mean normal with standard deviation σ = 10. The white noise term ϵ is independent from θ and also zero mean normal but with variance σ_ϵ² = 0.1. While inference is easy here, we consider drawing samples from the posterior π(θ ∣ y) using a Gibbs sampler. The posterior conditional distributions for θ₁ and θ₂ are normal with expectations $$\mathbb{E}(\theta_i | y, \theta_j) = \frac{\sigma^2}{\sigma^2_{\epsilon} + \sigma^2}(y - \theta_j),$$ and variances $$\mathbb{V}(\theta_i | y, \theta_j) = \frac{1}{\frac{1}{\sigma^2_{\epsilon}} + \frac{1}{\sigma^2}}.$$

Correct Sampler

Start with a correctly implemented random scan Gibbs sampler. First load the package and set the seed.

require(mcunit)

## Loading required package: mcunit

set.seed(10)

The following function updates one element of θ given the other and given the observed data y.

gibbsUpdate <- function(y, theta_j) {
  # Samples theta_i given y and theta_j
  mean <- 100 * (y - theta_j) / (100 + 0.1)
  var <- 1. / (1. / 100 + 1. / 0.1)
  rnorm(1, mean=mean, sd=sqrt(var))
}

The argument y is the observed data, and theta_j is the component to condition on. From this, we define a correctly implemented random scan sampler.

randomScan <- function(theta, y, thinning) {
  # Random Scan Gibbs
  for(i in 1:thinning) {
    # select index to update
    i <- sample.int(2,1)
    theta[i] <- gibbsUpdate(y, theta[i %% 2 + 1])
  }
  theta
}

This function takes the current state of the chain theta, the data y and a thinning parameter which determines the number of individual updates to make before returning the new state.

Our task is to test the correctness of randomScan. We will do this using both expect_mcmc and expect_mcmc_reversible. This latter method can be used because the sampler should, if correct, be reversible.

Both methods require a list object which describes the MCMC sampler to be tested. The list must contain the following elements:

object$genprior: A function with no arguments that generates a sample from the prior distribution. No default value.
object$gendata: A function that takes as input the parameter value (of the type generated by genprior) and returns the observed data as an arbitrary R object. No default value.
object$stepMCMC: A function that takes three arguments:
- theta: the current position of the chain (of the same type as produced by the prior),
- dat: the observed data (of the same type as produced by gendat)
- thinning: the number of steps the chain should take. 1 corresponds to one step.
object$test: Function that takes either one or two arguments, and returns a vector with components which will be used for checking the MCMC sampler. The first argument is interpreted as a parameter value, and if a second argument exists, it is interpreted as a data value. An example is the identity function: f(θ) = θ. Alternatively, if you have access to the model’s likelihood function, you could use p(y ∣ θ).

Please see the documentation for expect_mcmc and expect_mcmc_reversible for further details on this.

We begin constructing this list by defining the prior sampler and the data sampler.

obj <- list()
obj$genprior <- function() rnorm(n=2, mean=0, sd=10)
obj$gendata <- function(theta) sum(theta) + rnorm(n=1, mean=0, sd=sqrt(0.1))

As test functions, we use the components θ₁, theta₂, the prior density π(θ) and likelihood function p(y ∣ θ). The density and likelihood appear to be a good default choice because they are one-dimensional regardless of the dimension of the parameter vector and data.

priorDensity <- function(theta) prod(dnorm(theta, mean=0, sd=10))
likelihood <- function(theta, y) dnorm(y, mean=sum(theta), sd=sqrt(0.1))
testVec <- function(theta, y) c(theta[1], theta[2], priorDensity(theta), likelihood(theta, y)) 
obj$test <- testVec

Finally, we are left to define a single MCMC step.

obj$stepMCMC <- function(theta, dat, thinning) randomScan(theta, dat, thinning)
print(obj)

## $genprior
## function () 
## rnorm(n = 2, mean = 0, sd = 10)
## 
## $gendata
## function (theta) 
## sum(theta) + rnorm(n = 1, mean = 0, sd = sqrt(0.1))
## 
## $test
## function (theta, y) 
## c(theta[1], theta[2], priorDensity(theta), likelihood(theta, 
##     y))
## 
## $stepMCMC
## function (theta, dat, thinning) 
## randomScan(theta, dat, thinning)

First consider expect_mcmc. We use 100 thinning steps between samples to give the test some power to detect errors.

expect_mcmc(obj, thinning = 100)

No errors were detected, because there are none. Recall that the false rejection rate is set to 10⁻⁵ by default, and so repeated application of this test should very rarely flag randomScan.

This sampler is also reversible, and so we try expect_mcmc_reversible.

expect_mcmc_reversible(obj, thinning = 10, nsteps = 10)

Again, no error is detected (as expected).

A Correct Non-Reversible Sampler

Consider systematic scan of the vector θ rather than random scan.

systematicScan <- function(theta, y, thinning) {
  # Systematic Scan Gibbs
  for(i in 1:thinning) {
    theta[1] <- gibbsUpdate(y, theta[2])
    theta[2] <- gibbsUpdate(y, theta[1])
  }
  theta
}

This sampler is correct and expect_mcmc is unlikely to falsely detect errors.

obj$stepMCMC <- function(theta, dat, thinning) systematicScan(theta, dat, thinning)
expect_mcmc(obj, thinning = 100)

Good. What happens using expect_mcmc_reversible with thinning of 1?

expect_mcmc_reversible(obj, thinning = 1, nsteps = 10)

## Error: Test failed for components 1 with p-values=1.431056e-14 in iteration 2
## 
## Iteration 2 (4000 samples)
## Rejection if min(p) <= 2.44262e-06; No rejection if min(p) > 0.03655569.

The test failed. This illustrates why expect_mcmc_reversible should not be used for a non-reversible sampler - we have no guarantee over the false rejection rate. Even though this sampler is correctly implemented, we are erroneously detecting an error.

An Incorrect Sampler

Now we make a genuine mistake in the sampler, and investigate whether the methods detect it. We replace the variance terms σ² and σ_ϵ² in 𝕍(θ_i|y, θ_j) with their corresponding standard deviations (i.e. σ and σ_ϵ).

gibbsUpdate <- function(y, theta_j) {
  # Samples theta_i given y and theta_j
  mean <- 100 * (y - theta_j) / (100 + 0.1)
  var <- 1. / (1. / 10 + 1. / sqrt(0.1))
  rnorm(1, mean=mean, sd=sqrt(var))
}

Also make sure to switch back to random scan…

obj$stepMCMC <- function(theta, dat, thinning) randomScan(theta, dat, thinning)

Rerunning the tests…

expect_mcmc(obj, thinning = 100)

## Error: Test failed for components 4 with p-values=1.067384e-36 in iteration 1
## 
## Iteration 1 (1000 samples)
## Rejection if min(p) <= 3.571429e-07; No rejection if min(p) > 0.03655361.

expect_mcmc_reversible(obj, thinning = 10, nsteps = 10)

## Error: Test failed for components 4 with p-values=4.940024e-63 in iteration 1
## 
## Iteration 1 (1000 samples)
## Rejection if min(p) <= 3.571429e-07; No rejection if min(p) > 0.03655361.

Great! Both tests detect a problem. Even better, you can be pretty much sure there is a problem because the chance of a false rejection is upper bounded by 10⁻⁵!

References

Gandy, A., and Scott, J. (2020), “Unit testing for MCMC and other Monte Carlo methods.”

Wickham, H. (2011), “Testthat: Get started with testing,” The R Journal, 3, 5–10.