Background¶
Bayesian inference¶
In the Bayesian context, one is often interested to approximate the posterior distribution \(\mathcal{P}(\theta)\equiv p(\theta\vert d,\mathcal{M})\), that is, the probability distribution of the parameters \(\theta\) given the data \(d\) and the model \(\mathcal{M}\). This is given by Bayes’ theorem:
where
is the likelihood function,
is the prior probability density, and
is the so called model evidence or marginal likelihood.
Parameter estimation¶
The task of parameter estimation consists of finding the probability distribution of the parameters \(\theta\) of a model \(\mathcal{M}\) given some data \(d\). In practice this is achieved by approximating the posterior distribution by a collection of samples. The distribution of these samples can then be used to approximate various expectation values (e.g. mean, median, standard deviation, credible intervals, 1-D and 2-D marginal posteriors etc.)
as sums over the samples drawn from the posterior
Model comparison¶
For the task of Bayesian model comparison, one is interested in the ratio of posterior probabilities of models \(\mathcal{M}_{i}\) and \(\mathcal{M}_{j}\), given by
where the first term on the right-hand-side is the so called Bayes factor and the second term is the ratio of prior probabilities of the two models. The latter is often set to 1 (i.e. no model is preferred a priori). The Bayes factor on the other hand is simply the ratio of the model evidences of the two models, or
Preconditioned Monte Carlo¶
The Preconditioned Monte Carlo (PMC) algorithm is a variant of the Persistent Sampling (PS) framework, which is a generalization of the Sequential Monte Carlo (SMC) algorithm. The PMC algorithm is designed to sample from a sequence of probability distributions \(\mathcal{P}_{t}(\theta)\), where the target distribution \(\mathcal{P}_{t}(\theta)\) is defined by
where \(\mathcal{L}(\theta)\) is the likelihood function and \(\pi(\theta)\) is the prior probability density. The effective inverse temperature parameter \(\beta_{t}\) is initialized to 0 and is gradually increased to 1. When \(\beta_{t}=0\), the target distribution is the prior distribution, and when \(\beta_{t}=1\), the target distribution is the posterior distribution. The inverse temperature parameter is increased in each iteration by a small step size \(\Delta\beta\) until it reaches 1. The \(\Delta\beta\) is computed adaptively in each iteration to ensure PMC maintains a constant number of effective particles. In each iteration, the PMC algorithm samples from the target distribution \(\mathcal{P}_{t}(\theta)\) using a set of particles by applying a sequence of three steps:
Reweighting: The particles are reweighted to target the distribution \(\mathcal{P}_{t}(\theta)\).
Resampling: The particles are resampled according to their weights to ensure that the effective number of particles is constant.
Mutation: The particles are mutated by applying a number of MCMC.
The PMC algorithm terminates when the inverse temperature parameter reaches 1. The samples obtained from the PMC algorithm can be used to approximate the posterior distribution of the parameters \(\theta\) given the data \(d\) and the model \(\mathcal{M}\). The PMC algorithm is particularly useful for sampling from high-dimensional and multimodal posterior distributions. Furthemore, the PMC algorithm offers an estimate of the logarithm of the model evidence \(\log\mathcal{Z}\) which can be used for Bayesian model comparison.
The high sampling efficiency and robustness of the PMC algorithm is derived by three key features:
Persistent Sampling: The PMC algorithm maintains a set of particles throughout the entire run of the algorithm. This allows the PMC algorithm to reuse the particles from previous iterations to sample from the target distribution in the current iteration. This is particularly useful when the target distribution changes smoothly from one iteration to the next.
Normalizing Flow Preconditioning: The PMC algorithm uses a normalizing flow to precondition each target distribution \(\mathcal{P}_{t}(\theta)\). The normalizing flow is a sequence of invertible transformations that maps a simple distribution to the target distribution. The normalizing flow is trained to approximate the target distribution using a set of particles. Sampling in the target distribution is then performed by sampling from the simple distribution and applying the inverse of the normalizing flow. The normalizing flow preconditioning allows the PMC algorithm to sample from complex and multimodal target distributions.
t-preconditioned Crank-Nicolson: The PMC algorithm uses a t-preconditioned Crank-Nicolson integrator to evolve the particles in the target distribution. The t-preconditioned Crank-Nicolson algorithm is an MCMC method that scales well with the dimensionality of the target distribution. For targets that are close to Gaussian, the t-preconditioned Crank-Nicolson algorithm is particularly efficient and can scale to very high dimensions. For non-Gaussian targets (e.g., multimodal distributions), the t-preconditioned Crank-Nicolson algorithm can be combined with the normalizing flow preconditioning to sample from the target distribution efficiently even in high dimensions.
Unlike traditional samplers that rely on Random-walk Metropolis, Slice Sampling, Rejection Sampling, Importance Sampling, or Independence Metropolis, PMC can scale to high-dimensions without desolving into random-walk behavior.