Likelihood¶

The first step in a Bayesian analysis is to define the likelihood function. If you are unfamiliar with this term, I suggest you have a look at the Background section.

Standard Case¶

For practical purposes it is more convenient to work with the log-likelihood function. This is defined as \(\log \mathcal{L}(\theta)=p(D\vert\theta)\), where \(D\) are the data and \(\theta\) are the parameters of the model that we are trying to fit to the data. The (log-)likelihood function is not probability density and its specific form depends on the problem. For the vast majority of cases, the likelihood is assumed to be Gaussian.

Suppose that we want our likelihood function to be a Gaussian density with 10 parameters or in 10-D, we would do something like:

import numpy as np

# Define the dimensionality of our problem.
n_dim = 10

# Define our 3-D correlated multivariate normal log-likelihood.
C = np.identity(n_dim)
C[C==0] = 0.95
Cinv = np.linalg.inv(C)
lnorm = -0.5 * (np.log(2 * np.pi) * n_dim + np.log(np.linalg.det(C)))

def log_like(x):
    return -0.5 * np.dot(x, np.dot(Cinv, x)) + lnorm

The inclusion of the normalisation factor lnorm is not strictly necessary as it does not depend on x and thus does vary. This is a highly artificial scenario with no data or model. For an example of how to fit a model to some data using a Gaussian likelihood visit the Tutorials section.

Additional Arguments¶

If the log-likelihood function relies on additional arguments in the sequence log_likelihood(x, *args, **kwargs), then one can use the likelihood_args and likelihood_kwargs when initializing the sampler to provide them.

def log_likelihood(x, data, sigma, **kwargs):
    sigma_prime = kwargs.get("scale") * sigma
    return -0.5 * np.sum((x - data)**2 / sigma_prime**2)

import pocomc as pc
from scipy.stats import norm

n_dim = 5
prior = pc.Prior(n_dim * [norm(loc=0, scale=10)])

# Random data and sigma values
data = np.random.randn(n_dim)
sigma = np.ones(n_dim)

sampler = pc.Sampler(prior, log_likelihood, likelihood_args=(data, sigma), likelihood_kwargs={"scale": 1.0})

Vectorization¶

Sometimes it is possible to define a vectorized log-likelihood function that instead of accepting as input a single set of parameters (i.e., array of shape (n_dim,)) and returning the corresponding scalar value of the natural logarithm of likelihood, it takes as input an array of shape (N, n_dim) and returns an array of shape (N,). pocoMC can take advantage of the vectorization by setting vectorize=True. A simple example for a Gaussian likelihood of zero mean and unit variance is given below:

def log_likelihood(x):
    return -0.5 * np.sum(x**2, axis=1)

import pocomc as pc
from scipy.stats import norm

prior = pc.Prior(5 * [norm(loc=0, scale=10)])

sampler = pc.Sampler(prior, log_likelihood, vectorize=True)