Sampler

class pocomc.Sampler(prior: callable, likelihood: callable, n_dim: int = None, n_effective: int = 512, n_active: int = 256, likelihood_args: list = None, likelihood_kwargs: dict = None, vectorize: bool = False, blobs_dtype: str = None, periodic: list = None, reflective: list = None, transform: str = 'probit', pool=None, pytorch_threads=1, flow='nsf6', train_config: dict = None, train_frequency: int = None, precondition: bool = True, dynamic: bool = True, metric: str = 'ess', n_prior: int = None, sample: str = 'tpcn', n_steps: int = None, n_max_steps: int = None, resample: str = 'mult', output_dir: str = None, output_label: str = None, random_state: int = None, n_ess: int = None)

Preconditioned Monte Carlo class.

Parameters:
  • prior (callable) – Class implementing the prior distribution.

  • likelihood (callable) – Function returning the log likelihood of a set of parameters.

  • n_dim (int) – The total number of parameters/dimensions (Optional as it can be infered from the prior class).

  • n_effective (int) – The number of effective particles (default is n_effective=512). Higher values lead to more accurate results but also increase the computational cost. This should be set to a value that is large enough to ensure that the target distribution is well represented by the particles. The number of effective particles should be greater than the number of active particles. If n_effective=None, the default value is n_effective=2*n_active.

  • n_active (int) – The number of active particles (default is n_active=256). It must be smaller than n_effective. For best results, the number of active particles should be no more than half the number of effective particles. This is the number of particles that are evolved using MCMC at each iteration. If a pool is provided, the number of active particles should be a multiple of the number of processes in the pool to ensure efficient parallelisation. If n_active=None, the default value is n_active=n_effective//2.

  • likelihood_args (list) – Extra arguments to be passed to likelihood (default is likelihood_args=None). Example: likelihood_args=[data].

  • likelihood_kwargs (dict) – Extra arguments to be passed to likelihood (default is likelihood_kwargs=None). Example: likelihood_kwargs={"data": data}.

  • vectorize (bool) – If True, vectorize likelihood calculation (default is vectorize=False). If False, the likelihood is calculated for each particle individually. If vectorize=True, the likelihood is calculated for all particles simultaneously. This can lead to a significant speed-up if the likelihood function is computationally expensive. However, it requires that the likelihood function can handle arrays of shape (n_active, n_dim) as input and return an array of shape (n_active,) as output.

  • blobs_dtype (list) – Data type of the blobs returned by the likelihood function (default is blobs_dtype=None). If blobs_dtype is not provided, the data type is inferred from the blobs returned by the likelihood function. If the blobs are not of the same data type, they are converted to an object array. If the blobs are strings, the data type is set to object. If the blobs dtype is known in advance, it can be provided as a list of data types (e.g., blobs_dtype=[("blob_1", float), ("blob_2", int)]). Blobs can be used to store additional data returned by the likelihood function (e.g., chi-squared values, residuals, etc.). Blobs are stored as a structured array with named fields when the data type is provided. Currently, the blobs feature is not compatible with vectorized likelihood calculations.

  • periodic (list or None) – List of parameter indeces that should be wrapped around the domain (default is periodic=None). This can be useful for phase parameters that might be periodic e.g. on a range [0,2*np.pi]. For example, periodic=[0,1] will wrap around the first and second parameters.

  • reflective (list or None) – List of parameter indeces that should be reflected around the domain (default is reflective=None). This can arise in cases where parameters are ratios where a/b and b/a are equivalent. For example, reflective=[0,1] will reflect the first and second parameters.

  • transform (str) – Type of transformation to apply to bounded parameters (default is transform='probit'). Available options are 'probit' and 'logit'. See Reparameterize for more details.

  • pool (pool or int) – Number of processes to use for parallelisation (default is pool=None). If pool is an integer greater than 1, a multiprocessing pool is created with the specified number of processes (e.g., pool=8). If pool is an instance of mpi4py.futures.MPIPoolExecutor, the code runs in parallel using MPI. If a pool is provided, the number of active particles should be a multiple of the number of processes in the pool to ensure efficient parallelisation. If pool=None, the code runs in serial mode. When a pool is provided, please ensure that the likelihood function is picklable.

  • pytorch_threads (int) – Maximum number of threads to use for torch. If None torch uses all available threads while training the normalizing flow (default is pytorch_threads=1).

  • flow (zuko.flow.Flow or str) – Normalizing flow to use for preconditioning (default is flow='nsf6'). Available options are 'nsf3', 'nsf6', 'nsf12', 'maf3', 'maf6', and 'maf12'. ‘nsf’ stands for Neural Spline Flows and ‘maf’ stands for Masked Autoregressive Flows. The number indicates the number of transformations in the flow. More transformations lead to more flexibility but also increase the computational cost. If a zuko.flow.Flow instance is provided, the normalizing flow is used as is. If a string is provided, a new instance of the normalizing flow is created with the specified architecture. The normalizing flow is used to precondition the MCMC sampler and improve the efficiency of the sampling.

  • train_config (dict or None) – Configuration for training the normalizing flow (default is train_config=None). Options include a dictionary with the following keys: "validation_split", "epochs", "batch_size", "patience", "learning_rate", "annealing", "gaussian_scale", "laplace_scale", "noise", "shuffle", "clip_grad_norm".

  • train_frequency (int or None) – Frequency of training the normalizing flow (default is train_frequency=None). If train_frequency=None, the normalizing flow is trained every n_effective//n_active iterations. If train_frequency=1, the normalizing flow is trained at every iteration. If train_frequency>1, the normalizing flow is trained every train_frequency iterations.

  • precondition (bool) – If True, use preconditioned MCMC (default is precondition=True). If False, use standard MCMC without normalizing flow. The use of preconditioned MCMC is recommended as it is more efficient and scales better with the number of parameters. However, it requires the use of a normalizing flow and the training of the flow can be computationally expensive. If precondition=False, the normalizing flow is not used and the sampler runs in standard mode. This works well for targets that are not multimodal or have strong non-linear correlations between parameters.

  • dynamic (bool) – If True, dynamically adjust the effective sample size (ESS) threshold based on the number of unique particles (default is dynamic=True). This can be useful for targets with a large number of modes or strong non-linear correlations between parameters.

  • metric (str) – Metric used for determining the next temperature (beta) level (default is metric="ess"). Options are "ess" (Effective Sample Size) or "uss" (Unique Sample Size). The metric is used to determine the next temperature level based on the ESS or USS of the importance weights. If the ESS or USS of the importance weights is below the target threshold, the temperature is increased. If the ESS or USS is above the target threshold, the temperature is decreased. The target threshold is set by the n_effective parameter.

  • n_prior (int) – Number of prior samples to draw (default is n_prior=2*(n_effective//n_active)*n_active). This is used to initialise the particles at the beginning of the run. The prior samples are used to warm-up the sampler and ensure that the particles are well distributed across the prior volume.

  • sample (str) – Type of MCMC sampler to use (default is sample="tpcn"). Options are "pcn" (t-preconditioned Crank-Nicolson) or "rwm" (Random-walk Metropolis). t-preconditioned Crank-Nicolson is the default and recommended sampler for PMC as it is more efficient and scales better with the number of parameters.

  • n_steps (int) – Number of MCMC steps after logP plateau (default is n_steps=n_dim). This is used for early stopping of MCMC. Higher values can lead to better exploration but also increase the computational cost. If n_steps=None, the default value is n_steps=n_dim.

  • n_max_steps (int) – Maximum number of MCMC steps (default is n_max_steps=10*n_dim).

  • resample (str) – Resampling scheme to use (default is resample="mult"). Options are "syst" (systematic resampling) or "mult" (multinomial resampling).

  • output_dir (str or None) – Output directory for storing the state files of the sampler. Default is None which creates a states directory. Output files can be used to resume a run.

  • output_label (str or None) – Label used in state files. Defaullt is None which corresponds to "pmc". The saved states are named as "{output_dir}/{output_label}_{i}.state" where i is the iteration index. Output files can be used to resume a run.

  • random_state (int or None) – Initial random seed.

evidence()

Return the log evidence estimate and error.

load_state(path: str | Path)

Load state of sampler from file.

Parameters:

path (Union[str, Path]) – Path from which to load state.

posterior(resample=False, return_blobs=False, trim_importance_weights=True, return_logw=False, ess_trim=0.99, bins_trim=1000)

Return posterior samples.

Parameters:
  • resample (bool) – If True, resample particles (default is resample=False).

  • trim_importance_weights (bool) – If True, trim importance weights (default is trim_importance_weights=True).

  • return_logw (bool) – If True, return log importance weights (default is return_logw=False).

  • ess_trim (float) – Effective sample size threshold for trimming (default is ess_trim=0.99).

  • bins_trim (int) – Number of bins for trimming (default is bins_trim=1_000).

Returns:

  • samples (np.ndarray) – Samples from the posterior.

  • weights (np.ndarray) – Importance weights.

  • logl (np.ndarray) – Log likelihoods.

  • logp (np.ndarray) – Log priors.

property results

Return results.

Returns:

results – Dictionary containing the results.

Return type:

dict

run(n_total: int = 4096, n_evidence: int = 4096, progress: bool = True, resume_state_path: str | Path = None, save_every: int = None)

Run Preconditioned Monte Carlo.

Parameters:
  • n_total (int) – The total number of effectively independent samples to be collected (default is n_total=2048).

  • n_evidence (int) – The number of importance samples used to estimate the evidence (default is n_evidence=4096). If n_evidence=0, the evidence is not estimated using importance sampling and the SMC estimate is used instead. If preconditioned=False, the evidence is estimated using SMC and n_evidence is ignored.

  • progress (bool) – If True, print progress bar (default is progress=True).

  • resume_state_path (Union[str, Path]) – Path of state file used to resume a run. Default is None in which case the sampler does not load any previously saved states. An example of using this option to resume or continue a run is e.g. resume_state_path = "states/pmc_1.state".

  • save_every (int or None) – Argument which determines how often (i.e. every how many iterations) pocoMC saves state files to the output_dir directory. Default is None in which case no state files are stored during the run.

save_state(path: str | Path)

Save current state of sampler to file.

Parameters:

path (Union[str, Path]) – Path to save state.