Normalizing Flow¶

Normalizing Flow Preconditioning¶

The source of the high sampling of efficiency and flexibility of pocoMC is its advanced preconditioning strategy. Preconditioning is Preconditioning is a technique used to make hard problems easier to solve. The main idea is to transform the original problem into a new one that is easier to solve. When the problem is a sampling problem (e.g., sampling from a probability distribution), preconditioning can be used to transform the original distribution into a new one that is easier to sample from (e.g., a distribution that is closer to the normal distribution).

To transform an arbitrary, often complex, probability distribution into a simple one, we need to define a flexible invertible transformation that can be applied to the complex distribution. This transformation is called the normalizing flow. The normalizing flow is a sequence of invertible transformations that map a simple distribution to a complex distribution. The normalizing flow is a powerful tool for generative modeling, density estimation, and variational inference.

pocoMC supports a plethora of normalizing flows implemented through the zuko package. The user can choose either a predefined flow or define their own flow.

Predefined Flows¶

The predefined flows are of two types:

Masked Autoregressive Flows (MAF): Masked Autoregressive Flow (MAF) is a type of normalizing flow that utilizes autoregressive models to parameterize the transformation from a simple base distribution to a more complex target distribution. It achieves this by applying a series of invertible transformations, each conditioned on previous variables in an autoregressive manner. The main advantage of MAF is its ability to efficiently compute the log-likelihood of the transformed data due to its autoregressive structure, which allows for exact likelihood evaluation. This makes MAF particularly useful for density estimation and generative modeling tasks where likelihood-based training is crucial.
Neural Spline Flows (NSF): Neural Spline Flow (NSF) extends the concept of normalizing flows by using neural networks to parameterize piecewise monotonic rational quadratic splines as the invertible transformations. These splines provide a flexible way to model complex distributions while ensuring smooth and differentiable transformations. NSF combines the expressive power of neural networks with the efficiency of spline-based transformations, allowing for efficient sampling and exact likelihood computation. This makes NSF particularly effective for modeling high-dimensional data with complex, multimodal distributions, enhancing the flexibility and accuracy of normalizing flow-based generative models.

The predefined MAF and NSF flows are 'maf3', 'maf6', 'maf12', 'nsf3', 'nsf6', and 'nsf12'. By default, pocoMC uses the 'nsf6' flow, meaning a Neural Spline Flow with 6 transformations. This balances flexibility and computational cost. The user can change the flow by setting the flow parameter in the pocoMC Sampler class as follows:

sampler = pc.Sampler(prior, likelihood, flow='maf12')

Custom Flows¶

The user can also define their own normalizing flow. This is done by creating a flow using the zuko package and passing it to the Sampler class. For example, the following code creates a MAF flow with 10 transformations, 3-layered neural networks, 128 hidden units per layer, and residual connections between layers:

import zuko

flow = zuko.flows.MAF(n_dim, # Number of dimensions of the posterior
                      transforms=10, 
                      hidden_features=[128] * 3,
                      residual=True,)

sampler = pc.Sampler(prior, likelihood, flow=flow)

The advantage of defining a custom flow is that the user can tailor the flow to their specific problem. The disadvantage is that the user must have a good understanding of the normalizing flow architecture and how it affects the sampling process. The predefined flows are designed to be flexible and easy to use, so the user should only define a custom flow if they have a good reason to do so.

Training¶

Training Configuration¶

The flow is trained in each iteration of the sampler automatically. The training process is quite quick due to the fact that the flow is not trained from scratch in each iteration, but rather the training is continued from the previous iteration. The user can control the training configuration by passing a dictionary with the desired configuration to the train_config argument of the Sampler class. The default configuration is:

train_config = dict(validation_split=0.5, # Fraction of the data to use for validation
                    epochs=5000, # Maximum number of epochs to train for
                    batch_size=np.minimum(n_effective//2, 512), # Batch size
                    patience=n_dim, # Number of epochs to wait before early stopping
                    learning_rate=1e-3, # Learning rate
                    annealing=False, # Whether to use a learning rate schedule
                    gaussian_scale=None, # Standard deviation of the Gaussian prior on the weights used for regularization
                    laplace_scale=None, # Scale of the Laplace prior on the weights used for regularization
                    noise=None, # Standard deviation of the Gaussian noise added to the input data
                    shuffle=True, # Whether to shuffle the data before training
                    clip_grad_norm=1.0, # Maximum norm of the gradient
                    verbose=0, # Verbosity level
                    )

We do not recommend changing the default training configuration unless you are familiar with the training process and the impact of the hyperparameters on the results. The default configuration is designed to work well for a wide range of problems. If you do want to change the configuration, we recommend starting with the default values and only changing one hyperparameter at a time.

Training Frequency¶

The normalizing flow is not always trained in every iteration. Instead, this is controlled by the train_frequency parameter. By default, the value of this parameter is None and the training frequency is determined by the number of effective and active particles respectively as follows:

\[ f = \max\left( \frac{n_{\text{effective}}}{2\times n_{\text{active}}} , 1\right) \]

This means that for the default values n_effective=512 and n_active=256, we train the flow every iteration. However, for larger number of effective particles, or equivalently smaller number of active particles, the normalizing flow is trained more sparsely.

The user can also enter an integer value to train_frequency to specify exactly how often training occurs. The only exemption is when beta=1.0, when training occurs in every iteration.