Langevin_dynamics¶

Contents¶

Classes¶

LangevinDynamics - Langevin Dynamics sampler implementing discretized gradient-based MCMC.

API Reference¶

torchebm.samplers.langevin_dynamics ¶

Langevin Dynamics Sampler Module.

This module provides an implementation of the Langevin Dynamics algorithm, a gradient-based Markov Chain Monte Carlo (MCMC) method. It leverages stochastic differential equations to sample from complex probability distributions, making it a lightweight yet effective tool for Bayesian inference and generative modeling.

Key Features

Gradient-based sampling with stochastic updates.
Customizable step sizes and noise scales for flexible tuning.
Optional diagnostics and trajectory tracking for analysis.

Module Components¶

Classes:

Name	Description
`LangevinDynamics`	Core class implementing the Langevin Dynamics sampler.

Usage Example¶

Sampling from a Custom Energy Function

from torchebm.samplers.mcmc.langevin import LangevinDynamics
from torchebm.energy_functions.energy_function import GaussianEnergy
import torch

# Define a 2D Gaussian energy function
energy_fn = GaussianEnergy(mean=torch.zeros(2), cov=torch.eye(2))

# Initialize Langevin sampler
sampler = LangevinDynamics(energy_fn, step_size=0.01, noise_scale=0.1)

# Starting points for 5 chains
initial_state = torch.randn(5, 2)

# Run sampling
samples, diagnostics = sampler.sample_chain(
    x=initial_state, k_steps=100, n_samples=5, return_diagnostics=True
)
print(f"Samples batch_shape: {samples.batch_shape}")
print(f"Diagnostics keys: {diagnostics.batch_shape}")

Mathematical Foundations¶

Langevin Dynamics Overview

Langevin Dynamics simulates a stochastic process governed by the Langevin equation. For a state \( x_t \), the discretized update rule is:

\[ x_{t+1} = x_t - \eta \nabla U(x_t) + \sqrt{2\eta} \epsilon_t \]

\( U(x) \): Potential energy, where \( U(x) = -\log p(x) \) and \( p(x) \) is the target distribution.
\( \eta \): Step size controlling the gradient descent.
\( \epsilon_t \sim \mathcal{N}(0, I) \): Gaussian noise introducing stochasticity.

Over time, this process converges to samples from the Boltzmann distribution:

\[ p(x) \propto e^{-U(x)} \]

Why Use Langevin Dynamics?

Simplicity: Requires only first-order gradients, making it computationally lighter than methods like HMC.
Exploration: The noise term prevents the sampler from getting stuck in local minima.
Flexibility: Applicable to a wide range of energy-based models and score-based generative tasks.

Practical Considerations¶

Parameter Tuning Guide

Step Size (\(\eta\)):
- Too large: Instability and divergence
- Too small: Slow convergence
- Rule of thumb: Start with \(\eta \approx 10^{-3}\) to \(10^{-5}\)
Noise Scale (\(\beta^{-1/2}\)):
- Controls exploration-exploitation tradeoff
- Higher values help escape local minima
Decay Rate (future implementation):
- Momentum-like term for accelerated convergence

Diagnostics Interpretation

Use return_diagnostics=True to monitor: - Mean/Variance: Track distribution stationarity - Energy Gradients: Check for vanishing/exploding gradients - Autocorrelation: Assess mixing efficiency

When to Choose Langevin Over HMC?

Criterion	Langevin	HMC
Computational Cost	Lower	Higher
Tuning Complexity	Simpler	More involved
High Dimensions	Efficient	More efficient
Multimodal Targets	May need annealing	Better exploration

How to Diagnose Sampling?

Check diagnostics for: - Sample mean and variance convergence. - Gradient magnitudes (should stabilize). - Energy trends over iterations.