Benchmarks¶

Automated performance benchmarks for every TorchEBM module. Results are generated by the benchmark suite and rendered here automatically at documentation build time.

Quick Start¶

First Run (Baseline)After OptimizationQuick TestCI Mode

1	`bash benchmarks/run.sh`

Captures baseline performance numbers.

1	`bash benchmarks/run.sh`

Automatically compares against baseline and generates report.

1	`bash benchmarks/run.sh --quick`

Small-scale only (~1 min).

1	`bash benchmarks/run.sh --ci`

Exits non-zero on regression > 5%.

Coverage¶

Losses

Exact SM, Approx SM, Sliced SM, Contrastive Divergence, Equilibrium Matching
Samplers

Langevin Dynamics, HMC, Flow ODE
Integrators

Leapfrog, Euler-Maruyama, RK4, DOPRI5 (adaptive)
Models

Transformer forward, Transformer forward+backward
Interpolants

Linear, Cosine, Variance-Preserving

Scale Configurations¶

Scale	Batch Size	Dimensions	Steps
`small`	64	8	50
`medium`	256	32	100
`large`	1024	128	200

Measurement Methodology¶

Aspect	GPU	CPU
Timer	`torch.cuda.Event`	`time.perf_counter`
Warm-up	10 iterations	3 iterations
Measured	50 iterations	20 iterations
Memory	`cuda.max_memory_allocated`	N/A
Statistics	median, mean, std, IQR, min, max	same
Comparison	geometric mean speedup	same