Bases: BaseScheduler
Scheduler with cosine annealing.
This scheduler implements cosine annealing, which provides a smooth transition
from the start value to the end value following a cosine curve. Cosine annealing
is popular in deep learning as it provides fast initial decay followed by
slower decay, which can help with convergence.
Mathematical Formula
\[v(t) = \begin{cases}
v_{end} + (v_0 - v_{end}) \times \frac{1 + \cos(\pi t/T)}{2}, & \text{if } t < T \\
v_{end}, & \text{if } t \geq T
\end{cases}\]
where:
- \(v_0\) is the start_value
- \(v_{end}\) is the end_value
- \(T\) is n_steps
- \(t\) is the current step count
Cosine Curve Properties
The cosine function creates a smooth S-shaped curve that starts with rapid
decay and gradually slows down as it approaches the end value.
Parameters:
Name |
Type |
Description |
Default |
start_value
|
float
|
Starting parameter value.
|
required
|
end_value
|
float
|
|
required
|
n_steps
|
int
|
Number of steps to reach the final value.
|
required
|
Raises:
Type |
Description |
ValueError
|
If n_steps is not positive.
|
Step Size Annealing
| scheduler = CosineScheduler(start_value=0.1, end_value=0.001, n_steps=100)
values = []
for i in range(10):
value = scheduler.step()
values.append(value)
if i < 3: # Show first few values
print(f"Step {i+1}: {value:.6f}")
# Shows smooth decay: 0.099951, 0.099606, 0.098866, ...
|
Learning Rate Scheduling
| lr_scheduler = CosineScheduler(
start_value=0.01, end_value=0.0001, n_steps=1000
)
# In training loop
for epoch in range(1000):
lr = lr_scheduler.step()
# Update optimizer learning rate
|
Noise Scale Annealing
| noise_scheduler = CosineScheduler(
start_value=1.0, end_value=0.01, n_steps=500
)
sampler = LangevinDynamics(
energy_function=energy_fn,
step_size=0.01,
noise_scale=noise_scheduler
)
|
Source code in torchebm/core/base_scheduler.py
| class CosineScheduler(BaseScheduler):
r"""
Scheduler with cosine annealing.
This scheduler implements cosine annealing, which provides a smooth transition
from the start value to the end value following a cosine curve. Cosine annealing
is popular in deep learning as it provides fast initial decay followed by
slower decay, which can help with convergence.
!!! info "Mathematical Formula"
$$v(t) = \begin{cases}
v_{end} + (v_0 - v_{end}) \times \frac{1 + \cos(\pi t/T)}{2}, & \text{if } t < T \\
v_{end}, & \text{if } t \geq T
\end{cases}$$
where:
- \(v_0\) is the start_value
- \(v_{end}\) is the end_value
- \(T\) is n_steps
- \(t\) is the current step count
!!! note "Cosine Curve Properties"
The cosine function creates a smooth S-shaped curve that starts with rapid
decay and gradually slows down as it approaches the end value.
Args:
start_value (float): Starting parameter value.
end_value (float): Target parameter value.
n_steps (int): Number of steps to reach the final value.
Raises:
ValueError: If n_steps is not positive.
!!! example "Step Size Annealing"
```python
scheduler = CosineScheduler(start_value=0.1, end_value=0.001, n_steps=100)
values = []
for i in range(10):
value = scheduler.step()
values.append(value)
if i < 3: # Show first few values
print(f"Step {i+1}: {value:.6f}")
# Shows smooth decay: 0.099951, 0.099606, 0.098866, ...
```
!!! tip "Learning Rate Scheduling"
```python
lr_scheduler = CosineScheduler(
start_value=0.01, end_value=0.0001, n_steps=1000
)
# In training loop
for epoch in range(1000):
lr = lr_scheduler.step()
# Update optimizer learning rate
```
!!! example "Noise Scale Annealing"
```python
noise_scheduler = CosineScheduler(
start_value=1.0, end_value=0.01, n_steps=500
)
sampler = LangevinDynamics(
energy_function=energy_fn,
step_size=0.01,
noise_scale=noise_scheduler
)
```
"""
def __init__(self, start_value: float, end_value: float, n_steps: int):
r"""
Initialize the cosine scheduler.
Args:
start_value (float): Starting parameter value.
end_value (float): Target parameter value.
n_steps (int): Number of steps to reach the final value.
Raises:
ValueError: If n_steps is not positive.
"""
super().__init__(start_value)
if n_steps <= 0:
raise ValueError(f"n_steps must be a positive integer, got {n_steps}")
self.end_value = end_value
self.n_steps = n_steps
def _compute_value(self) -> float:
r"""
Compute the cosine annealed value.
Returns:
float: The annealed value following cosine schedule.
"""
if self.step_count >= self.n_steps:
return self.end_value
else:
# Cosine schedule from start_value to end_value
progress = self.step_count / self.n_steps
cosine_factor = 0.5 * (1 + math.cos(math.pi * progress))
return self.end_value + (self.start_value - self.end_value) * cosine_factor
|
end_value
instance-attribute
n_steps
instance-attribute
_compute_value
_compute_value() -> float
Compute the cosine annealed value.
Returns:
Name | Type |
Description |
float |
float
|
The annealed value following cosine schedule.
|
Source code in torchebm/core/base_scheduler.py
| def _compute_value(self) -> float:
r"""
Compute the cosine annealed value.
Returns:
float: The annealed value following cosine schedule.
"""
if self.step_count >= self.n_steps:
return self.end_value
else:
# Cosine schedule from start_value to end_value
progress = self.step_count / self.n_steps
cosine_factor = 0.5 * (1 + math.cos(math.pi * progress))
return self.end_value + (self.start_value - self.end_value) * cosine_factor
|