Solution: Adam on a Noisy Quadratic Loss
Exercise: Adam on a Noisy Quadratic Loss
Solution
The first moment smooths into a persistent positive direction, so the update does not whipsaw as much as the raw gradient. The second moment records that this coordinate has relatively large gradient scale, so Adam divides by a larger and moderates the step.
Plain gradient descent would take steps proportional to each observed gradient, alternating large and smaller moves.
Takeaways
- Momentum stabilises direction.
- Second-moment scaling moderates high-variance coordinates.
- Adam is useful when stochastic gradients are noisy but directionally informative.