Exercise: Adam on a Noisy Quadratic Loss
Prerequisites: Adam Optimizer
Problem
In a noisy quadratic loss, gradients alternate between and around the same positive direction. Explain qualitatively how Adam's first and second moments affect the resulting parameter updates compared with plain gradient descent.
Hint
Jump to the solution when you're ready.