CONTENTS

Solution: Adam on a Noisy Quadratic Loss

Solution

The first moment smooths 6,2,6,2,6,2,6,2,\ldots into a persistent positive direction, so the update does not whipsaw as much as the raw gradient. The second moment records that this coordinate has relatively large gradient scale, so Adam divides by a larger v^t\sqrt{\hat v_t} and moderates the step.

Plain gradient descent would take steps proportional to each observed gradient, alternating large and smaller moves.

Takeaways

  • Momentum stabilises direction.
  • Second-moment scaling moderates high-variance coordinates.
  • Adam is useful when stochastic gradients are noisy but directionally informative.
Solution - Adam on a Noisy Quadratic Loss | q4quant.studio