CONTENTS

Exercise: Adam on a Noisy Quadratic Loss

Prerequisites: Adam Optimizer

Problem

In a noisy quadratic loss, gradients alternate between 66 and 22 around the same positive direction. Explain qualitatively how Adam's first and second moments affect the resulting parameter updates compared with plain gradient descent.

Hint

Jump to the solution when you're ready.