CONTENTS

Exercise: Why Bias Correction Changes Early Steps

Prerequisites: Adam Optimizer

Problem

Assume a constant scalar gradient gt=2g_t=2. Explain why the uncorrected first moment mtm_t underestimates the true gradient early in training and compute m1m_1 and m2m_2 for β1=0.9\beta_1=0.9.

Hint

Jump to the solution when you're ready.