Exercise: Why Bias Correction Changes Early Steps
Prerequisites: Adam Optimizer
Problem
Assume a constant scalar gradient . Explain why the uncorrected first moment underestimates the true gradient early in training and compute and for .
Hint
Jump to the solution when you're ready.