Moment Generating Functions

Motivation: why this matters in quant finance

The moment generating function packages a distribution's moments into one expectation:

M_X(s)=\mathbb{E}[e^{sX}].

In Black-Scholes calculations, this is not decorative. The identity

\mathbb{E}[e^{\sigma W_T}]=e^{\sigma^2T/2}

is an MGF evaluation for a normal random variable. It is the exponential-moment calculation behind lognormal prices, convexity corrections, and the $-\sigma^2/2$ drift adjustment in geometric Brownian motion.

Bertsekas presents transforms as alternative representations of probability laws: not especially intuitive at first, but powerful for moments, distribution identification, and sums of independent variables. This lesson follows that route. The MGF is useful precisely when exponential moments exist; when they do not, the nearby characteristic functions lesson takes over.

The informal idea

Expand the exponential:

e^{sX}=1+sX+\frac{s^2X^2}{2!}+\frac{s^3X^3}{3!}+\cdots.

Taking expectations gives

M_X(s)=1+s\mathbb{E}[X]+\frac{s^2}{2!}\mathbb{E}[X^2]+\frac{s^3}{3!}\mathbb{E}[X^3]+\cdots.

So the MGF is a generating function for moments. Differentiating at zero reads the moments back out. The catch is existence: $\mathbb{E}[e^{sX}]$ may be infinite away from zero.

Formal definition

The moment generating function of a random variable

X

M_X(s)=\mathbb{E}[e^{sX}],

for all real $s$ where the expectation is finite. Its domain is

D_X=\{s\in\mathbb{R}:M_X(s)<\infty\}.

If $D_X$ contains an open interval around $0$ , the MGF is especially useful: it determines the distribution and its derivatives at zero give the moments.

For a discrete random variable,

M_X(s)=\sum_x e^{sx}p_X(x).

For a continuous random variable with density,

M_X(s)=\int_{-\infty}^{\infty} e^{sx}f_X(x)\,dx.

Key properties

Moment reading

If the MGF exists in a neighbourhood of zero, then

M_X^{(k)}(0)=\mathbb{E}[X^k].

In particular,

\mathbb{E}[X]=M_X'(0), \qquad \text{Var}(X)=M_X''(0)-\left(M_X'(0)\right)^2.

Affine transformations

If $Y=aX+b$ , then

M_Y(s)=e^{sb}M_X(as).

Bertsekas uses this to derive the MGF of a general normal from the standard normal.

Independent sums

If $X$ and $Y$ are independent, then

M_{X+Y}(s)=M_X(s)M_Y(s).

This is why MGFs make sums of independent Poisson, binomial, and normal variables easy.

Uniqueness

If $M_X(s)=M_Y(s)$ on an open interval around zero, then $X$ and $Y$ have the same distribution.

Cumulants

The log-MGF

K_X(s)=\log M_X(s)

is the cumulant generating function. Independent sums add cumulants because MGFs multiply and logarithms turn products into sums.

Worked examples

Example 1: normal MGF

For $X\sim\mathcal{N}(\mu,\sigma^2)$ ,

M_X(s)=\exp\left(\mu s+\frac12\sigma^2s^2\right).

Then $M_X'(0)=\mu$ and $M_X''(0)=\mu^2+\sigma^2$ , so $\text{Var}(X)=\sigma^2$ .

Example 2: Black-Scholes exponential moment

If $W_T\sim\mathcal{N}(0,T)$ , then

\mathbb{E}[e^{\sigma W_T}]=M_{W_T}(\sigma)=e^{\sigma^2T/2}.

Thus

\mathbb{E}\left[S_0e^{(\mu-\sigma^2/2)T+\sigma W_T}\right] =S_0e^{(\mu-\sigma^2/2)T}e^{\sigma^2T/2}=S_0e^{\mu T}.

The drift correction is not a convention; it is the MGF calculation that keeps the expected stock price at $S_0e^{\mu T}$ .

Example 3: sum of independent Poissons

For $X\sim\text{Poisson}(\lambda)$ ,

M_X(s)=\exp(\lambda(e^s-1)).

If $X$ and $Y$ are independent Poisson variables with means $\lambda$ and $\mu$ , then

M_{X+Y}(s)=\exp((\lambda+\mu)(e^s-1)),

so $X+Y\sim\text{Poisson}(\lambda+\mu)$ by uniqueness.

Example 4: Chernoff bound logic

For $s>0$ ,

\mathbb{P}(X\ge a)=\mathbb{P}(e^{sX}\ge e^{sa})\le e^{-sa}M_X(s).

Optimising over $s$ gives exponential tail bounds. This is why MGFs appear in concentration inequalities, large deviations, and stress testing.

Common confusions and pitfalls

"The MGF always exists." No. Heavy-tailed distributions may have

M_X(s)=\infty

for every

s>0

. A lognormal variable has no finite positive MGF even though all positive integer moments exist.

"The value at $s=0$ is informative."

M_X(0)=1

for every random variable. The information is in the behaviour around zero.

"MGFs add under independent sums." They multiply. Log-MGFs, or cumulant generating functions, add.

"Matching moments always matches distributions." Not in full generality. MGF uniqueness is stronger because it requires the transform on an interval, not only a sequence of moments.

"MGF and characteristic function are the same tool." They share transform algebra, but the MGF is an exponential-moment object and can fail. The characteristic function is Fourier-based and always exists.

Where this goes next

Characteristic Functions: The always-defined Fourier analogue.
Normal Distribution: Its MGF drives lognormal and Black-Scholes calculations.
Central Limit Theorem: Can be proved using MGFs under exponential-moment conditions.
Law of Large Numbers: Explains why sample averages converge to expectations.
The Derivation of the Black-Scholes Formula: Uses exponential-normal expectations in the closed-form pricing integral.

References

Bertsekas, D. P., & Tsitsiklis, J. N. (2008). Introduction to Probability (2nd ed.). Athena Scientific. Ch. 4 §4.4 (Transforms).