CONTENTS

Expectation and Variance

Motivation: why this matters in quant finance

Once a payoff is modelled as a random variable, the next question is not "what values can it take?" but "what number should stand in for it today?" For derivative pricing, that number is a discounted expectation:
V0=erTEQ[H].V_0=e^{-rT}\mathbb{E}^{\mathbb{Q}}[H].

For risk and portfolio construction, the next question is how far outcomes spread around their average. Variance and covariance supply that second-order information: volatility, tracking error, hedge error, and Markowitz portfolio risk all come from the same calculation.

Bertsekas motivates expectation as a long-run average payoff and variance as the mean squared deviation from that average. In quant finance, the same interpretation survives but the stakes are sharper. An expectation prices a payoff only after the measure has been chosen; a variance describes dispersion only after the random variable and its distribution are specified.

The informal idea

Expectation is a probability-weighted centre of mass. If a payoff pays 1010 in one state and 00 in another, the expectation is not the most likely payoff; it is the balancing point after probability weights are attached.

Variance measures how far outcomes tend to sit from that balancing point. It squares deviations, so large misses dominate. That is why volatility is sensitive to tail events and why portfolio variance can fall when positions offset each other.

Expectation and variance answer different questions:

QuantityQuestion answeredFinance reading
E[X]\mathbb{E}[X]Where is the probability-weighted centre?Price, drift, expected P&L
Var(X)\text{Var}(X)How dispersed are outcomes around the centre?Volatility, risk, hedge error
Cov(X,Y)\text{Cov}(X,Y)Do two quantities move together linearly?Diversification, factor exposure

Formal definitions

Discrete expectation

If XX takes values xix_i with PMF pX(xi)p_X(x_i), then

E[X]=ixipX(xi),\mathbb{E}[X]=\sum_i x_i p_X(x_i),

provided the absolute sum ixipX(xi)\sum_i |x_i|p_X(x_i) is finite. The absolute convergence condition matters: some symmetric-looking heavy-tailed variables do not have a well-defined mean.

Continuous expectation

If XX has density fXf_X, then

E[X]=xfX(x)dx,\mathbb{E}[X]=\int_{-\infty}^{\infty} x f_X(x)\,dx,

again provided xfX(x)dx<\int |x|f_X(x)\,dx<\infty.

General expectation

On a probability space, expectation is integration with respect to the probability measure:
E[X]=ΩX(ω)dP(ω).\mathbb{E}[X]=\int_{\Omega}X(\omega)\,d\mathbb{P}(\omega).

Variance, covariance, and correlation

For μX=E[X]\mu_X=\mathbb{E}[X],

Var(X)=E[(XμX)2]=E[X2](E[X])2.\text{Var}(X)=\mathbb{E}\left[(X-\mu_X)^2\right]=\mathbb{E}[X^2]-\left(\mathbb{E}[X]\right)^2.

For two square-integrable random variables,

Cov(X,Y)=E[(XμX)(YμY)]=E[XY]E[X]E[Y].\text{Cov}(X,Y)=\mathbb{E}\left[(X-\mu_X)(Y-\mu_Y)\right] =\mathbb{E}[XY]-\mathbb{E}[X]\mathbb{E}[Y].

Correlation normalises covariance:

ρXY=Cov(X,Y)Var(X)Var(Y).\rho_{XY}=\frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)}\sqrt{\text{Var}(Y)}}.

Key properties

Linearity of expectation

For constants a,ba,b,

E[aX+bY]=aE[X]+bE[Y].\mathbb{E}[aX+bY]=a\mathbb{E}[X]+b\mathbb{E}[Y].

No independence is required. This is why the value of a portfolio is the sum of the values of its components under a linear pricing rule.

Expected value rule

For a function gg,

E[g(X)]=xg(x)pX(x)\mathbb{E}[g(X)]=\sum_x g(x)p_X(x)

in the discrete case, and

E[g(X)]=g(x)fX(x)dx\mathbb{E}[g(X)]=\int g(x)f_X(x)\,dx

in the continuous case. Bertsekas treats this as the clean way to avoid first deriving the distribution of g(X)g(X). Option pricing uses exactly this move when integrating g(ST)=(STK)+g(S_T)=(S_T-K)^+ against the density of STS_T.

Affine transformations

If Y=aX+bY=aX+b, then

E[Y]=aE[X]+b,Var(Y)=a2Var(X).\mathbb{E}[Y]=a\mathbb{E}[X]+b, \qquad \text{Var}(Y)=a^2\text{Var}(X).

Adding cash shifts a payoff's mean but does not change its variance. Scaling a position by aa scales volatility by a|a| and variance by a2a^2.

Variance of sums

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y).\text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y)+2\text{Cov}(X,Y).

If XX and YY are independent, the covariance term is zero. Portfolio risk lives in this cross term: diversification is not magic; it is covariance arithmetic.

Nonlinear functions cannot be averaged by substitution

Usually

E[g(X)]g(E[X]).\mathbb{E}[g(X)]\ne g(\mathbb{E}[X]).

This is not a technicality. Convex payoffs, exponentials of normal variables, and reciprocal quantities all punish the shortcut.

Worked examples

Example 1: a two-state call payoff

In the one-period model ST{110,90}S_T\in\{110,90\} with risk-neutral probability Q(ST=110)=1/2\mathbb{Q}(S_T=110)=1/2, a call with strike 100100 has payoff H{10,0}H\in\{10,0\}.

EQ[H]=1012+012=5.\mathbb{E}^{\mathbb{Q}}[H]=10\cdot\frac12+0\cdot\frac12=5.

With risk-free discounting, the price is erT5e^{-rT}5. The arithmetic is elementary; the modelling content is the choice of measure.

Example 2: variance of an equally weighted portfolio

Let two asset returns have volatilities 20%20\% and 30%30\% and correlation ρ=0.5\rho=0.5. For equal weights,

Var(Rp)=0.52(0.20)2+0.52(0.30)2+2(0.5)(0.5)(0.5)(0.20)(0.30)=0.01+0.0225+0.015=0.0475.\begin{aligned} \text{Var}(R_p) &=0.5^2(0.20)^2+0.5^2(0.30)^2+2(0.5)(0.5)(0.5)(0.20)(0.30)\\ &=0.01+0.0225+0.015=0.0475. \end{aligned}

So σp=0.0475=21.8%\sigma_p=\sqrt{0.0475}=21.8\%. The volatility is below the simple average 25%25\% because correlation is below one.

Example 3: average speed is not average time

Bertsekas uses a simple pitfall: if speed VV is random and travel time is T=2/VT=2/V, then E[T]2/E[V]\mathbb{E}[T]\ne2/\mathbb{E}[V]. The finance analogue is discounting or convex payoffs. If D=erTD=e^{-rT} is random, E[D]\mathbb{E}[D] is not eE[r]Te^{-\mathbb{E}[r]T} unless the rate is deterministic or special structure is present.

Example 4: the exponential moment behind Black-Scholes

If XN(μ,σ2)X\sim\mathcal{N}(\mu,\sigma^2), then

E[eX]=eμ+σ2/2.\mathbb{E}[e^X]=e^{\mu+\sigma^2/2}.

The σ2/2\sigma^2/2 term is the convexity correction. It is the same second-order effect that appears in geometric Brownian motion when the log drift is adjusted by σ2/2-\sigma^2/2.

Common confusions and pitfalls

"The expected value is the most likely value." Not generally. A continuous random variable may never equal its expectation, and a skewed distribution can place the mean in a low-density region.
"A symmetric heavy-tailed variable has mean zero." Symmetry is not enough. The expectation must be absolutely integrable; otherwise the apparent cancellation depends on the order of summation or integration.
"Variance is downside risk." Variance penalises upside and downside deviations equally. It is central to volatility and quadratic risk models, but it is not a tail-loss measure like VaR or Expected Shortfall.
"Uncorrelated means independent." Zero covariance removes only linear dependence. Nonlinear dependence can remain strong.
"A nonlinear payoff can be priced by applying the payoff to the expected price." That shortcut destroys convexity. Options are valuable precisely because E[(STK)+]\mathbb{E}[(S_T-K)^+] is not (E[ST]K)+(\mathbb{E}[S_T]-K)^+ in general.

Where this goes next

References

  • Bertsekas, D. P., & Tsitsiklis, J. N. (2008). Introduction to Probability (2nd ed.). Athena Scientific. Ch. 2 §2.4 (Expectation, Mean, and Variance), §2.5 (Joint PMFs of Multiple Random Variables).
Expectation and Variance | q4quant.studio