Expectation and Variance

Motivation: why this matters in quant finance

Once a payoff is modelled as a random variable, the next question is not "what values can it take?" but "what number should stand in for it today?" For derivative pricing, that number is a discounted expectation:

V_0=e^{-rT}\mathbb{E}^{\mathbb{Q}}[H].

For risk and portfolio construction, the next question is how far outcomes spread around their average. Variance and covariance supply that second-order information: volatility, tracking error, hedge error, and Markowitz portfolio risk all come from the same calculation.

Bertsekas motivates expectation as a long-run average payoff and variance as the mean squared deviation from that average. In quant finance, the same interpretation survives but the stakes are sharper. An expectation prices a payoff only after the measure has been chosen; a variance describes dispersion only after the random variable and its distribution are specified.

The informal idea

Expectation is a probability-weighted centre of mass. If a payoff pays $10$ in one state and $0$ in another, the expectation is not the most likely payoff; it is the balancing point after probability weights are attached.

Variance measures how far outcomes tend to sit from that balancing point. It squares deviations, so large misses dominate. That is why volatility is sensitive to tail events and why portfolio variance can fall when positions offset each other.

Expectation and variance answer different questions:

Quantity	Question answered	Finance reading
$\mathbb{E}[X]$	Where is the probability-weighted centre?	Price, drift, expected P&L
$\text{Var}(X)$	How dispersed are outcomes around the centre?	Volatility, risk, hedge error
$\text{Cov}(X,Y)$	Do two quantities move together linearly?	Diversification, factor exposure

Formal definitions

Discrete expectation

If $X$ takes values $x_i$ with PMF $p_X(x_i)$ , then

\mathbb{E}[X]=\sum_i x_i p_X(x_i),

provided the absolute sum $\sum_i |x_i|p_X(x_i)$ is finite. The absolute convergence condition matters: some symmetric-looking heavy-tailed variables do not have a well-defined mean.

Continuous expectation

If $X$ has density $f_X$ , then

\mathbb{E}[X]=\int_{-\infty}^{\infty} x f_X(x)\,dx,

again provided $\int |x|f_X(x)\,dx<\infty$ .

General expectation

On a probability space, expectation is integration with respect to the probability measure:

\mathbb{E}[X]=\int_{\Omega}X(\omega)\,d\mathbb{P}(\omega).

Variance, covariance, and correlation

For $\mu_X=\mathbb{E}[X]$ ,

\text{Var}(X)=\mathbb{E}\left[(X-\mu_X)^2\right]=\mathbb{E}[X^2]-\left(\mathbb{E}[X]\right)^2.

For two square-integrable random variables,

\text{Cov}(X,Y)=\mathbb{E}\left[(X-\mu_X)(Y-\mu_Y)\right] =\mathbb{E}[XY]-\mathbb{E}[X]\mathbb{E}[Y].

Correlation normalises covariance:

\rho_{XY}=\frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)}\sqrt{\text{Var}(Y)}}.

Key properties

Linearity of expectation

For constants $a,b$ ,

\mathbb{E}[aX+bY]=a\mathbb{E}[X]+b\mathbb{E}[Y].

No independence is required. This is why the value of a portfolio is the sum of the values of its components under a linear pricing rule.

Expected value rule

For a function $g$ ,

\mathbb{E}[g(X)]=\sum_x g(x)p_X(x)

in the discrete case, and

\mathbb{E}[g(X)]=\int g(x)f_X(x)\,dx

in the continuous case. Bertsekas treats this as the clean way to avoid first deriving the distribution of $g(X)$ . Option pricing uses exactly this move when integrating $g(S_T)=(S_T-K)^+$ against the density of $S_T$ .

Affine transformations

If $Y=aX+b$ , then

\mathbb{E}[Y]=a\mathbb{E}[X]+b, \qquad \text{Var}(Y)=a^2\text{Var}(X).

Adding cash shifts a payoff's mean but does not change its variance. Scaling a position by $a$ scales volatility by $|a|$ and variance by $a^2$ .

Variance of sums

\text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y)+2\text{Cov}(X,Y).

If $X$ and $Y$ are independent, the covariance term is zero. Portfolio risk lives in this cross term: diversification is not magic; it is covariance arithmetic.

Nonlinear functions cannot be averaged by substitution

Usually

\mathbb{E}[g(X)]\ne g(\mathbb{E}[X]).

This is not a technicality. Convex payoffs, exponentials of normal variables, and reciprocal quantities all punish the shortcut.

Worked examples

Example 1: a two-state call payoff

In the one-period model $S_T\in\{110,90\}$ with risk-neutral probability $\mathbb{Q}(S_T=110)=1/2$ , a call with strike $100$ has payoff $H\in\{10,0\}$ .

\mathbb{E}^{\mathbb{Q}}[H]=10\cdot\frac12+0\cdot\frac12=5.

With risk-free discounting, the price is $e^{-rT}5$ . The arithmetic is elementary; the modelling content is the choice of measure.

Example 2: variance of an equally weighted portfolio

Let two asset returns have volatilities $20\%$ and $30\%$ and correlation $\rho=0.5$ . For equal weights,

\begin{aligned} \text{Var}(R_p) &=0.5^2(0.20)^2+0.5^2(0.30)^2+2(0.5)(0.5)(0.5)(0.20)(0.30)\\ &=0.01+0.0225+0.015=0.0475. \end{aligned}

So $\sigma_p=\sqrt{0.0475}=21.8\%$ . The volatility is below the simple average $25\%$ because correlation is below one.

Example 3: average speed is not average time

Bertsekas uses a simple pitfall: if speed $V$ is random and travel time is $T=2/V$ , then $\mathbb{E}[T]\ne2/\mathbb{E}[V]$ . The finance analogue is discounting or convex payoffs. If $D=e^{-rT}$ is random, $\mathbb{E}[D]$ is not $e^{-\mathbb{E}[r]T}$ unless the rate is deterministic or special structure is present.

Example 4: the exponential moment behind Black-Scholes

If $X\sim\mathcal{N}(\mu,\sigma^2)$ , then

\mathbb{E}[e^X]=e^{\mu+\sigma^2/2}.

The $\sigma^2/2$ term is the convexity correction. It is the same second-order effect that appears in geometric Brownian motion when the log drift is adjusted by $-\sigma^2/2$ .

Common confusions and pitfalls

"The expected value is the most likely value." Not generally. A continuous random variable may never equal its expectation, and a skewed distribution can place the mean in a low-density region.

"A symmetric heavy-tailed variable has mean zero." Symmetry is not enough. The expectation must be absolutely integrable; otherwise the apparent cancellation depends on the order of summation or integration.

"Variance is downside risk." Variance penalises upside and downside deviations equally. It is central to volatility and quadratic risk models, but it is not a tail-loss measure like VaR or Expected Shortfall.

"Uncorrelated means independent." Zero covariance removes only linear dependence. Nonlinear dependence can remain strong.

"A nonlinear payoff can be priced by applying the payoff to the expected price." That shortcut destroys convexity. Options are valuable precisely because

\mathbb{E}[(S_T-K)^+]

is not

(\mathbb{E}[S_T]-K)^+

in general.

Where this goes next

Independence and Conditioning: Explains when products factor and when conditioning changes averages.
Conditional Expectation: Turns expectation into a forecast given information.
Moment Generating Functions: Packages all moments into $\mathbb{E}[e^{sX}]$ when that quantity exists.
Normal Distribution: The mean-variance benchmark used throughout return modelling.
Mean-Variance Optimisation: Uses expected returns, variances, and covariances to choose portfolios.

References

Bertsekas, D. P., & Tsitsiklis, J. N. (2008). Introduction to Probability (2nd ed.). Athena Scientific. Ch. 2 §2.4 (Expectation, Mean, and Variance), §2.5 (Joint PMFs of Multiple Random Variables).