Conditional Expectation

Motivation: why this matters in quant finance

At time

0

, a derivative price is an expectation. At time

t

, after some market history has been observed, the price is a conditional expectation:

V_t=e^{-r(T-t)}\mathbb{E}^{\mathbb{Q}}[H\mid\mathcal{F}_t].

The conditioning object is not a single event like "the first coin toss was heads". It is the entire information available at time $t$ : prices, realised volatility, rates, defaults, and anything else the model allows the trader to know. Conditional expectation is the operation that converts a future random payoff into the best current forecast using that information.

Bertsekas builds conditional expectation from a simpler idea: first condition on an event, then on the value of another random variable, then view $\mathbb{E}[X\mid Y]$ itself as a random variable determined by $Y$ . The sigma-algebra version used in stochastic finance is the same idea with "the value of $Y$ " replaced by "the information set $\mathcal{G}$ ."

The informal idea

Conditional expectation averages only over distinctions the conditioning information cannot see. If $\mathcal{G}$ tells you which cell of a partition occurred, then $\mathbb{E}[X\mid\mathcal{G}]$ is constant on each cell and equals the average value of $X$ inside that cell.

This is why $\mathbb{E}[X\mid\mathcal{G}]$ is a random variable. Before the outcome is realised, you do not know which cell you will be in. After the information in $\mathcal{G}$ is revealed, the forecast takes the value attached to that cell.

In finance: before observing $S_t$ , the time- $t$ option value is random. Once $S_t$ and the rest of $\mathcal{F}_t$ are observed, the value is known.

Formal definitions

Conditioning on an event

For $\mathbb{P}(B)>0$ ,

\mathbb{E}[X\mid B]=\frac{\mathbb{E}[X\mathbf{1}_B]}{\mathbb{P}(B)}.

This is a number: the average of $X$ under the probability law restricted to $B$ .

Conditioning on a random variable

For discrete $Y$ ,

\mathbb{E}[X\mid Y=y]=\sum_x x\,p_{X\mid Y}(x\mid y).

The object $\mathbb{E}[X\mid Y]$ is the random variable obtained by substituting $Y$ into the function $y\mapsto\mathbb{E}[X\mid Y=y]$ .

Conditioning on a sigma-algebra

Let $\mathcal{G}\subseteq\mathcal{F}$ be a sub-sigma-algebra and let $X$ be integrable. The conditional expectation $Z=\mathbb{E}[X\mid\mathcal{G}]$ is the almost-surely unique random variable satisfying:

$Z$ is $\mathcal{G}$ -measurable.
For every $A\in\mathcal{G}$ ,

\int_A Z\,d\mathbb{P}=\int_A X\,d\mathbb{P}.

The first condition says the forecast uses only the information in $\mathcal{G}$ . The second says it preserves the correct average on every event that $\mathcal{G}$ can distinguish.

Key properties

Law of iterated expectations

Bertsekas states the basic form as

\mathbb{E}[\mathbb{E}[X\mid Y]]=\mathbb{E}[X].

In sigma-algebra form, if $\mathcal{H}\subseteq\mathcal{G}$ ,

\mathbb{E}\left[\mathbb{E}[X\mid\mathcal{G}]\mid\mathcal{H}\right]=\mathbb{E}[X\mid\mathcal{H}].

This is the tower property. It says that forecasting with more information and then coarsening back to less information gives the same result as forecasting directly with less information.

Pulling out known quantities

If $Y$ is $\mathcal{G}$ -measurable, then

\mathbb{E}[YX\mid\mathcal{G}]=Y\mathbb{E}[X\mid\mathcal{G}].

What is already known can be treated as a constant inside the conditional expectation.

Independence removes information value

If $X$ is independent of $\mathcal{G}$ , then

\mathbb{E}[X\mid\mathcal{G}]=\mathbb{E}[X].

For Brownian motion, this is the reason future increments have conditional mean zero given the past.

Full and trivial information

\mathbb{E}[X\mid\mathcal{F}]=X, \qquad \mathbb{E}[X\mid\{\emptyset,\Omega\}]=\mathbb{E}[X].

Full information leaves no uncertainty about $X$ ; no information leaves only the unconditional mean.

Conditional variance decomposition

Bertsekas derives the law of total variance:

\text{Var}(X)=\mathbb{E}[\text{Var}(X\mid Y)]+\text{Var}(\mathbb{E}[X\mid Y]).

The same idea separates average residual uncertainty from uncertainty in the conditional forecast.

Worked examples

Example 1: conditional expectation on a finite partition

Let $\Omega=\{HH,HT,TH,TT\}$ for two fair coin tosses, and let $X$ be the number of heads. Suppose $\mathcal{G}$ reveals only the first toss.

If the first toss is $H$ , the possible outcomes are $HH,HT$ , so the average of $X$ is $(2+1)/2=1.5$ . If the first toss is $T$ , the possible outcomes are $TH,TT$ , so the average is $(1+0)/2=0.5$ .

Thus $\mathbb{E}[X\mid\mathcal{G}]$ is the random variable equal to $1.5$ on $\{HH,HT\}$ and $0.5$ on $\{TH,TT\}$ .

Example 2: Brownian motion is a martingale

For $s\le t$ ,

\begin{aligned} \mathbb{E}[W_t\mid\mathcal{F}_s] &=\mathbb{E}[W_s+(W_t-W_s)\mid\mathcal{F}_s]\\ &=W_s+\mathbb{E}[W_t-W_s\mid\mathcal{F}_s]\\ &=W_s. \end{aligned}

The first term is known at time $s$ ; the second is an independent future increment with mean zero.

Example 3: risk-neutral pricing over time

A European payoff $H$ paid at $T$ has time- $t$ value

V_t=e^{-r(T-t)}\mathbb{E}^{\mathbb{Q}}[H\mid\mathcal{F}_t].

At $t=0$ , this is the usual pricing expectation. At later times, the conditioning information changes the distribution of the remaining uncertainty. The tower property is what makes the discounted price process dynamically consistent.

Example 4: forecast revision has zero prior mean

Bertsekas notes that if $\mathbb{E}[X\mid Y]$ is a revised forecast after observing $Y$ , then

\mathbb{E}[\mathbb{E}[X\mid Y]-\mathbb{E}[X]]=0.

Before seeing the information, the expected revision is zero. If it were systematically positive, the original forecast was too low.

Common confusions and pitfalls

" $\mathbb{E}[X\mid\mathcal{G}]$ is a number." It is a random variable unless

\mathcal{G}

carries no information or the conditional forecast is constant.

"The tower property works in either direction." The inclusion direction matters. Conditioning down from finer to coarser information loses resolution; conditioning up does not recover information that was averaged away.

"Conditional expectation means plugging in a conditional probability." For indicators,

\mathbb{P}(A\mid\mathcal{G})=\mathbb{E}[\mathbf{1}_A\mid\mathcal{G}]

. For general

X

, conditional expectation is an averaging operation, not a single probability.

"If $\mathbb{E}[XY]=\mathbb{E}[X]\mathbb{E}[Y]$ , then $\mathbb{E}[X\mid Y]=\mathbb{E}[X]$ ." Not necessarily. Zero covariance is weaker than independence.

"The conditional expectation is defined pointwise everywhere." In measure-theoretic probability it is unique only up to almost-sure equality. Changing it on a null event does not change the defining integrals.

Where this goes next

Filtrations and Information: Supplies the time-indexed sigma-algebras used in $\mathbb{E}[X\mid\mathcal{F}_t]$ .
Martingales Discrete Time: Defines fair-game processes through conditional expectation.
Optional Stopping Theorem: Studies conditional-expectation behaviour under random stopping times.
Risk-Neutral Valuation: Interprets derivative prices as conditional expectations under $\mathbb{Q}$ .
Radon-Nikodym Theorem: Provides the existence machinery behind the general conditional expectation definition.

References

Bertsekas, D. P., & Tsitsiklis, J. N. (2008). Introduction to Probability (2nd ed.). Athena Scientific. Ch. 2 §2.6 (Conditioning), Ch. 4 §4.3 (Conditional Expectation and Variance Revisited). The sigma-algebra formulation extends the textbook's random-variable conditioning treatment.