CONTENTS

Conditional Expectation

Motivation: why this matters in quant finance

At time 00, a derivative price is an expectation. At time tt, after some market history has been observed, the price is a conditional expectation:
Vt=er(Tt)EQ[HFt].V_t=e^{-r(T-t)}\mathbb{E}^{\mathbb{Q}}[H\mid\mathcal{F}_t].

The conditioning object is not a single event like "the first coin toss was heads". It is the entire information available at time tt: prices, realised volatility, rates, defaults, and anything else the model allows the trader to know. Conditional expectation is the operation that converts a future random payoff into the best current forecast using that information.

Bertsekas builds conditional expectation from a simpler idea: first condition on an event, then on the value of another random variable, then view E[XY]\mathbb{E}[X\mid Y] itself as a random variable determined by YY. The sigma-algebra version used in stochastic finance is the same idea with "the value of YY" replaced by "the information set G\mathcal{G}."

The informal idea

Conditional expectation averages only over distinctions the conditioning information cannot see. If G\mathcal{G} tells you which cell of a partition occurred, then E[XG]\mathbb{E}[X\mid\mathcal{G}] is constant on each cell and equals the average value of XX inside that cell.

This is why E[XG]\mathbb{E}[X\mid\mathcal{G}] is a random variable. Before the outcome is realised, you do not know which cell you will be in. After the information in G\mathcal{G} is revealed, the forecast takes the value attached to that cell.

In finance: before observing StS_t, the time-tt option value is random. Once StS_t and the rest of Ft\mathcal{F}_t are observed, the value is known.

Formal definitions

Conditioning on an event

For P(B)>0\mathbb{P}(B)>0,

E[XB]=E[X1B]P(B).\mathbb{E}[X\mid B]=\frac{\mathbb{E}[X\mathbf{1}_B]}{\mathbb{P}(B)}.

This is a number: the average of XX under the probability law restricted to BB.

Conditioning on a random variable

For discrete YY,

E[XY=y]=xxpXY(xy).\mathbb{E}[X\mid Y=y]=\sum_x x\,p_{X\mid Y}(x\mid y).

The object E[XY]\mathbb{E}[X\mid Y] is the random variable obtained by substituting YY into the function yE[XY=y]y\mapsto\mathbb{E}[X\mid Y=y].

Conditioning on a sigma-algebra

Let GF\mathcal{G}\subseteq\mathcal{F} be a sub-sigma-algebra and let XX be integrable. The conditional expectation Z=E[XG]Z=\mathbb{E}[X\mid\mathcal{G}] is the almost-surely unique random variable satisfying:

  1. ZZ is G\mathcal{G}-measurable.
  2. For every AGA\in\mathcal{G},
AZdP=AXdP.\int_A Z\,d\mathbb{P}=\int_A X\,d\mathbb{P}.

The first condition says the forecast uses only the information in G\mathcal{G}. The second says it preserves the correct average on every event that G\mathcal{G} can distinguish.

Key properties

Law of iterated expectations

Bertsekas states the basic form as

E[E[XY]]=E[X].\mathbb{E}[\mathbb{E}[X\mid Y]]=\mathbb{E}[X].

In sigma-algebra form, if HG\mathcal{H}\subseteq\mathcal{G},

E[E[XG]H]=E[XH].\mathbb{E}\left[\mathbb{E}[X\mid\mathcal{G}]\mid\mathcal{H}\right]=\mathbb{E}[X\mid\mathcal{H}].

This is the tower property. It says that forecasting with more information and then coarsening back to less information gives the same result as forecasting directly with less information.

Pulling out known quantities

If YY is G\mathcal{G}-measurable, then

E[YXG]=YE[XG].\mathbb{E}[YX\mid\mathcal{G}]=Y\mathbb{E}[X\mid\mathcal{G}].

What is already known can be treated as a constant inside the conditional expectation.

Independence removes information value

If XX is independent of G\mathcal{G}, then

E[XG]=E[X].\mathbb{E}[X\mid\mathcal{G}]=\mathbb{E}[X].

For Brownian motion, this is the reason future increments have conditional mean zero given the past.

Full and trivial information

E[XF]=X,E[X{,Ω}]=E[X].\mathbb{E}[X\mid\mathcal{F}]=X, \qquad \mathbb{E}[X\mid\{\emptyset,\Omega\}]=\mathbb{E}[X].

Full information leaves no uncertainty about XX; no information leaves only the unconditional mean.

Conditional variance decomposition

Bertsekas derives the law of total variance:

Var(X)=E[Var(XY)]+Var(E[XY]).\text{Var}(X)=\mathbb{E}[\text{Var}(X\mid Y)]+\text{Var}(\mathbb{E}[X\mid Y]).

The same idea separates average residual uncertainty from uncertainty in the conditional forecast.

Worked examples

Example 1: conditional expectation on a finite partition

Let Ω={HH,HT,TH,TT}\Omega=\{HH,HT,TH,TT\} for two fair coin tosses, and let XX be the number of heads. Suppose G\mathcal{G} reveals only the first toss.

If the first toss is HH, the possible outcomes are HH,HTHH,HT, so the average of XX is (2+1)/2=1.5(2+1)/2=1.5. If the first toss is TT, the possible outcomes are TH,TTTH,TT, so the average is (1+0)/2=0.5(1+0)/2=0.5.

Thus E[XG]\mathbb{E}[X\mid\mathcal{G}] is the random variable equal to 1.51.5 on {HH,HT}\{HH,HT\} and 0.50.5 on {TH,TT}\{TH,TT\}.

Example 2: Brownian motion is a martingale

For sts\le t,

E[WtFs]=E[Ws+(WtWs)Fs]=Ws+E[WtWsFs]=Ws.\begin{aligned} \mathbb{E}[W_t\mid\mathcal{F}_s] &=\mathbb{E}[W_s+(W_t-W_s)\mid\mathcal{F}_s]\\ &=W_s+\mathbb{E}[W_t-W_s\mid\mathcal{F}_s]\\ &=W_s. \end{aligned}

The first term is known at time ss; the second is an independent future increment with mean zero.

Example 3: risk-neutral pricing over time

A European payoff HH paid at TT has time-tt value

Vt=er(Tt)EQ[HFt].V_t=e^{-r(T-t)}\mathbb{E}^{\mathbb{Q}}[H\mid\mathcal{F}_t].

At t=0t=0, this is the usual pricing expectation. At later times, the conditioning information changes the distribution of the remaining uncertainty. The tower property is what makes the discounted price process dynamically consistent.

Example 4: forecast revision has zero prior mean

Bertsekas notes that if E[XY]\mathbb{E}[X\mid Y] is a revised forecast after observing YY, then

E[E[XY]E[X]]=0.\mathbb{E}[\mathbb{E}[X\mid Y]-\mathbb{E}[X]]=0.

Before seeing the information, the expected revision is zero. If it were systematically positive, the original forecast was too low.

Common confusions and pitfalls

"E[XG]\mathbb{E}[X\mid\mathcal{G}] is a number." It is a random variable unless G\mathcal{G} carries no information or the conditional forecast is constant.
"The tower property works in either direction." The inclusion direction matters. Conditioning down from finer to coarser information loses resolution; conditioning up does not recover information that was averaged away.
"Conditional expectation means plugging in a conditional probability." For indicators, P(AG)=E[1AG]\mathbb{P}(A\mid\mathcal{G})=\mathbb{E}[\mathbf{1}_A\mid\mathcal{G}]. For general XX, conditional expectation is an averaging operation, not a single probability.
"If E[XY]=E[X]E[Y]\mathbb{E}[XY]=\mathbb{E}[X]\mathbb{E}[Y], then E[XY]=E[X]\mathbb{E}[X\mid Y]=\mathbb{E}[X]." Not necessarily. Zero covariance is weaker than independence.
"The conditional expectation is defined pointwise everywhere." In measure-theoretic probability it is unique only up to almost-sure equality. Changing it on a null event does not change the defining integrals.

Where this goes next

References

  • Bertsekas, D. P., & Tsitsiklis, J. N. (2008). Introduction to Probability (2nd ed.). Athena Scientific. Ch. 2 §2.6 (Conditioning), Ch. 4 §4.3 (Conditional Expectation and Variance Revisited). The sigma-algebra formulation extends the textbook's random-variable conditioning treatment.