CONTENTS

Control Variates

Motivation: why this matters in quant finance

Control variates is the most powerful of the elementary variance-reduction methods. The idea: if you're pricing some quantity E[X]\mathbb{E}[X], and you know another quantity E[Y]\mathbb{E}[Y] exactly (e.g. analytically), and XX is correlated with YY, then

Xβ(YE[Y])X - \beta(Y - \mathbb{E}[Y])

has the same mean as XX but lower variance for an appropriately chosen β\beta.

In options pricing, this routinely gives 10x to 1000x variance reduction. The standard recipe for an Asian option: use the geometric Asian (closed-form Black-Scholes-like) as a control for the arithmetic Asian (no closed form). For a basket option: use the average asset (often easier) as a control. Once a good control is identified, control variates is dramatically more effective than antithetic variates.

The informal idea

Suppose you want θ=E[X]\theta = \mathbb{E}[X] and you have draws X1,,XNX_1, \dots, X_N. Plain MC estimator: Xˉ\bar X, variance σX2/N\sigma_X^2/N.

Now suppose you also have draws Y1,,YNY_1, \dots, Y_N with known mean μY=E[Y]\mu_Y = \mathbb{E}[Y], paired with the XiX_i (i.e., YiY_i comes from the same simulation as XiX_i). Define
θ^CV=Xˉβ(YˉμY).\hat\theta_{CV} = \bar X - \beta(\bar Y - \mu_Y).
This estimator is unbiased for θ\theta (since E[Yˉ]=μY\mathbb{E}[\bar Y] = \mu_Y) and its variance is
Var(θ^CV)=1N(σX22βσXY+β2σY2).\text{Var}(\hat\theta_{CV}) = \frac{1}{N}(\sigma_X^2 - 2\beta\sigma_{XY} + \beta^2\sigma_Y^2).

Minimising in β\beta: β=σXY/σY2\beta^* = \sigma_{XY}/\sigma_Y^2, giving optimal variance

Var(θ^CV)=σX2N(1ρXY2).\text{Var}(\hat\theta_{CV}^*) = \frac{\sigma_X^2}{N}(1 - \rho_{XY}^2).

So variance is reduced by factor 1ρXY21 - \rho_{XY}^2. If ρXY=0.95|\rho_{XY}| = 0.95, variance is reduced to 10.9025=9.75%1 - 0.9025 = 9.75\%, i.e., 10×\sim 10\times reduction. If ρXY=0.99|\rho_{XY}| = 0.99, 50×\sim 50\times reduction.

Formal statement

Let XX be the target with θ=E[X]\theta = \mathbb{E}[X] unknown. Let YY be a control with E[Y]=μY\mathbb{E}[Y] = \mu_Y known. Let X1,Y1;;XN,YNX_1, Y_1; \dots; X_N, Y_N be i.i.d. paired samples.
Estimator: θ^CV=Xˉβ(YˉμY)\hat\theta_{CV} = \bar X - \beta(\bar Y - \mu_Y) for any β\beta.
Properties.
  • Unbiased for any β\beta: E[θ^CV]=θ\mathbb{E}[\hat\theta_{CV}] = \theta.
  • Variance =(σX22βσXY+β2σY2)/N= (\sigma_X^2 - 2\beta\sigma_{XY} + \beta^2\sigma_Y^2)/N.
  • Optimal β=σXY/σY2=ρXYσX/σY\beta^* = \sigma_{XY}/\sigma_Y^2 = \rho_{XY} \sigma_X / \sigma_Y (the OLS slope of XX on YY).
  • Minimum variance =σX2(1ρXY2)/N= \sigma_X^2(1 - \rho_{XY}^2)/N.

In practice, β\beta^* is unknown; estimate it from the same sample as β^=SXY/SY2\hat\beta = S_{XY}/S_Y^2. Using estimated β\beta introduces a tiny bias of order 1/N1/N that's negligible compared to the variance.

Algorithm: geometric Asian as control for arithmetic Asian

Setup. Arithmetic Asian call: payoff X=(SˉarithK)+X = \big(\bar S_{\text{arith}} - K\big)^+ where Sˉarith=1MkStk\bar S_{\text{arith}} = \frac{1}{M}\sum_k S_{t_k}. No closed form.
Control. Geometric Asian call: Y=(SˉgeoK)+Y = \big(\bar S_{\text{geo}} - K\big)^+ where Sˉgeo=(kStk)1/M\bar S_{\text{geo}} = (\prod_k S_{t_k})^{1/M}. Has a closed-form Black-Scholes-like price (the geometric average of log-normals is log-normal).

The geometric and arithmetic averages are nearly equal for moderate volatility, so ρXY0.99\rho_{XY} \approx 0.99 — control variates gives massive variance reduction.

import numpy as np from scipy.stats import norm def mc_arithmetic_asian_with_cv(S0, K, T, r, sigma, M, N, seed=42): rng = np.random.default_rng(seed) dt = T / M Z = rng.standard_normal((N, M)) increments = (r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z log_paths = np.log(S0) + np.cumsum(increments, axis=1) paths = np.exp(log_paths) # Target X: arithmetic Asian arith_avg = paths.mean(axis=1) X = np.exp(-r*T) * np.maximum(arith_avg - K, 0) # Control Y: geometric Asian geo_avg = np.exp(log_paths.mean(axis=1)) Y = np.exp(-r*T) * np.maximum(geo_avg - K, 0) # Closed form for geometric Asian sigma_geo = sigma * np.sqrt((2*M + 1) / (6*(M+1))) mu_geo = 0.5 * (r - 0.5*sigma**2 + sigma_geo**2) d1 = (np.log(S0/K) + (mu_geo + 0.5*sigma_geo**2)*T) / (sigma_geo*np.sqrt(T)) d2 = d1 - sigma_geo*np.sqrt(T) Y_mean = np.exp(-r*T) * (S0 * np.exp(mu_geo*T) * norm.cdf(d1) - K * norm.cdf(d2)) # CV: estimate beta and apply beta = np.cov(X, Y, ddof=1)[0, 1] / np.var(Y, ddof=1) X_cv = X - beta * (Y - Y_mean) return X_cv.mean(), X_cv.std(ddof=1) / np.sqrt(N), X.mean(), X.std(ddof=1) / np.sqrt(N), np.corrcoef(X, Y)[0,1] p_cv, se_cv, p_plain, se_plain, rho = mc_arithmetic_asian_with_cv(100, 100, 1, 0.05, 0.2, 50, 100_000) print(f"Plain MC: {p_plain:.4f} ± {1.96*se_plain:.4f}") print(f"CV MC: {p_cv:.4f} ± {1.96*se_cv:.4f}") print(f"rho = {rho:.4f}, var reduction = {1 - rho**2:.4f}") print(f"SE ratio: {se_cv/se_plain:.4f}") # Plain MC: 5.7720 ± 0.0467 # CV MC: 5.7689 ± 0.0030 # rho = 0.9979, var reduction = 0.0042 # SE ratio: 0.0641

15-fold standard-error reduction; equivalent to 240×\sim 240\times more samples.

Key properties

  • Choice of control matters. Higher ρXY|\rho_{XY}| means more variance reduction. A weakly correlated control gives little benefit; a perfectly correlated control gives a closed-form-equivalent precision.
  • Multiple controls. Generalises to a vector of controls YRkY \in \mathbb{R}^k with multivariate regression. Optimal β=ΣY1σXY\beta = \Sigma_Y^{-1}\sigma_{XY}.
  • Estimated β\beta. Standard practice. Adds tiny bias but the variance reduction is preserved.
  • Self-control variates. When no analytic control exists, the value at t=0t = 0 of any traded asset can serve. E.g., the discounted stock erTSTe^{-rT}S_T has known mean S0S_0. For a call, this gives modest correlation but is always available.
  • Stratification as a control. Conditional MC and stratified sampling are mathematically equivalent to control variates with carefully chosen indicator controls.

Common controls in quant finance

  • Geometric Asian — control for arithmetic Asian.
  • Black-Scholes — control for stochastic-vol or local-vol pricing of vanillas. Compute the BS price at the same strike/expiry analytically; correlation between the model price and BS price is high.
  • Average asset / max of two — control for basket options.
  • First-order Taylor expansion — control for slowly-converging integrals; the linearised version is often much cheaper to compute.
  • Sticky-strike vol surface — control for full SLV pricing.

Worked example: Black-Scholes control for Heston

Heston model: dS=rSdt+VSdWSdS = rSdt + \sqrt{V}SdW^S, dV=κ(θV)dt+ξVdWVdV = \kappa(\theta - V)dt + \xi\sqrt{V}dW^V, ρSV\rho_{SV}.

Pricing a vanilla call under Heston via MC: target X=erT(STHestonK)+X = e^{-rT}(S_T^{\text{Heston}} - K)^+.

Control. Compute the Black-Scholes price using the integrated variance: σeff2=1T0TVtdt\sigma_{\text{eff}}^2 = \frac{1}{T}\int_0^T V_t dt (path-dependent). This is hard to use as a control directly.
Better control. Use the BS price with σ=σ0=V0\sigma = \sigma_0 = \sqrt{V_0} (initial vol). Compute the BS payoff using σ0\sigma_0 in the standard formula and correlate with the Heston payoff. Correlation typically 0.850.950.85 - 0.95, gives 5-10x variance reduction. Cheap and robust.

Common confusions and pitfalls

  • E[Y]\mathbb{E}[Y] must be exact. If you estimate μY\mu_Y from the same sample (e.g., using the empirical mean), you're not gaining anything — it's mathematically equivalent to a slightly different reformulation, no variance reduction.
  • Estimate β\beta from a small pilot. If β\beta is estimated from the full sample, the resulting estimator's β\beta depends on the sample, introducing O(1/N)O(1/N) bias. Usually negligible. But for tiny NN, or for unbiased asymptotic CIs, use a pilot run.
  • Correlation is what matters, not magnitude. A control YY that's 1000×1000\times smaller than XX but with ρ=0.99\rho = 0.99 is fine. The optimal β\beta scales the control to fit.
  • Multiple controls = multivariate regression. Add kk controls, get more variance reduction, but the gains plateau quickly when the controls are themselves correlated. Three carefully chosen controls usually capture 95% of available variance reduction.
  • Doesn't help with bias. Control variates is a variance reduction technique. Discretisation bias, model bias, and edge effects are unaffected.

Where this goes next

Exercises

Test your understanding with 3 exercises for this lesson.