Control Variates

Motivation: why this matters in quant finance

Control variates is the most powerful of the elementary variance-reduction methods. The idea: if you're pricing some quantity $\mathbb{E}[X]$ , and you know another quantity $\mathbb{E}[Y]$ exactly (e.g. analytically), and $X$ is correlated with $Y$ , then

X - \beta(Y - \mathbb{E}[Y])

has the same mean as $X$ but lower variance for an appropriately chosen $\beta$ .

In options pricing, this routinely gives 10x to 1000x variance reduction. The standard recipe for an Asian option: use the geometric Asian (closed-form Black-Scholes-like) as a control for the arithmetic Asian (no closed form). For a basket option: use the average asset (often easier) as a control. Once a good control is identified, control variates is dramatically more effective than antithetic variates.

The informal idea

Suppose you want $\theta = \mathbb{E}[X]$ and you have draws $X_1, \dots, X_N$ . Plain MC estimator: $\bar X$ , variance $\sigma_X^2/N$ .

Now suppose you also have draws

Y_1, \dots, Y_N

with known mean

\mu_Y = \mathbb{E}[Y]

, paired with the

X_i

(i.e.,

Y_i

comes from the same simulation as

X_i

). Define

\hat\theta_{CV} = \bar X - \beta(\bar Y - \mu_Y).

This estimator is unbiased for

\theta

(since

\mathbb{E}[\bar Y] = \mu_Y

) and its variance is

\text{Var}(\hat\theta_{CV}) = \frac{1}{N}(\sigma_X^2 - 2\beta\sigma_{XY} + \beta^2\sigma_Y^2).

Minimising in $\beta$ : $\beta^* = \sigma_{XY}/\sigma_Y^2$ , giving optimal variance

\text{Var}(\hat\theta_{CV}^*) = \frac{\sigma_X^2}{N}(1 - \rho_{XY}^2).

So variance is reduced by factor $1 - \rho_{XY}^2$ . If $|\rho_{XY}| = 0.95$ , variance is reduced to $1 - 0.9025 = 9.75\%$ , i.e., $\sim 10\times$ reduction. If $|\rho_{XY}| = 0.99$ , $\sim 50\times$ reduction.

Formal statement

Let

X

be the target with

\theta = \mathbb{E}[X]

unknown. Let

Y

be a control with

\mathbb{E}[Y] = \mu_Y

known. Let

X_1, Y_1; \dots; X_N, Y_N

be i.i.d. paired samples.

Estimator:

\hat\theta_{CV} = \bar X - \beta(\bar Y - \mu_Y)

for any

\beta

Properties.

Unbiased for any $\beta$ : $\mathbb{E}[\hat\theta_{CV}] = \theta$ .
Variance $= (\sigma_X^2 - 2\beta\sigma_{XY} + \beta^2\sigma_Y^2)/N$ .
Optimal $\beta^* = \sigma_{XY}/\sigma_Y^2 = \rho_{XY} \sigma_X / \sigma_Y$ (the OLS slope of $X$ on $Y$ ).
Minimum variance $= \sigma_X^2(1 - \rho_{XY}^2)/N$ .

In practice, $\beta^*$ is unknown; estimate it from the same sample as $\hat\beta = S_{XY}/S_Y^2$ . Using estimated $\beta$ introduces a tiny bias of order $1/N$ that's negligible compared to the variance.

Algorithm: geometric Asian as control for arithmetic Asian

Setup. Arithmetic Asian call: payoff

X = \big(\bar S_{\text{arith}} - K\big)^+

where

\bar S_{\text{arith}} = \frac{1}{M}\sum_k S_{t_k}

. No closed form.

Control. Geometric Asian call:

Y = \big(\bar S_{\text{geo}} - K\big)^+

where

\bar S_{\text{geo}} = (\prod_k S_{t_k})^{1/M}

. Has a closed-form Black-Scholes-like price (the geometric average of log-normals is log-normal).

The geometric and arithmetic averages are nearly equal for moderate volatility, so $\rho_{XY} \approx 0.99$ — control variates gives massive variance reduction.

import numpy as np
from scipy.stats import norm

def mc_arithmetic_asian_with_cv(S0, K, T, r, sigma, M, N, seed=42):
    rng = np.random.default_rng(seed)
    dt = T / M
    Z = rng.standard_normal((N, M))
    increments = (r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * Z
    log_paths = np.log(S0) + np.cumsum(increments, axis=1)
    paths = np.exp(log_paths)
    
    # Target X: arithmetic Asian
    arith_avg = paths.mean(axis=1)
    X = np.exp(-r*T) * np.maximum(arith_avg - K, 0)
    
    # Control Y: geometric Asian
    geo_avg = np.exp(log_paths.mean(axis=1))
    Y = np.exp(-r*T) * np.maximum(geo_avg - K, 0)
    
    # Closed form for geometric Asian
    sigma_geo = sigma * np.sqrt((2*M + 1) / (6*(M+1)))
    mu_geo = 0.5 * (r - 0.5*sigma**2 + sigma_geo**2)
    d1 = (np.log(S0/K) + (mu_geo + 0.5*sigma_geo**2)*T) / (sigma_geo*np.sqrt(T))
    d2 = d1 - sigma_geo*np.sqrt(T)
    Y_mean = np.exp(-r*T) * (S0 * np.exp(mu_geo*T) * norm.cdf(d1) - K * norm.cdf(d2))
    
    # CV: estimate beta and apply
    beta = np.cov(X, Y, ddof=1)[0, 1] / np.var(Y, ddof=1)
    X_cv = X - beta * (Y - Y_mean)
    
    return X_cv.mean(), X_cv.std(ddof=1) / np.sqrt(N), X.mean(), X.std(ddof=1) / np.sqrt(N), np.corrcoef(X, Y)[0,1]

p_cv, se_cv, p_plain, se_plain, rho = mc_arithmetic_asian_with_cv(100, 100, 1, 0.05, 0.2, 50, 100_000)
print(f"Plain MC: {p_plain:.4f} ± {1.96*se_plain:.4f}")
print(f"CV MC:    {p_cv:.4f} ± {1.96*se_cv:.4f}")
print(f"rho = {rho:.4f}, var reduction = {1 - rho**2:.4f}")
print(f"SE ratio: {se_cv/se_plain:.4f}")
# Plain MC: 5.7720 ± 0.0467
# CV MC:    5.7689 ± 0.0030
# rho = 0.9979, var reduction = 0.0042
# SE ratio: 0.0641

15-fold standard-error reduction; equivalent to $\sim 240\times$ more samples.

Key properties

Choice of control matters. Higher $|\rho_{XY}|$ means more variance reduction. A weakly correlated control gives little benefit; a perfectly correlated control gives a closed-form-equivalent precision.
Multiple controls. Generalises to a vector of controls $Y \in \mathbb{R}^k$ with multivariate regression. Optimal $\beta = \Sigma_Y^{-1}\sigma_{XY}$ .
Estimated $\beta$ . Standard practice. Adds tiny bias but the variance reduction is preserved.
Self-control variates. When no analytic control exists, the value at $t = 0$ of any traded asset can serve. E.g., the discounted stock $e^{-rT}S_T$ has known mean $S_0$ . For a call, this gives modest correlation but is always available.
Stratification as a control. Conditional MC and stratified sampling are mathematically equivalent to control variates with carefully chosen indicator controls.

Common controls in quant finance

Geometric Asian — control for arithmetic Asian.
Black-Scholes — control for stochastic-vol or local-vol pricing of vanillas. Compute the BS price at the same strike/expiry analytically; correlation between the model price and BS price is high.
Average asset / max of two — control for basket options.
First-order Taylor expansion — control for slowly-converging integrals; the linearised version is often much cheaper to compute.
Sticky-strike vol surface — control for full SLV pricing.

Worked example: Black-Scholes control for Heston

Heston model: $dS = rSdt + \sqrt{V}SdW^S$ , $dV = \kappa(\theta - V)dt + \xi\sqrt{V}dW^V$ , $\rho_{SV}$ .

Pricing a vanilla call under Heston via MC: target $X = e^{-rT}(S_T^{\text{Heston}} - K)^+$ .

Control. Compute the Black-Scholes price using the integrated variance:

\sigma_{\text{eff}}^2 = \frac{1}{T}\int_0^T V_t dt

(path-dependent). This is hard to use as a control directly.

Better control. Use the BS price with

\sigma = \sigma_0 = \sqrt{V_0}

(initial vol). Compute the BS payoff using

\sigma_0

in the standard formula and correlate with the Heston payoff. Correlation typically

0.85 - 0.95

, gives 5-10x variance reduction. Cheap and robust.

Common confusions and pitfalls

$\mathbb{E}[Y]$ must be exact. If you estimate $\mu_Y$ from the same sample (e.g., using the empirical mean), you're not gaining anything — it's mathematically equivalent to a slightly different reformulation, no variance reduction.
Estimate $\beta$ from a small pilot. If $\beta$ is estimated from the full sample, the resulting estimator's $\beta$ depends on the sample, introducing $O(1/N)$ bias. Usually negligible. But for tiny $N$ , or for unbiased asymptotic CIs, use a pilot run.
Correlation is what matters, not magnitude. A control $Y$ that's $1000\times$ smaller than $X$ but with $\rho = 0.99$ is fine. The optimal $\beta$ scales the control to fit.
Multiple controls = multivariate regression. Add $k$ controls, get more variance reduction, but the gains plateau quickly when the controls are themselves correlated. Three carefully chosen controls usually capture 95% of available variance reduction.
Doesn't help with bias. Control variates is a variance reduction technique. Discretisation bias, model bias, and edge effects are unaffected.

Where this goes next

Importance sampling — for tail-event payoffs.
Quasi-Monte Carlo — replaces randomness, stacks with control variates.
Antithetic variates — cheap and complementary.