Importance Sampling
Motivation: why this matters in quant finance
Antithetic and control variates fail when the payoff is concentrated in rare events — deeply OTM options, default probabilities, extreme tail risks. For these problems, plain Monte Carlo wastes most of its samples in regions where the payoff is zero.
The informal idea
Suppose we want where is the true density.
Choose another density such that wherever . Then
This is unbiased. Its variance is
Optimal (the zero-variance limit)
In practice, choose in a parametric family, ideally with a closed-form likelihood ratio, and tune parameters to approximate the optimal shape.
Algorithm: Esscher transform for OTM call
Standard MC: draw , transform to . The OTM call payoff is non-zero only when , which is a far right-tail event.
Wait — the cleaner derivation: if and ,
So the IS estimator is
Choosing to centre around the payoff region () makes most samples ITM, and the variance plummets.
import numpy as np
from scipy.stats import norm
S0, K, T, r, sigma, N = 100, 200, 1, 0.05, 0.2, 100_000
# Plain MC
rng = np.random.default_rng(42)
Z = rng.standard_normal(N)
ST = S0 * np.exp((r - 0.5*sigma**2)*T + sigma*np.sqrt(T)*Z)
X_plain = np.exp(-r*T) * np.maximum(ST - K, 0)
print(f"Plain: {X_plain.mean():.5f} ± {1.96*X_plain.std(ddof=1)/np.sqrt(N):.5f}")
# Plain: 0.00043 ± 0.00029 -- very noisy
# IS with mu = z*
z_star = (np.log(K/S0) - (r - 0.5*sigma**2)*T)/(sigma*np.sqrt(T))
mu = z_star # shift to bring most samples to the strike
print(f"z* = {z_star:.3f}, mu = {mu:.3f}")
# z* = 3.218, mu = 3.218
Z_tilde = mu + rng.standard_normal(N)
ST_is = S0 * np.exp((r - 0.5*sigma**2)*T + sigma*np.sqrt(T)*Z_tilde)
weights = np.exp(-mu*Z_tilde + 0.5*mu**2)
X_is = np.exp(-r*T) * np.maximum(ST_is - K, 0) * weights
print(f"IS: {X_is.mean():.5f} ± {1.96*X_is.std(ddof=1)/np.sqrt(N):.5f}")
# IS: 0.00060 ± 0.00001A standard error reduction; equivalent to samples. For deeper OTM strikes, the gain grows further.
Key properties
- Unbiased. For any with appropriate support, has .
- Variance can be much smaller — or much larger. A bad (e.g., shift in the wrong direction) can make IS far worse than plain MC. There's a self-correcting check: if some samples have weights thousands, you've shifted too far.
- Likelihood-ratio explosion. When has lighter tails than , the likelihood ratio has unbounded variance — IS estimator may have infinite variance even with finite . Diagnostic: check that has bounded variance via being finite.
- Effective sample size (ESS). . ESS means good IS; ESS means weights are concentrated on a few samples — bad sign.
- Optimal vs heuristic. Esscher (exponential tilting) is optimal for log-concave payoffs. For more complex payoffs, parametric families with adaptive tuning (cross-entropy method) are used.
- Stacks with other techniques. IS + control variates is common in default modelling. IS + QMC requires careful re-randomisation.
Choosing : rules of thumb
- Concentrate near the high-payoff region. For an OTM call, shift the normal mean toward .
- Don't over-shift. Setting pushes too far; almost all samples are deep ITM with tiny weights, ESS collapses.
- Match tails. If has lighter tails than , weights diverge. Common safe choice: in the same family as , just with different parameters.
- Adaptive tuning. Run a small pilot, estimate the optimal parameters via cross-entropy minimisation, then run the main pass.
- For multi-dimensional problems: shift each dimension separately. The product of likelihood ratios across dimensions becomes the joint LR.
Worked example: default probability
Loss for some asset , large barrier . True probability: .
For : . Plain MC needs samples for relative precision.
IS with shift: sample (centre on the barrier). Weight: . Now half the samples are below the barrier; ESS is large; variance is reduced by .
Used in credit risk (basket default models), regulatory ES estimation, and rare-event simulation generally.
Common confusions and pitfalls
- Forgetting the weights. Easy mistake: sample from but compute the plain mean of . Wrong — must multiply by .
- Numerical instability of weights. Compute and stabilise: (no exponentiation until the final weighted sum). For multi-dim, log-sum-exp tricks.
- Too-aggressive tilting. Pushes most weight onto a few samples, reduces ESS, increases variance.
- Wrong direction. Shifting away from the payoff region makes things worse than plain MC. Always sanity-check on a small pilot.
- Path-dependent IS is harder. For an Asian or barrier option, the natural likelihood ratio is over the entire path, with an exponential of an Itô integral — the Girsanov theorem is the analogue of the Esscher transform here. Computationally, this is a path-wise weighting that's a product over time steps; each step needs to be log-summed correctly.
Where this goes next
- Quasi-Monte Carlo — orthogonal variance-reduction strategy.
- Change of measure — the underlying mechanism (Girsanov) for path-IS.
- Cross-entropy method, adaptive IS — extensions for hard-to-tune problems.