CONTENTS

Covariance Matrices

Motivation: why this matters in quant finance

A covariance matrix Σ\Sigma is the full n×nn \times n bookkeeping of pairwise covariances for a random vector X=(X1,,Xn)X = (X_1, \ldots, X_n). It is the single most important object in multi-asset quantitative finance. Every canonical model of a portfolio lives inside Σ\Sigma:
  • Markowitz portfolio optimisation. The optimal weights are wΣ1μw^* \propto \Sigma^{-1}\mu. Without Σ\Sigma there is no mean-variance portfolio.
  • Value-at-Risk (VaR). Parametric VaR is αwΣw\alpha\sqrt{w^\top \Sigma w} — one matrix computation.
  • Principal Component Analysis (PCA). The eigen-decomposition of Σ\Sigma exposes the dominant factors in a return cross-section.
  • Factor models. Σ=BΩB+D\Sigma = B\Omega B^\top + D where BB are factor loadings, Ω\Omega is the factor-covariance matrix, and DD is idiosyncratic. Estimating this decomposition is the bread and butter of risk management.
  • Kalman filtering. State-space models carry a covariance matrix of the filtered state and update it at every time step via recursive matrix algebra.
  • Copula models. A gaussian or tt-copula is fully parameterised by a correlation matrix, which is a normalised Σ\Sigma.
  • Stress testing. "What happens to portfolio loss if we shock the covariance matrix by X%?" — concrete requirement under Basel rules.
Beyond the applications, Σ\Sigma has deep structure: it is symmetric positive semi-definite, its eigenvalues are non-negative, and its "roots" Σ1/2\Sigma^{1/2} generate correlated gaussian samples in Monte Carlo. Understanding its algebra and geometry is foundational for any multi-asset work.

Formal definition

For a random vector XRnX \in \mathbb{R}^n with mean vector μ=E[X]Rn\mu = \mathbb{E}[X] \in \mathbb{R}^n, the covariance matrix Σ\Sigma is the n×nn \times n matrix with entries:
Σij:=Cov(Xi,Xj)=E[(Xiμi)(Xjμj)].\Sigma_{ij} := \text{Cov}(X_i, X_j) = \mathbb{E}[(X_i - \mu_i)(X_j - \mu_j)].

Equivalently, Σ=E[(Xμ)(Xμ)]\Sigma = \mathbb{E}[(X - \mu)(X - \mu)^\top].

Diagonal entries are variances: Σii=Var(Xi)\Sigma_{ii} = \text{Var}(X_i). Off-diagonal entries are pairwise covariances.

Sample version. Given mm observations x1,,xmRnx_1, \ldots, x_m \in \mathbb{R}^n with sample mean xˉ=1mixi\bar x = \tfrac{1}{m}\sum_i x_i:
Σ^=1m1i=1m(xixˉ)(xixˉ).\hat\Sigma = \frac{1}{m - 1}\sum_{i=1}^m (x_i - \bar x)(x_i - \bar x)^\top.

The (m1)(m - 1) denominator is Bessel's correction for unbiasedness.

Properties

Property 1 — Symmetric and positive semi-definite (PSD)

Σ\Sigma is symmetric: Σij=Cov(Xi,Xj)=Cov(Xj,Xi)=Σji\Sigma_{ij} = \text{Cov}(X_i, X_j) = \text{Cov}(X_j, X_i) = \Sigma_{ji}.

Σ\Sigma is positive semi-definite: for any aRna \in \mathbb{R}^n,
aΣa=Var(aX)0.a^\top \Sigma a = \text{Var}(a^\top X) \ge 0.
Hence eigenvalues are non-negative (λi0\lambda_i \ge 0), and Σ\Sigma is positive definite (λi>0\lambda_i > 0 for all ii) iff no non-trivial linear combination of the XiX_i is a.s. constant.

In finance, Σ\Sigma fails to be strictly PD precisely when assets are linearly redundant (e.g. two tracking funds with identical composition). Numerically, rank-deficient or nearly-rank-deficient Σ\Sigma causes trouble with matrix inversion — standard cures: regularisation, shrinkage, or factor decomposition.

Property 2 — Affine transformation

If Y=AX+bY = AX + b for a matrix AA and vector bb, then:

Cov(Y)=AΣA.\text{Cov}(Y) = A\,\Sigma\,A^\top.

Special cases:

  • Portfolio variance: Y=wXY = w^\top X (scalar portfolio return), Var(Y)=wΣw\text{Var}(Y) = w^\top\Sigma w.
  • Variance of a sum: iXi=1X\sum_i X_i = \mathbf{1}^\top X, Var(Xi)=1Σ1\text{Var}(\sum X_i) = \mathbf{1}^\top\Sigma\mathbf{1}.

Property 3 — Spectral decomposition

Σ\Sigma symmetric PSD admits the spectral decomposition:

Σ=QΛQ,Λ=diag(λ1,,λn),λ1λn0,\Sigma = Q\Lambda Q^\top, \qquad \Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n), \quad \lambda_1 \ge \cdots \ge \lambda_n \ge 0,
with QQ orthogonal (QQ=IQ Q^\top = I). The columns of QQ are principal-component directions; the eigenvalues λi\lambda_i are the variances of the corresponding principal components. PCA is the data-scientific operation that exposes this decomposition.

Property 4 — Cholesky factorisation

If Σ\Sigma is positive definite, it has a unique lower-triangular Cholesky factor LL with positive diagonal entries such that
Σ=LL.\Sigma = L L^\top.

Cholesky is the cheapest way to generate correlated gaussian samples: if ZN(0,I)Z \sim \mathcal{N}(0, I), then X=LZN(0,Σ)X = LZ \sim \mathcal{N}(0, \Sigma) because Cov(X)=LIL=Σ\text{Cov}(X) = L\cdot I\cdot L^\top = \Sigma. In a Monte Carlo risk simulation with 1000 correlated assets, this is the main workhorse.

Property 5 — Correlation matrix

Normalising Σ\Sigma by volatilities gives the correlation matrix ρ\rho with entries ρij=Σij/ΣiiΣjj\rho_{ij} = \Sigma_{ij}/\sqrt{\Sigma_{ii}\Sigma_{jj}}. Equivalently:
ρ=D1ΣD1,D:=diag(Σ11,,Σnn).\rho = D^{-1}\Sigma D^{-1}, \qquad D := \text{diag}(\sqrt{\Sigma_{11}}, \ldots, \sqrt{\Sigma_{nn}}).

ρ\rho is symmetric, has all diagonal entries 11, and has entries in [1,1][-1, 1]. It carries no volatility information; Σ=DρD\Sigma = D\rho D reconstructs the full covariance.

Canonical examples

Example 1 — Two-asset portfolio covariance

Two assets with variances σ12,σ22\sigma_1^2, \sigma_2^2 and correlation ρ\rho:

Σ=(σ12ρσ1σ2ρσ1σ2σ22).\Sigma = \begin{pmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{pmatrix}.

Portfolio with weights w=(w1,w2)w = (w_1, w_2): Var(portfolio)=w12σ12+w22σ22+2w1w2ρσ1σ2\text{Var}(\text{portfolio}) = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1 w_2\rho\sigma_1\sigma_2.

Minimum-variance portfolio (full investment, w1+w2=1w_1 + w_2 = 1): w1=(σ22ρσ1σ2)/(σ12+σ222ρσ1σ2)w_1^* = (\sigma_2^2 - \rho\sigma_1\sigma_2)/(\sigma_1^2 + \sigma_2^2 - 2\rho\sigma_1\sigma_2).

Example 2 — Equicorrelated covariance

Σij=ρσ2\Sigma_{ij} = \rho\sigma^2 for iji \ne j and Σii=σ2\Sigma_{ii} = \sigma^2. Equivalently Σ=σ2[(1ρ)I+ρ11]\Sigma = \sigma^2[(1 - \rho)I + \rho\mathbf{1}\mathbf{1}^\top].

Eigenvalues: 1+(n1)ρ1 + (n - 1)\rho (eigenvector 1\mathbf{1}, the "market factor") and 1ρ1 - \rho with multiplicity n1n - 1 (idiosyncratic directions), both times σ2\sigma^2. PSD requires ρ1/(n1)\rho \ge -1/(n - 1) — with nn large, this is essentially ρ0\rho \ge 0. (A fully-anti-correlated large portfolio is impossible.)

This is the simplest factor model: all assets share one common factor and have equal idiosyncratic variance.

Example 3 — Factor model covariance

X=BF+ϵX = B F + \epsilon with FRkF \in \mathbb{R}^k factors, BB the n×kn \times k loadings, and ϵ\epsilon an idiosyncratic term with diagonal covariance DD. Assuming FF and ϵ\epsilon are independent:

Σ=BΩB+D,\Sigma = B\,\Omega\,B^\top + D,
where Ω=Cov(F)\Omega = \text{Cov}(F). When knk \ll n, this is a low-rank + diagonal decomposition of Σ\Sigma. It is how the Barra risk models, Axioma, MSCI, and every production equity risk system stores the covariance matrix for thousands of assets — direct n×nn\times n storage and inversion is too expensive.

Example 4 — Portfolio VaR under Gaussian returns

With RN(μ,Σ)R \sim \mathcal{N}(\mu, \Sigma) and portfolio ww:

wRN(wμ,wΣw).w^\top R \sim \mathcal{N}(w^\top\mu, w^\top\Sigma w).

95% 1-day VaR =wμ+1.645wΣw= -w^\top\mu + 1.645\sqrt{w^\top\Sigma w}. The entire calculation is a matrix operation in Σ\Sigma.

Common pitfalls

"Σ\Sigma is invertible." No — Σ\Sigma is only guaranteed PSD, which allows zero eigenvalues. Singularity is common in finance when assets are linearly redundant (e.g. hedged pairs). Use pseudo-inverse, regularisation, or factor decomposition.
"Sample covariance Σ^\hat\Sigma is a good estimate." For nn assets and TT observations, Σ^\hat\Sigma has rank min(n,T)\le \min(n, T); when T<nT < n, Σ^\hat\Sigma is automatically rank-deficient and completely unreliable. Shrinkage estimators (Ledoit-Wolf) or factor-model regularisation are essential in high-dimensional settings.
"Correlation matrix is interchangeable with covariance matrix." No — correlations hide scale information. Two portfolios with identical ρ\rho can have vastly different risk if volatilities differ by orders of magnitude.
"The correlation matrix must be PSD." It must be — and sample correlation matrices from short data often aren't (after cleaning / truncation / manual edits, practitioners frequently produce non-PSD correlation matrices). The cure: "nearest PSD matrix" algorithms (Higham's method), or reprojection onto the PSD cone.
"Eigenvalues give risk directly." The interpretation "top eigenvector = market factor" works for equity cross-sections but not universally. For multi-asset portfolios (bonds, FX, commodities), eigenvalues reflect the specific units and scaling; always standardise before interpretation.
"Cholesky always works." Only for strictly positive-definite Σ\Sigma. Rank-deficient (singular) Σ\Sigma requires LDL or eigen-decomposition with sqrt of non-negative eigenvalues.

Where this goes next

  • Correlation and Dependence: Background on pairwise correlations; this lesson extends to the full matrix.
  • Moments and Summary Statistics: Variance, the diagonal of Σ\Sigma.
  • Linear Regression Derivation: β^=(XX)1Xy\hat\beta = (X^\top X)^{-1}X^\top y requires inversion of a covariance-like matrix.
  • Markowitz Portfolio Theory: Applied optimisation with Σ\Sigma (see Portfolio Optimization).
  • Principal Component Analysis: Spectral decomposition of Σ\Sigma applied to a data matrix (future lesson).

Exercises

Test your understanding with 3 exercises for this lesson.