Covariance Matrices

Motivation: why this matters in quant finance

A covariance matrix

\Sigma

is the full

n \times n

bookkeeping of pairwise covariances for a random vector

X = (X_1, \ldots, X_n)

. It is the single most important object in multi-asset quantitative finance. Every canonical model of a portfolio lives inside

\Sigma

Markowitz portfolio optimisation. The optimal weights are $w^* \propto \Sigma^{-1}\mu$ . Without $\Sigma$ there is no mean-variance portfolio.
Value-at-Risk (VaR). Parametric VaR is $\alpha\sqrt{w^\top \Sigma w}$ — one matrix computation.
Principal Component Analysis (PCA). The eigen-decomposition of $\Sigma$ exposes the dominant factors in a return cross-section.
Factor models. $\Sigma = B\Omega B^\top + D$ where $B$ are factor loadings, $\Omega$ is the factor-covariance matrix, and $D$ is idiosyncratic. Estimating this decomposition is the bread and butter of risk management.
Kalman filtering. State-space models carry a covariance matrix of the filtered state and update it at every time step via recursive matrix algebra.
Copula models. A gaussian or $t$ -copula is fully parameterised by a correlation matrix, which is a normalised $\Sigma$ .
Stress testing. "What happens to portfolio loss if we shock the covariance matrix by X%?" — concrete requirement under Basel rules.

Beyond the applications,

\Sigma

has deep structure: it is symmetric positive semi-definite, its eigenvalues are non-negative, and its "roots"

\Sigma^{1/2}

generate correlated gaussian samples in Monte Carlo. Understanding its algebra and geometry is foundational for any multi-asset work.

Formal definition

For a random vector

X \in \mathbb{R}^n

with mean vector

\mu = \mathbb{E}[X] \in \mathbb{R}^n

, the covariance matrix

\Sigma

is the

n \times n

matrix with entries:

\Sigma_{ij} := \text{Cov}(X_i, X_j) = \mathbb{E}[(X_i - \mu_i)(X_j - \mu_j)].

Equivalently, $\Sigma = \mathbb{E}[(X - \mu)(X - \mu)^\top]$ .

Diagonal entries are variances: $\Sigma_{ii} = \text{Var}(X_i)$ . Off-diagonal entries are pairwise covariances.

Sample version. Given

m

observations

x_1, \ldots, x_m \in \mathbb{R}^n

with sample mean

\bar x = \tfrac{1}{m}\sum_i x_i

\hat\Sigma = \frac{1}{m - 1}\sum_{i=1}^m (x_i - \bar x)(x_i - \bar x)^\top.

The $(m - 1)$ denominator is Bessel's correction for unbiasedness.

Properties

Property 1 — Symmetric and positive semi-definite (PSD)

$\Sigma$ is symmetric: $\Sigma_{ij} = \text{Cov}(X_i, X_j) = \text{Cov}(X_j, X_i) = \Sigma_{ji}$ .

\Sigma

is positive semi-definite: for any

a \in \mathbb{R}^n

a^\top \Sigma a = \text{Var}(a^\top X) \ge 0.

Hence eigenvalues are non-negative (

\lambda_i \ge 0

), and

\Sigma

is positive definite (

\lambda_i > 0

for all

i

) iff no non-trivial linear combination of the

X_i

is a.s. constant.

In finance, $\Sigma$ fails to be strictly PD precisely when assets are linearly redundant (e.g. two tracking funds with identical composition). Numerically, rank-deficient or nearly-rank-deficient $\Sigma$ causes trouble with matrix inversion — standard cures: regularisation, shrinkage, or factor decomposition.

Property 2 — Affine transformation

If $Y = AX + b$ for a matrix $A$ and vector $b$ , then:

\text{Cov}(Y) = A\,\Sigma\,A^\top.

Special cases:

Portfolio variance: $Y = w^\top X$ (scalar portfolio return), $\text{Var}(Y) = w^\top\Sigma w$ .
Variance of a sum: $\sum_i X_i = \mathbf{1}^\top X$ , $\text{Var}(\sum X_i) = \mathbf{1}^\top\Sigma\mathbf{1}$ .

Property 3 — Spectral decomposition

$\Sigma$ symmetric PSD admits the spectral decomposition:

\Sigma = Q\Lambda Q^\top, \qquad \Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n), \quad \lambda_1 \ge \cdots \ge \lambda_n \ge 0,

with

Q

orthogonal (

Q Q^\top = I

). The columns of

Q

are principal-component directions; the eigenvalues

\lambda_i

are the variances of the corresponding principal components. PCA is the data-scientific operation that exposes this decomposition.

Property 4 — Cholesky factorisation

\Sigma

is positive definite, it has a unique lower-triangular Cholesky factor

L

with positive diagonal entries such that

\Sigma = L L^\top.

Cholesky is the cheapest way to generate correlated gaussian samples: if $Z \sim \mathcal{N}(0, I)$ , then $X = LZ \sim \mathcal{N}(0, \Sigma)$ because $\text{Cov}(X) = L\cdot I\cdot L^\top = \Sigma$ . In a Monte Carlo risk simulation with 1000 correlated assets, this is the main workhorse.

Property 5 — Correlation matrix

Normalising

\Sigma

by volatilities gives the correlation matrix

\rho

with entries

\rho_{ij} = \Sigma_{ij}/\sqrt{\Sigma_{ii}\Sigma_{jj}}

. Equivalently:

\rho = D^{-1}\Sigma D^{-1}, \qquad D := \text{diag}(\sqrt{\Sigma_{11}}, \ldots, \sqrt{\Sigma_{nn}}).

$\rho$ is symmetric, has all diagonal entries $1$ , and has entries in $[-1, 1]$ . It carries no volatility information; $\Sigma = D\rho D$ reconstructs the full covariance.

Canonical examples

Example 1 — Two-asset portfolio covariance

Two assets with variances $\sigma_1^2, \sigma_2^2$ and correlation $\rho$ :

\Sigma = \begin{pmatrix}\sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{pmatrix}.

Portfolio with weights $w = (w_1, w_2)$ : $\text{Var}(\text{portfolio}) = w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1 w_2\rho\sigma_1\sigma_2$ .

Minimum-variance portfolio (full investment, $w_1 + w_2 = 1$ ): $w_1^* = (\sigma_2^2 - \rho\sigma_1\sigma_2)/(\sigma_1^2 + \sigma_2^2 - 2\rho\sigma_1\sigma_2)$ .

Example 2 — Equicorrelated covariance

$\Sigma_{ij} = \rho\sigma^2$ for $i \ne j$ and $\Sigma_{ii} = \sigma^2$ . Equivalently $\Sigma = \sigma^2[(1 - \rho)I + \rho\mathbf{1}\mathbf{1}^\top]$ .

Eigenvalues: $1 + (n - 1)\rho$ (eigenvector $\mathbf{1}$ , the "market factor") and $1 - \rho$ with multiplicity $n - 1$ (idiosyncratic directions), both times $\sigma^2$ . PSD requires $\rho \ge -1/(n - 1)$ — with $n$ large, this is essentially $\rho \ge 0$ . (A fully-anti-correlated large portfolio is impossible.)

This is the simplest factor model: all assets share one common factor and have equal idiosyncratic variance.

Example 3 — Factor model covariance

$X = B F + \epsilon$ with $F \in \mathbb{R}^k$ factors, $B$ the $n \times k$ loadings, and $\epsilon$ an idiosyncratic term with diagonal covariance $D$ . Assuming $F$ and $\epsilon$ are independent:

\Sigma = B\,\Omega\,B^\top + D,

where

\Omega = \text{Cov}(F)

. When

k \ll n

, this is a low-rank + diagonal decomposition of

\Sigma

. It is how the Barra risk models, Axioma, MSCI, and every production equity risk system stores the covariance matrix for thousands of assets — direct

n\times n

storage and inversion is too expensive.

Example 4 — Portfolio VaR under Gaussian returns

With $R \sim \mathcal{N}(\mu, \Sigma)$ and portfolio $w$ :

w^\top R \sim \mathcal{N}(w^\top\mu, w^\top\Sigma w).

95% 1-day VaR $= -w^\top\mu + 1.645\sqrt{w^\top\Sigma w}$ . The entire calculation is a matrix operation in $\Sigma$ .

Common pitfalls

" $\Sigma$ is invertible." No —

\Sigma

is only guaranteed PSD, which allows zero eigenvalues. Singularity is common in finance when assets are linearly redundant (e.g. hedged pairs). Use pseudo-inverse, regularisation, or factor decomposition.

"Sample covariance $\hat\Sigma$ is a good estimate." For

n

assets and

T

observations,

\hat\Sigma

has rank

\le \min(n, T)

; when

T < n

\hat\Sigma

is automatically rank-deficient and completely unreliable. Shrinkage estimators (Ledoit-Wolf) or factor-model regularisation are essential in high-dimensional settings.

"Correlation matrix is interchangeable with covariance matrix." No — correlations hide scale information. Two portfolios with identical

\rho

can have vastly different risk if volatilities differ by orders of magnitude.

"The correlation matrix must be PSD." It must be — and sample correlation matrices from short data often aren't (after cleaning / truncation / manual edits, practitioners frequently produce non-PSD correlation matrices). The cure: "nearest PSD matrix" algorithms (Higham's method), or reprojection onto the PSD cone.

"Eigenvalues give risk directly." The interpretation "top eigenvector = market factor" works for equity cross-sections but not universally. For multi-asset portfolios (bonds, FX, commodities), eigenvalues reflect the specific units and scaling; always standardise before interpretation.

"Cholesky always works." Only for strictly positive-definite

\Sigma

. Rank-deficient (singular)

\Sigma

requires LDL or eigen-decomposition with sqrt of non-negative eigenvalues.

Where this goes next

Correlation and Dependence: Background on pairwise correlations; this lesson extends to the full matrix.
Moments and Summary Statistics: Variance, the diagonal of $\Sigma$ .
Linear Regression Derivation: $\hat\beta = (X^\top X)^{-1}X^\top y$ requires inversion of a covariance-like matrix.
Markowitz Portfolio Theory: Applied optimisation with $\Sigma$ (see Portfolio Optimization).
Principal Component Analysis: Spectral decomposition of $\Sigma$ applied to a data matrix (future lesson).