Portfolio Optimization
Why Portfolio Optimization?
Portfolio optimization is the process of selecting asset weights to maximize expected return for a given level of risk, or minimize risk for a given expected return. This mathematical framework, pioneered by Harry Markowitz in 1952, revolutionized investment management by formalizing the risk-return tradeoff and the benefits of diversification.
Modern portfolio theory provides the foundation for asset allocation, risk budgeting, and performance evaluation across all areas of finance — from individual retirement planning to institutional asset management and hedge fund strategies.
Mean-Variance Optimization
Markowitz Framework
Objective : Minimize portfolio variance subject to expected return constraint.
Setup :
Asset returns : r i r_i r i with expected return μ i \mu_i μ i and covariance matrix Σ \Sigma Σ
Portfolio weights : w = ( w 1 , … , w n ) T w = (w_1, \ldots, w_n)^T w = ( w 1 , … , w n ) T with ∑ i = 1 n w i = 1 \sum_{i=1}^n w_i = 1 ∑ i = 1 n w i = 1
Portfolio return : r p = w T r r_p = w^T r r p = w T r
Portfolio expected return : μ p = w T μ \mu_p = w^T \mu μ p = w T μ
Portfolio variance : σ p 2 = w T Σ w \sigma_p^2 = w^T \Sigma w σ p 2 = w T Σ w
Optimization Problem
Minimum Variance Portfolio :
min w w T Σ w subject to 1 T w = 1 \min_{w} w^T \Sigma w \quad \text{subject to} \quad \mathbf{1}^T w = 1 w min w T Σ w subject to 1 T w = 1
Solution :
w M V = Σ − 1 1 1 T Σ − 1 1 w^{MV} = \frac{\Sigma^{-1} \mathbf{1}}{\mathbf{1}^T \Sigma^{-1} \mathbf{1}} w M V = 1 T Σ − 1 1 Σ − 1 1
Target Return Portfolio :
min w w T Σ w subject to 1 T w = 1 , μ T w = μ p \min_{w} w^T \Sigma w \quad \text{subject to} \quad \mathbf{1}^T w = 1, \quad \mu^T w = \mu_p w min w T Σ w subject to 1 T w = 1 , μ T w = μ p
Solution :
w = g + h μ p w = g + h\mu_p w = g + h μ p
where:
g = Σ − 1 ( 1 − μ C ) 1 T Σ − 1 1 , h = Σ − 1 ( μ B − 1 A ) 1 T Σ − 1 1 g = \frac{\Sigma^{-1}(\mathbf{1} - \mu C)}{\mathbf{1}^T \Sigma^{-1} \mathbf{1}}, \quad h = \frac{\Sigma^{-1}(\mu B - \mathbf{1} A)}{\mathbf{1}^T \Sigma^{-1} \mathbf{1}} g = 1 T Σ − 1 1 Σ − 1 ( 1 − μ C ) , h = 1 T Σ − 1 1 Σ − 1 ( μ B − 1 A )
with:
A = 1 T Σ − 1 1 , B = μ T Σ − 1 1 , C = μ T Σ − 1 μ A = \mathbf{1}^T \Sigma^{-1} \mathbf{1}, \quad B = \mu^T \Sigma^{-1} \mathbf{1}, \quad C = \mu^T \Sigma^{-1} \mu A = 1 T Σ − 1 1 , B = μ T Σ − 1 1 , C = μ T Σ − 1 μ
Efficient Frontier
The efficient frontier is the locus of all mean-variance efficient portfolios:
σ p 2 ( μ p ) = C − 2 B μ p + A μ p 2 A C − B 2 \sigma_p^2(\mu_p) = \frac{C - 2B\mu_p + A\mu_p^2}{AC - B^2} σ p 2 ( μ p ) = A C − B 2 C − 2 B μ p + A μ p 2
Key Properties :
Hyperbolic shape in mean-variance space
Two-fund separation : Any efficient portfolio is a combination of any two efficient portfolios
Minimum variance portfolio lies at the vertex
Capital Asset Pricing Model (CAPM)
When a risk-free asset with return r f r_f r f is available:
Capital Allocation Line :
μ p = r f + μ T − r f σ T σ p \mu_p = r_f + \frac{\mu_T - r_f}{\sigma_T} \sigma_p μ p = r f + σ T μ T − r f σ p
where the tangent portfolio has weights:
w T = Σ − 1 ( μ − r f 1 ) 1 T Σ − 1 ( μ − r f 1 ) w^T = \frac{\Sigma^{-1}(\mu - r_f \mathbf{1})}{\mathbf{1}^T \Sigma^{-1}(\mu - r_f \mathbf{1})} w T = 1 T Σ − 1 ( μ − r f 1 ) Σ − 1 ( μ − r f 1 )
Black-Litterman Model
Motivation
Traditional mean-variance optimization suffers from:
Estimation error : Small changes in inputs cause large weight changes
Extreme positions : Optimizers concentrate in few assets
Counterintuitive results : Negative weights in "good" assets
Framework
Prior : Market capitalization weights
w m w_m w m with equilibrium returns:
Π = δ Σ w m \Pi = \delta \Sigma w_m Π = δ Σ w m
where δ \delta δ is the risk aversion coefficient.
Views : Investor's views on specific returns:
P μ = Q + ε P\mu = Q + \varepsilon P μ = Q + ε
where:
P P P : Picking matrix (which assets the views concern)
Q Q Q : Vector of view returns
ε ∼ N ( 0 , Ω ) \varepsilon \sim \mathcal{N}(0, \Omega) ε ∼ N ( 0 , Ω ) : View uncertainty
Bayesian Update :
μ ˉ = [ ( τ Σ ) − 1 + P T Ω − 1 P ] − 1 [ ( τ Σ ) − 1 Π + P T Ω − 1 Q ] \bar{\mu} = [(\tau\Sigma)^{-1} + P^T\Omega^{-1}P]^{-1}[(\tau\Sigma)^{-1}\Pi + P^T\Omega^{-1}Q] μ ˉ = [( τ Σ ) − 1 + P T Ω − 1 P ] − 1 [( τ Σ ) − 1 Π + P T Ω − 1 Q ]
Σ ˉ = [ ( τ Σ ) − 1 + P T Ω − 1 P ] − 1 \bar{\Sigma} = [(\tau\Sigma)^{-1} + P^T\Omega^{-1}P]^{-1} Σ ˉ = [( τ Σ ) − 1 + P T Ω − 1 P ] − 1
Optimal Weights :
w = Σ ˉ − 1 μ ˉ 1 T Σ ˉ − 1 μ ˉ w = \frac{\bar{\Sigma}^{-1}\bar{\mu}}{\mathbf{1}^T\bar{\Sigma}^{-1}\bar{\mu}} w = 1 T Σ ˉ − 1 μ ˉ Σ ˉ − 1 μ ˉ
Risk Parity
Equal Risk Contribution
Objective : Each asset contributes equally to portfolio risk.
Risk Contribution :
R C i = w i ∂ σ p ∂ w i = w i ( Σ w ) i σ p RC_i = w_i \frac{\partial \sigma_p}{\partial w_i} = w_i \frac{(\Sigma w)_i}{\sigma_p} R C i = w i ∂ w i ∂ σ p = w i σ p ( Σ w ) i
Equal Risk Constraint :
R C i = σ p 2 n ∀ i RC_i = \frac{\sigma_p^2}{n} \quad \forall i R C i = n σ p 2 ∀ i
Maximum Diversification
Objective : Maximize the ratio of weighted average volatility to portfolio volatility:
M D = w T σ w T Σ w MD = \frac{w^T \sigma}{\sqrt{w^T \Sigma w}} M D = w T Σ w w T σ
where σ = ( σ 1 , … , σ n ) T \sigma = (\sigma_1, \ldots, \sigma_n)^T σ = ( σ 1 , … , σ n ) T are individual asset volatilities.
Minimum Variance
Risk parity often approximates the minimum variance portfolio when correlations are moderate.
Factor Models in Optimization
Single-Factor Model
r i = α i + β i f + ε i r_i = \alpha_i + \beta_i f + \varepsilon_i r i = α i + β i f + ε i
Covariance Matrix :
Σ = β β T σ f 2 + D \Sigma = \beta\beta^T \sigma_f^2 + D Σ = β β T σ f 2 + D
where D = diag ( σ ε 1 2 , … , σ ε n 2 ) D = \text{diag}(\sigma_{\varepsilon_1}^2, \ldots, \sigma_{\varepsilon_n}^2) D = diag ( σ ε 1 2 , … , σ ε n 2 ) .
Multi-Factor Model
r i = α i + ∑ k = 1 K β i , k f k + ε i r_i = \alpha_i + \sum_{k=1}^K \beta_{i,k} f_k + \varepsilon_i r i = α i + k = 1 ∑ K β i , k f k + ε i
Benefits :
Dimension reduction : K ≪ n K \ll n K ≪ n
Structural interpretation : Economic factors
Estimation efficiency : Fewer parameters
Fama-French Factors
Three-Factor Model :
r i , t − r f , t = α i + β i , M ( r M , t − r f , t ) + β i , S M B S M B t + β i , H M L H M L t + ε i , t r_{i,t} - r_{f,t} = \alpha_i + \beta_{i,M}(r_{M,t} - r_{f,t}) + \beta_{i,SMB}SMB_t + \beta_{i,HML}HML_t + \varepsilon_{i,t} r i , t − r f , t = α i + β i , M ( r M , t − r f , t ) + β i , SMB SM B t + β i , H M L H M L t + ε i , t
Factors :
Market : Excess market return
SMB : Small minus big (size factor)
HML : High minus low (value factor)
Robust Optimization
Uncertainty Sets
Box Uncertainty :
U = { μ : μ i L ≤ μ i ≤ μ i U } \mathcal{U} = \{\mu : \mu_i^L \leq \mu_i \leq \mu_i^U\} U = { μ : μ i L ≤ μ i ≤ μ i U }
Ellipsoidal Uncertainty :
U = { μ : ( μ − μ ^ ) T Σ μ − 1 ( μ − μ ^ ) ≤ κ 2 } \mathcal{U} = \{\mu : (\mu - \hat{\mu})^T \Sigma_\mu^{-1} (\mu - \hat{\mu}) \leq \kappa^2\} U = { μ : ( μ − μ ^ ) T Σ μ − 1 ( μ − μ ^ ) ≤ κ 2 }
Robust Formulation
Max-Min Problem :
max w min μ ∈ U μ T w − γ 2 w T Σ w \max_{w} \min_{\mu \in \mathcal{U}} \mu^T w - \frac{\gamma}{2} w^T \Sigma w w max μ ∈ U min μ T w − 2 γ w T Σ w
Solution (ellipsoidal uncertainty):
w ∗ = 1 γ Σ − 1 ( μ ^ − κ w T Σ − 1 Σ μ Σ − 1 w w T Σ − 1 w Σ − 1 w ) w^* = \frac{1}{\gamma} \Sigma^{-1}(\hat{\mu} - \kappa\sqrt{\frac{w^T \Sigma^{-1} \Sigma_\mu \Sigma^{-1} w}{w^T \Sigma^{-1} w}}\Sigma^{-1} w) w ∗ = γ 1 Σ − 1 ( μ ^ − κ w T Σ − 1 w w T Σ − 1 Σ μ Σ − 1 w Σ − 1 w )
Dynamic Portfolio Optimization
Merton's Problem
Continuous-time setup :
max c t , w t E [ ∫ 0 T U ( c t ) d t + B ( X T ) ] \max_{c_t, w_t} \mathbb{E}\left[\int_0^T U(c_t) dt + B(X_T)\right] c t , w t max E [ ∫ 0 T U ( c t ) d t + B ( X T ) ]
subject to:
d X t = ( r + w t T ( μ − r 1 ) − c t ) X t d t + w t T σ X t d W t dX_t = (r + w_t^T(\mu - r\mathbf{1}) - c_t)X_t dt + w_t^T \sigma X_t dW_t d X t = ( r + w t T ( μ − r 1 ) − c t ) X t d t + w t T σ X t d W t
Solution (power utility):
w t = 1 γ Σ − 1 ( μ − r 1 ) w_t = \frac{1}{\gamma} \Sigma^{-1}(\mu - r\mathbf{1}) w t = γ 1 Σ − 1 ( μ − r 1 )
Multi-Period Discrete Model
Dynamic Programming :
V t ( x ) = max w t E [ V t + 1 ( x t + 1 ) ∣ x t = x ] V_t(x) = \max_{w_t} \mathbb{E}[V_{t+1}(x_{t+1}) | x_t = x] V t ( x ) = w t max E [ V t + 1 ( x t + 1 ) ∣ x t = x ]
Challenges :
Curse of dimensionality : State space grows exponentially
Parameter uncertainty : Must update beliefs
Transaction costs : Rebalancing costs
Alternative Risk Measures
Value at Risk (VaR) Optimization
Objective : Minimize portfolio VaR:
min w VaR α ( w T r ) subject to μ T w ≥ μ min \min_{w} \text{VaR}_\alpha(w^T r) \quad \text{subject to} \quad \mu^T w \geq \mu_{\min} w min VaR α ( w T r ) subject to μ T w ≥ μ m i n
Linear approximation (normal returns):
VaR α ≈ μ T w − Φ − 1 ( α ) w T Σ w \text{VaR}_\alpha \approx \mu^T w - \Phi^{-1}(\alpha) \sqrt{w^T \Sigma w} VaR α ≈ μ T w − Φ − 1 ( α ) w T Σ w
Conditional Value at Risk (CVaR)
CVaR α = E [ L ∣ L ≥ VaR α ] \text{CVaR}_\alpha = \mathbb{E}[L | L \geq \text{VaR}_\alpha] CVaR α = E [ L ∣ L ≥ VaR α ]
Advantages :
Coherent risk measure : Satisfies desirable properties
Convex optimization : Easier to solve
Tail risk focus : Captures extreme scenarios
Optimization with CVaR
min w , ζ ζ + 1 1 − α E [ max ( 0 , − w T r − ζ ) ] \min_{w,\zeta} \zeta + \frac{1}{1-\alpha} \mathbb{E}[\max(0, -w^T r - \zeta)] w , ζ min ζ + 1 − α 1 E [ max ( 0 , − w T r − ζ )]
Constraints and Practical Considerations
Common Constraints
Long-only :
w i ≥ 0 ∀ i w_i \geq 0 \quad \forall i w i ≥ 0 ∀ i
Sector limits :
∑ i ∈ sector s w i ≤ u s \sum_{i \in \text{sector } s} w_i \leq u_s i ∈ sector s ∑ w i ≤ u s
Turnover constraints :
∑ i = 1 n ∣ w i − w i , prev ∣ ≤ T \sum_{i=1}^n |w_i - w_{i,\text{prev}}| \leq T i = 1 ∑ n ∣ w i − w i , prev ∣ ≤ T
Tracking error :
( w − w b ) T Σ ( w − w b ) ≤ T E \sqrt{(w - w_b)^T \Sigma (w - w_b)} \leq TE ( w − w b ) T Σ ( w − w b ) ≤ TE
Transaction Costs
Linear costs :
Cost = ∑ i = 1 n c i ∣ w i − w i , prev ∣ \text{Cost} = \sum_{i=1}^n c_i |w_i - w_{i,\text{prev}}| Cost = i = 1 ∑ n c i ∣ w i − w i , prev ∣
Market impact :
Cost = ∑ i = 1 n α i ( w i − w i , prev ) 2 \text{Cost} = \sum_{i=1}^n \alpha_i (w_i - w_{i,\text{prev}})^2 Cost = i = 1 ∑ n α i ( w i − w i , prev ) 2
Machine Learning in Portfolio Optimization
Feature Engineering
Technical indicators : Moving averages, momentum, volatility
Fundamental ratios : P/E, P/B, ROE, debt-to-equity
Macroeconomic variables : GDP growth, inflation, term structure
Alternative data : Sentiment, satellite data, credit card spending
Regularization
Ridge regression :
min w ∣ ∣ y − X β ∣ ∣ 2 + λ ∣ ∣ β ∣ ∣ 2 \min_w ||y - X\beta||^2 + \lambda ||\beta||^2 min w ∣∣ y − Xβ ∣ ∣ 2 + λ ∣∣ β ∣ ∣ 2
Lasso regression :
min w ∣ ∣ y − X β ∣ ∣ 2 + λ ∣ ∣ β ∣ ∣ 1 \min_w ||y - X\beta||^2 + \lambda ||\beta||_1 min w ∣∣ y − Xβ ∣ ∣ 2 + λ ∣∣ β ∣ ∣ 1
Elastic net : Combines ridge and lasso penalties
Neural Networks
Deep learning for:
Return prediction : Non-linear factor models
Risk modeling : Time-varying covariance
Regime detection : Hidden market states
Performance Evaluation
Risk-Adjusted Returns
Sharpe Ratio :
S R = μ p − r f σ p SR = \frac{\mu_p - r_f}{\sigma_p} SR = σ p μ p − r f
Information Ratio :
I R = μ p − μ b T E IR = \frac{\mu_p - \mu_b}{TE} I R = TE μ p − μ b
Sortino Ratio :
Sortino = μ p − r f Downside deviation \text{Sortino} = \frac{\mu_p - r_f}{\text{Downside deviation}} Sortino = Downside deviation μ p − r f
Alpha Decomposition
Jensen's Alpha :
α = μ p − r f − β ( μ m − r f ) \alpha = \mu_p - r_f - \beta(\mu_m - r_f) α = μ p − r f − β ( μ m − r f )
Multi-factor alpha :
α = μ p − r f − ∑ k = 1 K β k ( μ f k − r f ) \alpha = \mu_p - r_f - \sum_{k=1}^K \beta_k (\mu_{f_k} - r_f) α = μ p − r f − k = 1 ∑ K β k ( μ f k − r f )
ESG and Sustainable Investing
ESG Integration
ESG scores as additional constraints or tilts:
min w w T Σ w subject to w T s E S G ≥ s min \min_{w} w^T \Sigma w \quad \text{subject to} \quad w^T s_{ESG} \geq s_{\min} w min w T Σ w subject to w T s ESG ≥ s m i n
Exclusionary screening : Remove assets below ESG threshold
Best-in-class : Select top ESG performers within sectors
Impact Measurement
Carbon footprint : Portfolio-weighted carbon intensity
UN SDGs : Alignment with Sustainable Development Goals
Engagement metrics : Proxy voting, shareholder resolutions
Algorithmic Implementation
Quadratic Programming
Standard mean-variance problems reduce to:
min w 1 2 w T Q w + c T w s.t. A w = b , G w ≤ h \min_w \frac{1}{2} w^T Q w + c^T w \quad \text{s.t.} \quad Aw = b, \quad Gw \leq h w min 2 1 w T Qw + c T w s.t. A w = b , Gw ≤ h
Interior Point Methods
Efficient for large-scale problems with many constraints.
Heuristic Approaches
Genetic algorithms : Global optimization for complex objectives
Simulated annealing : Escape local minima
Particle swarm : Population-based optimization
Connection to Other Topics
Portfolio optimization integrates many quantitative concepts:
Built on probability theory and random variables
Uses normal distribution assumptions extensively
Connects to linear regression for factor models
Applies optimization algorithms for solution
Foundation for risk management and asset allocation
Links to option pricing via risk-neutral measures
Enables sophisticated quantitative strategies