Cross-Validation
Motivation: why this matters in quant finance
Cross-validation keeps model selection separate from final evaluation. In quant work, this is not bookkeeping. Hyperparameters, feature choices, thresholds, transformations, and data-cleaning rules can all overfit if selected by looking at the test set.
The basic idea is simple: repeatedly hold out part of the training data, fit on the rest, and measure performance on the holdout fold. The finance complication is time. Random folds are often wrong for forecasting because they can train on future regimes and validate on past ones.
The informal idea
Use the training set to create several miniature train/validation experiments. Each observation gets a turn as validation data. Average the scores to choose the model or hyperparameter. Only then evaluate once on the untouched test set.
For chronological data, preserve time ordering with rolling, expanding, or blocked splits.
Formal statement
In -fold cross-validation, split training indices into folds . For each fold , fit on all indices except and evaluate on :
The selected hyperparameter is the one with the best average validation score. The test set remains unused until the selection rule is fixed.
Implementation
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
rng = np.random.default_rng(43)
X = rng.normal(size=(240, 4))
y = 0.6 * X[:, 0] - 0.4 * X[:, 1] + 0.25 * rng.normal(size=240)
cv = TimeSeriesSplit(n_splits=5)
for alpha in [0.1, 1.0, 10.0]:
model = make_pipeline(StandardScaler(), Ridge(alpha=alpha))
scores = cross_val_score(model, X, y, cv=cv, scoring="neg_mean_squared_error")
print(alpha, round(-scores.mean(), 4))
# 0.1 0.0711
# 1.0 0.0709
# 10.0 0.0701The scaler sits inside the pipeline, so each fold learns scaling parameters only from its training slice. Scaling before splitting is leakage.
Key properties and trade-offs
| Property | Meaning | Finance consequence |
|---|---|---|
| Repeated validation | Each fold gives a noisy out-of-sample estimate. | Average scores are more stable than one lucky split. |
| Pipeline discipline | Preprocessing is fit inside each fold. | Prevents feature leakage. |
| Time order matters | Random folds can train on the future. | Use time-series splits for forecasting and strategy research. |
| Test set is final | It is touched after model selection. | Reusing it turns it into validation data. |
Worked example: choosing ridge alpha
Common confusions and pitfalls
Where this goes next
- Ridge Regression: uses CV to select penalty strength.
- Support Vector Machine (SVM): needs CV for , , and kernel choices.
- Random Forest: uses validation to tune tree depth, leaf size, and feature subsampling.
- Regularisation: L1 vs L2: explains what the selected penalty is doing geometrically.
References
- Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 2 (train/test discipline, GridSearchCV) and Ch. 3 (cross-validation for classification).
- Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 9.3 (Model Selection via Cross Validation).
- Deisenroth, Faisal, and Ong (2020). Mathematics for Machine Learning. Ch. 8 (Model Selection).