Decision Tree
Motivation: why this matters in quant finance
Decision trees turn nonlinear rules into a model that can still be inspected. A tree can learn that a signal matters only when volatility is high, liquidity is thin, or a spread crosses a threshold. That interaction is awkward for a plain linear model unless it is engineered by hand.
The informal idea
A decision tree asks a sequence of questions. Is realised volatility below this threshold? If yes, go left. Is momentum positive? If no, go right. At a leaf, predict the average target or majority class among observations that reached that leaf.
Formal statement
For classification, Gini impurity is
where is the class- fraction in node . CART searches for a feature and threshold that minimise weighted child impurity:
For regression trees, the analogous criterion is reduction in squared error.
Implementation
import numpy as np
class DecisionStump:
"""One-split regression tree for teaching the split criterion."""
def fit(self, X: np.ndarray, y: np.ndarray):
best = (np.inf, None, None)
for j in range(X.shape[1]):
for threshold in np.unique(X[:, j]):
left = X[:, j] <= threshold
if left.all() or (~left).all():
continue
loss = ((y[left] - y[left].mean()) ** 2).sum()
loss += ((y[~left] - y[~left].mean()) ** 2).sum()
if loss < best[0]:
best = (loss, j, threshold)
self.feature_, self.threshold_ = best[1], best[2]
return self
rng = np.random.default_rng(23)
vol = rng.uniform(0.5, 2.5, size=80)
y = np.where(vol > 1.4, -0.02, 0.01) + 0.004 * rng.normal(size=80)
stump = DecisionStump().fit(vol.reshape(-1, 1), y)
print(stump.feature_, round(stump.threshold_, 2))
# 0 1.4Key properties and trade-offs
| Property | Meaning | Finance consequence |
|---|---|---|
| Axis-aligned splits | Each split thresholds one feature. | Captures simple regimes but needs many splits for diagonal boundaries. |
| Little scaling need | Trees do not require standardisation. | Convenient for mixed tabular features. |
| High variance | Small data changes can change the tree. | A single tree is often unstable across windows. |
| Interpretability | Paths can be read as rules. | Useful for research review and governance. |
Worked example: volatility threshold
A tree might learn that a momentum signal works only when realised volatility is below 1.4%. The path is readable. If volatility is high, predict weak next-period return; if volatility is low, inspect momentum. The trade-off is threshold instability.
Common confusions and pitfalls
Where this goes next
- Random Forest: averages many trees to reduce variance.
- K-Nearest Neighbors (KNN): compares a distance-based kind of locality.
- Cross-Validation: chooses depth and leaf-size controls.
- Logistic Regression: provides a smoother linear-probability baseline.
References
- Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 6 (Decision Trees, CART, Gini impurity, limitations).
- Avrim Blum, John Hopcroft, and Ravindran Kannan (2020). Foundations of Data Science. Ch. 5.6.3 (Application: Learning Decision Trees) and Ch. 5.7 (Regularization).