CONTENTS

Decision Tree

Motivation: why this matters in quant finance

Decision trees turn nonlinear rules into a model that can still be inspected. A tree can learn that a signal matters only when volatility is high, liquidity is thin, or a spread crosses a threshold. That interaction is awkward for a plain linear model unless it is engineered by hand.

A single tree is also the base learner behind Random Forest. Understanding one tree makes ensembles less mysterious: a forest is variance reduction over many unstable trees.

The informal idea

A decision tree asks a sequence of questions. Is realised volatility below this threshold? If yes, go left. Is momentum positive? If no, go right. At a leaf, predict the average target or majority class among observations that reached that leaf.

Each split should make child nodes more homogeneous than the parent. This is local modelling through rectangles in feature space, not through neighbour distances as in K-Nearest Neighbors (KNN).

Formal statement

For classification, Gini impurity is

Gm=1k=1Kpm,k2,G_m = 1 - \sum_{k=1}^K p_{m,k}^2,

where pm,kp_{m,k} is the class-kk fraction in node mm. CART searches for a feature jj and threshold tt that minimise weighted child impurity:

J(j,t)=nLnmGL+nRnmGR.J(j,t)=\frac{n_L}{n_m}G_L+\frac{n_R}{n_m}G_R.

For regression trees, the analogous criterion is reduction in squared error.

Implementation

import numpy as np class DecisionStump: """One-split regression tree for teaching the split criterion.""" def fit(self, X: np.ndarray, y: np.ndarray): best = (np.inf, None, None) for j in range(X.shape[1]): for threshold in np.unique(X[:, j]): left = X[:, j] <= threshold if left.all() or (~left).all(): continue loss = ((y[left] - y[left].mean()) ** 2).sum() loss += ((y[~left] - y[~left].mean()) ** 2).sum() if loss < best[0]: best = (loss, j, threshold) self.feature_, self.threshold_ = best[1], best[2] return self rng = np.random.default_rng(23) vol = rng.uniform(0.5, 2.5, size=80) y = np.where(vol > 1.4, -0.02, 0.01) + 0.004 * rng.normal(size=80) stump = DecisionStump().fit(vol.reshape(-1, 1), y) print(stump.feature_, round(stump.threshold_, 2)) # 0 1.4

Key properties and trade-offs

PropertyMeaningFinance consequence
Axis-aligned splitsEach split thresholds one feature.Captures simple regimes but needs many splits for diagonal boundaries.
Little scaling needTrees do not require standardisation.Convenient for mixed tabular features.
High varianceSmall data changes can change the tree.A single tree is often unstable across windows.
InterpretabilityPaths can be read as rules.Useful for research review and governance.

Worked example: volatility threshold

A tree might learn that a momentum signal works only when realised volatility is below 1.4%. The path is readable. If volatility is high, predict weak next-period return; if volatility is low, inspect momentum. The trade-off is threshold instability.

Common confusions and pitfalls

"Trees cannot overfit because they are simple rules." Deep trees can memorise noise with very specific paths.
"Feature importance is causal importance." It reflects split usefulness inside this fitted tree.
"No scaling means no preprocessing." Leakage-safe feature construction and validation still matter.

Where this goes next

References

  • Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 6 (Decision Trees, CART, Gini impurity, limitations).
  • Avrim Blum, John Hopcroft, and Ravindran Kannan (2020). Foundations of Data Science. Ch. 5.6.3 (Application: Learning Decision Trees) and Ch. 5.7 (Regularization).
Decision Tree | q4quant.studio