CONTENTS

Neural Networks from Scratch

Motivation: why this matters in quant finance

Neural networks are differentiable function approximators. In quant finance they appear in volatility-surface smoothing, nonlinear factor models, surrogate pricing functions, execution models, and regime classifiers. The library call is easy; the useful understanding is how affine maps, activations, losses, gradients, and updates fit together.

This lesson is intentionally from scratch. It does not replace Keras or PyTorch. It gives the NumPy-level mechanics so later library code feels inspectable rather than magical.

The informal idea

A dense neural network alternates linear transformations with nonlinear activations. The first layer builds hidden features. The output layer turns hidden features into predictions. Training repeats four steps: forward pass, loss computation, backward pass, parameter update.

Without nonlinear activations, stacking layers collapses to one Linear Regression model.

Formal statement

For a one-hidden-layer regression network,

H=tanh(XW1+b1),y^=HW2+b2.\begin{aligned} \mathbf{H} &= \tanh(\mathbf{X}\mathbf{W}_1+\mathbf{b}_1),\\\\ \hat{\mathbf{y}} &= \mathbf{H}\mathbf{W}_2 + b_2. \end{aligned}

With mean squared error,

L(θ)=1ni=1n(yiy^i)2.L(\theta)=\frac{1}{n}\sum_{i=1}^n(y_i-\hat{y}_i)^2.
Backpropagation applies the chain rule from the loss back through the output layer, activation, and first affine map. Gradient Descent updates each parameter in the negative-gradient direction.

Implementation

import numpy as np class TinyMLP: """One-hidden-layer neural network for regression.""" def __init__(self, n_features: int, n_hidden: int = 8, learning_rate: float = 0.05, seed: int = 37): rng = np.random.default_rng(seed) self.W1 = rng.normal(scale=0.3, size=(n_features, n_hidden)) self.b1 = np.zeros(n_hidden) self.W2 = rng.normal(scale=0.3, size=(n_hidden, 1)) self.b2 = np.zeros(1) self.learning_rate = learning_rate def fit(self, X: np.ndarray, y: np.ndarray, n_iter: int = 2_000): y = y.reshape(-1, 1) for _ in range(n_iter): H = np.tanh(X @ self.W1 + self.b1) y_hat = H @ self.W2 + self.b2 d_y_hat = 2 * (y_hat - y) / len(y) d_W2 = H.T @ d_y_hat d_b2 = d_y_hat.sum(axis=0) d_H = d_y_hat @ self.W2.T d_Z1 = d_H * (1 - H**2) self.W2 -= self.learning_rate * d_W2 self.b2 -= self.learning_rate * d_b2 self.W1 -= self.learning_rate * (X.T @ d_Z1) self.b1 -= self.learning_rate * d_Z1.sum(axis=0) return self def predict(self, X: np.ndarray) -> np.ndarray: H = np.tanh(X @ self.W1 + self.b1) return (H @ self.W2 + self.b2).ravel() rng = np.random.default_rng(37) X = rng.uniform(-2, 2, size=(200, 2)) y = np.sin(X[:, 0]) + 0.3 * X[:, 1] ** 2 model = TinyMLP(n_features=2).fit(X, y) print(np.round(model.predict(X[:3]), 3)) # [ 0.85 -0.125 -0.752]
Real projects use automatic differentiation, batching, regularisation, and better optimisers such as Adam. The point here is to see what those tools automate.

Key properties and trade-offs

PropertyMeaningFinance consequence
CompositionalLayers build nonlinear features.Useful for nonlinear surfaces and interaction-heavy signals.
DifferentiableTraining depends on gradients through the graph.Smooth losses and activations make optimisation feasible.
Data-hungryFlexibility can overfit small datasets.Chronological validation is essential.
Harder to interpretParameters are not factor betas.Benchmark against simpler models.

Common confusions and pitfalls

"A neural network is always better." On structured finance data, Random Forest or regularised linear models may win.
"Backpropagation is mysterious." It is the chain rule applied efficiently to a computation graph.
"Training loss is model quality." A network can learn noise while training loss falls.

Where this goes next

References

  • Aurelien Geron (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly. Ch. 10 (Introduction to Artificial Neural Networks with Keras).
  • Andrew Ng and Tengyu Ma (2023). CS229 Lecture Notes. Ch. 7 (Deep Learning and Backpropagation).
  • Francois Chollet (2021). Deep Learning with Python (2nd ed.). Manning. Ch. 2 (The Mathematical Building Blocks of Neural Networks).

Exercises

Test your understanding with 3 exercises for this lesson.