CONTENTS

Brownian Motion

Motivation: why this matters in quant finance

Brownian motion (also called the Wiener process) is the continuous-time random process at the heart of nearly every pricing, hedging, and risk model in quantitative finance. When Black and Scholes wrote down the assumption that a stock price follows a geometric Brownian motion, they were choosing Brownian motion as the source of randomness — the engine that drives price uncertainty forward in time.

The reason is not that markets literally move according to a Wiener process. They don't. The reason is threefold:

  1. Brownian motion is the scaling limit of a random walk. The discrete tick-by-tick movements of a price, when aggregated over many small time steps, converge (via Donsker's theorem) to Brownian motion. This gives the model a solid statistical foundation.
  2. The mathematics works. Brownian motion is the unique process that is continuous, has independent and stationary Gaussian increments, and starts at zero. That combination makes it tractable enough to support an entire calculus — Itô calculus — which in turn makes it possible to derive closed-form or semi-closed-form solutions for derivative prices.
  3. It separates drift from noise. In the stochastic differential equation framework, Brownian motion supplies the unpredictable component (dWtdW_t), while a deterministic drift (μdt\mu\,dt) supplies the trend. This clean separation is exactly what you need to construct hedging arguments, change probability measures, and arrive at risk-neutral pricing.
In short, Brownian motion is not just a mathematical curiosity. It is the modelling primitive from which geometric Brownian motion, the Black-Scholes PDE, Girsanov's theorem, and nearly the entire classical derivatives framework are built.

The informal idea

Lawler describes Brownian motion as random continuous motion. There are two equivalent mental pictures. At each fixed time tt, WtW_t is a random variable. Across all times, tWt(ω)t\mapsto W_t(\omega) is a random function: one possible continuous path of accumulated shocks.

The process is the continuous limit of a random walk when time steps shrink like Δt\Delta t and space steps shrink like Δt\sqrt{\Delta t}. That scaling keeps variance finite and makes the limiting increments Gaussian. The paths become continuous, but not smooth; their roughness is exactly what later forces quadratic variation and Itô's lemma.

Formal definitions

Before defining Brownian motion, we need the probability infrastructure on which it lives. This section is deliberately concise; the details are in Probability Space.
We work on a filtered probability space (Ω,F,{Ft}t0,P)(\Omega, \mathcal{F}, \{\mathcal{F}_t\}_{t \ge 0}, \mathbb{P}), where:
  • Ω\Omega is the sample space — the set of all possible paths the world can take.
  • F\mathcal{F} is the σ\sigma-algebra of all events we can assign probabilities to.
  • {Ft}t0\{\mathcal{F}_t\}_{t \ge 0} is a filtration: a growing family of σ\sigma-algebras with FsFt\mathcal{F}_s \subseteq \mathcal{F}_t for sts \le t. Intuitively, Ft\mathcal{F}_t represents everything that is known (observable) up to time tt. As time passes, information only accumulates — it never disappears.
  • P\mathbb{P} is the probability measure.
A stochastic process (Xt)t0(X_t)_{t \ge 0} is adapted to the filtration if, for every tt, the random variable XtX_t is Ft\mathcal{F}_t-measurable. In plain terms: you can determine the value of XtX_t using only information available at time tt, without peeking into the future. Brownian motion is always assumed to be adapted to its natural filtration (the smallest filtration generated by its own history).
We also impose the usual conditions: the filtration is right-continuous (Ft=Ft+\mathcal{F}_t = \mathcal{F}_{t^+}) and F0\mathcal{F}_0 contains all P\mathbb{P}-null sets. These are technical but standard; they ensure that stopping times, martingale theorems, and Itô calculus work cleanly.

Standard Brownian motion

A stochastic process (Wt)t0(W_t)_{t \ge 0} defined on (Ω,F,{Ft}t0,P)(\Omega, \mathcal{F}, \{\mathcal{F}_t\}_{t \ge 0}, \mathbb{P}) is a standard Brownian motion (or Wiener process) if the following four axioms hold:
(BM1) Initial condition:
W0=0almost surelyW_0 = 0 \quad \text{almost surely}
(BM2) Independent increments:
For any 0t0<t1<<tn0 \le t_0 < t_1 < \cdots < t_n, the increments
Wt1Wt0,Wt2Wt1,,WtnWtn1W_{t_1} - W_{t_0}, \quad W_{t_2} - W_{t_1}, \quad \dots, \quad W_{t_n} - W_{t_{n-1}}
are mutually independent. Knowing where the path has been tells you nothing about the direction of its next move.
(BM3) Gaussian increments with variance equal to elapsed time:
For any 0s<t0 \le s < t,
WtWsN(0,  ts)W_t - W_s \sim \mathcal{N}(0, \; t - s)
The increment is normally distributed with mean zero and variance tst - s. In particular, WtN(0,t)W_t \sim \mathcal{N}(0, t).
(BM4) Continuous paths:
The map tWt(ω)t \mapsto W_t(\omega) is continuous for almost every ωΩ\omega \in \Omega.
These four properties uniquely determine the law of the process (up to modification on null sets). Everything else — the Markov property, the martingale property, quadratic variation, nowhere differentiability — follows as a consequence.

Immediate consequences of the axioms

From the definition alone, several useful facts are immediate.

Moments:
E[Wt]=0,Var(Wt)=t,E[Wt2]=t\mathbb{E}[W_t] = 0, \qquad \text{Var}(W_t) = t, \qquad \mathbb{E}[W_t^2] = t
Covariance:
For sts \le t, write Wt=Ws+(WtWs)W_t = W_s + (W_t - W_s). Since WsW_s is Fs\mathcal{F}_s-measurable and WtWsW_t - W_s is independent of Fs\mathcal{F}_s with mean zero:
Cov(Ws,Wt)=E[WsWt]=E[Ws(Ws+(WtWs))]=E[Ws2]+E[Ws]E[WtWs]=s\text{Cov}(W_s, W_t) = \mathbb{E}[W_s W_t] = \mathbb{E}[W_s(W_s + (W_t - W_s))] = \mathbb{E}[W_s^2] + \mathbb{E}[W_s]\mathbb{E}[W_t - W_s] = s

More compactly:

Cov(Ws,Wt)=min(s,t)\text{Cov}(W_s, W_t) = \min(s, t)

This covariance structure completely characterises a Gaussian process, so it is an equivalent way to specify Brownian motion among Gaussian processes.

Core properties

Stationary increments

The distribution of WtWsW_t - W_s depends only on the length of the interval tst - s, not on when the interval starts. This is already implicit in (BM3), but it is worth naming explicitly: increments are stationary. In finance, this means the "noise structure" of the model is the same whether you look at the first hour of trading or the last.

The t\sqrt{t} scaling (self-similarity)

Standard Brownian motion has a powerful scaling property. For any constant c>0c > 0, define:

W~t=1cWct\tilde{W}_t = \frac{1}{\sqrt{c}} W_{ct}

Then (W~t)t0(\tilde{W}_t)_{t \ge 0} is again a standard Brownian motion. You can verify this by checking the four axioms: the process starts at zero, has independent Gaussian increments with the correct variance, and has continuous paths.

This self-similarity means that Brownian motion looks statistically identical at every time scale — zooming into a small segment of a Brownian path produces a picture that is indistinguishable (in distribution) from the whole path. In finance, this is connected to the assumption that volatility scales as σΔt\sigma\sqrt{\Delta t}: the standard deviation of the increment Wt+ΔtWtW_{t + \Delta t} - W_t is Δt\sqrt{\Delta t}, and when you multiply by volatility σ\sigma and look at log-returns, you get the familiar σΔt\sigma\sqrt{\Delta t} scaling.

Markov property

Brownian motion is a Markov process: given the present value WtW_t, the future (Wu)ut(W_u)_{u \ge t} is independent of the past (Ws)st(W_s)_{s \le t}. Formally:
P(WuAFt)=P(WuAWt)for all ut\mathbb{P}(W_u \in A \mid \mathcal{F}_t) = \mathbb{P}(W_u \in A \mid W_t) \quad \text{for all } u \ge t
This follows directly from the independent increments property. The entire future evolution is determined (in law) by where the process is now — not by how it got there. In finance, this is related to the efficient market hypothesis: if the market is efficient, then the current price already incorporates all past information, and only the present price matters for forecasting the distribution of future prices.

Martingale property

(Wt)t0(W_t)_{t \ge 0} is a martingale with respect to its natural filtration. For sts \le t:
E[WtFs]=E[Ws+(WtWs)Fs]=Ws+E[WtWs]=Ws\mathbb{E}[W_t \mid \mathcal{F}_s] = \mathbb{E}[W_s + (W_t - W_s) \mid \mathcal{F}_s] = W_s + \mathbb{E}[W_t - W_s] = W_s

The second equality uses the fact that WtWsW_t - W_s is independent of Fs\mathcal{F}_s, and the third uses E[WtWs]=0\mathbb{E}[W_t - W_s] = 0.

This is the mathematical expression of "fairness": given everything you know up to time ss, your best forecast of WtW_t is simply WsW_s. No drift, no predictable direction. In the risk-neutral pricing framework, it is precisely this martingale property (applied to discounted prices) that encodes the no-arbitrage condition. See Martingale I for the full story.

A useful related result: the process Wt2tW_t^2 - t is also a martingale. This can be shown by direct computation:

E[Wt2tFs]=E[(Ws+(WtWs))2Fs]t=Ws2+(ts)t=Ws2s\mathbb{E}[W_t^2 - t \mid \mathcal{F}_s] = \mathbb{E}[(W_s + (W_t - W_s))^2 \mid \mathcal{F}_s] - t = W_s^2 + (t - s) - t = W_s^2 - s

This result is more than a curiosity — it is closely connected to quadratic variation and is used in proving properties of Itô integrals.

Gaussianity

Brownian motion is a Gaussian process: every finite collection (Wt1,Wt2,,Wtn)(W_{t_1}, W_{t_2}, \dots, W_{t_n}) is jointly normally distributed. The joint distribution is fully specified by the mean vector (all zeros) and the covariance matrix with entries Cov(Wti,Wtj)=min(ti,tj)\text{Cov}(W_{t_i}, W_{t_j}) = \min(t_i, t_j). This is an extremely strong property. It means that all marginal distributions, all conditional distributions, and all finite-dimensional projections of Brownian motion are Gaussian.

Path properties

The paths of Brownian motion — the actual functions tWt(ω)t \mapsto W_t(\omega) for a given outcome ω\omega — have surprising and important properties that distinguish stochastic calculus from ordinary calculus.

Continuity

By axiom (BM4), almost every sample path of Brownian motion is a continuous function of time. There are no jumps. This is what makes the continuous hedging argument in the Black-Scholes PDE possible: the stock price (modelled as a function of WtW_t) moves continuously, so you can continuously adjust your hedge without being caught off guard by a sudden discontinuity.

In reality, of course, prices do jump (earnings announcements, flash crashes). This is one of the known limitations of Brownian-based models and motivates extensions such as jump-diffusion processes.

Nowhere differentiability

Despite being continuous everywhere, Brownian motion is differentiable nowhere (almost surely). Informally, the path is infinitely "jagged" — no matter how far you zoom in, it never smooths out into something with a well-defined slope.

The intuition is rooted in the scaling of increments. Over a small time interval Δt\Delta t, the typical size of the increment is:

Wt+ΔtWtΔt|W_{t + \Delta t} - W_t| \sim \sqrt{\Delta t}

If you try to form a "derivative" by computing the difference quotient:

Wt+ΔtWtΔtΔtΔt=1Δtas Δt0\frac{W_{t + \Delta t} - W_t}{\Delta t} \sim \frac{\sqrt{\Delta t}}{\Delta t} = \frac{1}{\sqrt{\Delta t}} \to \infty \quad \text{as } \Delta t \to 0
The ratio blows up. The path wiggles too violently for a derivative to exist. This is not a technicality — it is the fundamental reason why ordinary calculus (the chain rule, the product rule) cannot be applied to functions of Brownian motion, and why Itô's Lemma is necessary.

Infinite total variation

The total variation of a function ff on [0,T][0, T] is the supremum over all partitions of:
TV(f;[0,T])=sup{ti}if(ti+1)f(ti)\text{TV}(f; [0, T]) = \sup_{\{t_i\}} \sum_i |f(t_{i+1}) - f(t_i)|
For smooth functions, total variation is finite and equals the integral of f(t)|f'(t)|. For Brownian motion, total variation is infinite almost surely on every interval, no matter how small. Intuitively: the path oscillates so relentlessly that summing up the absolute sizes of its moves produces an infinite total.
This has a direct consequence: you cannot define a pathwise Riemann-Stieltjes integral 0Tf(t)dWt\int_0^T f(t)\,dW_t in the classical sense. The integrator WtW_t has too much variation. This is why stochastic integration requires its own theory (Itô integration), built on L2L^2 limits rather than pointwise approximation.

Quadratic variation

While total (first-order) variation is infinite, the quadratic variation of Brownian motion is finite and deterministic. For a partition 0=t0<t1<<tn=T0 = t_0 < t_1 < \cdots < t_n = T with mesh maxi(ti+1ti)0\max_i(t_{i+1} - t_i) \to 0:
[W]T=limni=0n1(Wti+1Wti)2=T[W]_T = \lim_{n \to \infty} \sum_{i=0}^{n-1} (W_{t_{i+1}} - W_{t_i})^2 = T

The convergence is in L2L^2 and also in probability. This is often written in differential notation as:

[W]t=tor equivalently(dWt)2=dt[W]_t = t \qquad \text{or equivalently} \qquad (dW_t)^2 = dt
Quadratic variation is the single most important concept bridging Brownian motion to Itô calculus. In ordinary calculus, (dx)2(dx)^2 is negligible because smooth functions have increments of order Δt\Delta t, so (Δx)2=O(Δt2)0(\Delta x)^2 = O(\Delta t^2) \to 0. But for Brownian motion, ΔW=O(Δt)\Delta W = O(\sqrt{\Delta t}), so (ΔW)2=O(Δt)(\Delta W)^2 = O(\Delta t) — it does not vanish. This non-vanishing second-order term is precisely why the Taylor expansion of f(Wt)f(W_t) retains the 12f\frac{1}{2}f'' term, and that extra term is the hallmark of Itô's Lemma.
Proof sketch (convergence in L2L^2):
Define Qn=i=0n1(Wti+1Wti)2Q_n = \sum_{i=0}^{n-1}(W_{t_{i+1}} - W_{t_i})^2. Each squared increment (Wti+1Wti)2(W_{t_{i+1}} - W_{t_i})^2 has mean Δti=ti+1ti\Delta t_i = t_{i+1} - t_i and variance 2(Δti)22(\Delta t_i)^2 (since if ZN(0,σ2)Z \sim \mathcal{N}(0, \sigma^2), then Var(Z2)=2σ4\text{Var}(Z^2) = 2\sigma^4). By independence of increments:
E[Qn]=iΔti=T\mathbb{E}[Q_n] = \sum_i \Delta t_i = T Var(Qn)=i2(Δti)22maxi(Δti)iΔti=2Tmaxi(Δti)0\text{Var}(Q_n) = \sum_i 2(\Delta t_i)^2 \le 2 \max_i(\Delta t_i) \cdot \sum_i \Delta t_i = 2T \max_i(\Delta t_i) \to 0

So QnTQ_n \to T in L2L^2, hence in probability. The sum of squared increments converges to the elapsed time.

Everything above converges on a single conclusion: classical calculus is not equipped to handle Brownian motion.

The core issue is always the same. If ff is a smooth function and XtX_t is a smooth function of time, then the chain rule gives df(Xt)=f(Xt)dXtdf(X_t) = f'(X_t)\,dX_t, and you can drop all higher-order terms in a Taylor expansion. But if XtX_t involves Brownian motion, the increment dXtdX_t has a component of order dt\sqrt{dt}, and so (dXt)2(dX_t)^2 has a component of order dtdt that survives in the limit.

Concretely, for a general Itô process:

dXt=a(t,Xt)dt+b(t,Xt)dWtdX_t = a(t, X_t)\,dt + b(t, X_t)\,dW_t
the correct chain rule for f(t,Xt)f(t, X_t) is Itô's Lemma:
df=ftdt+fxdXt+122fx2b2(t,Xt)dtdf = \frac{\partial f}{\partial t}\,dt + \frac{\partial f}{\partial x}\,dX_t + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\,b^2(t, X_t)\,dt

The extra 12fxxb2dt\frac{1}{2}f_{xx}\,b^2\,dt term — absent in ordinary calculus — is entirely due to the non-zero quadratic variation of WtW_t. This is the "Itô correction" and it has far-reaching consequences: it is what produces the 12σ2-\frac{1}{2}\sigma^2 drift correction in the log-return distribution, it is what makes the Black-Scholes PDE differ from a simple transport equation, and it is why hedging requires continuous rebalancing.

For the full derivation and worked examples, see Itô's Lemma.

Quant-finance applications

From Brownian motion to geometric Brownian motion

The standard model for a stock price assumes:

dSt=μStdt+σStdWtdS_t = \mu S_t\,dt + \sigma S_t\,dW_t
where μ\mu is the drift (expected return) and σ\sigma is the volatility. This is the geometric Brownian motion (GBM) SDE. The key feature is that both drift and diffusion are proportional to StS_t, which ensures St>0S_t > 0 and means that percentage changes (not absolute changes) are driven by Brownian motion.

Why this particular SDE? The reasoning is:

  • Multiplicative noise (σStdWt\sigma S_t\,dW_t) means a stock at $100 and a stock at $10 experience the same proportional randomness, which matches how returns behave empirically.
  • The random walk of log-prices in discrete time (the multiplicative random walk) converges in the scaling limit to exactly this SDE.
  • The model is analytically solvable via Itô's Lemma, which is essential for deriving the Black-Scholes PDE.

Deriving the distribution of log-returns

This is a canonical application of Itô's Lemma and illustrates exactly why the Itô correction matters.
Goal: Given dSt=μStdt+σStdWtdS_t = \mu S_t\,dt + \sigma S_t\,dW_t, find the distribution of lnST\ln S_T.
Step 1: Let f(S)=lnSf(S) = \ln S. Then f(S)=1/Sf'(S) = 1/S and f(S)=1/S2f''(S) = -1/S^2. Apply Itô's Lemma to f(St)f(S_t):
d(lnSt)=1StdSt+12(1St2)(σSt)2dtd(\ln S_t) = \frac{1}{S_t}\,dS_t + \frac{1}{2}\left(-\frac{1}{S_t^2}\right)(\sigma S_t)^2\,dt =1St(μStdt+σStdWt)12σ2dt= \frac{1}{S_t}(\mu S_t\,dt + \sigma S_t\,dW_t) - \frac{1}{2}\sigma^2\,dt =(μ12σ2)dt+σdWt= \left(\mu - \frac{1}{2}\sigma^2\right)dt + \sigma\,dW_t
Step 2: This is an arithmetic Brownian motion with constant drift μ12σ2\mu - \frac{1}{2}\sigma^2 and constant diffusion σ\sigma. Integrate from 00 to TT:
lnSTlnS0=(μ12σ2)T+σWT\ln S_T - \ln S_0 = \left(\mu - \frac{1}{2}\sigma^2\right)T + \sigma W_T

Since WTN(0,T)W_T \sim \mathcal{N}(0, T), the log-return is:

lnSTS0N((μ12σ2)T,  σ2T)\ln\frac{S_T}{S_0} \sim \mathcal{N}\left(\left(\mu - \frac{1}{2}\sigma^2\right)T, \; \sigma^2 T\right)

And the stock price itself is:

ST=S0exp((μ12σ2)T+σWT)S_T = S_0 \exp\left(\left(\mu - \frac{1}{2}\sigma^2\right)T + \sigma W_T\right)
The 12σ2-\frac{1}{2}\sigma^2 correction is purely a consequence of Itô calculus. If you naively applied the ordinary chain rule (ignoring the second-order term), you would get d(lnSt)=μdt+σdWtd(\ln S_t) = \mu\,dt + \sigma\,dW_t and conclude that log-returns have drift μ\mu. But that is wrong — the correct drift is μ12σ2\mu - \frac{1}{2}\sigma^2. This difference matters enormously: it determines the expected growth rate of wealth, the calibration of risk-neutral drift, and the correct form of the Black-Scholes PDE.

Risk-neutral measure intuition

Under the real-world (physical) measure P\mathbb{P}, the stock has drift μ\mu:

dSt=μStdt+σStdWtPdS_t = \mu S_t\,dt + \sigma S_t\,dW_t^{\mathbb{P}}
But for pricing derivatives, we need the discounted price ertSte^{-rt}S_t to be a martingale. Under P\mathbb{P}, the discounted price has drift μr\mu - r, which is generally nonzero (investors demand a risk premium). So discounted prices are not martingales under P\mathbb{P}.
The solution is to change the probability measure from P\mathbb{P} to a risk-neutral measure Q\mathbb{Q}. Under Q\mathbb{Q}, the stock dynamics become:
dSt=rStdt+σStdWtQdS_t = r S_t\,dt + \sigma S_t\,dW_t^{\mathbb{Q}}
where WtQW_t^{\mathbb{Q}} is a Brownian motion under Q\mathbb{Q}. The drift μ\mu has been replaced by the risk-free rate rr, and now ertSte^{-rt}S_t is a Q\mathbb{Q}-martingale. This is the content of the Fundamental Theorem of Asset Pricing: no-arbitrage is equivalent to the existence of such a measure.
The technical tool that makes the measure change rigorous is Girsanov's theorem, which says that the relationship between the two Brownian motions is:
WtQ=WtP+μrσtW_t^{\mathbb{Q}} = W_t^{\mathbb{P}} + \frac{\mu - r}{\sigma}\,t
The quantity θ=μrσ\theta = \frac{\mu - r}{\sigma} is called the market price of risk (or Sharpe ratio). Girsanov's theorem guarantees that WtQW_t^{\mathbb{Q}} is indeed a standard Brownian motion under Q\mathbb{Q}, provided certain integrability conditions (Novikov's condition) are satisfied.

The upshot for pricing is clean: the fair price of a derivative with payoff HH at maturity TT is:

V0=erTEQ[H]V_0 = e^{-rT}\,\mathbb{E}^{\mathbb{Q}}[H]
No need to estimate μ\mu. No need to model risk preferences. Just take the expectation under Q\mathbb{Q}, where Brownian motion does the same job but with a different drift. This is the logic underpinning the Black-Scholes formula and all of its extensions. For more on measure changes, see Girsanov's theorem.

Common confusions and pitfalls

"Why can't I differentiate WtW_t?"

This is perhaps the most common confusion for people coming from classical calculus. The short answer: the path is too rough. Over a time interval Δt\Delta t, the increment ΔWΔt\Delta W \sim \sqrt{\Delta t}, so the ratio ΔW/Δt1/Δt\Delta W / \Delta t \sim 1/\sqrt{\Delta t} \to \infty. The path oscillates so violently at every scale that no tangent line exists, anywhere, ever (with probability one).

This does not mean we "cannot do calculus." It means we must use a different calculus — Itô calculus — in which the basic object is not dWt/dtdW_t/dt (which doesn't exist) but dWtdW_t itself (an infinitesimal increment). Stochastic differential equations like dS=μSdt+σSdWdS = \mu S\,dt + \sigma S\,dW are not equations about derivatives; they are shorthand for integral equations:
ST=S0+0TμStdt+0TσStdWtS_T = S_0 + \int_0^T \mu S_t\,dt + \int_0^T \sigma S_t\,dW_t

where the second integral is an Itô integral defined as an L2L^2 limit, not a Riemann sum.

"What does dWtdW_t actually mean?"

dWtdW_t is not a well-defined mathematical object on its own. It is a notational shorthand. When we write dSt=μStdt+σStdWtdS_t = \mu S_t\,dt + \sigma S_t\,dW_t, what we really mean is the integral form above. The "differential notation" is a compact and intuitive way to express the integral equation, but it should always be understood as shorthand.

In particular, dWtdW_t does not have a "value" — you cannot evaluate it at a point. It is meaningful only inside an integral. Think of it as analogous to dxdx in f(x)dx\int f(x)\,dx: the symbol dxdx is not a number, but it makes the integral notation work.

"Why does (dWt)2=dt(dW_t)^2 = dt?"

This is one of the most frequently misunderstood statements in stochastic calculus. It is not an algebraic identity — you are not squaring a number and getting another number. It is a statement about quadratic variation: when you sum up squared increments of Brownian motion over a partition and take the limit, you get the elapsed time.

More precisely, (dWt)2=dt(dW_t)^2 = dt is shorthand for:

i=0n1(Wti+1Wti)2L2Tas maxi(ti+1ti)0\sum_{i=0}^{n-1} (W_{t_{i+1}} - W_{t_i})^2 \xrightarrow{L^2} T \quad \text{as } \max_i(t_{i+1} - t_i) \to 0

The "multiplication rules" of stochastic calculus — (dWt)2=dt(dW_t)^2 = dt, dtdWt=0dt \cdot dW_t = 0, (dt)2=0(dt)^2 = 0 — are not algebra. They are limit statements about how different types of infinitesimal quantities behave when summed over many small intervals. The first rule holds because Brownian increments are of order dt\sqrt{dt}; the second holds because dtdt=dt3/20\sqrt{dt} \cdot dt = dt^{3/2} \to 0 faster than dtdt; the third holds trivially.

These rules are what you mechanically apply when using Itô's Lemma, but it is important to remember that the justification is always convergence in probability (or L2L^2), not algebraic manipulation.

"Brownian motion has drift zero, so it can't model a stock with positive expected return"

This confuses the role of Brownian motion with the role of the SDE. Standard Brownian motion WtW_t has zero drift, but the stock price model dSt=μStdt+σStdWtdS_t = \mu S_t\,dt + \sigma S_t\,dW_t can have any drift μ\mu you like. Brownian motion supplies the noise; the drift is added separately as a deterministic term μStdt\mu S_t\,dt. And under the risk-neutral measure Q\mathbb{Q}, the drift is rr (the risk-free rate), not μ\mu — so even the meaning of "drift" changes with the measure.

"If (dWt)2=dt(dW_t)^2 = dt is deterministic, doesn't that mean we can predict WtW_t?"

No. The quadratic variation [W]t=t[W]_t = t is deterministic, but quadratic variation tells you about the accumulated squared fluctuation — it says how much total "energy" the path has spent by time tt. It does not tell you the direction of any individual move. Knowing that (ΔW)2T\sum (\Delta W)^2 \approx T is like knowing the length of a tangled rope without knowing its shape. The signed increments ΔWi\Delta W_i are still independent normals with mean zero; the information about direction cancels out when you square.

Where this goes next

  • Geometric Brownian Motion: The exponential of Brownian motion with drift. Gives a positive, log-normal process that is the canonical stock-price model in the Black-Scholes framework.
  • Itô's Lemma: The stochastic chain rule. Every calculation involving a function of Brownian motion — log-returns, option Greeks, the Black-Scholes PDE — routes through this result.
  • Stochastic Differential Equations: The integral-equation framework Xt=X0+ads+bdWsX_t = X_0 + \int a\,ds + \int b\,dW_s in which SDEs like dS=μSdt+σSdWdS = \mu S\,dt + \sigma S\,dW are properly defined.
  • Martingales: Brownian motion is the canonical continuous martingale; the martingale representation theorem says that under mild conditions, every martingale on the Brownian filtration is a stochastic integral against WW.
  • Change of Measure (Girsanov's theorem): Under a new measure Q\mathbb{Q}, WtW_t plus a drift is a new Brownian motion. This is the machine that makes risk-neutral pricing rigorous.
  • Black-Scholes PDE: The capstone application. Brownian motion supplies the noise, Itô's lemma supplies the calculus, and Girsanov supplies the drift change.

References

  • Lawler, G. F. (2023). Stochastic Calculus: An Introduction with Applications. Ch. 2 §2.3 (Limits of random walks), §2.4 (Brownian motion), §2.6 (Understanding Brownian motion), §2.7 (Computations for Brownian motion), §2.8 (Quadratic variation).
  • Albin, P., Hamza, K., & Klebaner, F. C. (2025). Problems and Solutions in Stochastic Calculus with Applications. World Scientific. Ch. 4 (Brownian Motion Calculus) — supporting exercise checks.

Exercises

Test your understanding with 3 exercises for this lesson.