From a24731fd8d5122209c54cc6e05112b8432e68b3b Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Fri, 6 Feb 2026 22:40:10 +1100
Subject: [PATCH 01/37] update

---
 lectures/_static/quant-econ.bib  |   24 +
 lectures/_toc.yml                |    1 +
 lectures/chow_business_cycles.md | 1162 ++++++++++++++++++++++++++++++
 3 files changed, 1187 insertions(+)
 create mode 100644 lectures/chow_business_cycles.md
diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 218573589..55b678f94 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -2733,3 +2733,27 @@ @article{Meghir2004
   year={2004},
   publisher={Wiley Online Library}
 }
+
+@article{Chow1968,
+  title={The Acceleration Principle and the Nature of Business Cycles},
+  author={Chow, Gregory C.},
+  journal={The Quarterly Journal of Economics},
+  volume={82},
+  number={3},
+  pages={403--418},
+  year={1968},
+  month={aug},
+  publisher={Oxford University Press}
+}
+
+@article{ChowLevitan1969,
+  title={Nature of Business Cycles Implicit in a Linear Economic Model},
+  author={Chow, Gregory C. and Levitan, Richard E.},
+  journal={The Quarterly Journal of Economics},
+  volume={83},
+  number={3},
+  pages={504--517},
+  year={1969},
+  month={aug},
+  publisher={Oxford University Press}
+}
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index 8d63e5906..aeaab36b5 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -56,6 +56,7 @@ parts:
   - file: inventory_dynamics
   - file: linear_models
   - file: samuelson
+  - file: chow_business_cycles
   - file: kesten_processes
   - file: wealth_dynamics
   - file: kalman
diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
new file mode 100644
index 000000000..9e99b0478
--- /dev/null
+++ b/lectures/chow_business_cycles.md
@@ -0,0 +1,1162 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.2
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(chow_business_cycles)=
+
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# The Acceleration Principle and the Nature of Business Cycles
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+This lecture studies two classic papers by Gregory Chow on business cycles in linear dynamic models:
+
+- {cite}`Chow1968`: why acceleration-type investment behavior matters for oscillations, and how to read stochastic dynamics through autocovariances and spectral densities
+- {cite}`ChowLevitan1969`: how those tools look when applied to a calibrated macroeconometric model of the U.S. economy
+
+These papers sit right at the intersection of three themes in this lecture series:
+
+- The multiplier–accelerator mechanism in {doc}`samuelson`
+- Linear stochastic difference equations and autocovariances in {doc}`linear_models`
+- Eigenmodes of multivariate dynamics in {doc}`var_dmd`
+- Fourier ideas in {doc}`eig_circulant` (and, for empirical estimation, the advanced lecture [Estimation of Spectra](https://python-advanced.quantecon.org/estspec.html#))
+
+We will keep coming back to three ideas:
+
+- In deterministic models, oscillations correspond to complex eigenvalues of a transition matrix.
+- In stochastic models, a "cycle" shows up as a local peak in a (univariate) spectral density.
+- Spectral peaks depend on eigenvalues, but also on how shocks enter (the covariance matrix $V$) and on how observables load on eigenmodes.
+
+## A linear system with shocks
+
+Both papers analyze (or reduce to) a first-order linear stochastic system
+
+```{math}
+:label: chow_var1
+
+y_t = A y_{t-1} + u_t,
+\qquad
+\mathbb E[u_t] = 0,
+\qquad
+\mathbb E[u_t u_t^\top] = V,
+\qquad
+\mathbb E[u_t u_{t-k}^\top] = 0 \ (k \neq 0).
+```
+
+When the eigenvalues of $A$ are strictly inside the unit circle, the process is (covariance) stationary and its autocovariances exist.
+
+In the notation of {doc}`linear_models`, this is the same stability condition that guarantees a unique solution to a discrete Lyapunov equation.
+
+Define the lag-$k$ autocovariance matrices
+
+```{math}
+:label: chow_autocov_def
+
+\Gamma_k := \mathbb E[y_t y_{t-k}^\top] .
+```
+
+Standard calculations (also derived in {cite}`Chow1968`) give the recursion
+
+```{math}
+:label: chow_autocov_rec
+
+\Gamma_k = A \Gamma_{k-1}, \quad k \ge 1,
+\qquad\text{and}\qquad
+\Gamma_0 = A \Gamma_0 A^\top + V.
+```
+
+The second equation is the discrete Lyapunov equation for $\Gamma_0$.
+
+## From autocovariances to spectra
+
+Chow’s key step is to translate the autocovariance sequence $\{\Gamma_k\}$ into a frequency-domain object.
+
+The **spectral density matrix** is the Fourier transform of $\Gamma_k$:
+
+```{math}
+:label: chow_spectral_def
+
+F(\omega) := \frac{1}{2\pi} \sum_{k=-\infty}^{\infty} \Gamma_k e^{-i \omega k},
+\qquad \omega \in [0, \pi].
+```
+
+For the VAR(1) system {eq}`chow_var1`, this sum has a closed form
+
+```{math}
+:label: chow_spectral_closed
+
+F(\omega)
+= \frac{1}{2\pi}
+\left(I - A e^{-i\omega}\right)^{-1}
+V
+\left(I - A^\top e^{i\omega}\right)^{-1}.
+```
+
+Intuitively, $F(\omega)$ tells us how much variation in $y_t$ is associated with cycles of (angular) frequency $\omega$.
+
+The corresponding cycle length is
+
+```{math}
+:label: chow_period
+
+T(\omega) = \frac{2\pi}{\omega}.
+```
+
+The advanced lecture {doc}`advanced:estspec` explains how to estimate $F(\omega)$ from data.
+
+Here we focus on the model-implied spectrum.
+
+We will use the following imports and helper functions throughout the lecture.
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+
+def spectral_density_var1(A, V, ω_grid):
+    """Spectral density matrix for VAR(1): y_t = A y_{t-1} + u_t."""
+    A, V = np.asarray(A), np.asarray(V)
+    n = A.shape[0]
+    I = np.eye(n)
+    F = np.empty((len(ω_grid), n, n), dtype=complex)
+    for k, ω in enumerate(ω_grid):
+        H = np.linalg.inv(I - np.exp(-1j * ω) * A)
+        F[k] = (H @ V @ H.conj().T) / (2 * np.pi)
+    return F
+
+def spectrum_of_linear_combination(F, b):
+    """Spectrum of x_t = b'y_t given the spectral matrix F(ω)."""
+    b = np.asarray(b).reshape(-1, 1)
+    return np.array([np.real((b.T @ F[k] @ b).item()) for k in range(F.shape[0])])
+
+def simulate_var1(A, V, T, burn=200, seed=1234):
+    """Simulate y_t = A y_{t-1} + u_t with u_t ~ N(0, V)."""
+    rng = np.random.default_rng(seed)
+    A, V = np.asarray(A), np.asarray(V)
+    n = A.shape[0]
+    chol = np.linalg.cholesky(V)
+    y = np.zeros((T + burn, n))
+    for t in range(1, T + burn):
+        y[t] = A @ y[t - 1] + chol @ rng.standard_normal(n)
+    return y[burn:]
+
+def sample_autocorrelation(x, max_lag):
+    """Sample autocorrelation of a 1d array from lag 0 to max_lag."""
+    x = np.asarray(x)
+    x = x - x.mean()
+    denom = np.dot(x, x)
+    acf = np.empty(max_lag + 1)
+    for k in range(max_lag + 1):
+        acf[k] = np.dot(x[:-k] if k else x, x[k:]) / denom
+    return acf
+```
+
+## Deterministic propagation and acceleration
+
+Chow {cite}`Chow1968` begins with a clean deterministic question:
+
+> If you build a macro model using only standard demand equations with simple distributed lags, can the system generate sustained oscillations without acceleration?
+
+He shows that, under natural sign restrictions, the answer is no.
+
+### A demand system without acceleration
+
+Consider a system where each component $y_{it}$ responds to aggregate output $Y_t$ and its own lag:
+
+```{math}
+:label: chow_simple_demand
+
+y_{it} = a_i Y_t + b_i y_{i,t-1},
+\qquad
+Y_t = \sum_i y_{it},
+\qquad
+a_i > 0,\; b_i > 0.
+```
+
+Chow shows that the implied transition matrix has real characteristic roots, and that if $\sum_i a_i < 1$ these roots are also positive.
+
+In that case, solutions are linear combinations of decaying exponentials without persistent sign-switching components, so there are no “business-cycle-like” oscillations driven purely by internal propagation.
+
+### What acceleration changes
+
+For investment (and some durables), Chow argues that a more relevant starting point is a *stock adjustment* equation (demand for a stock), e.g.
+
+```{math}
+:label: chow_stock_adj
+
+s_{it} = \alpha_i Y_t + \beta_i s_{i,t-1}.
+```
+
+If flow investment is proportional to the change in the desired stock, differencing introduces terms in $\Delta Y_t$.
+
+That "acceleration" structure creates negative coefficients (in lagged levels), which makes complex roots possible.
+
+This connects directly to {doc}`samuelson`, where acceleration is the key ingredient that can generate damped or persistent oscillations in a deterministic second-order difference equation.
+
+To see the mechanism with minimal algebra, take the multiplier–accelerator law of motion
+
+```{math}
+Y_t = c Y_{t-1} + v (Y_{t-1} - Y_{t-2}),
+```
+
+and rewrite it as a first-order system in $(Y_t, Y_{t-1})$.
+
+```{code-cell} ipython3
+def samuelson_transition(c, v):
+    return np.array([[c + v, -v], [1.0, 0.0]])
+
+c = 0.6
+v_values = (0.0, 0.8)
+A_list = [samuelson_transition(c, v) for v in v_values]
+
+for v, A in zip(v_values, A_list):
+    eig = np.linalg.eigvals(A)
+    print(f"v={v:.1f}, eigenvalues={eig}")
+
+# impulse responses from a one-time unit shock in Y
+T = 40
+s0 = np.array([1.0, 0.0])
+irfs = []
+for A in A_list:
+    s = s0.copy()
+    path = np.empty(T + 1)
+    for t in range(T + 1):
+        path[t] = s[0]
+        s = A @ s
+    irfs.append(path)
+
+# model-implied spectra for the stochastic version with shocks in the Y equation
+freq = np.linspace(1e-4, 0.5, 2500)  # cycles/period
+ω_grid = 2 * np.pi * freq
+V = np.array([[1.0, 0.0], [0.0, 0.0]])
+
+spectra = []
+for A in A_list:
+    F = spectral_density_var1(A, V, ω_grid)
+    f11 = np.real(F[:, 0, 0])
+    spectra.append(f11 / np.trapz(f11, freq))
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+axes[0].plot(range(T + 1), irfs[0], lw=1.8, label="no acceleration")
+axes[0].plot(range(T + 1), irfs[1], lw=1.8, label="with acceleration")
+axes[0].axhline(0.0, lw=0.8)
+axes[0].set_xlabel("time")
+axes[0].set_ylabel(r"$Y_t$")
+axes[0].legend(frameon=False)
+
+axes[1].plot(freq, spectra[0], lw=1.8, label="no acceleration")
+axes[1].plot(freq, spectra[1], lw=1.8, label="with acceleration")
+axes[1].set_xlabel(r"frequency $\omega/2\pi$")
+axes[1].set_ylabel("normalized spectrum")
+axes[1].set_xlim([0.0, 0.5])
+axes[1].legend(frameon=False)
+
+plt.tight_layout()
+plt.show()
+```
+
+The left panel shows that acceleration creates oscillatory impulse responses.
+
+The right panel shows the corresponding spectral signature: a peak at interior frequencies.
+
+### How the accelerator shifts the spectral peak
+
+As we increase the accelerator $v$, the complex eigenvalues rotate further from the real axis, shifting the spectral peak to higher frequencies.
+
+```{code-cell} ipython3
+v_grid = np.linspace(0.2, 1.2, 6)
+c = 0.6
+freq_fine = np.linspace(1e-4, 0.5, 2000)
+ω_fine = 2 * np.pi * freq_fine
+V_acc = np.array([[1.0, 0.0], [0.0, 0.0]])
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+for v in v_grid:
+    A = samuelson_transition(c, v)
+    eig = np.linalg.eigvals(A)
+    F = spectral_density_var1(A, V_acc, ω_fine)
+    f11 = np.real(F[:, 0, 0])
+    f11_norm = f11 / np.trapz(f11, freq_fine)
+
+    # plot eigenvalues
+    axes[0].scatter(eig.real, eig.imag, s=40, label=f'$v={v:.1f}$')
+
+    # plot spectrum
+    axes[1].plot(freq_fine, f11_norm, lw=1.5, label=f'$v={v:.1f}$')
+
+# unit circle
+θ_circle = np.linspace(0, 2*np.pi, 100)
+axes[0].plot(np.cos(θ_circle), np.sin(θ_circle), 'k--', lw=0.8)
+axes[0].set_xlabel('real part')
+axes[0].set_ylabel('imaginary part')
+axes[0].set_aspect('equal')
+axes[0].legend(frameon=False, fontsize=8)
+
+axes[1].set_xlabel(r'frequency $\omega/2\pi$')
+axes[1].set_ylabel('normalized spectrum')
+axes[1].set_xlim([0, 0.5])
+axes[1].legend(frameon=False, fontsize=8)
+
+plt.tight_layout()
+plt.show()
+```
+
+Larger $v$ pushes the eigenvalues further off the real axis, shifting the spectral peak to higher frequencies.
+
+When $v$ is large enough that eigenvalues leave the unit circle, the system becomes explosive.
+
+## Spectral peaks are not just eigenvalues
+
+With shocks, the deterministic question ("does the system oscillate?") becomes: at which cycle lengths does the variance of $y_t$ concentrate?
+
+In this lecture, a "cycle" means a local peak in a univariate spectrum $f_{ii}(\omega)$.
+
+Chow's point in {cite}`Chow1968` is that eigenvalues help interpret spectra, but they do not determine peaks by themselves.
+
+Two extra ingredients matter:
+
+- how shocks load on the eigenmodes (the covariance matrix $V$),
+- how the variable of interest mixes those modes.
+
+The next simulations isolate these effects.
+
+### Complex roots: a peak and an oscillating autocorrelation
+
+Take a stable “rotation–contraction” matrix
+
+```{math}
+:label: chow_rot
+
+A = r
+\begin{bmatrix}
+\cos \theta & -\sin \theta \\
+\sin \theta & \cos \theta
+\end{bmatrix},
+\qquad 0 < r < 1,
+```
+
+whose eigenvalues are $r e^{\pm i\theta}$.
+
+When $r$ is close to 1, the spectrum shows a pronounced peak near $\omega \approx \theta$.
+
+```{code-cell} ipython3
+def rotation_contraction(r, θ):
+    c, s = np.cos(θ), np.sin(θ)
+    return r * np.array([[c, -s], [s, c]])
+
+θ = np.pi / 3
+r_values = (0.95, 0.4)
+ω_grid = np.linspace(1e-3, np.pi - 1e-3, 800)
+V = np.eye(2)
+
+acfs = []
+spectra = []
+for r in r_values:
+    A = rotation_contraction(r, θ)
+
+    y = simulate_var1(A, V, T=5000, burn=500, seed=1234)
+    acfs.append(sample_autocorrelation(y[:, 0], 40))
+
+    F = spectral_density_var1(A, V, ω_grid)
+    spectra.append(np.real(F[:, 0, 0]))
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+for r, acf in zip(r_values, acfs):
+    axes[0].plot(range(len(acf)), acf, lw=1.8, label=fr"$r={r}$")
+axes[0].axhline(0.0, lw=0.8)
+axes[0].set_xlabel("lag")
+axes[0].set_ylabel("autocorrelation")
+axes[0].legend(frameon=False)
+
+for r, f11 in zip(r_values, spectra):
+    axes[1].plot(ω_grid / np.pi, f11, lw=1.8, label=fr"$r={r}$")
+axes[1].axvline(θ / np.pi, ls="--", lw=1.0, label=r"$\theta/\pi$")
+axes[1].set_xlabel(r"frequency $\omega/\pi$")
+axes[1].set_ylabel(r"$f_{11}(\omega)$")
+axes[1].legend(frameon=False)
+
+plt.tight_layout()
+plt.show()
+```
+
+When $r$ is close to 1, the autocorrelation oscillates slowly and the spectrum has a sharp peak near $\theta$.
+
+When $r$ is smaller, oscillations die out quickly and the spectrum is flatter.
+
+### How shock structure shapes the spectrum
+
+Even with the same transition matrix, different shock covariance structures produce different spectral shapes.
+
+Here we fix $r = 0.9$ and vary the correlation between the two shocks.
+
+```{code-cell} ipython3
+r_fixed = 0.9
+A_fixed = rotation_contraction(r_fixed, θ)
+corr_values = [-0.9, 0.0, 0.9]
+
+fig, ax = plt.subplots(figsize=(9, 4))
+for corr in corr_values:
+    V_corr = np.array([[1.0, corr], [corr, 1.0]])
+    F = spectral_density_var1(A_fixed, V_corr, ω_grid)
+    f11 = np.real(F[:, 0, 0])
+    f11_norm = f11 / np.trapz(f11, ω_grid / np.pi)
+    ax.plot(ω_grid / np.pi, f11_norm, lw=1.8, label=fr'$\rho = {corr}$')
+
+ax.axvline(θ / np.pi, ls='--', lw=1.0, color='gray')
+ax.set_xlabel(r'frequency $\omega/\pi$')
+ax.set_ylabel('normalized spectrum')
+ax.legend(frameon=False)
+plt.show()
+```
+
+The peak location is unchanged, but the peak height depends on the shock correlation.
+
+This illustrates that eigenvalues alone do not determine the full spectral shape.
+
+### Complex roots: an oscillatory mode can be hidden
+
+Complex roots are not sufficient for a visible peak in the spectrum of every observed series.
+
+Even if the state vector contains an oscillatory mode, a variable can be dominated by a non-oscillatory component.
+
+The next example combines a rotation–contraction block with a very persistent real root, and then looks at a mixture that is dominated by the persistent component.
+
+```{code-cell} ipython3
+A_osc = rotation_contraction(0.95, θ)
+A = np.block([
+    [A_osc, np.zeros((2, 1))],
+    [np.zeros((1, 2)), np.array([[0.99]])]
+])
+
+# shocks hit the persistent component much more strongly
+V = np.diag([1.0, 1.0, 50.0])
+
+ω_grid_big = np.linspace(1e-3, np.pi - 1e-3, 1200)
+F = spectral_density_var1(A, V, ω_grid_big)
+
+x_grid = ω_grid_big / np.pi
+f_y1 = np.real(F[:, 0, 0])
+
+b = np.array([0.05, 0.0, 1.0])
+f_mix = spectrum_of_linear_combination(F, b)
+
+f_y1_norm = f_y1 / np.trapz(f_y1, x_grid)
+f_mix_norm = f_mix / np.trapz(f_mix, x_grid)
+
+fig, ax = plt.subplots(figsize=(9, 4))
+ax.plot(x_grid, f_y1_norm, lw=1.8, label=r"$y_1$")
+ax.plot(x_grid, f_mix_norm, lw=1.8, label=r"$x = 0.05\,y_1 + y_3$")
+ax.set_xlabel(r"frequency $\omega/\pi$")
+ax.set_ylabel("normalized spectrum")
+ax.legend(frameon=False)
+plt.show()
+```
+
+Here the oscillatory mode is still present (the $y_1$ spectrum peaks away from zero), but the mixture $x$ is dominated by the near-unit root and hence by very low frequencies.
+
+### Real roots: a peak from mixing shocks
+
+Chow also constructs examples where all roots are real and positive yet a linear combination displays a local spectral peak.
+
+The mechanism is that cross-correlation in shocks can generate cyclical-looking behavior.
+
+Here is a close analog of Chow’s two-root illustration.
+
+```{code-cell} ipython3
+A = np.diag([0.1, 0.9])
+V = np.array([[1.0, 0.8], [0.8, 1.0]])
+b = np.array([1.0, -0.01])
+
+F = spectral_density_var1(A, V, ω_grid)
+f_x = spectrum_of_linear_combination(F, b)
+imax = np.argmax(f_x)
+ω_star = ω_grid[imax]
+period_star = 2 * np.pi / ω_star
+
+fig, ax = plt.subplots(figsize=(9, 4))
+ax.plot(ω_grid / np.pi, f_x)
+ax.scatter([ω_star / np.pi], [f_x[imax]], zorder=3)
+ax.set_xlabel(r"frequency $\omega/\pi$")
+ax.set_ylabel(r"$f_x(\omega)$")
+plt.show()
+print(f"peak period ≈ {period_star:.1f}")
+```
+
+The lesson is the same as Chow’s: in multivariate stochastic systems, “cycle-like” spectra are shaped not only by eigenvalues, but also by how shocks enter ($V$) and how variables combine (the analogue of Chow’s eigenvector matrix).
+
+## A calibrated model in the frequency domain
+
+Chow and Levitan {cite}`ChowLevitan1969` use the frequency-domain objects from {cite}`Chow1968` to study a calibrated annual macroeconometric model.
+
+They work with five annual aggregates
+
+- $y_1 = C$ (consumption),
+- $y_2 = I_1$ (equipment plus inventories),
+- $y_3 = I_2$ (construction),
+- $y_4 = R_a$ (long rate),
+- $y_5 = Y_1 = C + I_1 + I_2$ (private-domestic gnp),
+
+and add $y_6 = y_{1,t-1}$ to rewrite the original system in first-order form.
+
+Throughout this section, frequency is measured in cycles per year, $f = \omega/2\pi \in [0, 1/2]$.
+
+Following the paper, we normalize each spectrum to have area 1 over $[0, 1/2]$ so plots compare shape rather than scale.
+
+Our goal is to reconstruct the transition matrix $A$ and then compute and interpret the model-implied spectra, gains/coherences, and phase differences.
+
+### The cycle subsystem
+
+The paper starts from a reduced form with exogenous inputs,
+
+```{math}
+:label: chow_reduced_full
+
+y_t = A y_{t-1} + C x_t + u_t.
+```
+
+To study cycles, they remove the deterministic component attributable to $x_t$ and focus on the zero-mean subsystem
+
+```{math}
+:label: chow_cycle_system
+
+y_t = A y_{t-1} + u_t.
+```
+
+For second moments, the only additional ingredient is the covariance matrix $V = \mathbb E[u_t u_t^\top]$.
+
+Chow and Levitan compute it from structural parameters via
+
+```{math}
+:label: chow_v_from_structural
+
+V = M^{-1} \Sigma (M^{-1})^\top
+```
+
+where $\Sigma$ is the covariance of structural residuals and $M$ is the matrix of contemporaneous structural coefficients.
+
+Here we take $A$ and $V$ as given and ask what they imply for spectra and cross-spectra.
+
+### Reported shock covariance
+
+Chow and Levitan report the $6 \times 6$ reduced-form shock covariance matrix $V$ (scaled by $10^{-7}$):
+
+```{math}
+:label: chow_V_matrix
+
+V = \begin{bmatrix}
+8.250 & 7.290 & 2.137 & 2.277 & 17.68 & 0 \\
+7.290 & 7.135 & 1.992 & 2.165 & 16.42 & 0 \\
+2.137 & 1.992 & 0.618 & 0.451 & 4.746 & 0 \\
+2.277 & 2.165 & 0.451 & 1.511 & 4.895 & 0 \\
+17.68 & 16.42 & 4.746 & 4.895 & 38.84 & 0 \\
+0 & 0 & 0 & 0 & 0 & 0
+\end{bmatrix}.
+```
+
+The sixth row and column are zeros because $y_6$ is an identity (lagged $y_1$).
+
+### Reported eigenvalues
+
+The transition matrix $A$ has six characteristic roots:
+
+```{math}
+:label: chow_eigenvalues
+
+\begin{aligned}
+\lambda_1 &= 0.9999725, \quad \lambda_2 = 0.9999064, \quad \lambda_3 = 0.4838, \\
+\lambda_4 &= 0.0761 + 0.1125i, \quad \lambda_5 = 0.0761 - 0.1125i, \quad \lambda_6 = -0.00004142.
+\end{aligned}
+```
+
+Two roots are near unity because two structural equations are in first differences.
+
+One root ($\lambda_6$) is theoretically zero because of the identity $y_5 = y_1 + y_2 + y_3$.
+
+The complex conjugate pair $\lambda_{4,5}$ has modulus $|\lambda_4| = \sqrt{0.0761^2 + 0.1125^2} \approx 0.136$.
+
+### Reported eigenvectors
+
+The right eigenvector matrix $B$ (columns are eigenvectors corresponding to $\lambda_1, \ldots, \lambda_6$):
+
+```{math}
+:label: chow_B_matrix
+
+B = \begin{bmatrix}
+-0.008 & 1.143 & 0.320 & 0.283+0.581i & 0.283-0.581i & 0.000 \\
+-0.000 & 0.013 & -0.586 & -2.151+0.742i & -2.151-0.742i & 2.241 \\
+-0.001 & 0.078 & 0.889 & -0.215+0.135i & -0.215-0.135i & 0.270 \\
+1.024 & 0.271 & 0.069 & -0.231+0.163i & -0.231-0.163i & 0.307 \\
+-0.009 & 1.235 & 0.623 & -2.082+1.468i & -2.082-1.468i & 2.766 \\
+-0.008 & 1.143 & 0.662 & 4.772+0.714i & 4.772-0.714i & -4.399
+\end{bmatrix}.
+```
+
+Together, $V$, $\{\lambda_i\}$, and $B$ are sufficient to compute all spectral and cross-spectral densities.
+
+### Reconstructing $A$ and computing $F(\omega)$
+
+The paper reports $(\lambda, B, V)$, which is enough to reconstruct
+$A = B \, \mathrm{diag}(\lambda_1,\dots,\lambda_6)\, B^{-1}$ and then compute the model-implied spectral objects.
+
+```{code-cell} ipython3
+λ = np.array([
+    0.9999725, 0.9999064, 0.4838,
+    0.0761 + 0.1125j, 0.0761 - 0.1125j, -0.00004142
+], dtype=complex)
+
+B = np.array([
+    [-0.008, 1.143, 0.320, 0.283+0.581j, 0.283-0.581j, 0.000],
+    [-0.000, 0.013, -0.586, -2.151+0.742j, -2.151-0.742j, 2.241],
+    [-0.001, 0.078, 0.889, -0.215+0.135j, -0.215-0.135j, 0.270],
+    [1.024, 0.271, 0.069, -0.231+0.163j, -0.231-0.163j, 0.307],
+    [-0.009, 1.235, 0.623, -2.082+1.468j, -2.082-1.468j, 2.766],
+    [-0.008, 1.143, 0.662, 4.772+0.714j, 4.772-0.714j, -4.399]
+], dtype=complex)
+
+V = np.array([
+    [8.250, 7.290, 2.137, 2.277, 17.68, 0],
+    [7.290, 7.135, 1.992, 2.165, 16.42, 0],
+    [2.137, 1.992, 0.618, 0.451, 4.746, 0],
+    [2.277, 2.165, 0.451, 1.511, 4.895, 0],
+    [17.68, 16.42, 4.746, 4.895, 38.84, 0],
+    [0, 0, 0, 0, 0, 0]
+]) * 1e-7
+
+D_λ = np.diag(λ)
+A_chow = B @ D_λ @ np.linalg.inv(B)
+A_chow = np.real(A_chow)  # drop tiny imaginary parts from reported rounding
+print("eigenvalues of reconstructed A:")
+print(np.linalg.eigvals(A_chow).round(6))
+```
+
+### Canonical coordinates
+
+Chow's canonical transformation uses $z_t = B^{-1} y_t$, giving dynamics $z_t = D_\lambda z_{t-1} + e_t$.
+
+An algebraic detail: the closed form for $F(\omega)$ uses $A^\top$ (real transpose) rather than a conjugate transpose.
+
+Accordingly, the canonical shock covariance is
+
+```{math}
+W = B^{-1} V (B^{-1})^\top.
+```
+
+```{code-cell} ipython3
+B_inv = np.linalg.inv(B)
+W = B_inv @ V @ B_inv.T
+print("diagonal of W:")
+print(np.diag(W).round(10))
+```
+
+### Spectral density via eigendecomposition
+
+Chow's closed-form formula for the spectral density matrix is
+
+```{math}
+:label: chow_spectral_eigen
+
+F(\omega)
+= B \left[ \frac{w_{ij}}{(1 - \lambda_i e^{-i\omega})(1 - \lambda_j e^{i\omega})} \right] B^\top,
+```
+
+where $w_{ij}$ are elements of the canonical shock covariance $W$.
+
+```{code-cell} ipython3
+def spectral_density_chow(λ, B, W, ω_grid):
+    """Spectral density via Chow's eigendecomposition formula."""
+    p = len(λ)
+    F = np.zeros((len(ω_grid), p, p), dtype=complex)
+    for k, ω in enumerate(ω_grid):
+        F_star = np.zeros((p, p), dtype=complex)
+        for i in range(p):
+            for j in range(p):
+                denom = (1 - λ[i] * np.exp(-1j * ω)) * (1 - λ[j] * np.exp(1j * ω))
+                F_star[i, j] = W[i, j] / denom
+        F[k] = B @ F_star @ B.T
+    return F / (2 * np.pi)
+
+freq = np.linspace(1e-4, 0.5, 5000)     # cycles/year in [0, 1/2]
+ω_grid = 2 * np.pi * freq               # radians in [0, π]
+F_chow = spectral_density_chow(λ, B, W, ω_grid)
+```
+
+### Where is variance concentrated?
+
+Normalizing each spectrum to have unit area over $[0, 1/2]$ lets us compare shapes rather than scales.
+
+```{code-cell} ipython3
+variable_names = ['$C$', '$I_1$', '$I_2$', '$R_a$', '$Y_1$']
+freq_ticks = [1/18, 1/9, 1/6, 1/4, 1/3, 1/2]
+freq_labels = [r'$\frac{1}{18}$', r'$\frac{1}{9}$', r'$\frac{1}{6}$',
+               r'$\frac{1}{4}$', r'$\frac{1}{3}$', r'$\frac{1}{2}$']
+
+def paper_frequency_axis(ax):
+    ax.set_xlim([0.0, 0.5])
+    ax.set_xticks(freq_ticks)
+    ax.set_xticklabels(freq_labels)
+    ax.set_xlabel(r'frequency $\omega/2\pi$')
+
+# Normalized spectra (areas set to 1)
+S = np.real(np.diagonal(F_chow, axis1=1, axis2=2))[:, :5]  # y1..y5
+areas = np.trapz(S, freq, axis=0)
+S_norm = S / areas
+mask = freq >= 0.0
+
+fig, axes = plt.subplots(1, 2, figsize=(10, 6))
+
+# Figure I.1: consumption (log scale)
+axes[0].plot(freq[mask], S_norm[mask, 0], lw=1.8)
+axes[0].set_yscale('log')
+paper_frequency_axis(axes[0])
+axes[0].set_ylabel(r'normalized $f_{11}(\omega)$')
+
+# Figure I.2: equipment + inventories (log scale)
+axes[1].plot(freq[mask], S_norm[mask, 1], lw=1.8)
+axes[1].set_yscale('log')
+paper_frequency_axis(axes[1])
+axes[1].set_ylabel(r'normalized $f_{22}(\omega)$')
+
+plt.tight_layout()
+plt.show()
+
+i_peak = np.argmax(S_norm[mask, 1])
+f_peak = freq[mask][i_peak]
+print(f"Peak within [1/18, 1/2]: frequency ≈ {f_peak:.3f} cycles/year, period ≈ {1/f_peak:.2f} years.")
+```
+
+Both spectra are dominated by very low frequencies, reflecting the near-unit eigenvalues.
+
+This is the "typical spectral shape" of macroeconomic time series.
+
+(These patterns match Figures I.1–I.2 of {cite}`ChowLevitan1969`.)
+
+### How variables move together across frequencies
+
+Beyond univariate spectra, we can ask how pairs of variables covary at each frequency.
+
+The **cross-spectrum** $f_{ij}(\omega) = c_{ij}(\omega) - i \cdot q_{ij}(\omega)$ decomposes into the cospectrum $c_{ij}$ and the quadrature spectrum $q_{ij}$.
+
+The **cross-amplitude** is $g_{ij}(\omega) = |f_{ij}(\omega)| = \sqrt{c_{ij}^2 + q_{ij}^2}$.
+
+The **squared coherence** measures linear association at frequency $\omega$:
+
+```{math}
+:label: chow_coherence
+
+R^2_{ij}(\omega) = \frac{|f_{ij}(\omega)|^2}{f_{ii}(\omega) f_{jj}(\omega)} \in [0, 1].
+```
+
+The **gain** is the frequency-response coefficient when regressing $y_i$ on $y_j$:
+
+```{math}
+:label: chow_gain
+
+G_{ij}(\omega) = \frac{|f_{ij}(\omega)|}{f_{jj}(\omega)}.
+```
+
+The **phase** captures lead-lag relationships (in radians):
+
+```{math}
+:label: chow_phase
+
+\Delta_{ij}(\omega) = \tan^{-1}\left( \frac{q_{ij}(\omega)}{c_{ij}(\omega)} \right).
+```
+
+```{code-cell} ipython3
+def cross_spectral_measures(F, i, j):
+    """Compute coherence, gain (y_i on y_j), and phase between variables i and j."""
+    f_ij = F[:, i, j]
+    f_ii, f_jj = np.real(F[:, i, i]), np.real(F[:, j, j])
+    g_ij = np.abs(f_ij)
+    coherence = (g_ij**2) / (f_ii * f_jj)
+    gain = g_ij / f_jj
+    phase = np.arctan2(-np.imag(f_ij), np.real(f_ij))
+    return coherence, gain, phase
+```
+
+We now plot gain and coherence as in Figures II.1-II.3 of {cite}`ChowLevitan1969`.
+
+```{code-cell} ipython3
+gnp_idx = 4
+
+fig, axes = plt.subplots(1, 3, figsize=(14, 6))
+
+for idx, var_idx in enumerate([0, 1, 2]):
+    coherence, gain, phase = cross_spectral_measures(F_chow, var_idx, gnp_idx)
+    ax = axes[idx]
+
+    ax.plot(freq[mask], coherence[mask],
+            lw=1.8, label=rf'$R^2_{{{var_idx+1}5}}(\omega)$')
+    ax.plot(freq[mask], gain[mask],
+            lw=1.8, label=rf'$G_{{{var_idx+1}5}}(\omega)$')
+
+    paper_frequency_axis(ax)
+    ax.set_ylim([0, 1.0])
+    ax.set_ylabel('gain, coherence')
+    ax.legend(frameon=False, loc='best')
+
+plt.tight_layout()
+plt.show()
+```
+
+Coherence is high at low frequencies for all three components, meaning long-run movements track output closely.
+
+Gains differ: consumption smooths (gain below 1), while investment responds more strongly at higher frequencies.
+
+(These patterns match Figures II.1-II.3 of {cite}`ChowLevitan1969`.)
+
+### Lead-lag relationships
+
+The phase tells us which variable leads at each frequency.
+
+Positive phase means output leads the component; negative phase means the component leads output.
+
+```{code-cell} ipython3
+fig, ax = plt.subplots(figsize=(8, 6))
+
+labels = [r'$\psi_{15}(\omega)/2\pi$', r'$\psi_{25}(\omega)/2\pi$',
+          r'$\psi_{35}(\omega)/2\pi$', r'$\psi_{45}(\omega)/2\pi$']
+
+for var_idx in range(4):
+    coherence, gain, phase = cross_spectral_measures(F_chow, var_idx, gnp_idx)
+    phase_cycles = phase / (2 * np.pi)
+    ax.plot(freq[mask], phase_cycles[mask], lw=1.8, label=labels[var_idx])
+
+ax.axhline(0, lw=0.8)
+paper_frequency_axis(ax)
+ax.set_ylabel('phase difference in cycles')
+ax.set_ylim([-0.25, 0.25])
+ax.set_yticks([-0.25, -0.20, -0.15, -0.10, -0.05, 0, 0.05, 0.10, 0.15, 0.20, 0.25])
+ax.legend(frameon=False, fontsize=9)
+plt.tight_layout()
+plt.show()
+```
+
+At business-cycle frequencies, consumption tends to lag output while equipment and inventories tend to lead.
+
+The interest rate is roughly coincident.
+
+(This matches Figure III of {cite}`ChowLevitan1969`.)
+
+### Building blocks of spectral shape
+
+Each eigenvalue contributes a characteristic spectral shape through the **scalar kernel**
+
+```{math}
+:label: chow_scalar_kernel
+
+g_i(\omega) = \frac{1 - |\lambda_i|^2}{|1 - \lambda_i e^{-i\omega}|^2} = \frac{1 - |\lambda_i|^2}{1 + |\lambda_i|^2 - 2 \text{Re}(\lambda_i) \cos\omega + 2 \text{Im}(\lambda_i) \sin\omega}.
+```
+
+For real $\lambda_i$, this simplifies to
+
+```{math}
+g_i(\omega) = \frac{1 - \lambda_i^2}{1 + \lambda_i^2 - 2\lambda_i \cos\omega}.
+```
+
+Each observable spectral density is a linear combination of these kernels (plus cross-terms).
+
+```{code-cell} ipython3
+def scalar_kernel(λ_i, ω_grid):
+    """Chow's scalar spectral kernel g_i(ω)."""
+    λ_i = complex(λ_i)
+    mod_sq = np.abs(λ_i)**2
+    return np.array([(1 - mod_sq) / np.abs(1 - λ_i * np.exp(-1j * ω))**2 for ω in ω_grid])
+
+fig, ax = plt.subplots(figsize=(10, 5))
+for i, λ_i in enumerate(λ[:4]):
+    if np.abs(λ_i) > 0.01:
+        g_i = scalar_kernel(λ_i, ω_grid)
+        label = f'$\\lambda_{i+1}$ = {λ_i:.4f}' if np.isreal(λ_i) else f'$\\lambda_{i+1}$ = {λ_i:.3f}'
+        ax.semilogy(freq, g_i, label=label, lw=1.5)
+ax.set_xlabel(r'frequency $\omega/2\pi$')
+ax.set_ylabel('$g_i(\\omega)$')
+ax.set_xlim([1/18, 0.5])
+ax.set_xticks(freq_ticks)
+ax.set_xticklabels(freq_labels)
+ax.legend(frameon=False)
+plt.show()
+```
+
+Near-unit eigenvalues produce kernels sharply peaked at low frequencies.
+
+Smaller eigenvalues produce flatter kernels.
+
+The complex pair ($\lambda_{4,5}$) has such small modulus that its kernel is nearly flat.
+
+### Why the spectra look the way they do
+
+The two near-unit eigenvalues generate strong low-frequency power.
+
+The moderate eigenvalue ($\lambda_3 \approx 0.48$) contributes a flatter component.
+
+The complex pair has small modulus ($|\lambda_{4,5}| \approx 0.136$), so it cannot generate a pronounced interior peak.
+
+The near-zero eigenvalue reflects the accounting identity $Y_1 = C + I_1 + I_2$.
+
+This illustrates Chow's message: eigenvalues guide intuition, but observed spectra also depend on how shocks excite the modes and how observables combine them.
+
+### Summary
+
+The calibrated model reveals three patterns: (1) most variance sits at very low frequencies due to near-unit eigenvalues; (2) consumption smooths while investment amplifies high-frequency movements; (3) consumption lags output at business-cycle frequencies while investment leads.
+
+## Wrap-up
+
+Chow {cite}`Chow1968` emphasizes two complementary diagnostics for linear macro models: how eigenvalues shape deterministic propagation, and how spectra summarize stochastic dynamics.
+
+Chow and Levitan {cite}`ChowLevitan1969` then show what these objects look like in a calibrated system: strong low-frequency power, frequency-dependent gains/coherences, and lead–lag relations that vary with the cycle length.
+
+To connect this to data, pair the model-implied objects here with the advanced lecture [Estimation of Spectra](https://python-advanced.quantecon.org/estspec.html#).
+
+## A structural view of acceleration
+
+Chow {cite}`Chow1968` provides a structural interpretation of how acceleration enters the model.
+
+The starting point is a stock-adjustment demand for capital:
+
+```{math}
+:label: chow_stock_adj_struct
+
+s_{it} = a_i Y_t + b_i s_{i,t-1}
+```
+
+where $s_{it}$ is the desired stock of capital type $i$, $Y_t$ is aggregate output, and $(a_i, b_i)$ are parameters.
+
+Net investment is the stock change:
+
+```{math}
+:label: chow_net_inv
+
+y^n_{it} = \Delta s_{it} = a_i \Delta Y_t + b_i y^n_{i,t-1}.
+```
+
+For gross investment with depreciation rate $\delta_i$:
+
+```{math}
+:label: chow_gross_inv
+
+y_{it} = a_i [Y_t - (1-\delta_i) Y_{t-1}] + b_i y_{i,t-1}.
+```
+
+The parameters $(a_i, b_i, \delta_i)$ are the key "acceleration equation" parameters.
+
+The term $a_i \Delta Y_t$ is the acceleration effect: investment responds to *changes* in output, not just levels.
+
+This creates negative coefficients on lagged output levels, which in turn makes complex roots (and hence oscillatory components) possible in the characteristic equation.
+
+## Exercises
+
+```{exercise}
+:label: chow_cycles_ex1
+
+In the rotation-contraction example, fix $\theta$ and vary $r$ in a grid between $0.2$ and $0.99$.
+
+1. For each $r$, compute the frequency $\omega^*(r)$ that maximizes $f_{11}(\omega)$.
+2. Plot $\omega^*(r)$ and the implied peak period $2\pi/\omega^*(r)$ as functions of $r$.
+
+How does the peak location behave as $r \uparrow 1$?
+```
+
+```{solution-start} chow_cycles_ex1
+:class: dropdown
+```
+
+```{code-cell} ipython3
+r_grid = np.linspace(0.2, 0.99, 50)
+θ = np.pi / 3
+ω_grid_ex = np.linspace(1e-3, np.pi - 1e-3, 1000)
+V_ex = np.eye(2)
+
+ω_star = np.zeros(len(r_grid))
+period_star = np.zeros(len(r_grid))
+for idx, r in enumerate(r_grid):
+    A_ex = rotation_contraction(r, θ)
+    F_ex = spectral_density_var1(A_ex, V_ex, ω_grid_ex)
+    f11 = np.real(F_ex[:, 0, 0])
+    i_max = np.argmax(f11)
+    ω_star[idx] = ω_grid_ex[i_max]
+    period_star[idx] = 2 * np.pi / ω_star[idx]
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+axes[0].plot(r_grid, ω_star / np.pi, lw=1.8)
+axes[0].axhline(θ / np.pi, ls='--', lw=1.0, label=r'$\theta/\pi$')
+axes[0].set_xlabel('$r$')
+axes[0].set_ylabel(r'$\omega^*/\pi$')
+axes[0].legend(frameon=False)
+
+axes[1].plot(r_grid, period_star, lw=1.8)
+axes[1].axhline(2 * np.pi / θ, ls='--', lw=1.0, label=r'$2\pi/\theta$')
+axes[1].set_xlabel('$r$')
+axes[1].set_ylabel('peak period')
+axes[1].legend(frameon=False)
+plt.tight_layout()
+plt.show()
+```
+
+As $r \uparrow 1$, the peak frequency converges to $\theta$ (the argument of the complex eigenvalue).
+
+This confirms Chow's insight: when the modulus is close to 1, the spectral peak aligns with the eigenvalue frequency.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: chow_cycles_ex2
+
+In the "real roots but a peak" example, hold $A$ fixed and vary the shock correlation (the off-diagonal entry of $V$) between $0$ and $0.99$.
+
+When does the interior-frequency peak appear, and how does its location change?
+```
+
+```{solution-start} chow_cycles_ex2
+:class: dropdown
+```
+
+```{code-cell} ipython3
+A_ex2 = np.diag([0.1, 0.9])
+b_ex2 = np.array([1.0, -0.01])
+corr_grid = np.linspace(0, 0.99, 50)
+peak_periods = []
+for corr in corr_grid:
+    V_ex2 = np.array([[1.0, corr], [corr, 1.0]])
+    F_ex2 = spectral_density_var1(A_ex2, V_ex2, ω_grid_ex)
+    f_x = spectrum_of_linear_combination(F_ex2, b_ex2)
+    i_max = np.argmax(f_x)
+    if 5 < i_max < len(ω_grid_ex) - 5:
+        peak_periods.append(2 * np.pi / ω_grid_ex[i_max])
+    else:
+        peak_periods.append(np.nan)
+
+fig, ax = plt.subplots(figsize=(8, 4))
+ax.plot(corr_grid, peak_periods, marker='o', lw=1.8, markersize=4)
+ax.set_xlabel('shock correlation')
+ax.set_ylabel('peak period')
+plt.show()
+
+threshold_idx = np.where(~np.isnan(peak_periods))[0]
+if len(threshold_idx) > 0:
+    print(f"interior peak appears when correlation ≥ {corr_grid[threshold_idx[0]]:.2f}")
+```
+
+The interior peak appears only when the shock correlation exceeds a threshold.
+
+This illustrates Chow's point that spectral peaks depend on the full system structure, not just eigenvalues.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: chow_cycles_ex3
+
+Using the calibrated Chow-Levitan (1969) parameters, compute the autocovariance matrices $\Gamma_0, \Gamma_1, \ldots, \Gamma_{10}$ using:
+
+1. The recursion $\Gamma_k = A \Gamma_{k-1}$ with $\Gamma_0$ from the Lyapunov equation.
+2. Chow's eigendecomposition formula $\Gamma_k = B D_\lambda^k \Gamma_0^* B^\top$ where $\Gamma_0^*$ is the canonical covariance.
+
+Verify that both methods give the same result.
+```
+
+```{solution-start} chow_cycles_ex3
+:class: dropdown
+```
+
+```{code-cell} ipython3
+from scipy.linalg import solve_discrete_lyapunov
+
+Γ_0_lyap = solve_discrete_lyapunov(A_chow, V)
+Γ_recursion = [Γ_0_lyap]
+for k in range(1, 11):
+    Γ_recursion.append(A_chow @ Γ_recursion[-1])
+
+p = len(λ)
+Γ_0_star = np.zeros((p, p), dtype=complex)
+for i in range(p):
+    for j in range(p):
+        Γ_0_star[i, j] = W[i, j] / (1 - λ[i] * λ[j])
+
+Γ_eigen = []
+for k in range(11):
+    D_k = np.diag(λ**k)
+    Γ_eigen.append(np.real(B @ D_k @ Γ_0_star @ B.T))
+
+print("Comparison of Γ_5 (first 3x3 block):")
+print("\nRecursion method:")
+print(np.real(Γ_recursion[5][:3, :3]).round(10))
+print("\nEigendecomposition method:")
+print(Γ_eigen[5][:3, :3].round(10))
+print("\nMax absolute difference:", np.max(np.abs(np.real(Γ_recursion[5]) - Γ_eigen[5])))
+```
+
+Both methods produce essentially identical results, up to numerical precision.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: chow_cycles_ex4
+
+Modify the Chow-Levitan model by changing $\lambda_3$ from $0.4838$ to $0.95$.
+
+1. Recompute the spectral densities.
+2. How does this change affect the spectral shape for each variable?
+3. What economic interpretation might correspond to this parameter change?
+```
+
+```{solution-start} chow_cycles_ex4
+:class: dropdown
+```
+
+```{code-cell} ipython3
+λ_modified = λ.copy()
+λ_modified[2] = 0.95
+F_mod = spectral_density_chow(λ_modified, B, W, ω_grid)
+
+fig, axes = plt.subplots(2, 3, figsize=(14, 8))
+axes = axes.flatten()
+var_labels = ["consumption", "equipment + inventories", "construction", "long rate", "output"]
+for i in range(5):
+    f_orig = np.real(F_chow[:, i, i])
+    f_mod = np.real(F_mod[:, i, i])
+    f_orig_norm = f_orig / np.trapz(f_orig, freq)
+    f_mod_norm = f_mod / np.trapz(f_mod, freq)
+    axes[i].semilogy(freq, f_orig_norm, lw=1.5, label=r"original ($\lambda_3=0.48$)")
+    axes[i].semilogy(freq, f_mod_norm, lw=1.5, ls="--", label=r"modified ($\lambda_3=0.95$)")
+    paper_frequency_axis(axes[i])
+    axes[i].set_ylabel(rf"normalized $f_{{{i+1}{i+1}}}(\omega)$")
+    axes[i].text(0.03, 0.08, var_labels[i], transform=axes[i].transAxes)
+    axes[i].legend(frameon=False, fontsize=8)
+axes[5].axis('off')
+plt.tight_layout()
+plt.show()
+```
+
+Increasing $\lambda_3$ from 0.48 to 0.95 adds more persistence to the system.
+
+The spectral densities show increased power at low frequencies.
+
+Economically, this could correspond to stronger persistence in the propagation of shocks—perhaps due to slower adjustment speeds in investment or consumption behavior.
+
+```{solution-end}
+```

From 24063c178c9ae59868f7dc1c49eee27e755f10c9 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Sat, 7 Feb 2026 13:06:22 +1100
Subject: [PATCH 02/37] updates

---
 lectures/chow_business_cycles.md | 907 ++++++++++++++++++++++---------
 1 file changed, 646 insertions(+), 261 deletions(-)

diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index 9e99b0478..393119345 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -4,7 +4,7 @@ jupytext:
     extension: .md
     format_name: myst
     format_version: 0.13
-    jupytext_version: 1.17.2
+    jupytext_version: 1.17.1
 kernelspec:
   display_name: Python 3 (ipykernel)
   language: python
@@ -31,15 +31,19 @@ kernelspec:
 
 This lecture studies two classic papers by Gregory Chow on business cycles in linear dynamic models:
 
-- {cite}`Chow1968`: why acceleration-type investment behavior matters for oscillations, and how to read stochastic dynamics through autocovariances and spectral densities
-- {cite}`ChowLevitan1969`: how those tools look when applied to a calibrated macroeconometric model of the U.S. economy
+- {cite}`Chow1968`: empirical evidence for the acceleration principle, why acceleration enables oscillations, and when spectral peaks arise in stochastic systems
+- {cite}`ChowLevitan1969`: spectral analysis of a calibrated U.S. macroeconometric model, showing gains, coherences, and lead-lag patterns
 
-These papers sit right at the intersection of three themes in this lecture series:
+These papers connect ideas in the following lectures:
 
 - The multiplier–accelerator mechanism in {doc}`samuelson`
 - Linear stochastic difference equations and autocovariances in {doc}`linear_models`
 - Eigenmodes of multivariate dynamics in {doc}`var_dmd`
-- Fourier ideas in {doc}`eig_circulant` (and, for empirical estimation, the advanced lecture [Estimation of Spectra](https://python-advanced.quantecon.org/estspec.html#))
+- Fourier ideas in {doc}`eig_circulant` (and, for empirical estimation, the advanced lecture {doc}`advanced:estspec`)
+
+{cite:t}`Chow1968` builds on earlier empirical work testing the acceleration principle on U.S. investment data.
+
+We begin with that empirical foundation before developing the theoretical framework.
 
 We will keep coming back to three ideas:
 
@@ -47,8 +51,90 @@ We will keep coming back to three ideas:
 - In stochastic models, a "cycle" shows up as a local peak in a (univariate) spectral density.
 - Spectral peaks depend on eigenvalues, but also on how shocks enter (the covariance matrix $V$) and on how observables load on eigenmodes.
 
+In this lecture, we start with Chow's empirical evidence for the acceleration principle, then introduce the VAR(1) framework and spectral analysis tools. 
+
+Next, we show why acceleration creates complex roots that enable oscillations, and derive Chow's conditions for spectral peaks in the Hansen-Samuelson model.
+
+We then present Chow's striking counterexample: real roots *can* produce spectral peaks in general multivariate systems. 
+
+Finally, we apply these tools to the calibrated Chow-Levitan model to see what model-implied spectra look like in practice.
+
+Let's start with some standard imports
+
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+```
+
+(empirical_section)=
+## Empirical foundation for the acceleration principle
+
+{cite:t}`Chow1968` opens by reviewing empirical evidence for the acceleration principle from earlier macroeconometric work.
+
+Using annual observations for 1931--40 and 1948--63, Chow tested the acceleration equation on three investment categories:
+
+- new construction
+- gross private domestic investment in producers' durable equipment combined with change in business inventories
+- the last two variables separately
+
+In each case, when the regression included both $Y_t$ and $Y_{t-1}$ (where $Y$ is gross national product minus taxes net of transfers), the coefficient on $Y_{t-1}$ was of *opposite sign* and slightly smaller in absolute value than the coefficient on $Y_t$.
+
+Equivalently, when expressed in terms of $\Delta Y_t$ and $Y_{t-1}$, the coefficient on $Y_{t-1}$ was a small fraction of the coefficient on $\Delta Y_t$.
+
+### An example: Automobile demand
+
+Chow presents a clean illustration using data on net investment in automobiles from his earlier work on automobile demand.
+
+Using annual data for 1922--41 and 1948--57, he estimates by least squares:
+
+```{math}
+:label: chow_auto_eq5
+
+y_t^n = \underset{(0.0022)}{0.0155} Y_t \underset{(0.0020)}{- 0.0144} Y_{t-1} \underset{(0.0056)}{- 0.0239} p_t \underset{(0.0040)}{+ 0.0199} p_{t-1} + \underset{(0.101)}{0.351} y_{t-1}^n + \text{const.}
+```
+
+where:
+- $Y_t$ is real disposable personal income per capita
+- $p_t$ is a relative price index for automobiles
+- $y_t^n$ is per capita net investment in passenger automobiles
+- standard errors appear in parentheses
+
+The key observation: the coefficients on $Y_{t-1}$ and $p_{t-1}$ are *the negatives* of the coefficients on $Y_t$ and $p_t$.
+
+This pattern is exactly what the acceleration principle predicts.
+
+### From stock adjustment to acceleration
+
+The empirical support for acceleration should not be surprising once we accept a stock-adjustment demand equation for capital:
+
+```{math}
+:label: chow_stock_adj_emp
+
+s_{it} = a_i Y_t + b_i s_{i,t-1}
+```
+
+where $s_{it}$ is the stock of capital good $i$.
+
+The acceleration equation {eq}`chow_auto_eq5` is essentially the *first difference* of {eq}`chow_stock_adj_emp`.
+
+Net investment is the change in stock, $y_{it}^n = \Delta s_{it}$, and differencing {eq}`chow_stock_adj_emp` gives:
+
+```{math}
+:label: chow_acc_from_stock
+
+y_{it}^n = a_i \Delta Y_t + b_i y_{i,t-1}^n
+```
+
+The coefficients on $Y_t$ and $Y_{t-1}$ in the level form are $a_i$ and $-a_i(1-b_i)$ respectively. 
+
+They are opposite in sign and similar in magnitude when $b_i$ is not too far from unity.
+
+This connection between stock adjustment and acceleration is central to Chow's argument about why acceleration matters for business cycles.
+
 ## A linear system with shocks
 
+To study business cycles formally, we need a framework that combines the deterministic dynamics (captured by the transition matrix $A$) with random shocks.
+
 Both papers analyze (or reduce to) a first-order linear stochastic system
 
 ```{math}
@@ -63,7 +149,7 @@ y_t = A y_{t-1} + u_t,
 \mathbb E[u_t u_{t-k}^\top] = 0 \ (k \neq 0).
 ```
 
-When the eigenvalues of $A$ are strictly inside the unit circle, the process is (covariance) stationary and its autocovariances exist.
+When the eigenvalues of $A$ are strictly inside the unit circle, the process is covariance stationary and its autocovariances exist.
 
 In the notation of {doc}`linear_models`, this is the same stability condition that guarantees a unique solution to a discrete Lyapunov equation.
 
@@ -87,6 +173,108 @@ Standard calculations (also derived in {cite}`Chow1968`) give the recursion
 
 The second equation is the discrete Lyapunov equation for $\Gamma_0$.
 
+### Why stochastic dynamics matter
+
+{cite:t}`Chow1968` motivates the stochastic analysis with a quote from Ragnar Frisch:
+
+> The examples we have discussed ... show that when an [deterministic] economic system gives rise to oscillations, these will most frequently be damped. But in reality the cycles ... are generally not damped. How can the maintenance of the swings be explained? ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
+>
+> Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings—we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
+>
+> — Ragnar Frisch (1933)
+
+Chow's main insight is that oscillations in the deterministic system are *neither necessary nor sufficient* for producing "cycles" in the stochastic system.
+
+We have to bring the stochastic element into the picture.
+
+We will show that even when eigenvalues are real (no deterministic oscillations), the stochastic system can exhibit cyclical patterns in its autocovariances and spectral densities.
+
+### Autocovariances in terms of eigenvalues
+
+Let $\lambda_1, \ldots, \lambda_p$ be the (possibly complex) eigenvalues of $A$, assumed distinct, and let $B$ be the matrix whose columns are the corresponding right eigenvectors:
+
+```{math}
+:label: chow_eigen_decomp
+
+A B = B D_\lambda, \quad \text{or equivalently} \quad A = B D_\lambda B^{-1}
+```
+
+where $D_\lambda = \text{diag}(\lambda_1, \ldots, \lambda_p)$.
+
+Define canonical variables $z_t = B^{-1} y_t$.
+These satisfy the decoupled dynamics
+
+```{math}
+:label: chow_canonical_dynamics
+
+z_t = D_\lambda z_{t-1} + \varepsilon_t
+```
+
+where $\varepsilon_t = B^{-1} u_t$ has covariance matrix $W = B^{-1} V (B^{-1})^\top$.
+
+The autocovariance matrix of the canonical variables, denoted $\Gamma_k^*$, satisfies
+
+```{math}
+:label: chow_canonical_autocov
+
+\Gamma_k^* = D_\lambda^k \Gamma_0^*, \quad k = 1, 2, 3, \ldots
+```
+
+and
+
+```{math}
+:label: chow_gamma0_star
+
+\Gamma_0^* = \left( \frac{w_{ij}}{1 - \lambda_i \lambda_j} \right)
+```
+
+where $w_{ij}$ are elements of $W$.
+
+The autocovariance matrices of the original variables are then
+
+```{math}
+:label: chow_autocov_eigen
+
+\Gamma_k = B \Gamma_k^* B^\top = B D_\lambda^k \Gamma_0^* B^\top, \quad k = 0, 1, 2, \ldots
+```
+
+The scalar autocovariance $\gamma_{ij,k} = \mathbb{E}[y_{it} y_{j,t-k}]$ is a *linear combination* of powers of the eigenvalues:
+
+```{math}
+:label: chow_scalar_autocov
+
+\gamma_{ij,k} = \sum_m \sum_n b_{im} b_{jn} \gamma^*_{mn,0} \lambda_m^k = \sum_m d_{ij,m} \lambda_m^k
+```
+
+Compare this to the deterministic time path from initial condition $y_0$:
+
+```{math}
+:label: chow_det_path
+
+y_{it} = \sum_j b_{ij} z_{j0} \lambda_j^t
+```
+
+Both the autocovariance function {eq}`chow_scalar_autocov` and the deterministic path {eq}`chow_det_path` are linear combinations of $\lambda_m^k$ (or $\lambda_j^t$).
+
+This formal resemblance is important: the coefficients differ (depending on initial conditions vs. shock covariances), but the role of eigenvalues is analogous.
+
+### Complex roots and damped oscillations
+
+When eigenvalues come in complex conjugate pairs $\lambda = r e^{\pm i\theta}$ with $r < 1$, their contribution to the autocovariance function is a **damped cosine**:
+
+```{math}
+:label: chow_damped_cosine
+
+2 s r^k \cos(\theta k + \phi)
+```
+
+for appropriate amplitude $s$ and phase $\phi$ determined by the eigenvector loadings.
+
+In the deterministic model, such complex roots generate damped oscillatory time paths.
+In the stochastic model, they generate damped oscillatory autocovariance functions.
+
+It is in this sense that deterministic oscillations could be "maintained" in the stochastic model—but as we will see, the connection between eigenvalues and spectral peaks is more subtle than this suggests.
+
 ## From autocovariances to spectra
 
 Chow’s key step is to translate the autocovariance sequence $\{\Gamma_k\}$ into a frequency-domain object.
@@ -126,12 +314,9 @@ The advanced lecture {doc}`advanced:estspec` explains how to estimate $F(\omega)
 
 Here we focus on the model-implied spectrum.
 
-We will use the following imports and helper functions throughout the lecture.
+We will use the following helper functions throughout the lecture.
 
 ```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
-
 def spectral_density_var1(A, V, ω_grid):
     """Spectral density matrix for VAR(1): y_t = A y_{t-1} + u_t."""
     A, V = np.asarray(A), np.asarray(V)
@@ -149,7 +334,7 @@ def spectrum_of_linear_combination(F, b):
     return np.array([np.real((b.T @ F[k] @ b).item()) for k in range(F.shape[0])])
 
 def simulate_var1(A, V, T, burn=200, seed=1234):
-    """Simulate y_t = A y_{t-1} + u_t with u_t ~ N(0, V)."""
+    r"""Simulate y_t = A y_{t-1} + u_t with u_t \sim N(0, V)."""
     rng = np.random.default_rng(seed)
     A, V = np.asarray(A), np.asarray(V)
     n = A.shape[0]
@@ -172,47 +357,25 @@ def sample_autocorrelation(x, max_lag):
 
 ## Deterministic propagation and acceleration
 
-Chow {cite}`Chow1968` begins with a clean deterministic question:
-
-> If you build a macro model using only standard demand equations with simple distributed lags, can the system generate sustained oscillations without acceleration?
-
-He shows that, under natural sign restrictions, the answer is no.
+Now we have the tools and the motivation to analyze spectral peaks in linear stochastic systems.
 
-### A demand system without acceleration
+We first go back to the deterministic system to understand why acceleration matters for generating oscillations in the first place.
 
-Consider a system where each component $y_{it}$ responds to aggregate output $Y_t$ and its own lag:
+Before analyzing spectral peaks, we need to understand why acceleration matters for generating oscillations in the first place.
 
-```{math}
-:label: chow_simple_demand
+{cite:t}`Chow1968` asks a question in the deterministic setup: if we build a macro model using only standard demand equations with simple distributed lags, can the system generate sustained oscillations?
 
-y_{it} = a_i Y_t + b_i y_{i,t-1},
-\qquad
-Y_t = \sum_i y_{it},
-\qquad
-a_i > 0,\; b_i > 0.
-```
-
-Chow shows that the implied transition matrix has real characteristic roots, and that if $\sum_i a_i < 1$ these roots are also positive.
-
-In that case, solutions are linear combinations of decaying exponentials without persistent sign-switching components, so there are no “business-cycle-like” oscillations driven purely by internal propagation.
-
-### What acceleration changes
-
-For investment (and some durables), Chow argues that a more relevant starting point is a *stock adjustment* equation (demand for a stock), e.g.
-
-```{math}
-:label: chow_stock_adj
+He shows that, under natural sign restrictions, the answer is no.
 
-s_{it} = \alpha_i Y_t + \beta_i s_{i,t-1}.
-```
+As we saw in the {ref}`empirical foundation <empirical_section>`, stock-adjustment demand for durable goods leads to investment equations where the coefficient on $Y_{t-1}$ is negative, i.e., the **acceleration effect**.
 
-If flow investment is proportional to the change in the desired stock, differencing introduces terms in $\Delta Y_t$.
+This negative coefficient is what makes complex roots possible in the characteristic equation.
 
-That "acceleration" structure creates negative coefficients (in lagged levels), which makes complex roots possible.
+Without it, Chow proves that demand systems with only positive coefficients have real positive roots, and hence no oscillatory dynamics.
 
-This connects directly to {doc}`samuelson`, where acceleration is the key ingredient that can generate damped or persistent oscillations in a deterministic second-order difference equation.
+The {doc}`samuelson` lecture explores this mechanism in detail through the Hansen-Samuelson multiplier-accelerator model.
 
-To see the mechanism with minimal algebra, take the multiplier–accelerator law of motion
+Here we briefly illustrate the effect. Take the multiplier–accelerator law of motion
 
 ```{math}
 Y_t = c Y_{t-1} + v (Y_{t-1} - Y_{t-2}),
@@ -224,13 +387,16 @@ and rewrite it as a first-order system in $(Y_t, Y_{t-1})$.
 def samuelson_transition(c, v):
     return np.array([[c + v, -v], [1.0, 0.0]])
 
-c = 0.6
-v_values = (0.0, 0.8)
-A_list = [samuelson_transition(c, v) for v in v_values]
+# Compare weak vs strong acceleration
+# Weak: c=0.8, v=0.1 gives real roots (discriminant > 0)
+# Strong: c=0.6, v=0.8 gives complex roots (discriminant < 0)
+cases = [("weak acceleration", 0.8, 0.1), ("strong acceleration", 0.6, 0.8)]
+A_list = [samuelson_transition(c, v) for _, c, v in cases]
 
-for v, A in zip(v_values, A_list):
+for (label, c, v), A in zip(cases, A_list):
     eig = np.linalg.eigvals(A)
-    print(f"v={v:.1f}, eigenvalues={eig}")
+    disc = (c + v)**2 - 4*v
+    print(f"{label}: c={c}, v={v}, discriminant={disc:.2f}, eigenvalues={eig}")
 
 # impulse responses from a one-time unit shock in Y
 T = 40
@@ -253,19 +419,19 @@ spectra = []
 for A in A_list:
     F = spectral_density_var1(A, V, ω_grid)
     f11 = np.real(F[:, 0, 0])
-    spectra.append(f11 / np.trapz(f11, freq))
+    spectra.append(f11 / np.trapezoid(f11, freq))
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
-axes[0].plot(range(T + 1), irfs[0], lw=1.8, label="no acceleration")
-axes[0].plot(range(T + 1), irfs[1], lw=1.8, label="with acceleration")
+axes[0].plot(range(T + 1), irfs[0], lw=2, label="weak acceleration (real roots)")
+axes[0].plot(range(T + 1), irfs[1], lw=2, label="strong acceleration (complex roots)")
 axes[0].axhline(0.0, lw=0.8)
 axes[0].set_xlabel("time")
 axes[0].set_ylabel(r"$Y_t$")
 axes[0].legend(frameon=False)
 
-axes[1].plot(freq, spectra[0], lw=1.8, label="no acceleration")
-axes[1].plot(freq, spectra[1], lw=1.8, label="with acceleration")
+axes[1].plot(freq, spectra[0], lw=2, label="weak acceleration (real roots)")
+axes[1].plot(freq, spectra[1], lw=2, label="strong acceleration (complex roots)")
 axes[1].set_xlabel(r"frequency $\omega/2\pi$")
 axes[1].set_ylabel("normalized spectrum")
 axes[1].set_xlim([0.0, 0.5])
@@ -275,240 +441,448 @@ plt.tight_layout()
 plt.show()
 ```
 
-The left panel shows that acceleration creates oscillatory impulse responses.
+The left panel shows the contrast between weak and strong acceleration: with weak acceleration ($v=0.1$) the roots are real and the impulse response decays monotonically; with strong acceleration ($v=0.8$) the roots are complex and the impulse response oscillates.
+
+The right panel shows the corresponding spectral signatures.
 
-The right panel shows the corresponding spectral signature: a peak at interior frequencies.
+Complex roots produce a pronounced peak at interior frequencies—the spectral signature of business cycles.
 
-### How the accelerator shifts the spectral peak
+### How acceleration strength affects the spectrum
 
-As we increase the accelerator $v$, the complex eigenvalues rotate further from the real axis, shifting the spectral peak to higher frequencies.
+As we increase the accelerator $v$, the eigenvalues move further from the origin.
+
+For this model, the eigenvalue modulus is $|\lambda| = \sqrt{v}$, so the stability boundary is $v = 1$.
 
 ```{code-cell} ipython3
-v_grid = np.linspace(0.2, 1.2, 6)
+v_grid = [0.2, 0.4, 0.6, 0.8, 0.95]  # stable cases only
 c = 0.6
 freq_fine = np.linspace(1e-4, 0.5, 2000)
 ω_fine = 2 * np.pi * freq_fine
 V_acc = np.array([[1.0, 0.0], [0.0, 0.0]])
+T_irf = 40  # periods for impulse response
 
-fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+fig = plt.figure(figsize=(12, 8))
+ax_eig = fig.add_subplot(2, 2, 1)
+ax_spec = fig.add_subplot(2, 2, 2)
+ax_irf = fig.add_subplot(2, 1, 2)  # spans entire bottom row
 
 for v in v_grid:
     A = samuelson_transition(c, v)
     eig = np.linalg.eigvals(A)
-    F = spectral_density_var1(A, V_acc, ω_fine)
-    f11 = np.real(F[:, 0, 0])
-    f11_norm = f11 / np.trapz(f11, freq_fine)
 
-    # plot eigenvalues
-    axes[0].scatter(eig.real, eig.imag, s=40, label=f'$v={v:.1f}$')
+    # eigenvalues (top left)
+    ax_eig.scatter(eig.real, eig.imag, s=40, label=f'$v={v}$')
 
-    # plot spectrum
-    axes[1].plot(freq_fine, f11_norm, lw=1.5, label=f'$v={v:.1f}$')
+    # spectrum (top right)
+    F = spectral_density_var1(A, V_acc, ω_fine)
+    f11 = np.real(F[:, 0, 0])
+    f11_norm = f11 / np.trapezoid(f11, freq_fine)
+    ax_spec.plot(freq_fine, f11_norm, lw=2, label=f'$v={v}$')
+
+    # impulse response (bottom row)
+    s = np.array([1.0, 0.0])
+    irf = np.empty(T_irf + 1)
+    for t in range(T_irf + 1):
+        irf[t] = s[0]
+        s = A @ s
+    ax_irf.plot(range(T_irf + 1), irf, lw=2, label=f'$v={v}$')
 
-# unit circle
+# eigenvalue panel with unit circle
 θ_circle = np.linspace(0, 2*np.pi, 100)
-axes[0].plot(np.cos(θ_circle), np.sin(θ_circle), 'k--', lw=0.8)
-axes[0].set_xlabel('real part')
-axes[0].set_ylabel('imaginary part')
-axes[0].set_aspect('equal')
-axes[0].legend(frameon=False, fontsize=8)
-
-axes[1].set_xlabel(r'frequency $\omega/2\pi$')
-axes[1].set_ylabel('normalized spectrum')
-axes[1].set_xlim([0, 0.5])
-axes[1].legend(frameon=False, fontsize=8)
+ax_eig.plot(np.cos(θ_circle), np.sin(θ_circle), 'k--', lw=0.8, label='unit circle')
+ax_eig.set_xlabel('real part')
+ax_eig.set_ylabel('imaginary part')
+ax_eig.set_aspect('equal')
+ax_eig.legend(frameon=False, fontsize=8)
+
+# spectrum panel
+ax_spec.set_xlabel(r'frequency $\omega/2\pi$')
+ax_spec.set_ylabel('normalized spectrum')
+ax_spec.set_xlim([0, 0.5])
+ax_spec.set_yscale('log')
+ax_spec.legend(frameon=False, fontsize=8)
+
+# impulse response panel
+ax_irf.axhline(0, lw=0.8, color='gray')
+ax_irf.set_xlabel('time')
+ax_irf.set_ylabel(r'$Y_t$')
+ax_irf.legend(frameon=False, fontsize=8)
 
 plt.tight_layout()
 plt.show()
 ```
 
-Larger $v$ pushes the eigenvalues further off the real axis, shifting the spectral peak to higher frequencies.
+As $v$ increases, eigenvalues approach the unit circle and the spectral peak becomes sharper.
+
+This illustrates Chow's main point: acceleration creates complex eigenvalues, which are necessary for oscillatory dynamics.
 
-When $v$ is large enough that eigenvalues leave the unit circle, the system becomes explosive.
+Without acceleration, the eigenvalues would be real and the impulse response would decay monotonically without oscillation.
 
-## Spectral peaks are not just eigenvalues
+With stronger acceleration (larger $v$), eigenvalues move closer to the unit circle, producing more persistent oscillations and a sharper spectral peak.
 
-With shocks, the deterministic question ("does the system oscillate?") becomes: at which cycle lengths does the variance of $y_t$ concentrate?
+The above examples show that complex roots *can* produce spectral peaks.
 
-In this lecture, a "cycle" means a local peak in a univariate spectrum $f_{ii}(\omega)$.
+But when exactly does this happen, and are complex roots *necessary*?
 
-Chow's point in {cite}`Chow1968` is that eigenvalues help interpret spectra, but they do not determine peaks by themselves.
+Chow answers these questions for the Hansen-Samuelson model.
 
-Two extra ingredients matter:
+## Spectral peaks in the Hansen-Samuelson model
 
-- how shocks load on the eigenmodes (the covariance matrix $V$),
-- how the variable of interest mixes those modes.
+{cite:t}`Chow1968` provides a detailed spectral analysis of the Hansen-Samuelson multiplier-accelerator model.
 
-The next simulations isolate these effects.
+This analysis reveals exactly when complex roots produce spectral peaks, and establishes that in this specific model, complex roots are *necessary* for a peak.
 
-### Complex roots: a peak and an oscillating autocorrelation
+### The model as a first-order system
 
-Take a stable “rotation–contraction” matrix
+The second-order Hansen-Samuelson equation can be written as a first-order system:
 
 ```{math}
-:label: chow_rot
+:label: chow_hs_system
 
-A = r
-\begin{bmatrix}
-\cos \theta & -\sin \theta \\
-\sin \theta & \cos \theta
-\end{bmatrix},
-\qquad 0 < r < 1,
+\begin{bmatrix} y_{1t} \\ y_{2t} \end{bmatrix} =
+\begin{bmatrix} a_{11} & a_{12} \\ 1 & 0 \end{bmatrix}
+\begin{bmatrix} y_{1,t-1} \\ y_{2,t-1} \end{bmatrix} +
+\begin{bmatrix} u_{1t} \\ 0 \end{bmatrix}
 ```
 
-whose eigenvalues are $r e^{\pm i\theta}$.
+where $y_{2t} = y_{1,t-1}$ is simply the lagged value of $y_{1t}$.
 
-When $r$ is close to 1, the spectrum shows a pronounced peak near $\omega \approx \theta$.
+This structure implies a special relationship among the autocovariances:
 
-```{code-cell} ipython3
-def rotation_contraction(r, θ):
-    c, s = np.cos(θ), np.sin(θ)
-    return r * np.array([[c, -s], [s, c]])
+```{math}
+:label: chow_hs_autocov_relation
 
-θ = np.pi / 3
-r_values = (0.95, 0.4)
-ω_grid = np.linspace(1e-3, np.pi - 1e-3, 800)
-V = np.eye(2)
+\gamma_{11,k} = \gamma_{22,k} = \gamma_{12,k-1} = \gamma_{21,k+1}
+```
 
-acfs = []
-spectra = []
-for r in r_values:
-    A = rotation_contraction(r, θ)
+Using the autocovariance recursion, Chow shows that this leads to the condition
 
-    y = simulate_var1(A, V, T=5000, burn=500, seed=1234)
-    acfs.append(sample_autocorrelation(y[:, 0], 40))
+```{math}
+:label: chow_hs_condition53
 
-    F = spectral_density_var1(A, V, ω_grid)
-    spectra.append(np.real(F[:, 0, 0]))
+\gamma_{11,-1} = d_{11,1} \lambda_1^{-1} + d_{11,2} \lambda_2^{-1} = \gamma_{11,1} = d_{11,1} \lambda_1 + d_{11,2} \lambda_2
+```
+
+which constrains the spectral density in a useful way.
+
+### The spectral density formula
+
+From equations {eq}`chow_scalar_autocov` and the scalar kernel $g_i(\omega) = (1 - \lambda_i^2)/(1 + \lambda_i^2 - 2\lambda_i \cos\omega)$, the spectral density of $y_{1t}$ is:
+
+```{math}
+:label: chow_hs_spectral
+
+f_{11}(\omega) = d_{11,1} g_1(\omega) + d_{11,2} g_2(\omega)
+```
+
+which can be written in the combined form:
+
+```{math}
+:label: chow_hs_spectral_combined
+
+f_{11}(\omega) = \frac{d_{11,1}(1 - \lambda_1^2)(1 + \lambda_2^2) + d_{11,2}(1 - \lambda_2^2)(1 + \lambda_1^2) - 2[d_{11,1}(1-\lambda_1^2)\lambda_2 + d_{11,2}(1-\lambda_2^2)\lambda_1]\cos\omega}{(1 + \lambda_1^2 - 2\lambda_1 \cos\omega)(1 + \lambda_2^2 - 2\lambda_2 \cos\omega)}
+```
+
+A key observation: due to condition {eq}`chow_hs_condition53`, the *numerator is not a function of $\cos\omega$*.
+
+Therefore, to find a maximum of $f_{11}(\omega)$, we need only find a minimum of the denominator.
+
+### Conditions for a spectral peak
+
+The first derivative of the denominator with respect to $\omega$ is:
+
+```{math}
+:label: chow_hs_derivative
+
+2[(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1] \sin\omega - 8\lambda_1 \lambda_2 \cos\omega \sin\omega
+```
+
+For $0 < \omega < \pi$, we have $\sin\omega > 0$, so the derivative equals zero if and only if:
+
+```{math}
+:label: chow_hs_foc
+
+(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1 = 4\lambda_1 \lambda_2 \cos\omega
+```
+
+For *complex conjugate roots* $\lambda_1 = r e^{i\theta}$, $\lambda_2 = r e^{-i\theta}$, substitution into {eq}`chow_hs_foc` gives:
+
+```{math}
+:label: chow_hs_peak_condition
+
+\cos\omega = \frac{1 + r^2}{2r} \cos\theta
+```
+
+The second derivative confirms this is a maximum when $\omega < \frac{3\pi}{4}$.
+
+The necessary condition for a valid solution is:
+
+```{math}
+:label: chow_hs_necessary
+
+-1 < \frac{1 + r^2}{2r} \cos\theta < 1
+```
+
+We can interpret it as:
+- When $r \approx 1$, the factor $(1+r^2)/2r \approx 1$, so $\omega \approx \theta$ 
+- When $r$ is small (e.g., 0.3 or 0.4), condition {eq}`chow_hs_necessary` can only be satisfied if $\cos\theta \approx 0$, meaning $\theta \approx \pi/2$ (cycles of approximately 4 periods)
+
+If $\theta = 54 \degree$ (corresponding to cycles of 6.67 periods) and $r = 0.4$, then $(1+r^2)/2r = 1.45$, giving $\cos\omega = 1.45 \times 0.588 = 0.85$, or $\omega = 31.5 \degree$, corresponding to cycles of 11.4 periods, which is much longer than the deterministic cycle.
+
+```{code-cell} ipython3
+def peak_condition_factor(r):
+    """Compute (1 + r^2) / (2r)"""
+    return (1 + r**2) / (2 * r)
+
+# Verify Chow's analysis: peak frequency as function of r for fixed θ
+θ_deg = 54
+θ = np.deg2rad(θ_deg)
+r_grid = np.linspace(0.3, 0.99, 100)
+
+# For each r, compute the implied peak frequency (if it exists)
+ω_peak = []
+for r in r_grid:
+    factor = peak_condition_factor(r)
+    cos_omega = factor * np.cos(θ)
+    if -1 < cos_omega < 1:
+        ω_peak.append(np.arccos(cos_omega))
+    else:
+        ω_peak.append(np.nan)
+
+ω_peak = np.array(ω_peak)
+period_peak = 2 * np.pi / ω_peak
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
-for r, acf in zip(r_values, acfs):
-    axes[0].plot(range(len(acf)), acf, lw=1.8, label=fr"$r={r}$")
-axes[0].axhline(0.0, lw=0.8)
-axes[0].set_xlabel("lag")
-axes[0].set_ylabel("autocorrelation")
+axes[0].plot(r_grid, np.rad2deg(ω_peak), lw=2)
+axes[0].axhline(θ_deg, ls='--', lw=1.0, color='gray', label=rf'$\theta = {θ_deg}°$')
+axes[0].set_xlabel('eigenvalue modulus $r$')
+axes[0].set_ylabel('peak frequency $\omega$ (degrees)')
 axes[0].legend(frameon=False)
 
-for r, f11 in zip(r_values, spectra):
-    axes[1].plot(ω_grid / np.pi, f11, lw=1.8, label=fr"$r={r}$")
-axes[1].axvline(θ / np.pi, ls="--", lw=1.0, label=r"$\theta/\pi$")
-axes[1].set_xlabel(r"frequency $\omega/\pi$")
-axes[1].set_ylabel(r"$f_{11}(\omega)$")
+axes[1].plot(r_grid, period_peak, lw=2)
+axes[1].axhline(360/θ_deg, ls='--', lw=1.0, color='gray', label=rf'deterministic period = {360/θ_deg:.1f}')
+axes[1].set_xlabel('eigenvalue modulus $r$')
+axes[1].set_ylabel('peak period')
 axes[1].legend(frameon=False)
 
 plt.tight_layout()
 plt.show()
+
+# Verify Chow's specific example
+r_example = 0.4
+factor = peak_condition_factor(r_example)
+cos_omega = factor * np.cos(θ)
+omega_example = np.arccos(cos_omega)
+print(f"Chow's example: r = {r_example}, θ = {θ_deg}°")
+print(f"  Factor (1+r²)/2r = {factor:.3f}")
+print(f"  cos(ω) = {cos_omega:.3f}")
+print(f"  ω = {np.rad2deg(omega_example):.1f}°")
+print(f"  Peak period = {360/np.rad2deg(omega_example):.1f} (vs deterministic period = {360/θ_deg:.1f})")
 ```
 
-When $r$ is close to 1, the autocorrelation oscillates slowly and the spectrum has a sharp peak near $\theta$.
+As $r \to 1$, the peak frequency converges to $\theta$.
+For smaller $r$, the peak frequency can differ substantially from the deterministic oscillation frequency.
 
-When $r$ is smaller, oscillations die out quickly and the spectrum is flatter.
+### Real positive roots cannot produce peaks
 
-### How shock structure shapes the spectrum
+For *real and positive roots* $\lambda_1, \lambda_2 > 0$, the first-order condition {eq}`chow_hs_foc` cannot be satisfied.
 
-Even with the same transition matrix, different shock covariance structures produce different spectral shapes.
+To see why, note that we would need:
 
-Here we fix $r = 0.9$ and vary the correlation between the two shocks.
+```{math}
+:label: chow_hs_real_impossible
 
-```{code-cell} ipython3
-r_fixed = 0.9
-A_fixed = rotation_contraction(r_fixed, θ)
-corr_values = [-0.9, 0.0, 0.9]
+\cos\omega = \frac{(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1}{4\lambda_1 \lambda_2} > 1
+```
 
-fig, ax = plt.subplots(figsize=(9, 4))
-for corr in corr_values:
-    V_corr = np.array([[1.0, corr], [corr, 1.0]])
-    F = spectral_density_var1(A_fixed, V_corr, ω_grid)
-    f11 = np.real(F[:, 0, 0])
-    f11_norm = f11 / np.trapz(f11, ω_grid / np.pi)
-    ax.plot(ω_grid / np.pi, f11_norm, lw=1.8, label=fr'$\rho = {corr}$')
+The inequality follows because:
+
+```{math}
+:label: chow_hs_real_proof
+
+(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1 - 4\lambda_1\lambda_2 = \lambda_1(1-\lambda_2)^2 + \lambda_2(1-\lambda_1)^2 > 0
+```
+
+which is strictly positive for any $\lambda_1, \lambda_2 > 0$.
 
-ax.axvline(θ / np.pi, ls='--', lw=1.0, color='gray')
+This is a key result: In the Hansen-Samuelson model, *complex roots are necessary* for a spectral peak at interior frequencies.
+
+```{code-cell} ipython3
+# Demonstrate: compare spectra with complex vs real roots
+# Both cases use valid Hansen-Samuelson parameterizations
+ω_grid = np.linspace(1e-3, np.pi - 1e-3, 800)
+V_hs = np.array([[1.0, 0.0], [0.0, 0.0]])  # shock only in first equation
+
+# Case 1: Complex roots (c=0.6, v=0.8)
+# Discriminant = (c+v)² - 4v = 1.96 - 3.2 < 0 → complex roots
+c_complex, v_complex = 0.6, 0.8
+A_complex = samuelson_transition(c_complex, v_complex)
+eig_complex = np.linalg.eigvals(A_complex)
+
+# Case 2: Real roots (c=0.8, v=0.1)
+# Discriminant = (c+v)² - 4v = 0.81 - 0.4 > 0 → real roots
+# Both roots positive and < 1 (stable)
+c_real, v_real = 0.8, 0.1
+A_real = samuelson_transition(c_real, v_real)
+eig_real = np.linalg.eigvals(A_real)
+
+print(f"Complex case (c={c_complex}, v={v_complex}): eigenvalues = {eig_complex}")
+print(f"Real case (c={c_real}, v={v_real}): eigenvalues = {eig_real}")
+
+F_complex = spectral_density_var1(A_complex, V_hs, ω_grid)
+F_real = spectral_density_var1(A_real, V_hs, ω_grid)
+
+f11_complex = np.real(F_complex[:, 0, 0])
+f11_real = np.real(F_real[:, 0, 0])
+
+fig, ax = plt.subplots()
+ax.plot(ω_grid / np.pi, f11_complex / np.max(f11_complex), lw=2,
+        label=fr'complex roots ($c={c_complex}, v={v_complex}$)')
+ax.plot(ω_grid / np.pi, f11_real / np.max(f11_real), lw=2,
+        label=fr'real roots ($c={c_real}, v={v_real}$)')
 ax.set_xlabel(r'frequency $\omega/\pi$')
 ax.set_ylabel('normalized spectrum')
 ax.legend(frameon=False)
 plt.show()
 ```
 
-The peak location is unchanged, but the peak height depends on the shock correlation.
+With complex roots, the spectrum has a clear interior peak.
 
-This illustrates that eigenvalues alone do not determine the full spectral shape.
+With real roots, the spectrum is monotonically decreasing and no interior peak is possible.
 
-### Complex roots: an oscillatory mode can be hidden
+## Real roots can produce peaks in general models
 
-Complex roots are not sufficient for a visible peak in the spectrum of every observed series.
+While real positive roots cannot produce spectral peaks in the Hansen-Samuelson model, {cite:t}`Chow1968` emphasizes that this is *not true in general*.
 
-Even if the state vector contains an oscillatory mode, a variable can be dominated by a non-oscillatory component.
+In multivariate systems, the spectral density of a linear combination of variables can have interior peaks even when all eigenvalues are real and positive.
 
-The next example combines a rotation–contraction block with a very persistent real root, and then looks at a mixture that is dominated by the persistent component.
+### Chow's example
 
-```{code-cell} ipython3
-A_osc = rotation_contraction(0.95, θ)
-A = np.block([
-    [A_osc, np.zeros((2, 1))],
-    [np.zeros((1, 2)), np.array([[0.99]])]
-])
+Chow constructs the following explicit example with two real positive eigenvalues:
 
-# shocks hit the persistent component much more strongly
-V = np.diag([1.0, 1.0, 50.0])
+```{math}
+:label: chow_real_roots_example
 
-ω_grid_big = np.linspace(1e-3, np.pi - 1e-3, 1200)
-F = spectral_density_var1(A, V, ω_grid_big)
+\lambda_1 = 0.1, \quad \lambda_2 = 0.9
+```
 
-x_grid = ω_grid_big / np.pi
-f_y1 = np.real(F[:, 0, 0])
+```{math}
+:label: chow_real_roots_W
 
-b = np.array([0.05, 0.0, 1.0])
-f_mix = spectrum_of_linear_combination(F, b)
+w_{11} = w_{22} = 1, \quad w_{12} = 0.8
+```
 
-f_y1_norm = f_y1 / np.trapz(f_y1, x_grid)
-f_mix_norm = f_mix / np.trapz(f_mix, x_grid)
+```{math}
+:label: chow_real_roots_b
 
-fig, ax = plt.subplots(figsize=(9, 4))
-ax.plot(x_grid, f_y1_norm, lw=1.8, label=r"$y_1$")
-ax.plot(x_grid, f_mix_norm, lw=1.8, label=r"$x = 0.05\,y_1 + y_3$")
-ax.set_xlabel(r"frequency $\omega/\pi$")
-ax.set_ylabel("normalized spectrum")
-ax.legend(frameon=False)
-plt.show()
+b_{m1} = 1, \quad b_{m2} = -0.01
 ```
 
-Here the oscillatory mode is still present (the $y_1$ spectrum peaks away from zero), but the mixture $x$ is dominated by the near-unit root and hence by very low frequencies.
+The spectral density of the linear combination $x_t = b_m^\top y_t$ is:
 
-### Real roots: a peak from mixing shocks
+```{math}
+:label: chow_real_roots_spectrum
+
+f_{mm}(\omega) = \frac{0.9913}{1.01 - 0.2\cos\omega} - \frac{0.001570}{1.81 - 1.8\cos\omega}
+```
 
-Chow also constructs examples where all roots are real and positive yet a linear combination displays a local spectral peak.
+Chow tabulates the values:
 
-The mechanism is that cross-correlation in shocks can generate cyclical-looking behavior.
+| $\omega$ | $0$ | $\pi/8$ | $2\pi/8$ | $3\pi/8$ | $4\pi/8$ | $5\pi/8$ | $6\pi/8$ | $7\pi/8$ | $\pi$ |
+|----------|-----|---------|----------|----------|----------|----------|----------|----------|-------|
+| $f_{mm}(\omega)$ | 1.067 | 1.183 | 1.191 | 1.138 | 1.061 | 0.981 | 0.912 | 0.860 | 0.829 |
 
-Here is a close analog of Chow’s two-root illustration.
+The peak at $\omega$ slightly below $\pi/8$ (corresponding to periods around 11) is "quite pronounced."
 
-```{code-cell} ipython3
-A = np.diag([0.1, 0.9])
-V = np.array([[1.0, 0.8], [0.8, 1.0]])
-b = np.array([1.0, -0.01])
+In the following figure, we reproduce this table, but with Python, we can plot a finer grid to find the peak more accurately.
 
-F = spectral_density_var1(A, V, ω_grid)
-f_x = spectrum_of_linear_combination(F, b)
-imax = np.argmax(f_x)
-ω_star = ω_grid[imax]
-period_star = 2 * np.pi / ω_star
+```{code-cell} ipython3
+# Reproduce Chow's exact example
+λ1, λ2 = 0.1, 0.9
+w11, w22, w12 = 1.0, 1.0, 0.8
+bm1, bm2 = 1.0, -0.01
+
+# Construct the system
+A_chow_ex = np.diag([λ1, λ2])
+# W is the canonical shock covariance; we need V = B W B^T
+# For diagonal A with distinct eigenvalues, B = I, so V = W
+V_chow_ex = np.array([[w11, w12], [w12, w22]])
+b_chow_ex = np.array([bm1, bm2])
+
+# Chow's formula (equation 67)
+def chow_spectrum_formula(ω):
+    term1 = 0.9913 / (1.01 - 0.2 * np.cos(ω))
+    term2 = 0.001570 / (1.81 - 1.8 * np.cos(ω))
+    return term1 - term2
+
+# Compute via formula and via our general method
+ω_table = np.array([0, np.pi/8, 2*np.pi/8, 3*np.pi/8, 4*np.pi/8,
+                    5*np.pi/8, 6*np.pi/8, 7*np.pi/8, np.pi])
+f_formula = np.array([chow_spectrum_formula(ω) for ω in ω_table])
+
+# General method
+ω_grid_fine = np.linspace(1e-4, np.pi, 1000)
+F_chow_ex = spectral_density_var1(A_chow_ex, V_chow_ex, ω_grid_fine)
+f_general = spectrum_of_linear_combination(F_chow_ex, b_chow_ex)
+
+# Normalize to match Chow's table scale
+scale = f_formula[0] / spectrum_of_linear_combination(
+    spectral_density_var1(A_chow_ex, V_chow_ex, np.array([0.0])), b_chow_ex)[0]
+
+print("Chow's Table (equation 67):")
+print("ω/π:        ", "  ".join([f"{ω/np.pi:.3f}" for ω in ω_table]))
+print("f_mm(ω):    ", "  ".join([f"{f:.3f}" for f in f_formula]))
 
 fig, ax = plt.subplots(figsize=(9, 4))
-ax.plot(ω_grid / np.pi, f_x)
-ax.scatter([ω_star / np.pi], [f_x[imax]], zorder=3)
-ax.set_xlabel(r"frequency $\omega/\pi$")
-ax.set_ylabel(r"$f_x(\omega)$")
+ax.plot(ω_grid_fine / np.pi, f_general * scale, lw=2, label='spectrum')
+ax.scatter(ω_table / np.pi, f_formula, s=50, zorder=3, label="Chow's table values")
+
+# Mark the peak
+i_peak = np.argmax(f_general)
+ω_peak = ω_grid_fine[i_peak]
+ax.axvline(ω_peak / np.pi, ls='--', lw=1.0, color='gray', alpha=0.7)
+ax.set_xlabel(r'frequency $\omega/\pi$')
+ax.set_ylabel(r'$f_{mm}(\omega)$')
+ax.legend(frameon=False)
 plt.show()
-print(f"peak period ≈ {period_star:.1f}")
+
+print(f"\nPeak at ω/π ≈ {ω_peak/np.pi:.3f}, period ≈ {2*np.pi/ω_peak:.1f}")
+```
+
+### The Slutsky connection
+
+Chow connects this result to Slutsky's well-known finding that taking moving averages of a random series can generate cycles.
+
+The VAR(1) model can be written as an infinite moving average:
+
+```{math}
+:label: chow_ma_rep
+
+y_t = u_t + A u_{t-1} + A^2 u_{t-2} + \cdots
 ```
 
-The lesson is the same as Chow’s: in multivariate stochastic systems, “cycle-like” spectra are shaped not only by eigenvalues, but also by how shocks enter ($V$) and how variables combine (the analogue of Chow’s eigenvector matrix).
+This amounts to taking an infinite moving average of the random vectors $u_t$ with "geometrically declining" weights $A^0, A^1, A^2, \ldots$
+
+For a scalar process with $0 < \lambda < 1$, no distinct cycles can emerge.
+But for a matrix $A$ with real roots between 0 and 1, cycles **can** emerge in linear combinations of the variables.
+
+As Chow puts it: "When neither of two (canonical) variables has distinct cycles... a linear combination can have a peak in its spectral density."
+
+### The general lesson
+
+The examples above illustrate Chow's central point:
+
+1. In the *Hansen-Samuelson model specifically*, complex roots are necessary for a spectral peak
+2. But in *general multivariate systems*, complex roots are neither necessary nor sufficient
+3. The full spectral shape depends on:
+   - The eigenvalues of $A$
+   - The shock covariance structure $V$
+   - How the observable of interest loads on the eigenmodes (the vector $b$)
 
 ## A calibrated model in the frequency domain
 
-Chow and Levitan {cite}`ChowLevitan1969` use the frequency-domain objects from {cite}`Chow1968` to study a calibrated annual macroeconometric model.
+{cite:t}`ChowLevitan1969` use the frequency-domain objects from {cite:t}`Chow1968` to study a calibrated annual macroeconometric model.
 
 They work with five annual aggregates
 
@@ -720,20 +1094,20 @@ def paper_frequency_axis(ax):
 
 # Normalized spectra (areas set to 1)
 S = np.real(np.diagonal(F_chow, axis1=1, axis2=2))[:, :5]  # y1..y5
-areas = np.trapz(S, freq, axis=0)
+areas = np.trapezoid(S, freq, axis=0)
 S_norm = S / areas
 mask = freq >= 0.0
 
 fig, axes = plt.subplots(1, 2, figsize=(10, 6))
 
 # Figure I.1: consumption (log scale)
-axes[0].plot(freq[mask], S_norm[mask, 0], lw=1.8)
+axes[0].plot(freq[mask], S_norm[mask, 0], lw=2)
 axes[0].set_yscale('log')
 paper_frequency_axis(axes[0])
 axes[0].set_ylabel(r'normalized $f_{11}(\omega)$')
 
 # Figure I.2: equipment + inventories (log scale)
-axes[1].plot(freq[mask], S_norm[mask, 1], lw=1.8)
+axes[1].plot(freq[mask], S_norm[mask, 1], lw=2)
 axes[1].set_yscale('log')
 paper_frequency_axis(axes[1])
 axes[1].set_ylabel(r'normalized $f_{22}(\omega)$')
@@ -808,9 +1182,9 @@ for idx, var_idx in enumerate([0, 1, 2]):
     ax = axes[idx]
 
     ax.plot(freq[mask], coherence[mask],
-            lw=1.8, label=rf'$R^2_{{{var_idx+1}5}}(\omega)$')
+            lw=2, label=rf'$R^2_{{{var_idx+1}5}}(\omega)$')
     ax.plot(freq[mask], gain[mask],
-            lw=1.8, label=rf'$G_{{{var_idx+1}5}}(\omega)$')
+            lw=2, label=rf'$G_{{{var_idx+1}5}}(\omega)$')
 
     paper_frequency_axis(ax)
     ax.set_ylim([0, 1.0])
@@ -842,7 +1216,7 @@ labels = [r'$\psi_{15}(\omega)/2\pi$', r'$\psi_{25}(\omega)/2\pi$',
 for var_idx in range(4):
     coherence, gain, phase = cross_spectral_measures(F_chow, var_idx, gnp_idx)
     phase_cycles = phase / (2 * np.pi)
-    ax.plot(freq[mask], phase_cycles[mask], lw=1.8, label=labels[var_idx])
+    ax.plot(freq[mask], phase_cycles[mask], lw=2, label=labels[var_idx])
 
 ax.axhline(0, lw=0.8)
 paper_frequency_axis(ax)
@@ -890,7 +1264,7 @@ for i, λ_i in enumerate(λ[:4]):
     if np.abs(λ_i) > 0.01:
         g_i = scalar_kernel(λ_i, ω_grid)
         label = f'$\\lambda_{i+1}$ = {λ_i:.4f}' if np.isreal(λ_i) else f'$\\lambda_{i+1}$ = {λ_i:.3f}'
-        ax.semilogy(freq, g_i, label=label, lw=1.5)
+        ax.semilogy(freq, g_i, label=label, lw=2)
 ax.set_xlabel(r'frequency $\omega/2\pi$')
 ax.set_ylabel('$g_i(\\omega)$')
 ax.set_xlim([1/18, 0.5])
@@ -924,59 +1298,38 @@ The calibrated model reveals three patterns: (1) most variance sits at very low
 
 ## Wrap-up
 
-Chow {cite}`Chow1968` emphasizes two complementary diagnostics for linear macro models: how eigenvalues shape deterministic propagation, and how spectra summarize stochastic dynamics.
-
-Chow and Levitan {cite}`ChowLevitan1969` then show what these objects look like in a calibrated system: strong low-frequency power, frequency-dependent gains/coherences, and lead–lag relations that vary with the cycle length.
-
-To connect this to data, pair the model-implied objects here with the advanced lecture [Estimation of Spectra](https://python-advanced.quantecon.org/estspec.html#).
-
-## A structural view of acceleration
+{cite:t}`Chow1968` draws several conclusions that remain relevant for understanding business cycles:
 
-Chow {cite}`Chow1968` provides a structural interpretation of how acceleration enters the model.
+1. **Empirical support for acceleration**: The acceleration principle, as formulated through stock-adjustment equations, receives strong empirical support from investment data. The negative coefficient on lagged output levels is a robust empirical finding.
 
-The starting point is a stock-adjustment demand for capital:
+2. **Acceleration is necessary for deterministic oscillations**: In a model consisting only of demand equations with simple distributed lags, the transition matrix has real positive roots (under natural sign restrictions), and hence no prolonged oscillations can occur. Acceleration introduces the possibility of complex roots.
 
-```{math}
-:label: chow_stock_adj_struct
-
-s_{it} = a_i Y_t + b_i s_{i,t-1}
-```
+3. **Complex roots are neither necessary nor sufficient for stochastic cycles**: While complex roots in the deterministic model guarantee oscillatory autocovariances, they are neither necessary nor sufficient for a pronounced spectral peak. In the Hansen-Samuelson model specifically, complex roots *are* necessary for a spectral peak. But in general multivariate systems, real roots can produce peaks through the interaction of shocks and eigenvector loadings.
 
-where $s_{it}$ is the desired stock of capital type $i$, $Y_t$ is aggregate output, and $(a_i, b_i)$ are parameters.
+4. **An integrated view is essential**: As Chow concludes, "an obvious moral is that the nature of business cycles can be understood only by an integrated view of the deterministic as well as the random elements."
 
-Net investment is the stock change:
+{cite:t}`ChowLevitan1969` then show what these objects look like in a calibrated system: strong low-frequency power (reflecting near-unit eigenvalues), frequency-dependent gains/coherences, and lead–lag relations that vary with the cycle length.
 
-```{math}
-:label: chow_net_inv
+On the empirical side, Granger has noted a "typical spectral shape" for economic time series—a monotonically decreasing function of frequency.
 
-y^n_{it} = \Delta s_{it} = a_i \Delta Y_t + b_i y^n_{i,t-1}.
-```
+The Chow-Levitan calibration is consistent with this shape, driven by the near-unit eigenvalues.
 
-For gross investment with depreciation rate $\delta_i$:
+But as Chow emphasizes, understanding whether this shape reflects the true data-generating process requires analyzing the spectral densities implied by structural econometric models.
 
-```{math}
-:label: chow_gross_inv
-
-y_{it} = a_i [Y_t - (1-\delta_i) Y_{t-1}] + b_i y_{i,t-1}.
-```
-
-The parameters $(a_i, b_i, \delta_i)$ are the key "acceleration equation" parameters.
-
-The term $a_i \Delta Y_t$ is the acceleration effect: investment responds to *changes* in output, not just levels.
-
-This creates negative coefficients on lagged output levels, which in turn makes complex roots (and hence oscillatory components) possible in the characteristic equation.
+To connect this to data, pair the model-implied objects here with the advanced lecture {doc}`advanced:estspec`.
 
 ## Exercises
 
 ```{exercise}
 :label: chow_cycles_ex1
 
-In the rotation-contraction example, fix $\theta$ and vary $r$ in a grid between $0.2$ and $0.99$.
+Verify Chow's spectral peak condition {eq}`chow_hs_peak_condition` numerically for the Hansen-Samuelson model.
 
-1. For each $r$, compute the frequency $\omega^*(r)$ that maximizes $f_{11}(\omega)$.
-2. Plot $\omega^*(r)$ and the implied peak period $2\pi/\omega^*(r)$ as functions of $r$.
-
-How does the peak location behave as $r \uparrow 1$?
+1. For a range of eigenvalue moduli $r \in [0.3, 0.99]$ with fixed $\theta = 60°$, compute:
+   - The theoretical peak frequency from Chow's formula: $\cos\omega = \frac{1+r^2}{2r}\cos\theta$
+   - The actual peak frequency by numerically maximizing the spectral density
+2. Plot both on the same graph and verify they match.
+3. Identify the range of $r$ for which no valid peak exists (when the condition {eq}`chow_hs_necessary` is violated).
 ```
 
 ```{solution-start} chow_cycles_ex1
@@ -984,40 +1337,72 @@ How does the peak location behave as $r \uparrow 1$?
 ```
 
 ```{code-cell} ipython3
-r_grid = np.linspace(0.2, 0.99, 50)
-θ = np.pi / 3
+θ_ex = np.pi / 3  # 60 degrees
+r_grid = np.linspace(0.3, 0.99, 50)
 ω_grid_ex = np.linspace(1e-3, np.pi - 1e-3, 1000)
-V_ex = np.eye(2)
+V_hs_ex = np.array([[1.0, 0.0], [0.0, 0.0]])
+
+ω_theory = []
+ω_numerical = []
 
-ω_star = np.zeros(len(r_grid))
-period_star = np.zeros(len(r_grid))
-for idx, r in enumerate(r_grid):
-    A_ex = rotation_contraction(r, θ)
-    F_ex = spectral_density_var1(A_ex, V_ex, ω_grid_ex)
+for r in r_grid:
+    # Theoretical peak from Chow's formula
+    factor = (1 + r**2) / (2 * r)
+    cos_omega = factor * np.cos(θ_ex)
+    if -1 < cos_omega < 1:
+        ω_theory.append(np.arccos(cos_omega))
+    else:
+        ω_theory.append(np.nan)
+
+    # Numerical peak from spectral density
+    # Construct Hansen-Samuelson with eigenvalues r*exp(±iθ)
+    # This corresponds to c + v = 2r*cos(θ), v = r²
+    v = r**2
+    c = 2 * r * np.cos(θ_ex) - v
+    A_ex = samuelson_transition(c, v)
+    F_ex = spectral_density_var1(A_ex, V_hs_ex, ω_grid_ex)
     f11 = np.real(F_ex[:, 0, 0])
     i_max = np.argmax(f11)
-    ω_star[idx] = ω_grid_ex[i_max]
-    period_star[idx] = 2 * np.pi / ω_star[idx]
+    # Only count as a peak if it's not at the boundary
+    if 5 < i_max < len(ω_grid_ex) - 5:
+        ω_numerical.append(ω_grid_ex[i_max])
+    else:
+        ω_numerical.append(np.nan)
+
+ω_theory = np.array(ω_theory)
+ω_numerical = np.array(ω_numerical)
 
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
-axes[0].plot(r_grid, ω_star / np.pi, lw=1.8)
-axes[0].axhline(θ / np.pi, ls='--', lw=1.0, label=r'$\theta/\pi$')
-axes[0].set_xlabel('$r$')
-axes[0].set_ylabel(r'$\omega^*/\pi$')
+
+# Plot peak frequencies
+axes[0].plot(r_grid, ω_theory / np.pi, lw=2, label="Chow's formula")
+axes[0].plot(r_grid, ω_numerical / np.pi, 'o', markersize=4, label='numerical')
+axes[0].axhline(θ_ex / np.pi, ls='--', lw=1.0, color='gray', label=r'$\theta/\pi$')
+axes[0].set_xlabel('eigenvalue modulus $r$')
+axes[0].set_ylabel(r'peak frequency $\omega^*/\pi$')
 axes[0].legend(frameon=False)
 
-axes[1].plot(r_grid, period_star, lw=1.8)
-axes[1].axhline(2 * np.pi / θ, ls='--', lw=1.0, label=r'$2\pi/\theta$')
-axes[1].set_xlabel('$r$')
-axes[1].set_ylabel('peak period')
+# Plot the factor (1+r²)/2r to show when peaks are valid
+axes[1].plot(r_grid, (1 + r_grid**2) / (2 * r_grid), lw=2)
+axes[1].axhline(1 / np.cos(θ_ex), ls='--', lw=1.0, color='red',
+                label=f'threshold = 1/cos({np.rad2deg(θ_ex):.0f}°) = {1/np.cos(θ_ex):.2f}')
+axes[1].set_xlabel('eigenvalue modulus $r$')
+axes[1].set_ylabel(r'$(1+r^2)/2r$')
 axes[1].legend(frameon=False)
+
 plt.tight_layout()
 plt.show()
-```
 
-As $r \uparrow 1$, the peak frequency converges to $\theta$ (the argument of the complex eigenvalue).
+# Find threshold r below which no peak exists
+valid_mask = ~np.isnan(ω_theory)
+if valid_mask.any():
+    r_threshold = r_grid[valid_mask][0]
+    print(f"Peak exists for r ≥ {r_threshold:.2f}")
+```
 
-This confirms Chow's insight: when the modulus is close to 1, the spectral peak aligns with the eigenvalue frequency.
+The theoretical and numerical peak frequencies match closely.
+As $r \to 1$, the peak frequency converges to $\theta$.
+For smaller $r$, the factor $(1+r^2)/2r$ exceeds the threshold, and no valid peak exists.
 
 ```{solution-end}
 ```
@@ -1050,7 +1435,7 @@ for corr in corr_grid:
         peak_periods.append(np.nan)
 
 fig, ax = plt.subplots(figsize=(8, 4))
-ax.plot(corr_grid, peak_periods, marker='o', lw=1.8, markersize=4)
+ax.plot(corr_grid, peak_periods, marker='o', lw=2, markersize=4)
 ax.set_xlabel('shock correlation')
 ax.set_ylabel('peak period')
 plt.show()
@@ -1139,10 +1524,10 @@ var_labels = ["consumption", "equipment + inventories", "construction", "long ra
 for i in range(5):
     f_orig = np.real(F_chow[:, i, i])
     f_mod = np.real(F_mod[:, i, i])
-    f_orig_norm = f_orig / np.trapz(f_orig, freq)
-    f_mod_norm = f_mod / np.trapz(f_mod, freq)
-    axes[i].semilogy(freq, f_orig_norm, lw=1.5, label=r"original ($\lambda_3=0.48$)")
-    axes[i].semilogy(freq, f_mod_norm, lw=1.5, ls="--", label=r"modified ($\lambda_3=0.95$)")
+    f_orig_norm = f_orig / np.trapezoid(f_orig, freq)
+    f_mod_norm = f_mod / np.trapezoid(f_mod, freq)
+    axes[i].semilogy(freq, f_orig_norm, lw=2, label=r"original ($\lambda_3=0.48$)")
+    axes[i].semilogy(freq, f_mod_norm, lw=2, ls="--", label=r"modified ($\lambda_3=0.95$)")
     paper_frequency_axis(axes[i])
     axes[i].set_ylabel(rf"normalized $f_{{{i+1}{i+1}}}(\omega)$")
     axes[i].text(0.03, 0.08, var_labels[i], transform=axes[i].transAxes)

From 4726b27a7661c6f026cd0c778915297186e18e64 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Sat, 7 Feb 2026 15:55:00 +1100
Subject: [PATCH 03/37] updates

---
 lectures/chow_business_cycles.md | 772 +++++++++++++++++--------------
 1 file changed, 422 insertions(+), 350 deletions(-)

diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index 393119345..396a4de55 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -32,7 +32,7 @@ kernelspec:
 This lecture studies two classic papers by Gregory Chow on business cycles in linear dynamic models:
 
 - {cite}`Chow1968`: empirical evidence for the acceleration principle, why acceleration enables oscillations, and when spectral peaks arise in stochastic systems
-- {cite}`ChowLevitan1969`: spectral analysis of a calibrated U.S. macroeconometric model, showing gains, coherences, and lead-lag patterns
+- {cite}`ChowLevitan1969`: spectral analysis of a calibrated US macroeconometric model, showing gains, coherences, and lead–lag patterns
 
 These papers connect ideas in the following lectures:
 
@@ -41,7 +41,7 @@ These papers connect ideas in the following lectures:
 - Eigenmodes of multivariate dynamics in {doc}`var_dmd`
 - Fourier ideas in {doc}`eig_circulant` (and, for empirical estimation, the advanced lecture {doc}`advanced:estspec`)
 
-{cite:t}`Chow1968` builds on earlier empirical work testing the acceleration principle on U.S. investment data.
+{cite:t}`Chow1968` builds on earlier empirical work testing the acceleration principle on US investment data.
 
 We begin with that empirical foundation before developing the theoretical framework.
 
@@ -51,19 +51,52 @@ We will keep coming back to three ideas:
 - In stochastic models, a "cycle" shows up as a local peak in a (univariate) spectral density.
 - Spectral peaks depend on eigenvalues, but also on how shocks enter (the covariance matrix $V$) and on how observables load on eigenmodes.
 
-In this lecture, we start with Chow's empirical evidence for the acceleration principle, then introduce the VAR(1) framework and spectral analysis tools. 
+Let's start with some standard imports:
 
-Next, we show why acceleration creates complex roots that enable oscillations, and derive Chow's conditions for spectral peaks in the Hansen-Samuelson model.
+```{code-cell} ipython3
+import numpy as np
+import matplotlib.pyplot as plt
+```
 
-We then present Chow's striking counterexample: real roots *can* produce spectral peaks in general multivariate systems. 
+We will use the following helper functions throughout the lecture:
 
-Finally, we apply these tools to the calibrated Chow-Levitan model to see what model-implied spectra look like in practice.
+```{code-cell} ipython3
+def spectral_density_var1(A, V, ω_grid):
+    """Spectral density matrix for VAR(1): y_t = A y_{t-1} + u_t."""
+    A, V = np.asarray(A), np.asarray(V)
+    n = A.shape[0]
+    I = np.eye(n)
+    F = np.empty((len(ω_grid), n, n), dtype=complex)
+    for k, ω in enumerate(ω_grid):
+        H = np.linalg.inv(I - np.exp(-1j * ω) * A)
+        F[k] = (H @ V @ H.conj().T) / (2 * np.pi)
+    return F
 
-Let's start with some standard imports
+def spectrum_of_linear_combination(F, b):
+    """Spectrum of x_t = b'y_t given the spectral matrix F(ω)."""
+    b = np.asarray(b).reshape(-1, 1)
+    return np.array([np.real((b.T @ F[k] @ b).item()) for k in range(F.shape[0])])
 
-```{code-cell} ipython3
-import numpy as np
-import matplotlib.pyplot as plt
+def simulate_var1(A, V, T, burn=200, seed=1234):
+    r"""Simulate y_t = A y_{t-1} + u_t with u_t \sim N(0, V)."""
+    rng = np.random.default_rng(seed)
+    A, V = np.asarray(A), np.asarray(V)
+    n = A.shape[0]
+    chol = np.linalg.cholesky(V)
+    y = np.zeros((T + burn, n))
+    for t in range(1, T + burn):
+        y[t] = A @ y[t - 1] + chol @ rng.standard_normal(n)
+    return y[burn:]
+
+def sample_autocorrelation(x, max_lag):
+    """Sample autocorrelation of a 1d array from lag 0 to max_lag."""
+    x = np.asarray(x)
+    x = x - x.mean()
+    denom = np.dot(x, x)
+    acf = np.empty(max_lag + 1)
+    for k in range(max_lag + 1):
+        acf[k] = np.dot(x[:-k] if k else x, x[k:]) / denom
+    return acf
 ```
 
 (empirical_section)=
@@ -81,7 +114,7 @@ In each case, when the regression included both $Y_t$ and $Y_{t-1}$ (where $Y$ i
 
 Equivalently, when expressed in terms of $\Delta Y_t$ and $Y_{t-1}$, the coefficient on $Y_{t-1}$ was a small fraction of the coefficient on $\Delta Y_t$.
 
-### An example: Automobile demand
+### An example: automobile demand
 
 Chow presents a clean illustration using data on net investment in automobiles from his earlier work on automobile demand.
 
@@ -125,17 +158,149 @@ Net investment is the change in stock, $y_{it}^n = \Delta s_{it}$, and differenc
 y_{it}^n = a_i \Delta Y_t + b_i y_{i,t-1}^n
 ```
 
-The coefficients on $Y_t$ and $Y_{t-1}$ in the level form are $a_i$ and $-a_i(1-b_i)$ respectively. 
+The coefficients on $Y_t$ and $Y_{t-1}$ in the level form are $a_i$ and $-a_i(1-b_i)$ respectively.
 
 They are opposite in sign and similar in magnitude when $b_i$ is not too far from unity.
 
 This connection between stock adjustment and acceleration is central to Chow's argument about why acceleration matters for business cycles.
 
-## A linear system with shocks
+## Acceleration enables oscillations
+
+Having established the empirical evidence for acceleration, we now examine why it matters theoretically for generating oscillations.
+
+{cite:t}`Chow1968` asks a fundamental question: if we build a macro model using only standard demand equations with simple distributed lags, can the system generate sustained oscillations?
+
+He shows that, under natural sign restrictions, the answer is no.
 
-To study business cycles formally, we need a framework that combines the deterministic dynamics (captured by the transition matrix $A$) with random shocks.
+Stock-adjustment demand for durable goods leads to investment equations where the coefficient on $Y_{t-1}$ is negative—the **acceleration effect**.
 
-Both papers analyze (or reduce to) a first-order linear stochastic system
+This negative coefficient is what makes complex roots possible in the characteristic equation.
+
+Without it, Chow proves that demand systems with only positive coefficients have real positive roots, and hence no oscillatory dynamics.
+
+The {doc}`samuelson` lecture explores this mechanism in detail through the Hansen-Samuelson multiplier-accelerator model.
+
+Here we briefly illustrate the effect.
+
+Take the multiplier–accelerator law of motion:
+
+```{math}
+Y_t = c Y_{t-1} + v (Y_{t-1} - Y_{t-2}),
+```
+
+and rewrite it as a first-order system in $(Y_t, Y_{t-1})$.
+
+```{code-cell} ipython3
+def samuelson_transition(c, v):
+    return np.array([[c + v, -v], [1.0, 0.0]])
+
+# Compare weak vs strong acceleration
+# Weak: c=0.8, v=0.1 gives real roots (discriminant > 0)
+# Strong: c=0.6, v=0.8 gives complex roots (discriminant < 0)
+cases = [("weak acceleration", 0.8, 0.1),
+         ("strong acceleration", 0.6, 0.8)]
+A_list = [samuelson_transition(c, v) for _, c, v in cases]
+
+for (label, c, v), A in zip(cases, A_list):
+    eig = np.linalg.eigvals(A)
+    disc = (c + v)**2 - 4*v
+    print(f"{label}: c={c}, v={v}, discriminant={disc:.2f}, eigenvalues={eig}")
+```
+
+With weak acceleration ($v=0.1$), the discriminant is positive and the roots are real.
+
+With strong acceleration ($v=0.8$), the discriminant is negative and the roots are complex conjugates, enabling oscillatory dynamics.
+
+```{code-cell} ipython3
+# impulse responses from a one-time unit shock in Y
+T = 40
+s0 = np.array([1.0, 0.0])
+irfs = []
+for A in A_list:
+    s = s0.copy()
+    path = np.empty(T + 1)
+    for t in range(T + 1):
+        path[t] = s[0]
+        s = A @ s
+    irfs.append(path)
+
+fig, ax = plt.subplots(figsize=(10, 4))
+ax.plot(range(T + 1), irfs[0], lw=2,
+                label="weak acceleration (real roots)")
+ax.plot(range(T + 1), irfs[1], lw=2,
+                label="strong acceleration (complex roots)")
+ax.axhline(0.0, lw=0.8, color='gray')
+ax.set_xlabel("time")
+ax.set_ylabel(r"$Y_t$")
+ax.legend(frameon=False)
+plt.tight_layout()
+plt.show()
+```
+
+With weak acceleration, the impulse response decays monotonically.
+
+With strong acceleration, it oscillates.
+
+We can ask how the eigenvalues change as we increase the accelerator $v$.
+
+As we increase the accelerator $v$, the eigenvalues move further from the origin.
+
+For this model, the eigenvalue modulus is $|\lambda| = \sqrt{v}$, so the stability boundary is $v = 1$.
+
+```{code-cell} ipython3
+v_grid = [0.2, 0.4, 0.6, 0.8, 0.95]
+c = 0.6
+T_irf = 40  # periods for impulse response
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+
+for v in v_grid:
+    A = samuelson_transition(c, v)
+    eig = np.linalg.eigvals(A)
+
+    # Eigenvalues (left panel)
+    axes[0].scatter(eig.real, eig.imag, s=40, label=f'$v={v}$')
+
+    # Impulse response (right panel)
+    s = np.array([1.0, 0.0])
+    irf = np.empty(T_irf + 1)
+    for t in range(T_irf + 1):
+        irf[t] = s[0]
+        s = A @ s
+    axes[1].plot(range(T_irf + 1), irf, lw=2, label=f'$v={v}$')
+
+# Eigenvalue panel with unit circle
+θ_circle = np.linspace(0, 2*np.pi, 100)
+axes[0].plot(np.cos(θ_circle), np.sin(θ_circle),
+                'k--', lw=0.8, label='unit circle')
+axes[0].set_xlabel('real part')
+axes[0].set_ylabel('imaginary part')
+axes[0].set_aspect('equal')
+axes[0].legend(frameon=False)
+
+# impulse response panel
+axes[1].axhline(0, lw=0.8, color='gray')
+axes[1].set_xlabel('time')
+axes[1].set_ylabel(r'$Y_t$')
+axes[1].legend(frameon=False)
+
+plt.tight_layout()
+plt.show()
+```
+
+As $v$ increases, eigenvalues approach the unit circle and oscillations become more persistent.
+
+This illustrates that acceleration creates complex eigenvalues, which are necessary for oscillatory dynamics in deterministic systems.
+
+But what happens when we add random shocks?
+
+Frisch's insight was that even damped oscillations can be "maintained" when the system is continuously perturbed by random disturbances.
+
+To study this formally, we need to introduce the stochastic framework.
+
+## A linear system with shocks
+
+We analyze (or reduce to) a first-order linear stochastic system
 
 ```{math}
 :label: chow_var1
@@ -173,11 +338,15 @@ Standard calculations (also derived in {cite}`Chow1968`) give the recursion
 
 The second equation is the discrete Lyapunov equation for $\Gamma_0$.
 
-### Why stochastic dynamics matter
-
 {cite:t}`Chow1968` motivates the stochastic analysis with a quote from Ragnar Frisch:
 
-> The examples we have discussed ... show that when an [deterministic] economic system gives rise to oscillations, these will most frequently be damped. But in reality the cycles ... are generally not damped. How can the maintenance of the swings be explained? ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
+> The examples we have discussed ... show that when a [deterministic] economic system gives rise to oscillations, these will most frequently be damped.
+>
+> But in reality the cycles ... are generally not damped.
+>
+> How can the maintenance of the swings be explained?
+>
+> ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
 >
 > Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings—we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
 >
@@ -202,7 +371,8 @@ A B = B D_\lambda, \quad \text{or equivalently} \quad A = B D_\lambda B^{-1}
 where $D_\lambda = \text{diag}(\lambda_1, \ldots, \lambda_p)$.
 
 Define canonical variables $z_t = B^{-1} y_t$.
-These satisfy the decoupled dynamics
+
+These satisfy the decoupled dynamics:
 
 ```{math}
 :label: chow_canonical_dynamics
@@ -256,8 +426,6 @@ y_{it} = \sum_j b_{ij} z_{j0} \lambda_j^t
 
 Both the autocovariance function {eq}`chow_scalar_autocov` and the deterministic path {eq}`chow_det_path` are linear combinations of $\lambda_m^k$ (or $\lambda_j^t$).
 
-This formal resemblance is important: the coefficients differ (depending on initial conditions vs. shock covariances), but the role of eigenvalues is analogous.
-
 ### Complex roots and damped oscillations
 
 When eigenvalues come in complex conjugate pairs $\lambda = r e^{\pm i\theta}$ with $r < 1$, their contribution to the autocovariance function is a **damped cosine**:
@@ -271,9 +439,10 @@ When eigenvalues come in complex conjugate pairs $\lambda = r e^{\pm i\theta}$ w
 for appropriate amplitude $s$ and phase $\phi$ determined by the eigenvector loadings.
 
 In the deterministic model, such complex roots generate damped oscillatory time paths.
+
 In the stochastic model, they generate damped oscillatory autocovariance functions.
 
-It is in this sense that deterministic oscillations could be "maintained" in the stochastic model—but as we will see, the connection between eigenvalues and spectral peaks is more subtle than this suggests.
+It is in this sense that deterministic oscillations could be "maintained" in the stochastic model, but as we will see, the connection between eigenvalues and spectral peaks is more subtle than this suggests.
 
 ## From autocovariances to spectra
 
@@ -314,223 +483,19 @@ The advanced lecture {doc}`advanced:estspec` explains how to estimate $F(\omega)
 
 Here we focus on the model-implied spectrum.
 
-We will use the following helper functions throughout the lecture.
-
-```{code-cell} ipython3
-def spectral_density_var1(A, V, ω_grid):
-    """Spectral density matrix for VAR(1): y_t = A y_{t-1} + u_t."""
-    A, V = np.asarray(A), np.asarray(V)
-    n = A.shape[0]
-    I = np.eye(n)
-    F = np.empty((len(ω_grid), n, n), dtype=complex)
-    for k, ω in enumerate(ω_grid):
-        H = np.linalg.inv(I - np.exp(-1j * ω) * A)
-        F[k] = (H @ V @ H.conj().T) / (2 * np.pi)
-    return F
-
-def spectrum_of_linear_combination(F, b):
-    """Spectrum of x_t = b'y_t given the spectral matrix F(ω)."""
-    b = np.asarray(b).reshape(-1, 1)
-    return np.array([np.real((b.T @ F[k] @ b).item()) for k in range(F.shape[0])])
-
-def simulate_var1(A, V, T, burn=200, seed=1234):
-    r"""Simulate y_t = A y_{t-1} + u_t with u_t \sim N(0, V)."""
-    rng = np.random.default_rng(seed)
-    A, V = np.asarray(A), np.asarray(V)
-    n = A.shape[0]
-    chol = np.linalg.cholesky(V)
-    y = np.zeros((T + burn, n))
-    for t in range(1, T + burn):
-        y[t] = A @ y[t - 1] + chol @ rng.standard_normal(n)
-    return y[burn:]
-
-def sample_autocorrelation(x, max_lag):
-    """Sample autocorrelation of a 1d array from lag 0 to max_lag."""
-    x = np.asarray(x)
-    x = x - x.mean()
-    denom = np.dot(x, x)
-    acf = np.empty(max_lag + 1)
-    for k in range(max_lag + 1):
-        acf[k] = np.dot(x[:-k] if k else x, x[k:]) / denom
-    return acf
-```
-
-## Deterministic propagation and acceleration
-
-Now we have the tools and the motivation to analyze spectral peaks in linear stochastic systems.
-
-We first go back to the deterministic system to understand why acceleration matters for generating oscillations in the first place.
-
-Before analyzing spectral peaks, we need to understand why acceleration matters for generating oscillations in the first place.
-
-{cite:t}`Chow1968` asks a question in the deterministic setup: if we build a macro model using only standard demand equations with simple distributed lags, can the system generate sustained oscillations?
+We saw earlier that acceleration creates complex eigenvalues, which enable oscillatory impulse responses.
 
-He shows that, under natural sign restrictions, the answer is no.
-
-As we saw in the {ref}`empirical foundation <empirical_section>`, stock-adjustment demand for durable goods leads to investment equations where the coefficient on $Y_{t-1}$ is negative, i.e., the **acceleration effect**.
-
-This negative coefficient is what makes complex roots possible in the characteristic equation.
-
-Without it, Chow proves that demand systems with only positive coefficients have real positive roots, and hence no oscillatory dynamics.
-
-The {doc}`samuelson` lecture explores this mechanism in detail through the Hansen-Samuelson multiplier-accelerator model.
-
-Here we briefly illustrate the effect. Take the multiplier–accelerator law of motion
-
-```{math}
-Y_t = c Y_{t-1} + v (Y_{t-1} - Y_{t-2}),
-```
-
-and rewrite it as a first-order system in $(Y_t, Y_{t-1})$.
-
-```{code-cell} ipython3
-def samuelson_transition(c, v):
-    return np.array([[c + v, -v], [1.0, 0.0]])
-
-# Compare weak vs strong acceleration
-# Weak: c=0.8, v=0.1 gives real roots (discriminant > 0)
-# Strong: c=0.6, v=0.8 gives complex roots (discriminant < 0)
-cases = [("weak acceleration", 0.8, 0.1), ("strong acceleration", 0.6, 0.8)]
-A_list = [samuelson_transition(c, v) for _, c, v in cases]
-
-for (label, c, v), A in zip(cases, A_list):
-    eig = np.linalg.eigvals(A)
-    disc = (c + v)**2 - 4*v
-    print(f"{label}: c={c}, v={v}, discriminant={disc:.2f}, eigenvalues={eig}")
-
-# impulse responses from a one-time unit shock in Y
-T = 40
-s0 = np.array([1.0, 0.0])
-irfs = []
-for A in A_list:
-    s = s0.copy()
-    path = np.empty(T + 1)
-    for t in range(T + 1):
-        path[t] = s[0]
-        s = A @ s
-    irfs.append(path)
-
-# model-implied spectra for the stochastic version with shocks in the Y equation
-freq = np.linspace(1e-4, 0.5, 2500)  # cycles/period
-ω_grid = 2 * np.pi * freq
-V = np.array([[1.0, 0.0], [0.0, 0.0]])
-
-spectra = []
-for A in A_list:
-    F = spectral_density_var1(A, V, ω_grid)
-    f11 = np.real(F[:, 0, 0])
-    spectra.append(f11 / np.trapezoid(f11, freq))
-
-fig, axes = plt.subplots(1, 2, figsize=(12, 4))
-
-axes[0].plot(range(T + 1), irfs[0], lw=2, label="weak acceleration (real roots)")
-axes[0].plot(range(T + 1), irfs[1], lw=2, label="strong acceleration (complex roots)")
-axes[0].axhline(0.0, lw=0.8)
-axes[0].set_xlabel("time")
-axes[0].set_ylabel(r"$Y_t$")
-axes[0].legend(frameon=False)
-
-axes[1].plot(freq, spectra[0], lw=2, label="weak acceleration (real roots)")
-axes[1].plot(freq, spectra[1], lw=2, label="strong acceleration (complex roots)")
-axes[1].set_xlabel(r"frequency $\omega/2\pi$")
-axes[1].set_ylabel("normalized spectrum")
-axes[1].set_xlim([0.0, 0.5])
-axes[1].legend(frameon=False)
-
-plt.tight_layout()
-plt.show()
-```
-
-The left panel shows the contrast between weak and strong acceleration: with weak acceleration ($v=0.1$) the roots are real and the impulse response decays monotonically; with strong acceleration ($v=0.8$) the roots are complex and the impulse response oscillates.
-
-The right panel shows the corresponding spectral signatures.
-
-Complex roots produce a pronounced peak at interior frequencies—the spectral signature of business cycles.
-
-### How acceleration strength affects the spectrum
-
-As we increase the accelerator $v$, the eigenvalues move further from the origin.
-
-For this model, the eigenvalue modulus is $|\lambda| = \sqrt{v}$, so the stability boundary is $v = 1$.
-
-```{code-cell} ipython3
-v_grid = [0.2, 0.4, 0.6, 0.8, 0.95]  # stable cases only
-c = 0.6
-freq_fine = np.linspace(1e-4, 0.5, 2000)
-ω_fine = 2 * np.pi * freq_fine
-V_acc = np.array([[1.0, 0.0], [0.0, 0.0]])
-T_irf = 40  # periods for impulse response
-
-fig = plt.figure(figsize=(12, 8))
-ax_eig = fig.add_subplot(2, 2, 1)
-ax_spec = fig.add_subplot(2, 2, 2)
-ax_irf = fig.add_subplot(2, 1, 2)  # spans entire bottom row
-
-for v in v_grid:
-    A = samuelson_transition(c, v)
-    eig = np.linalg.eigvals(A)
-
-    # eigenvalues (top left)
-    ax_eig.scatter(eig.real, eig.imag, s=40, label=f'$v={v}$')
-
-    # spectrum (top right)
-    F = spectral_density_var1(A, V_acc, ω_fine)
-    f11 = np.real(F[:, 0, 0])
-    f11_norm = f11 / np.trapezoid(f11, freq_fine)
-    ax_spec.plot(freq_fine, f11_norm, lw=2, label=f'$v={v}$')
-
-    # impulse response (bottom row)
-    s = np.array([1.0, 0.0])
-    irf = np.empty(T_irf + 1)
-    for t in range(T_irf + 1):
-        irf[t] = s[0]
-        s = A @ s
-    ax_irf.plot(range(T_irf + 1), irf, lw=2, label=f'$v={v}$')
-
-# eigenvalue panel with unit circle
-θ_circle = np.linspace(0, 2*np.pi, 100)
-ax_eig.plot(np.cos(θ_circle), np.sin(θ_circle), 'k--', lw=0.8, label='unit circle')
-ax_eig.set_xlabel('real part')
-ax_eig.set_ylabel('imaginary part')
-ax_eig.set_aspect('equal')
-ax_eig.legend(frameon=False, fontsize=8)
-
-# spectrum panel
-ax_spec.set_xlabel(r'frequency $\omega/2\pi$')
-ax_spec.set_ylabel('normalized spectrum')
-ax_spec.set_xlim([0, 0.5])
-ax_spec.set_yscale('log')
-ax_spec.legend(frameon=False, fontsize=8)
-
-# impulse response panel
-ax_irf.axhline(0, lw=0.8, color='gray')
-ax_irf.set_xlabel('time')
-ax_irf.set_ylabel(r'$Y_t$')
-ax_irf.legend(frameon=False, fontsize=8)
-
-plt.tight_layout()
-plt.show()
-```
-
-As $v$ increases, eigenvalues approach the unit circle and the spectral peak becomes sharper.
-
-This illustrates Chow's main point: acceleration creates complex eigenvalues, which are necessary for oscillatory dynamics.
-
-Without acceleration, the eigenvalues would be real and the impulse response would decay monotonically without oscillation.
-
-With stronger acceleration (larger $v$), eigenvalues move closer to the unit circle, producing more persistent oscillations and a sharper spectral peak.
+But do complex roots guarantee a spectral peak?
 
-The above examples show that complex roots *can* produce spectral peaks.
+Are they necessary for one?
 
-But when exactly does this happen, and are complex roots *necessary*?
-
-Chow answers these questions for the Hansen-Samuelson model.
+Chow provides precise answers for the Hansen-Samuelson model.
 
 ## Spectral peaks in the Hansen-Samuelson model
 
-{cite:t}`Chow1968` provides a detailed spectral analysis of the Hansen-Samuelson multiplier-accelerator model.
+{cite:t}`Chow1968` provides a detailed spectral analysis of the Hansen-Samuelson multiplier-accelerator model, deriving exact conditions for when spectral peaks occur.
 
-This analysis reveals exactly when complex roots produce spectral peaks, and establishes that in this specific model, complex roots are *necessary* for a peak.
+The analysis reveals that in this specific model, complex roots are *necessary* for a peak, but as we will see later, this is not true in general.
 
 ### The model as a first-order system
 
@@ -624,17 +589,16 @@ The necessary condition for a valid solution is:
 ```
 
 We can interpret it as:
-- When $r \approx 1$, the factor $(1+r^2)/2r \approx 1$, so $\omega \approx \theta$ 
+- When $r \approx 1$, the factor $(1+r^2)/2r \approx 1$, so $\omega \approx \theta$
 - When $r$ is small (e.g., 0.3 or 0.4), condition {eq}`chow_hs_necessary` can only be satisfied if $\cos\theta \approx 0$, meaning $\theta \approx \pi/2$ (cycles of approximately 4 periods)
 
-If $\theta = 54 \degree$ (corresponding to cycles of 6.67 periods) and $r = 0.4$, then $(1+r^2)/2r = 1.45$, giving $\cos\omega = 1.45 \times 0.588 = 0.85$, or $\omega = 31.5 \degree$, corresponding to cycles of 11.4 periods, which is much longer than the deterministic cycle.
+If $\theta = 54^\circ$ (corresponding to cycles of 6.67 periods) and $r = 0.4$, then $(1+r^2)/2r = 1.45$, giving $\cos\omega = 1.45 \times 0.588 = 0.85$, or $\omega = 31.5^\circ$, corresponding to cycles of 11.4 periods, which is much longer than the deterministic cycle.
 
 ```{code-cell} ipython3
 def peak_condition_factor(r):
     """Compute (1 + r^2) / (2r)"""
     return (1 + r**2) / (2 * r)
 
-# Verify Chow's analysis: peak frequency as function of r for fixed θ
 θ_deg = 54
 θ = np.deg2rad(θ_deg)
 r_grid = np.linspace(0.3, 0.99, 100)
@@ -643,9 +607,9 @@ r_grid = np.linspace(0.3, 0.99, 100)
 ω_peak = []
 for r in r_grid:
     factor = peak_condition_factor(r)
-    cos_omega = factor * np.cos(θ)
-    if -1 < cos_omega < 1:
-        ω_peak.append(np.arccos(cos_omega))
+    cos_ω = factor * np.cos(θ)
+    if -1 < cos_ω < 1:
+        ω_peak.append(np.arccos(cos_ω))
     else:
         ω_peak.append(np.nan)
 
@@ -657,7 +621,7 @@ fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 axes[0].plot(r_grid, np.rad2deg(ω_peak), lw=2)
 axes[0].axhline(θ_deg, ls='--', lw=1.0, color='gray', label=rf'$\theta = {θ_deg}°$')
 axes[0].set_xlabel('eigenvalue modulus $r$')
-axes[0].set_ylabel('peak frequency $\omega$ (degrees)')
+axes[0].set_ylabel(r'peak frequency $\omega$ (degrees)')
 axes[0].legend(frameon=False)
 
 axes[1].plot(r_grid, period_peak, lw=2)
@@ -669,19 +633,19 @@ axes[1].legend(frameon=False)
 plt.tight_layout()
 plt.show()
 
-# Verify Chow's specific example
 r_example = 0.4
 factor = peak_condition_factor(r_example)
-cos_omega = factor * np.cos(θ)
-omega_example = np.arccos(cos_omega)
+cos_ω = factor * np.cos(θ)
+ω_example = np.arccos(cos_ω)
 print(f"Chow's example: r = {r_example}, θ = {θ_deg}°")
 print(f"  Factor (1+r²)/2r = {factor:.3f}")
-print(f"  cos(ω) = {cos_omega:.3f}")
-print(f"  ω = {np.rad2deg(omega_example):.1f}°")
-print(f"  Peak period = {360/np.rad2deg(omega_example):.1f} (vs deterministic period = {360/θ_deg:.1f})")
+print(f"  cos(ω) = {cos_ω:.3f}")
+print(f"  ω = {np.rad2deg(ω_example):.1f}°")
+print(f"  Peak period = {360/np.rad2deg(ω_example):.1f} (vs deterministic period = {360/θ_deg:.1f})")
 ```
 
 As $r \to 1$, the peak frequency converges to $\theta$.
+
 For smaller $r$, the peak frequency can differ substantially from the deterministic oscillation frequency.
 
 ### Real positive roots cannot produce peaks
@@ -715,14 +679,11 @@ This is a key result: In the Hansen-Samuelson model, *complex roots are necessar
 V_hs = np.array([[1.0, 0.0], [0.0, 0.0]])  # shock only in first equation
 
 # Case 1: Complex roots (c=0.6, v=0.8)
-# Discriminant = (c+v)² - 4v = 1.96 - 3.2 < 0 → complex roots
 c_complex, v_complex = 0.6, 0.8
 A_complex = samuelson_transition(c_complex, v_complex)
 eig_complex = np.linalg.eigvals(A_complex)
 
 # Case 2: Real roots (c=0.8, v=0.1)
-# Discriminant = (c+v)² - 4v = 0.81 - 0.4 > 0 → real roots
-# Both roots positive and < 1 (stable)
 c_real, v_real = 0.8, 0.1
 A_real = samuelson_transition(c_real, v_real)
 eig_real = np.linalg.eigvals(A_real)
@@ -757,7 +718,7 @@ While real positive roots cannot produce spectral peaks in the Hansen-Samuelson
 
 In multivariate systems, the spectral density of a linear combination of variables can have interior peaks even when all eigenvalues are real and positive.
 
-### Chow's example
+### Example
 
 Chow constructs the following explicit example with two real positive eigenvalues:
 
@@ -828,15 +789,18 @@ f_general = spectrum_of_linear_combination(F_chow_ex, b_chow_ex)
 
 # Normalize to match Chow's table scale
 scale = f_formula[0] / spectrum_of_linear_combination(
-    spectral_density_var1(A_chow_ex, V_chow_ex, np.array([0.0])), b_chow_ex)[0]
+    spectral_density_var1(
+        A_chow_ex, V_chow_ex, np.array([0.0])), b_chow_ex)[0]
 
 print("Chow's Table (equation 67):")
 print("ω/π:        ", "  ".join([f"{ω/np.pi:.3f}" for ω in ω_table]))
 print("f_mm(ω):    ", "  ".join([f"{f:.3f}" for f in f_formula]))
 
 fig, ax = plt.subplots(figsize=(9, 4))
-ax.plot(ω_grid_fine / np.pi, f_general * scale, lw=2, label='spectrum')
-ax.scatter(ω_table / np.pi, f_formula, s=50, zorder=3, label="Chow's table values")
+ax.plot(ω_grid_fine / np.pi, f_general * scale, lw=2,
+            label='spectrum')
+ax.scatter(ω_table / np.pi, f_formula, s=50, zorder=3,
+            label="Chow's table values")
 
 # Mark the peak
 i_peak = np.argmax(f_general)
@@ -850,6 +814,8 @@ plt.show()
 print(f"\nPeak at ω/π ≈ {ω_peak/np.pi:.3f}, period ≈ {2*np.pi/ω_peak:.1f}")
 ```
 
+The peak appears at $\omega/\pi \approx 0.10$, which corresponds to a cycle length of approximately 20 periods, again much longer than the deterministic cycles implied by the eigenvalues.
+
 ### The Slutsky connection
 
 Chow connects this result to Slutsky's well-known finding that taking moving averages of a random series can generate cycles.
@@ -865,13 +831,14 @@ y_t = u_t + A u_{t-1} + A^2 u_{t-2} + \cdots
 This amounts to taking an infinite moving average of the random vectors $u_t$ with "geometrically declining" weights $A^0, A^1, A^2, \ldots$
 
 For a scalar process with $0 < \lambda < 1$, no distinct cycles can emerge.
-But for a matrix $A$ with real roots between 0 and 1, cycles **can** emerge in linear combinations of the variables.
+
+But for a matrix $A$ with real roots between 0 and 1, cycles *can* emerge in linear combinations of the variables.
 
 As Chow puts it: "When neither of two (canonical) variables has distinct cycles... a linear combination can have a peak in its spectral density."
 
 ### The general lesson
 
-The examples above illustrate Chow's central point:
+The examples above illustrate the following central points:
 
 1. In the *Hansen-Samuelson model specifically*, complex roots are necessary for a spectral peak
 2. But in *general multivariate systems*, complex roots are neither necessary nor sufficient
@@ -884,13 +851,13 @@ The examples above illustrate Chow's central point:
 
 {cite:t}`ChowLevitan1969` use the frequency-domain objects from {cite:t}`Chow1968` to study a calibrated annual macroeconometric model.
 
-They work with five annual aggregates
+They work with five annual aggregates:
 
 - $y_1 = C$ (consumption),
 - $y_2 = I_1$ (equipment plus inventories),
 - $y_3 = I_2$ (construction),
 - $y_4 = R_a$ (long rate),
-- $y_5 = Y_1 = C + I_1 + I_2$ (private-domestic gnp),
+- $y_5 = Y_1 = C + I_1 + I_2$ (private-domestic GNP),
 
 and add $y_6 = y_{1,t-1}$ to rewrite the original system in first-order form.
 
@@ -934,7 +901,7 @@ Here we take $A$ and $V$ as given and ask what they imply for spectra and cross-
 
 ### Reported shock covariance
 
-Chow and Levitan report the $6 \times 6$ reduced-form shock covariance matrix $V$ (scaled by $10^{-7}$):
+The $6 \times 6$ reduced-form shock covariance matrix $V$ (scaled by $10^{-7}$) is:
 
 ```{math}
 :label: chow_V_matrix
@@ -1029,8 +996,6 @@ print(np.linalg.eigvals(A_chow).round(6))
 
 Chow's canonical transformation uses $z_t = B^{-1} y_t$, giving dynamics $z_t = D_\lambda z_{t-1} + e_t$.
 
-An algebraic detail: the closed form for $F(\omega)$ uses $A^\top$ (real transpose) rather than a conjugate transpose.
-
 Accordingly, the canonical shock covariance is
 
 ```{math}
@@ -1044,9 +1009,7 @@ print("diagonal of W:")
 print(np.diag(W).round(10))
 ```
 
-### Spectral density via eigendecomposition
-
-Chow's closed-form formula for the spectral density matrix is
+Chow derives the following closed-form formula for the spectral density matrix:
 
 ```{math}
 :label: chow_spectral_eigen
@@ -1076,9 +1039,7 @@ freq = np.linspace(1e-4, 0.5, 5000)     # cycles/year in [0, 1/2]
 F_chow = spectral_density_chow(λ, B, W, ω_grid)
 ```
 
-### Where is variance concentrated?
-
-Normalizing each spectrum to have unit area over $[0, 1/2]$ lets us compare shapes rather than scales.
+Let's plot the univariate spectra of consumption ($y_1$) and equipment plus inventories ($y_2$):
 
 ```{code-cell} ipython3
 variable_names = ['$C$', '$I_1$', '$I_2$', '$R_a$', '$Y_1$']
@@ -1117,14 +1078,27 @@ plt.show()
 
 i_peak = np.argmax(S_norm[mask, 1])
 f_peak = freq[mask][i_peak]
-print(f"Peak within [1/18, 1/2]: frequency ≈ {f_peak:.3f} cycles/year, period ≈ {1/f_peak:.2f} years.")
 ```
 
-Both spectra are dominated by very low frequencies, reflecting the near-unit eigenvalues.
+We reproduce only Figures I.1 and I.2 here.
+
+Figure I.1 corresponds to consumption and declines monotonically with frequency.
+
+Figure I.1 illustrates Granger's "typical spectral shape" for macroeconomic time series.
+
+Figure I.2 corresponds to equipment plus inventories and shows the clearest (but still very flat) interior-frequency bump.
+
+Chow and Levitan associate the dominance of very low frequencies in both plots with strong persistence and long-run movements.
+
+They note that very large low-frequency power can arise from eigenvalues extremely close to one, which can occur mechanically when some equations are written in first differences.
 
-This is the "typical spectral shape" of macroeconomic time series.
+They stress that local peaks are not automatic, because complex roots may have small modulus and multivariate interactions can generate peaks even when all roots are real.
 
-(These patterns match Figures I.1–I.2 of {cite}`ChowLevitan1969`.)
+They note that the interior bump in Figure I.2 corresponds to cycles of roughly three years and that the spectrum is nearly flat over cycles between about two and four years.
+
+Their other spectra in Figures I.3–I.5 (construction, the long rate, and private-domestic GNP) decline monotonically with frequency in the same calibration.
+
+(This discussion follows Section II of {cite}`ChowLevitan1969`.)
 
 ### How variables move together across frequencies
 
@@ -1142,6 +1116,10 @@ The **squared coherence** measures linear association at frequency $\omega$:
 R^2_{ij}(\omega) = \frac{|f_{ij}(\omega)|^2}{f_{ii}(\omega) f_{jj}(\omega)} \in [0, 1].
 ```
 
+Think of coherence as the frequency-domain analogue of $R^2$: it measures how much of the variance of $y_i$ at frequency $\omega$ can be "explained" by $y_j$ at the same frequency. 
+
+High coherence means the two series move together tightly at that frequency.
+
 The **gain** is the frequency-response coefficient when regressing $y_i$ on $y_j$:
 
 ```{math}
@@ -1150,6 +1128,10 @@ The **gain** is the frequency-response coefficient when regressing $y_i$ on $y_j
 G_{ij}(\omega) = \frac{|f_{ij}(\omega)|}{f_{jj}(\omega)}.
 ```
 
+Think of gain as the frequency-domain analogue of a regression coefficient: it measures how much $y_i$ responds to a unit change in $y_j$ at frequency $\omega$. 
+
+A gain of 0.9 at low frequencies means long-cycle movements in $y_j$ translate almost one-for-one to $y_i$; a gain of 0.3 at high frequencies means short-cycle movements are dampened.
+
 The **phase** captures lead-lag relationships (in radians):
 
 ```{math}
@@ -1170,14 +1152,14 @@ def cross_spectral_measures(F, i, j):
     return coherence, gain, phase
 ```
 
-We now plot gain and coherence as in Figures II.1-II.3 of {cite}`ChowLevitan1969`.
+We now plot gain and coherence as in Figures II.1–II.4 of {cite}`ChowLevitan1969`.
 
 ```{code-cell} ipython3
 gnp_idx = 4
 
-fig, axes = plt.subplots(1, 3, figsize=(14, 6))
+fig, axes = plt.subplots(1, 2, figsize=(8, 6))
 
-for idx, var_idx in enumerate([0, 1, 2]):
+for idx, var_idx in enumerate([0, 1]):
     coherence, gain, phase = cross_spectral_measures(F_chow, var_idx, gnp_idx)
     ax = axes[idx]
 
@@ -1185,7 +1167,6 @@ for idx, var_idx in enumerate([0, 1, 2]):
             lw=2, label=rf'$R^2_{{{var_idx+1}5}}(\omega)$')
     ax.plot(freq[mask], gain[mask],
             lw=2, label=rf'$G_{{{var_idx+1}5}}(\omega)$')
-
     paper_frequency_axis(ax)
     ax.set_ylim([0, 1.0])
     ax.set_ylabel('gain, coherence')
@@ -1195,11 +1176,43 @@ plt.tight_layout()
 plt.show()
 ```
 
-Coherence is high at low frequencies for all three components, meaning long-run movements track output closely.
+The gain and coherence patterns differ across components (Figures II.1–II.2 of {cite}`ChowLevitan1969`):
+
+- Consumption vs private-domestic GNP  (left panel):
+    - Gain is about 0.9 at very low frequencies but falls below 0.4 for cycles shorter than four years. 
+    - This is evidence that short-cycle income movements translate less into consumption than long-cycle movements, consistent with permanent-income interpretations. 
+    - Coherence remains high throughout.
+- For Equipment plus inventories vs private-domestic GNP  (right panel):
+    - Gain *rises* with frequency, exceeding 0.5 for short cycles. 
+    - This is the frequency-domain signature of acceleration and volatile short-run inventory movements.
 
-Gains differ: consumption smooths (gain below 1), while investment responds more strongly at higher frequencies.
+```{code-cell} ipython3
+fig, axes = plt.subplots(1, 2, figsize=(8, 6))
+
+for idx, var_idx in enumerate([2, 3]):
+    coherence, gain, phase = cross_spectral_measures(F_chow, var_idx, gnp_idx)
+    ax = axes[idx]
+
+    ax.plot(freq[mask], coherence[mask],
+            lw=2, label=rf'$R^2_{{{var_idx+3}5}}(\omega)$')
+    ax.plot(freq[mask], gain[mask],
+            lw=2, label=rf'$G_{{{var_idx+3}5}}(\omega)$')
+    paper_frequency_axis(ax)
+    ax.set_ylim([0, 1.0])
+    ax.set_ylabel('gain, coherence')
+    ax.legend(frameon=False, loc='best')
+
+plt.tight_layout()
+plt.show()
+```
+
+- New construction  vs private-domestic GNP  (left panel):
+    - Gain peaks at medium cycle lengths (around 0.1 for short cycles). 
+    - Coherence for both investment series stays fairly high across frequencies.
+- Long-bond yield  vs private-domestic GNP  (right panel):
+    - Gain varies less across frequencies than real activity series. 
+    - Coherence with output is comparatively low at business-cycle frequencies, making it hard to explain interest-rate movements by inverting a money-demand equation.
 
-(These patterns match Figures II.1-II.3 of {cite}`ChowLevitan1969`.)
 
 ### Lead-lag relationships
 
@@ -1208,7 +1221,7 @@ The phase tells us which variable leads at each frequency.
 Positive phase means output leads the component; negative phase means the component leads output.
 
 ```{code-cell} ipython3
-fig, ax = plt.subplots(figsize=(8, 6))
+fig, ax = plt.subplots()
 
 labels = [r'$\psi_{15}(\omega)/2\pi$', r'$\psi_{25}(\omega)/2\pi$',
           r'$\psi_{35}(\omega)/2\pi$', r'$\psi_{45}(\omega)/2\pi$']
@@ -1223,16 +1236,19 @@ paper_frequency_axis(ax)
 ax.set_ylabel('phase difference in cycles')
 ax.set_ylim([-0.25, 0.25])
 ax.set_yticks([-0.25, -0.20, -0.15, -0.10, -0.05, 0, 0.05, 0.10, 0.15, 0.20, 0.25])
-ax.legend(frameon=False, fontsize=9)
+ax.legend(frameon=False)
 plt.tight_layout()
 plt.show()
 ```
 
-At business-cycle frequencies, consumption tends to lag output while equipment and inventories tend to lead.
+The phase relationships reveal that:
 
-The interest rate is roughly coincident.
+- Output leads consumption by a small fraction of a cycle (about 0.06 cycles at a 6-year period, 0.04 cycles at a 3-year period).
+- Equipment plus inventories tends to lead output (by about 0.07 cycles at a 6-year period, 0.03 cycles at a 3-year period).
+- New construction leads at low frequencies and is close to coincident at higher frequencies.
+- The bond yield lags output slightly, remaining close to coincident in timing.
 
-(This matches Figure III of {cite}`ChowLevitan1969`.)
+These implied leads and lags are broadly consistent with turning-point timing summaries reported elsewhere, and simulations of the same model deliver similar lead–lag ordering at turning points (Figure III of {cite}`ChowLevitan1969`).
 
 ### Building blocks of spectral shape
 
@@ -1254,13 +1270,13 @@ Each observable spectral density is a linear combination of these kernels (plus
 
 ```{code-cell} ipython3
 def scalar_kernel(λ_i, ω_grid):
-    """Chow's scalar spectral kernel g_i(ω)."""
+    """scalar spectral kernel g_i(ω)."""
     λ_i = complex(λ_i)
     mod_sq = np.abs(λ_i)**2
     return np.array([(1 - mod_sq) / np.abs(1 - λ_i * np.exp(-1j * ω))**2 for ω in ω_grid])
 
 fig, ax = plt.subplots(figsize=(10, 5))
-for i, λ_i in enumerate(λ[:4]):
+for i, λ_i in enumerate(λ):
     if np.abs(λ_i) > 0.01:
         g_i = scalar_kernel(λ_i, ω_grid)
         label = f'$\\lambda_{i+1}$ = {λ_i:.4f}' if np.isreal(λ_i) else f'$\\lambda_{i+1}$ = {λ_i:.3f}'
@@ -1274,65 +1290,120 @@ ax.legend(frameon=False)
 plt.show()
 ```
 
-Near-unit eigenvalues produce kernels sharply peaked at low frequencies.
+The figure reveals how eigenvalue magnitude shapes spectral contributions:
 
-Smaller eigenvalues produce flatter kernels.
+- *Near-unit eigenvalues* ($\lambda_1, \lambda_2 \approx 1$) produce kernels sharply peaked at low frequencies—these drive the strong low-frequency power seen in the spectra above.
+- *The moderate eigenvalue* ($\lambda_3 \approx 0.48$) contributes a flatter component that spreads power more evenly across frequencies.
+- *The complex pair* ($\lambda_{4,5}$) has such small modulus ($|\lambda_{4,5}| \approx 0.136$) that its kernel is nearly flat, which is too weak to generate a pronounced interior peak.
 
-The complex pair ($\lambda_{4,5}$) has such small modulus that its kernel is nearly flat.
+This decomposition explains why the spectra look the way they do: the near-unit eigenvalues dominate, concentrating variance at very low frequencies.
 
-### Why the spectra look the way they do
+The complex pair, despite enabling oscillatory dynamics in principle, has insufficient modulus to produce a visible spectral peak.
 
-The two near-unit eigenvalues generate strong low-frequency power.
+## Summary
 
-The moderate eigenvalue ($\lambda_3 \approx 0.48$) contributes a flatter component.
+{cite:t}`Chow1968` draws several conclusions that remain relevant for understanding business cycles.
 
-The complex pair has small modulus ($|\lambda_{4,5}| \approx 0.136$), so it cannot generate a pronounced interior peak.
+The acceleration principle receives strong empirical support: the negative coefficient on lagged output in investment equations is a robust finding across datasets.
 
-The near-zero eigenvalue reflects the accounting identity $Y_1 = C + I_1 + I_2$.
+- This matters because, in a model consisting only of demand equations with simple distributed lags, the transition matrix has real positive roots under natural sign restrictions—ruling out prolonged oscillations.
 
-This illustrates Chow's message: eigenvalues guide intuition, but observed spectra also depend on how shocks excite the modes and how observables combine them.
+- Acceleration introduces the possibility of complex roots, which are necessary for oscillatory dynamics in deterministic systems.
 
-### Summary
+The relationship between eigenvalues and spectral peaks is more subtle than it first appears:
 
-The calibrated model reveals three patterns: (1) most variance sits at very low frequencies due to near-unit eigenvalues; (2) consumption smooths while investment amplifies high-frequency movements; (3) consumption lags output at business-cycle frequencies while investment leads.
+- Complex roots guarantee oscillatory autocovariances, but they are neither necessary nor sufficient for a pronounced spectral peak.
 
-## Wrap-up
+- In the Hansen–Samuelson model specifically, complex roots *are* necessary for a peak.
 
-{cite:t}`Chow1968` draws several conclusions that remain relevant for understanding business cycles:
+- But in general multivariate systems, even real roots can produce peaks through the interaction of shocks and eigenvector loadings.
 
-1. **Empirical support for acceleration**: The acceleration principle, as formulated through stock-adjustment equations, receives strong empirical support from investment data. The negative coefficient on lagged output levels is a robust empirical finding.
+Chow argues that understanding business cycles requires an integrated view of deterministic dynamics and random shocks.
 
-2. **Acceleration is necessary for deterministic oscillations**: In a model consisting only of demand equations with simple distributed lags, the transition matrix has real positive roots (under natural sign restrictions), and hence no prolonged oscillations can occur. Acceleration introduces the possibility of complex roots.
+{cite:t}`ChowLevitan1969` demonstrate what these objects look like in a calibrated system: strong low-frequency power from near-unit eigenvalues, frequency-dependent gains and coherences, and lead–lag relations that vary with cycle length.
 
-3. **Complex roots are neither necessary nor sufficient for stochastic cycles**: While complex roots in the deterministic model guarantee oscillatory autocovariances, they are neither necessary nor sufficient for a pronounced spectral peak. In the Hansen-Samuelson model specifically, complex roots *are* necessary for a spectral peak. But in general multivariate systems, real roots can produce peaks through the interaction of shocks and eigenvector loadings.
+Their results are consistent with Granger's "typical spectral shape" for economic time series.
 
-4. **An integrated view is essential**: As Chow concludes, "an obvious moral is that the nature of business cycles can be understood only by an integrated view of the deterministic as well as the random elements."
+That is a monotonically decreasing function of frequency, driven by the near-unit eigenvalues that arise when some equations are specified in first differences.
 
-{cite:t}`ChowLevitan1969` then show what these objects look like in a calibrated system: strong low-frequency power (reflecting near-unit eigenvalues), frequency-dependent gains/coherences, and lead–lag relations that vary with the cycle length.
+Understanding whether this shape reflects the true data-generating process requires analyzing the spectral densities implied by structural econometric models.
 
-On the empirical side, Granger has noted a "typical spectral shape" for economic time series—a monotonically decreasing function of frequency.
+## Exercises
 
-The Chow-Levitan calibration is consistent with this shape, driven by the near-unit eigenvalues.
+```{exercise}
+:label: chow_cycles_ex1
 
-But as Chow emphasizes, understanding whether this shape reflects the true data-generating process requires analyzing the spectral densities implied by structural econometric models.
+Plot impulse responses and spectra side-by-side for several values of the accelerator $v$ in the Hansen-Samuelson model, showing how acceleration strength affects both the time-domain and frequency-domain signatures.
 
-To connect this to data, pair the model-implied objects here with the advanced lecture {doc}`advanced:estspec`.
+Use the same $v$ values as in the main text: $v \in \{0.2, 0.4, 0.6, 0.8, 0.95\}$ with $c = 0.6$.
+```
 
-## Exercises
+```{solution-start} chow_cycles_ex1
+:class: dropdown
+```
+
+```{code-cell} ipython3
+v_grid_ex1 = [0.2, 0.4, 0.6, 0.8, 0.95]
+c_ex1 = 0.6
+freq_ex1 = np.linspace(1e-4, 0.5, 2000)
+ω_grid_ex1 = 2 * np.pi * freq_ex1
+V_ex1 = np.array([[1.0, 0.0], [0.0, 0.0]])
+T_irf_ex1 = 40
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+
+for v in v_grid_ex1:
+    A = samuelson_transition(c_ex1, v)
+
+    # impulse response (left panel)
+    s = np.array([1.0, 0.0])
+    irf = np.empty(T_irf_ex1 + 1)
+    for t in range(T_irf_ex1 + 1):
+        irf[t] = s[0]
+        s = A @ s
+    axes[0].plot(range(T_irf_ex1 + 1), irf, lw=2, label=f'$v={v}$')
+
+    # spectrum (right panel)
+    F = spectral_density_var1(A, V_ex1, ω_grid_ex1)
+    f11 = np.real(F[:, 0, 0])
+    f11_norm = f11 / np.trapezoid(f11, freq_ex1)
+    axes[1].plot(freq_ex1, f11_norm, lw=2, label=f'$v={v}$')
+
+axes[0].axhline(0, lw=0.8, color='gray')
+axes[0].set_xlabel('time')
+axes[0].set_ylabel(r'$Y_t$')
+axes[0].legend(frameon=False)
+
+axes[1].set_xlabel(r'frequency $\omega/2\pi$')
+axes[1].set_ylabel('normalized spectrum')
+axes[1].set_xlim([0, 0.5])
+axes[1].set_yscale('log')
+axes[1].legend(frameon=False)
+
+plt.tight_layout()
+plt.show()
+```
+
+As $v$ increases, eigenvalues approach the unit circle: oscillations become more persistent in the time domain (left), and the spectral peak becomes sharper in the frequency domain (right).
+
+Complex roots produce a pronounced peak at interior frequencies—the spectral signature of business cycles.
+
+```{solution-end}
+```
 
 ```{exercise}
-:label: chow_cycles_ex1
+:label: chow_cycles_ex2
 
-Verify Chow's spectral peak condition {eq}`chow_hs_peak_condition` numerically for the Hansen-Samuelson model.
+Verify spectral peak condition {eq}`chow_hs_peak_condition` numerically for the Hansen-Samuelson model.
 
 1. For a range of eigenvalue moduli $r \in [0.3, 0.99]$ with fixed $\theta = 60°$, compute:
-   - The theoretical peak frequency from Chow's formula: $\cos\omega = \frac{1+r^2}{2r}\cos\theta$
+   - The theoretical peak frequency from formula: $\cos\omega = \frac{1+r^2}{2r}\cos\theta$
    - The actual peak frequency by numerically maximizing the spectral density
 2. Plot both on the same graph and verify they match.
 3. Identify the range of $r$ for which no valid peak exists (when the condition {eq}`chow_hs_necessary` is violated).
 ```
 
-```{solution-start} chow_cycles_ex1
+```{solution-start} chow_cycles_ex2
 :class: dropdown
 ```
 
@@ -1346,17 +1417,17 @@ V_hs_ex = np.array([[1.0, 0.0], [0.0, 0.0]])
 ω_numerical = []
 
 for r in r_grid:
-    # Theoretical peak from Chow's formula
+    # Theoretical peak
     factor = (1 + r**2) / (2 * r)
-    cos_omega = factor * np.cos(θ_ex)
-    if -1 < cos_omega < 1:
-        ω_theory.append(np.arccos(cos_omega))
+    cos_ω = factor * np.cos(θ_ex)
+    if -1 < cos_ω < 1:
+        ω_theory.append(np.arccos(cos_ω))
     else:
         ω_theory.append(np.nan)
 
     # Numerical peak from spectral density
-    # Construct Hansen-Samuelson with eigenvalues r*exp(±iθ)
-    # This corresponds to c + v = 2r*cos(θ), v = r²
+    # Construct Hansen-Samuelson with eigenvalues r*exp(+-iθ)
+    # This corresponds to c + v = 2r*cos(θ), v = r^2
     v = r**2
     c = 2 * r * np.cos(θ_ex) - v
     A_ex = samuelson_transition(c, v)
@@ -1401,33 +1472,35 @@ if valid_mask.any():
 ```
 
 The theoretical and numerical peak frequencies match closely.
+
 As $r \to 1$, the peak frequency converges to $\theta$.
+
 For smaller $r$, the factor $(1+r^2)/2r$ exceeds the threshold, and no valid peak exists.
 
 ```{solution-end}
 ```
 
 ```{exercise}
-:label: chow_cycles_ex2
+:label: chow_cycles_ex3
 
 In the "real roots but a peak" example, hold $A$ fixed and vary the shock correlation (the off-diagonal entry of $V$) between $0$ and $0.99$.
 
 When does the interior-frequency peak appear, and how does its location change?
 ```
 
-```{solution-start} chow_cycles_ex2
+```{solution-start} chow_cycles_ex3
 :class: dropdown
 ```
 
 ```{code-cell} ipython3
-A_ex2 = np.diag([0.1, 0.9])
-b_ex2 = np.array([1.0, -0.01])
+A_ex3 = np.diag([0.1, 0.9])
+b_ex3 = np.array([1.0, -0.01])
 corr_grid = np.linspace(0, 0.99, 50)
 peak_periods = []
 for corr in corr_grid:
-    V_ex2 = np.array([[1.0, corr], [corr, 1.0]])
-    F_ex2 = spectral_density_var1(A_ex2, V_ex2, ω_grid_ex)
-    f_x = spectrum_of_linear_combination(F_ex2, b_ex2)
+    V_ex3 = np.array([[1.0, corr], [corr, 1.0]])
+    F_ex3 = spectral_density_var1(A_ex3, V_ex3, ω_grid_ex)
+    f_x = spectrum_of_linear_combination(F_ex3, b_ex3)
     i_max = np.argmax(f_x)
     if 5 < i_max < len(ω_grid_ex) - 5:
         peak_periods.append(2 * np.pi / ω_grid_ex[i_max])
@@ -1447,15 +1520,15 @@ if len(threshold_idx) > 0:
 
 The interior peak appears only when the shock correlation exceeds a threshold.
 
-This illustrates Chow's point that spectral peaks depend on the full system structure, not just eigenvalues.
+This illustrates that spectral peaks depend on the full system structure, not just eigenvalues.
 
 ```{solution-end}
 ```
 
 ```{exercise}
-:label: chow_cycles_ex3
+:label: chow_cycles_ex4
 
-Using the calibrated Chow-Levitan (1969) parameters, compute the autocovariance matrices $\Gamma_0, \Gamma_1, \ldots, \Gamma_{10}$ using:
+Using the calibrated Chow-Levitan parameters, compute the autocovariance matrices $\Gamma_0, \Gamma_1, \ldots, \Gamma_{10}$ using:
 
 1. The recursion $\Gamma_k = A \Gamma_{k-1}$ with $\Gamma_0$ from the Lyapunov equation.
 2. Chow's eigendecomposition formula $\Gamma_k = B D_\lambda^k \Gamma_0^* B^\top$ where $\Gamma_0^*$ is the canonical covariance.
@@ -1463,7 +1536,7 @@ Using the calibrated Chow-Levitan (1969) parameters, compute the autocovariance
 Verify that both methods give the same result.
 ```
 
-```{solution-start} chow_cycles_ex3
+```{solution-start} chow_cycles_ex4
 :class: dropdown
 ```
 
@@ -1500,7 +1573,7 @@ Both methods produce essentially identical results, up to numerical precision.
 ```
 
 ```{exercise}
-:label: chow_cycles_ex4
+:label: chow_cycles_ex5
 
 Modify the Chow-Levitan model by changing $\lambda_3$ from $0.4838$ to $0.95$.
 
@@ -1509,39 +1582,38 @@ Modify the Chow-Levitan model by changing $\lambda_3$ from $0.4838$ to $0.95$.
 3. What economic interpretation might correspond to this parameter change?
 ```
 
-```{solution-start} chow_cycles_ex4
+```{solution-start} chow_cycles_ex5
 :class: dropdown
 ```
 
 ```{code-cell} ipython3
+# Modify λ_3 and reconstruct the transition matrix
 λ_modified = λ.copy()
 λ_modified[2] = 0.95
-F_mod = spectral_density_chow(λ_modified, B, W, ω_grid)
-
-fig, axes = plt.subplots(2, 3, figsize=(14, 8))
-axes = axes.flatten()
-var_labels = ["consumption", "equipment + inventories", "construction", "long rate", "output"]
-for i in range(5):
-    f_orig = np.real(F_chow[:, i, i])
-    f_mod = np.real(F_mod[:, i, i])
-    f_orig_norm = f_orig / np.trapezoid(f_orig, freq)
-    f_mod_norm = f_mod / np.trapezoid(f_mod, freq)
-    axes[i].semilogy(freq, f_orig_norm, lw=2, label=r"original ($\lambda_3=0.48$)")
-    axes[i].semilogy(freq, f_mod_norm, lw=2, ls="--", label=r"modified ($\lambda_3=0.95$)")
-    paper_frequency_axis(axes[i])
-    axes[i].set_ylabel(rf"normalized $f_{{{i+1}{i+1}}}(\omega)$")
-    axes[i].text(0.03, 0.08, var_labels[i], transform=axes[i].transAxes)
-    axes[i].legend(frameon=False, fontsize=8)
-axes[5].axis('off')
-plt.tight_layout()
+D_λ_mod = np.diag(λ_modified)
+A_mod = np.real(B @ D_λ_mod @ np.linalg.inv(B))
+
+# Compute spectra using the VAR(1) formula with original V
+F_mod = spectral_density_var1(A_mod, V, ω_grid)
+F_orig = spectral_density_var1(A_chow, V, ω_grid)
+
+# Plot ratio of spectra for output (Y_1)
+f_orig = np.real(F_orig[:, 4, 4])
+f_mod = np.real(F_mod[:, 4, 4])
+
+fig, ax = plt.subplots()
+ax.plot(freq, f_mod / f_orig, lw=2)
+ax.axhline(1.0, ls='--', lw=1, color='gray')
+paper_frequency_axis(ax)
+ax.set_ylabel(r"ratio: modified / original spectrum for $Y_1$")
 plt.show()
 ```
 
-Increasing $\lambda_3$ from 0.48 to 0.95 adds more persistence to the system.
+The near-unit eigenvalues ($\lambda_1, \lambda_2 \approx 0.9999$) dominate the output spectrum so heavily that changing $\lambda_3$ from 0.48 to 0.95 produces only a small relative effect.
 
-The spectral densities show increased power at low frequencies.
+The ratio plot reveals the change: the modified spectrum has slightly more power at low-to-medium frequencies and slightly less at high frequencies.
 
-Economically, this could correspond to stronger persistence in the propagation of shocks—perhaps due to slower adjustment speeds in investment or consumption behavior.
+Economically, increasing $\lambda_3$ adds persistence to the mode it governs.
 
 ```{solution-end}
 ```

From 8c917acdf6d6377d450d92a6ba801b0909c147ae Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Sat, 7 Feb 2026 16:06:31 +1100
Subject: [PATCH 04/37] updates

---
 lectures/chow_business_cycles.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index 396a4de55..196059ec0 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -31,8 +31,8 @@ kernelspec:
 
 This lecture studies two classic papers by Gregory Chow on business cycles in linear dynamic models:
 
-- {cite}`Chow1968`: empirical evidence for the acceleration principle, why acceleration enables oscillations, and when spectral peaks arise in stochastic systems
-- {cite}`ChowLevitan1969`: spectral analysis of a calibrated US macroeconometric model, showing gains, coherences, and lead–lag patterns
+- {cite:t}`Chow1968`: empirical evidence for the acceleration principle, why acceleration enables oscillations, and when spectral peaks arise in stochastic systems
+- {cite:t}`ChowLevitan1969`: spectral analysis of a calibrated US macroeconometric model, showing gains, coherences, and lead–lag patterns
 
 These papers connect ideas in the following lectures:
 

From 9606ef6c370ed737014265f0b0eeeacf1a963d14 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Sat, 7 Feb 2026 16:20:17 +1100
Subject: [PATCH 05/37] updates

---
 lectures/chow_business_cycles.md | 35 +++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index 196059ec0..f03f2f3cc 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -75,7 +75,8 @@ def spectral_density_var1(A, V, ω_grid):
 def spectrum_of_linear_combination(F, b):
     """Spectrum of x_t = b'y_t given the spectral matrix F(ω)."""
     b = np.asarray(b).reshape(-1, 1)
-    return np.array([np.real((b.T @ F[k] @ b).item()) for k in range(F.shape[0])])
+    return np.array([np.real((b.T @ F[k] @ b).item()) 
+                                for k in range(F.shape[0])])
 
 def simulate_var1(A, V, T, burn=200, seed=1234):
     r"""Simulate y_t = A y_{t-1} + u_t with u_t \sim N(0, V)."""
@@ -84,8 +85,10 @@ def simulate_var1(A, V, T, burn=200, seed=1234):
     n = A.shape[0]
     chol = np.linalg.cholesky(V)
     y = np.zeros((T + burn, n))
+
     for t in range(1, T + burn):
         y[t] = A @ y[t - 1] + chol @ rng.standard_normal(n)
+
     return y[burn:]
 
 def sample_autocorrelation(x, max_lag):
@@ -172,9 +175,11 @@ Having established the empirical evidence for acceleration, we now examine why i
 
 He shows that, under natural sign restrictions, the answer is no.
 
-Stock-adjustment demand for durable goods leads to investment equations where the coefficient on $Y_{t-1}$ is negative—the **acceleration effect**.
+Stock-adjustment demand for durable goods leads to investment equations where the coefficient on $Y_{t-1}$ is negative. 
 
-This negative coefficient is what makes complex roots possible in the characteristic equation.
+This negative coefficient captures the **acceleration effect**: investment responds not just to the level of income, but to its rate of change.
+
+This negative coefficient is also what makes complex roots possible in the characteristic equation.
 
 Without it, Chow proves that demand systems with only positive coefficients have real positive roots, and hence no oscillatory dynamics.
 
@@ -469,9 +474,13 @@ V
 \left(I - A^\top e^{i\omega}\right)^{-1}.
 ```
 
-Intuitively, $F(\omega)$ tells us how much variation in $y_t$ is associated with cycles of (angular) frequency $\omega$.
+$F(\omega)$ tells us how much variation in $y_t$ is associated with cycles of (angular) frequency $\omega$.
+
+Higher frequencies correspond to rapid oscillations, meaning short cycles where the series completes many up-and-down movements per unit of time.
 
-The corresponding cycle length is
+Lower frequencies correspond to slower oscillations, meaning long cycles that unfold over extended periods.
+
+The corresponding cycle length (or period) is
 
 ```{math}
 :label: chow_period
@@ -479,6 +488,10 @@ The corresponding cycle length is
 T(\omega) = \frac{2\pi}{\omega}.
 ```
 
+Thus, a frequency of $\omega = \pi$ corresponds to the shortest possible cycle of $T = 2$ periods, while frequencies near zero correspond to very long cycles.
+
+When the spectral density $F(\omega)$ is concentrated at particular frequencies, it indicates that the time series exhibits pronounced cyclical behavior at those frequencies.
+
 The advanced lecture {doc}`advanced:estspec` explains how to estimate $F(\omega)$ from data.
 
 Here we focus on the model-implied spectrum.
@@ -638,7 +651,6 @@ factor = peak_condition_factor(r_example)
 cos_ω = factor * np.cos(θ)
 ω_example = np.arccos(cos_ω)
 print(f"Chow's example: r = {r_example}, θ = {θ_deg}°")
-print(f"  Factor (1+r²)/2r = {factor:.3f}")
 print(f"  cos(ω) = {cos_ω:.3f}")
 print(f"  ω = {np.rad2deg(ω_example):.1f}°")
 print(f"  Peak period = {360/np.rad2deg(ω_example):.1f} (vs deterministic period = {360/θ_deg:.1f})")
@@ -1128,9 +1140,9 @@ The **gain** is the frequency-response coefficient when regressing $y_i$ on $y_j
 G_{ij}(\omega) = \frac{|f_{ij}(\omega)|}{f_{jj}(\omega)}.
 ```
 
-Think of gain as the frequency-domain analogue of a regression coefficient: it measures how much $y_i$ responds to a unit change in $y_j$ at frequency $\omega$. 
+It measures how much $y_i$ responds to a unit change in $y_j$ at frequency $\omega$. 
 
-A gain of 0.9 at low frequencies means long-cycle movements in $y_j$ translate almost one-for-one to $y_i$; a gain of 0.3 at high frequencies means short-cycle movements are dampened.
+For instance, a gain of 0.9 at low frequencies means long-cycle movements in $y_j$ translate almost one-for-one to $y_i$, and a gain of 0.3 at high frequencies means short-cycle movements are dampened.
 
 The **phase** captures lead-lag relationships (in radians):
 
@@ -1273,13 +1285,16 @@ def scalar_kernel(λ_i, ω_grid):
     """scalar spectral kernel g_i(ω)."""
     λ_i = complex(λ_i)
     mod_sq = np.abs(λ_i)**2
-    return np.array([(1 - mod_sq) / np.abs(1 - λ_i * np.exp(-1j * ω))**2 for ω in ω_grid])
+    return np.array(
+        [(1 - mod_sq) / np.abs(1 - λ_i * np.exp(-1j * ω))**2 
+        for ω in ω_grid])
 
 fig, ax = plt.subplots(figsize=(10, 5))
 for i, λ_i in enumerate(λ):
     if np.abs(λ_i) > 0.01:
         g_i = scalar_kernel(λ_i, ω_grid)
-        label = f'$\\lambda_{i+1}$ = {λ_i:.4f}' if np.isreal(λ_i) else f'$\\lambda_{i+1}$ = {λ_i:.3f}'
+        label = f'$\\lambda_{i+1}$ = {λ_i:.4f}' \
+        if np.isreal(λ_i) else f'$\\lambda_{i+1}$ = {λ_i:.3f}'
         ax.semilogy(freq, g_i, label=label, lw=2)
 ax.set_xlabel(r'frequency $\omega/2\pi$')
 ax.set_ylabel('$g_i(\\omega)$')

From 8fa879a903179b86f92602068e2e2d1b0c4aa351 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Sat, 7 Feb 2026 17:55:09 +1100
Subject: [PATCH 06/37] updates

---
 lectures/chow_business_cycles.md | 128 +++++++++++++++++--------------
 1 file changed, 69 insertions(+), 59 deletions(-)

diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index f03f2f3cc..664f6d206 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -49,7 +49,7 @@ We will keep coming back to three ideas:
 
 - In deterministic models, oscillations correspond to complex eigenvalues of a transition matrix.
 - In stochastic models, a "cycle" shows up as a local peak in a (univariate) spectral density.
-- Spectral peaks depend on eigenvalues, but also on how shocks enter (the covariance matrix $V$) and on how observables load on eigenmodes.
+- Spectral peaks depend on eigenvalues, but also on how shocks enter and on how observables load on eigenmodes.
 
 Let's start with some standard imports:
 
@@ -214,7 +214,7 @@ for (label, c, v), A in zip(cases, A_list):
 
 With weak acceleration ($v=0.1$), the discriminant is positive and the roots are real.
 
-With strong acceleration ($v=0.8$), the discriminant is negative and the roots are complex conjugates, enabling oscillatory dynamics.
+With strong acceleration ($v=0.8$), the discriminant is negative and the roots are complex conjugates that enable oscillatory dynamics.
 
 ```{code-cell} ipython3
 # impulse responses from a one-time unit shock in Y
@@ -305,7 +305,7 @@ To study this formally, we need to introduce the stochastic framework.
 
 ## A linear system with shocks
 
-We analyze (or reduce to) a first-order linear stochastic system
+We analyze a first-order linear stochastic system
 
 ```{math}
 :label: chow_var1
@@ -316,7 +316,7 @@ y_t = A y_{t-1} + u_t,
 \qquad
 \mathbb E[u_t u_t^\top] = V,
 \qquad
-\mathbb E[u_t u_{t-k}^\top] = 0 \ (k \neq 0).
+\mathbb E[u_t u_{t-k}^\top] = 0, \quad k \neq 0.
 ```
 
 When the eigenvalues of $A$ are strictly inside the unit circle, the process is covariance stationary and its autocovariances exist.
@@ -346,13 +346,9 @@ The second equation is the discrete Lyapunov equation for $\Gamma_0$.
 {cite:t}`Chow1968` motivates the stochastic analysis with a quote from Ragnar Frisch:
 
 > The examples we have discussed ... show that when a [deterministic] economic system gives rise to oscillations, these will most frequently be damped.
->
 > But in reality the cycles ... are generally not damped.
->
 > How can the maintenance of the swings be explained?
->
 > ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
->
 > Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings—we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
 >
 > — Ragnar Frisch (1933)
@@ -365,7 +361,7 @@ We will show that even when eigenvalues are real (no deterministic oscillations)
 
 ### Autocovariances in terms of eigenvalues
 
-Let $\lambda_1, \ldots, \lambda_p$ be the (possibly complex) eigenvalues of $A$, assumed distinct, and let $B$ be the matrix whose columns are the corresponding right eigenvectors:
+Let $\lambda_1, \ldots, \lambda_p$ be the distinct, possibly complex, eigenvalues of $A$, and let $B$ be the matrix whose columns are the corresponding right eigenvectors:
 
 ```{math}
 :label: chow_eigen_decomp
@@ -616,7 +612,7 @@ def peak_condition_factor(r):
 θ = np.deg2rad(θ_deg)
 r_grid = np.linspace(0.3, 0.99, 100)
 
-# For each r, compute the implied peak frequency (if it exists)
+# For each r, compute the implied peak frequency
 ω_peak = []
 for r in r_grid:
     factor = peak_condition_factor(r)
@@ -632,13 +628,15 @@ period_peak = 2 * np.pi / ω_peak
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
 axes[0].plot(r_grid, np.rad2deg(ω_peak), lw=2)
-axes[0].axhline(θ_deg, ls='--', lw=1.0, color='gray', label=rf'$\theta = {θ_deg}°$')
+axes[0].axhline(θ_deg, ls='--', lw=1.0, color='gray', 
+        label=rf'$\theta = {θ_deg}°$')
 axes[0].set_xlabel('eigenvalue modulus $r$')
 axes[0].set_ylabel(r'peak frequency $\omega$ (degrees)')
 axes[0].legend(frameon=False)
 
 axes[1].plot(r_grid, period_peak, lw=2)
-axes[1].axhline(360/θ_deg, ls='--', lw=1.0, color='gray', label=rf'deterministic period = {360/θ_deg:.1f}')
+axes[1].axhline(360/θ_deg, ls='--', lw=1.0, color='gray', 
+        label=rf'deterministic period = {360/θ_deg:.1f}')
 axes[1].set_xlabel('eigenvalue modulus $r$')
 axes[1].set_ylabel('peak period')
 axes[1].legend(frameon=False)
@@ -664,29 +662,41 @@ For smaller $r$, the peak frequency can differ substantially from the determinis
 
 For *real and positive roots* $\lambda_1, \lambda_2 > 0$, the first-order condition {eq}`chow_hs_foc` cannot be satisfied.
 
-To see why, note that we would need:
+To see why, recall that a spectral peak at an interior frequency $\omega \in (0, \pi)$ requires
 
 ```{math}
-:label: chow_hs_real_impossible
-
-\cos\omega = \frac{(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1}{4\lambda_1 \lambda_2} > 1
+\cos\omega = \frac{(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1}{4\lambda_1 \lambda_2}.
 ```
 
-The inequality follows because:
+For this to have a solution, we need the right-hand side to lie in $[-1, 1]$.
+
+But for positive $\lambda_1, \lambda_2$, the numerator exceeds $4\lambda_1\lambda_2$:
 
 ```{math}
 :label: chow_hs_real_proof
 
-(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1 - 4\lambda_1\lambda_2 = \lambda_1(1-\lambda_2)^2 + \lambda_2(1-\lambda_1)^2 > 0
+(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1 - 4\lambda_1\lambda_2 = \lambda_1(1-\lambda_2)^2 + \lambda_2(1-\lambda_1)^2.
+```
+
+The right-hand side is a sum of two non-negative terms (each is a positive number times a square).
+
+It equals zero only if both $\lambda_1 = 1$ and $\lambda_2 = 1$, which violates the stability condition $|\lambda_i| < 1$.
+
+For any stable system with real positive roots, this expression is strictly positive, so
+
+```{math}
+:label: chow_hs_real_impossible
+
+\cos\omega = \frac{(1 + \lambda_1^2)\lambda_2 + (1 + \lambda_2^2)\lambda_1}{4\lambda_1 \lambda_2} > 1,
 ```
 
-which is strictly positive for any $\lambda_1, \lambda_2 > 0$.
+which is impossible.
+
+This is a key result: in the Hansen-Samuelson model, *complex roots are necessary* for a spectral peak at interior frequencies.
 
-This is a key result: In the Hansen-Samuelson model, *complex roots are necessary* for a spectral peak at interior frequencies.
+The following figure illustrates the difference in spectra between a case with complex roots and a case with real roots
 
 ```{code-cell} ipython3
-# Demonstrate: compare spectra with complex vs real roots
-# Both cases use valid Hansen-Samuelson parameterizations
 ω_grid = np.linspace(1e-3, np.pi - 1e-3, 800)
 V_hs = np.array([[1.0, 0.0], [0.0, 0.0]])  # shock only in first equation
 
@@ -768,22 +778,22 @@ Chow tabulates the values:
 
 The peak at $\omega$ slightly below $\pi/8$ (corresponding to periods around 11) is "quite pronounced."
 
-In the following figure, we reproduce this table, but with Python, we can plot a finer grid to find the peak more accurately.
+In the following figure, we reproduce this table, but with Python, we can plot a finer grid to find the peak more accurately
 
 ```{code-cell} ipython3
-# Reproduce Chow's exact example
 λ1, λ2 = 0.1, 0.9
 w11, w22, w12 = 1.0, 1.0, 0.8
 bm1, bm2 = 1.0, -0.01
 
 # Construct the system
 A_chow_ex = np.diag([λ1, λ2])
+
 # W is the canonical shock covariance; we need V = B W B^T
 # For diagonal A with distinct eigenvalues, B = I, so V = W
 V_chow_ex = np.array([[w11, w12], [w12, w22]])
 b_chow_ex = np.array([bm1, bm2])
 
-# Chow's formula (equation 67)
+# Chow's formula
 def chow_spectrum_formula(ω):
     term1 = 0.9913 / (1.01 - 0.2 * np.cos(ω))
     term2 = 0.001570 / (1.81 - 1.8 * np.cos(ω))
@@ -911,9 +921,7 @@ where $\Sigma$ is the covariance of structural residuals and $M$ is the matrix o
 
 Here we take $A$ and $V$ as given and ask what they imply for spectra and cross-spectra.
 
-### Reported shock covariance
-
-The $6 \times 6$ reduced-form shock covariance matrix $V$ (scaled by $10^{-7}$) is:
+The $6 \times 6$ reduced-form shock covariance matrix $V$ (scaled by $10^{-7}$) reported by Chow and Levitan is:
 
 ```{math}
 :label: chow_V_matrix
@@ -930,8 +938,6 @@ V = \begin{bmatrix}
 
 The sixth row and column are zeros because $y_6$ is an identity (lagged $y_1$).
 
-### Reported eigenvalues
-
 The transition matrix $A$ has six characteristic roots:
 
 ```{math}
@@ -949,8 +955,6 @@ One root ($\lambda_6$) is theoretically zero because of the identity $y_5 = y_1
 
 The complex conjugate pair $\lambda_{4,5}$ has modulus $|\lambda_4| = \sqrt{0.0761^2 + 0.1125^2} \approx 0.136$.
 
-### Reported eigenvectors
-
 The right eigenvector matrix $B$ (columns are eigenvectors corresponding to $\lambda_1, \ldots, \lambda_6$):
 
 ```{math}
@@ -999,14 +1003,14 @@ V = np.array([
 
 D_λ = np.diag(λ)
 A_chow = B @ D_λ @ np.linalg.inv(B)
-A_chow = np.real(A_chow)  # drop tiny imaginary parts from reported rounding
+A_chow = np.real(A_chow) 
 print("eigenvalues of reconstructed A:")
 print(np.linalg.eigvals(A_chow).round(6))
 ```
 
 ### Canonical coordinates
 
-Chow's canonical transformation uses $z_t = B^{-1} y_t$, giving dynamics $z_t = D_\lambda z_{t-1} + e_t$.
+Chow and Levitan's canonical transformation uses $z_t = B^{-1} y_t$, giving dynamics $z_t = D_\lambda z_{t-1} + e_t$.
 
 Accordingly, the canonical shock covariance is
 
@@ -1021,7 +1025,7 @@ print("diagonal of W:")
 print(np.diag(W).round(10))
 ```
 
-Chow derives the following closed-form formula for the spectral density matrix:
+Chow and Levitan derive the following closed-form formula for the spectral density matrix:
 
 ```{math}
 :label: chow_spectral_eigen
@@ -1051,7 +1055,7 @@ freq = np.linspace(1e-4, 0.5, 5000)     # cycles/year in [0, 1/2]
 F_chow = spectral_density_chow(λ, B, W, ω_grid)
 ```
 
-Let's plot the univariate spectra of consumption ($y_1$) and equipment plus inventories ($y_2$):
+Let's plot the univariate spectra of consumption ($y_1$) and equipment plus inventories ($y_2$)
 
 ```{code-cell} ipython3
 variable_names = ['$C$', '$I_1$', '$I_2$', '$R_a$', '$Y_1$']
@@ -1092,23 +1096,19 @@ i_peak = np.argmax(S_norm[mask, 1])
 f_peak = freq[mask][i_peak]
 ```
 
-We reproduce only Figures I.1 and I.2 here.
-
-Figure I.1 corresponds to consumption and declines monotonically with frequency.
+The left panel corresponds to consumption and declines monotonically with frequency.
 
-Figure I.1 illustrates Granger's "typical spectral shape" for macroeconomic time series.
+It illustrates Granger's "typical spectral shape" for macroeconomic time series.
 
-Figure I.2 corresponds to equipment plus inventories and shows the clearest (but still very flat) interior-frequency bump.
+The right panel corresponds to equipment plus inventories and shows the clearest (but still very flat) interior-frequency bump.
 
 Chow and Levitan associate the dominance of very low frequencies in both plots with strong persistence and long-run movements.
 
-They note that very large low-frequency power can arise from eigenvalues extremely close to one, which can occur mechanically when some equations are written in first differences.
+Very large low-frequency power can arise from eigenvalues extremely close to one, which occurs mechanically when some equations are written in first differences.
 
-They stress that local peaks are not automatic, because complex roots may have small modulus and multivariate interactions can generate peaks even when all roots are real.
+Local peaks are not automatic: complex roots may have small modulus, and multivariate interactions can generate peaks even when all roots are real.
 
-They note that the interior bump in Figure I.2 corresponds to cycles of roughly three years and that the spectrum is nearly flat over cycles between about two and four years.
-
-Their other spectra in Figures I.3–I.5 (construction, the long rate, and private-domestic GNP) decline monotonically with frequency in the same calibration.
+The interior bump in the right panel corresponds to cycles of roughly three years, with the spectrum nearly flat over cycles between about two and four years.
 
 (This discussion follows Section II of {cite}`ChowLevitan1969`.)
 
@@ -1128,7 +1128,7 @@ The **squared coherence** measures linear association at frequency $\omega$:
 R^2_{ij}(\omega) = \frac{|f_{ij}(\omega)|^2}{f_{ii}(\omega) f_{jj}(\omega)} \in [0, 1].
 ```
 
-Think of coherence as the frequency-domain analogue of $R^2$: it measures how much of the variance of $y_i$ at frequency $\omega$ can be "explained" by $y_j$ at the same frequency. 
+Coherence measures how much of the variance of $y_i$ at frequency $\omega$ can be "explained" by $y_j$ at the same frequency. 
 
 High coherence means the two series move together tightly at that frequency.
 
@@ -1194,7 +1194,7 @@ The gain and coherence patterns differ across components (Figures II.1–II.2 of
     - Gain is about 0.9 at very low frequencies but falls below 0.4 for cycles shorter than four years. 
     - This is evidence that short-cycle income movements translate less into consumption than long-cycle movements, consistent with permanent-income interpretations. 
     - Coherence remains high throughout.
-- For Equipment plus inventories vs private-domestic GNP  (right panel):
+- Equipment plus inventories vs private-domestic GNP  (right panel):
     - Gain *rises* with frequency, exceeding 0.5 for short cycles. 
     - This is the frequency-domain signature of acceleration and volatile short-run inventory movements.
 
@@ -1247,7 +1247,7 @@ ax.axhline(0, lw=0.8)
 paper_frequency_axis(ax)
 ax.set_ylabel('phase difference in cycles')
 ax.set_ylim([-0.25, 0.25])
-ax.set_yticks([-0.25, -0.20, -0.15, -0.10, -0.05, 0, 0.05, 0.10, 0.15, 0.20, 0.25])
+ax.set_yticks(np.arange(-0.25, 0.3, 0.05), minor=True)
 ax.legend(frameon=False)
 plt.tight_layout()
 plt.show()
@@ -1280,6 +1280,8 @@ g_i(\omega) = \frac{1 - \lambda_i^2}{1 + \lambda_i^2 - 2\lambda_i \cos\omega}.
 
 Each observable spectral density is a linear combination of these kernels (plus cross-terms).
 
+Below, we plot the scalar kernels for each eigenvalue to see how they shape the overall spectra
+
 ```{code-cell} ipython3
 def scalar_kernel(λ_i, ω_grid):
     """scalar spectral kernel g_i(ω)."""
@@ -1307,7 +1309,7 @@ plt.show()
 
 The figure reveals how eigenvalue magnitude shapes spectral contributions:
 
-- *Near-unit eigenvalues* ($\lambda_1, \lambda_2 \approx 1$) produce kernels sharply peaked at low frequencies—these drive the strong low-frequency power seen in the spectra above.
+- *Near-unit eigenvalues* ($\lambda_1, \lambda_2 \approx 1$) produce kernels sharply peaked at low frequencies as these drive the strong low-frequency power seen in the spectra above.
 - *The moderate eigenvalue* ($\lambda_3 \approx 0.48$) contributes a flatter component that spreads power more evenly across frequencies.
 - *The complex pair* ($\lambda_{4,5}$) has such small modulus ($|\lambda_{4,5}| \approx 0.136$) that its kernel is nearly flat, which is too weak to generate a pronounced interior peak.
 
@@ -1321,10 +1323,6 @@ The complex pair, despite enabling oscillatory dynamics in principle, has insuff
 
 The acceleration principle receives strong empirical support: the negative coefficient on lagged output in investment equations is a robust finding across datasets.
 
-- This matters because, in a model consisting only of demand equations with simple distributed lags, the transition matrix has real positive roots under natural sign restrictions—ruling out prolonged oscillations.
-
-- Acceleration introduces the possibility of complex roots, which are necessary for oscillatory dynamics in deterministic systems.
-
 The relationship between eigenvalues and spectral peaks is more subtle than it first appears:
 
 - Complex roots guarantee oscillatory autocovariances, but they are neither necessary nor sufficient for a pronounced spectral peak.
@@ -1333,8 +1331,6 @@ The relationship between eigenvalues and spectral peaks is more subtle than it f
 
 - But in general multivariate systems, even real roots can produce peaks through the interaction of shocks and eigenvector loadings.
 
-Chow argues that understanding business cycles requires an integrated view of deterministic dynamics and random shocks.
-
 {cite:t}`ChowLevitan1969` demonstrate what these objects look like in a calibrated system: strong low-frequency power from near-unit eigenvalues, frequency-dependent gains and coherences, and lead–lag relations that vary with cycle length.
 
 Their results are consistent with Granger's "typical spectral shape" for economic time series.
@@ -1357,6 +1353,9 @@ Use the same $v$ values as in the main text: $v \in \{0.2, 0.4, 0.6, 0.8, 0.95\}
 :class: dropdown
 ```
 
+Here is one solution:
+
+
 ```{code-cell} ipython3
 v_grid_ex1 = [0.2, 0.4, 0.6, 0.8, 0.95]
 c_ex1 = 0.6
@@ -1422,6 +1421,8 @@ Verify spectral peak condition {eq}`chow_hs_peak_condition` numerically for the
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 θ_ex = np.pi / 3  # 60 degrees
 r_grid = np.linspace(0.3, 0.99, 50)
@@ -1449,6 +1450,7 @@ for r in r_grid:
     F_ex = spectral_density_var1(A_ex, V_hs_ex, ω_grid_ex)
     f11 = np.real(F_ex[:, 0, 0])
     i_max = np.argmax(f11)
+
     # Only count as a peak if it's not at the boundary
     if 5 < i_max < len(ω_grid_ex) - 5:
         ω_numerical.append(ω_grid_ex[i_max])
@@ -1468,7 +1470,7 @@ axes[0].set_xlabel('eigenvalue modulus $r$')
 axes[0].set_ylabel(r'peak frequency $\omega^*/\pi$')
 axes[0].legend(frameon=False)
 
-# Plot the factor (1+r²)/2r to show when peaks are valid
+# Plot the factor (1+r^2)/2r to show when peaks are valid
 axes[1].plot(r_grid, (1 + r_grid**2) / (2 * r_grid), lw=2)
 axes[1].axhline(1 / np.cos(θ_ex), ls='--', lw=1.0, color='red',
                 label=f'threshold = 1/cos({np.rad2deg(θ_ex):.0f}°) = {1/np.cos(θ_ex):.2f}')
@@ -1483,7 +1485,7 @@ plt.show()
 valid_mask = ~np.isnan(ω_theory)
 if valid_mask.any():
     r_threshold = r_grid[valid_mask][0]
-    print(f"Peak exists for r ≥ {r_threshold:.2f}")
+    print(f"Peak exists for r >= {r_threshold:.2f}")
 ```
 
 The theoretical and numerical peak frequencies match closely.
@@ -1507,6 +1509,8 @@ When does the interior-frequency peak appear, and how does its location change?
 :class: dropdown
 ```
 
+Here is one solution:
+
 ```{code-cell} ipython3
 A_ex3 = np.diag([0.1, 0.9])
 b_ex3 = np.array([1.0, -0.01])
@@ -1530,7 +1534,7 @@ plt.show()
 
 threshold_idx = np.where(~np.isnan(peak_periods))[0]
 if len(threshold_idx) > 0:
-    print(f"interior peak appears when correlation ≥ {corr_grid[threshold_idx[0]]:.2f}")
+    print(f"interior peak appears when correlation >= {corr_grid[threshold_idx[0]]:.2f}")
 ```
 
 The interior peak appears only when the shock correlation exceeds a threshold.
@@ -1555,6 +1559,9 @@ Verify that both methods give the same result.
 :class: dropdown
 ```
 
+Here is one solution:
+
+
 ```{code-cell} ipython3
 from scipy.linalg import solve_discrete_lyapunov
 
@@ -1601,6 +1608,9 @@ Modify the Chow-Levitan model by changing $\lambda_3$ from $0.4838$ to $0.95$.
 :class: dropdown
 ```
 
+Here is one solution:
+
+
 ```{code-cell} ipython3
 # Modify λ_3 and reconstruct the transition matrix
 λ_modified = λ.copy()

From 11fa74a02f69214cc6afdb551e43235146cd2a2f Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Sun, 8 Feb 2026 11:52:43 +0800
Subject: [PATCH 07/37] Tom's Feb 8 edits of the Chow lecture

---
 lectures/_static/quant-econ.bib  | 23 +++++++++++++++++++++++
 lectures/chow_business_cycles.md | 20 ++++++++++----------
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 55b678f94..21e6feacf 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -4,6 +4,29 @@
 ###
 
 
+@incollection{slutsky:1927,
+ address = {Moscow},
+ author = {Slutsky, Eugen},
+ booktitle = {Problems of Economic Conditions},
+ date-added = {2021-02-16 14:44:03 -0600},
+ date-modified = {2021-02-16 14:44:03 -0600},
+ publisher = {The Conjuncture Institute},
+ title = {The Summation of Random Causes as the Source of Cyclic Processes},
+ volume = {3},
+ year = {1927}
+}
+
+@incollection{frisch33,
+ author = {Ragar Frisch},
+ booktitle = {Economic Essays in Honour of Gustav Cassel},
+ date-added = {2015-01-09 21:08:15 +0000},
+ date-modified = {2015-01-09 21:08:15 +0000},
+ pages = {171-205},
+ publisher = {Allen and Unwin},
+ title = {Propagation Problems and Impulse Problems in Dynamic Economics},
+ year = {1933}
+}
+
 @article{harsanyi1968games,
   title={Games with Incomplete Information Played by ``{B}ayesian'' Players, {I}--{III} Part {II}. {B}ayesian Equilibrium Points},
   author={Harsanyi, John C.},
diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index 664f6d206..5f16fca14 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -29,12 +29,12 @@ kernelspec:
 
 ## Overview
 
-This lecture studies two classic papers by Gregory Chow on business cycles in linear dynamic models:
+This lecture studies two classic papers by Gregory Chow:
 
-- {cite:t}`Chow1968`: empirical evidence for the acceleration principle, why acceleration enables oscillations, and when spectral peaks arise in stochastic systems
-- {cite:t}`ChowLevitan1969`: spectral analysis of a calibrated US macroeconometric model, showing gains, coherences, and lead–lag patterns
+- {cite:t}`Chow1968` presents  empirical evidence for the acceleration principle, describes how  acceleration promotes  oscillations, and analyzes conditions for the emergence of spectral peaks in linear difference equation subjected to random shocks
+- {cite:t}`ChowLevitan1969` presents a spectral analysis of a calibrated US macroeconometric model and teaches about spectral  gains, coherences, and lead–lag patterns
 
-These papers connect ideas in the following lectures:
+These papers are related to ideas in the following lectures:
 
 - The multiplier–accelerator mechanism in {doc}`samuelson`
 - Linear stochastic difference equations and autocovariances in {doc}`linear_models`
@@ -43,11 +43,11 @@ These papers connect ideas in the following lectures:
 
 {cite:t}`Chow1968` builds on earlier empirical work testing the acceleration principle on US investment data.
 
-We begin with that empirical foundation before developing the theoretical framework.
+We start  with that empirical evidence before developing the theoretical framework.
 
-We will keep coming back to three ideas:
+We will keep returning to three ideas:
 
-- In deterministic models, oscillations correspond to complex eigenvalues of a transition matrix.
+- In deterministic models, oscillations indicate  complex eigenvalues of a transition matrix.
 - In stochastic models, a "cycle" shows up as a local peak in a (univariate) spectral density.
 - Spectral peaks depend on eigenvalues, but also on how shocks enter and on how observables load on eigenmodes.
 
@@ -299,7 +299,7 @@ This illustrates that acceleration creates complex eigenvalues, which are necess
 
 But what happens when we add random shocks?
 
-Frisch's insight was that even damped oscillations can be "maintained" when the system is continuously perturbed by random disturbances.
+An  insight of Ragnar Frisch {cite}`frisch33` was that  damped oscillations can be "maintained" when the system is continuously perturbed by random disturbances.
 
 To study this formally, we need to introduce the stochastic framework.
 
@@ -351,7 +351,7 @@ The second equation is the discrete Lyapunov equation for $\Gamma_0$.
 > ... One way which I believe is particularly fruitful and promising is to study what would become of the solution of a determinate dynamic system if it were exposed to a stream of erratic shocks ...
 > Thus, by connecting the two ideas: (1) the continuous solution of a determinate dynamic system and (2) the discontinuous shocks intervening and supplying the energy that may maintain the swings—we get a theoretical setup which seems to furnish a rational interpretation of those movements which we have been accustomed to see in our statistical time data.
 >
-> — Ragnar Frisch (1933)
+> — Ragnar Frisch (1933) {cite}`frisch33`
 
 Chow's main insight is that oscillations in the deterministic system are *neither necessary nor sufficient* for producing "cycles" in the stochastic system.
 
@@ -840,7 +840,7 @@ The peak appears at $\omega/\pi \approx 0.10$, which corresponds to a cycle leng
 
 ### The Slutsky connection
 
-Chow connects this result to Slutsky's well-known finding that taking moving averages of a random series can generate cycles.
+Chow connects this result to Slutsky's {cite}`slutsky:1927`  finding that  moving averages of a random series have recurrent cycles.
 
 The VAR(1) model can be written as an infinite moving average:
 

From c78fe1890641c27421d1426a086c321fb0826fbf Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Sun, 8 Feb 2026 16:19:17 +1100
Subject: [PATCH 08/37] update

---
 lectures/chow_business_cycles.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lectures/chow_business_cycles.md b/lectures/chow_business_cycles.md
index 5f16fca14..d8a424875 100644
--- a/lectures/chow_business_cycles.md
+++ b/lectures/chow_business_cycles.md
@@ -1070,8 +1070,9 @@ def paper_frequency_axis(ax):
     ax.set_xlabel(r'frequency $\omega/2\pi$')
 
 # Normalized spectra (areas set to 1)
-S = np.real(np.diagonal(F_chow, axis1=1, axis2=2))[:, :5]  # y1..y5
-areas = np.trapezoid(S, freq, axis=0)
+S = np.real(np.diagonal(F_chow, axis1=1, axis2=2))[:, :5]
+df = np.diff(freq)
+areas = np.sum(0.5 * (S[1:] + S[:-1]) * df[:, None], axis=0)
 S_norm = S / areas
 mask = freq >= 0.0
 
@@ -1355,7 +1356,6 @@ Use the same $v$ values as in the main text: $v \in \{0.2, 0.4, 0.6, 0.8, 0.95\}
 
 Here is one solution:
 
-
 ```{code-cell} ipython3
 v_grid_ex1 = [0.2, 0.4, 0.6, 0.8, 0.95]
 c_ex1 = 0.6
@@ -1380,7 +1380,9 @@ for v in v_grid_ex1:
     # spectrum (right panel)
     F = spectral_density_var1(A, V_ex1, ω_grid_ex1)
     f11 = np.real(F[:, 0, 0])
-    f11_norm = f11 / np.trapezoid(f11, freq_ex1)
+    df = np.diff(freq_ex1)
+    area = np.sum(0.5 * (f11[1:] + f11[:-1]) * df)
+    f11_norm = f11 / area
     axes[1].plot(freq_ex1, f11_norm, lw=2, label=f'$v={v}$')
 
 axes[0].axhline(0, lw=0.8, color='gray')
@@ -1561,7 +1563,6 @@ Verify that both methods give the same result.
 
 Here is one solution:
 
-
 ```{code-cell} ipython3
 from scipy.linalg import solve_discrete_lyapunov
 
@@ -1610,7 +1611,6 @@ Modify the Chow-Levitan model by changing $\lambda_3$ from $0.4838$ to $0.95$.
 
 Here is one solution:
 
-
 ```{code-cell} ipython3
 # Modify λ_3 and reconstruct the transition matrix
 λ_modified = λ.copy()

From ddd82516eaa447c3dfe89b66f839331406c6571c Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Mon, 9 Feb 2026 16:28:32 +1100
Subject: [PATCH 09/37] minor updates

---
 lectures/_static/quant-econ.bib |  10 +
 lectures/_toc.yml               |   1 +
 lectures/measurement_models.md  | 810 ++++++++++++++++++++++++++++++++
 3 files changed, 821 insertions(+)
 create mode 100644 lectures/measurement_models.md

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 21e6feacf..bd35b4809 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -2207,6 +2207,16 @@ @book{Sargent1987
   year      = {1987}
 }
 
+@article{Sargent1989,
+  author  = {Sargent, Thomas J},
+  title   = {Two Models of Measurements and the Investment Accelerator},
+  journal = {Journal of Political Economy},
+  volume  = {97},
+  number  = {2},
+  pages   = {251--287},
+  year    = {1989}
+}
+
 @article{SchechtmanEscudero1977,
   author  = {Schechtman, Jack and Escudero, Vera L S},
   journal = {Journal of Economic Theory},
diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index aeaab36b5..098deaaa7 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -61,6 +61,7 @@ parts:
   - file: wealth_dynamics
   - file: kalman
   - file: kalman_2
+  - file: measurement_models
 - caption: Search
   numbered: true
   chapters:
diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
new file mode 100644
index 000000000..6b640aece
--- /dev/null
+++ b/lectures/measurement_models.md
@@ -0,0 +1,810 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(sargent_measurement_models)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Two Models of Measurements and the Investment Accelerator
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+{cite:t}`Sargent1989` studies what happens to an econometrician's
+inferences about economic dynamics when observed data are contaminated
+by measurement error.
+
+The setting is a {doc}`permanent income <perm_income>` economy in which the
+investment accelerator, the mechanism studied in {doc}`samuelson` and
+{doc}`chow_business_cycles`, drives business cycle fluctuations.
+
+Sargent specifies a {doc}`linear state space model <linear_models>` for the
+true economy and then considers two ways of extracting information from
+noisy measurements:
+
+- Model 1 applies a {doc}`Kalman filter <kalman>` directly to
+  raw (noisy) observations.
+- Model 2 first filters the data to remove measurement error,
+  then computes dynamics from the filtered series.
+
+The two models produce different Wold representations and
+forecast-error-variance decompositions, even though they describe
+the same underlying economy.
+
+In this lecture we reproduce all numbered tables and figures from
+{cite}`Sargent1989` while studying the underlying mechanisms in the paper.
+
+We use the following imports and precision settings for tables:
+
+```{code-cell} ipython3
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from scipy import linalg
+
+np.set_printoptions(precision=4, suppress=True)
+```
+
+## Model Setup
+
+The true economy is a version of the permanent income model
+(see {doc}`perm_income`) in which a representative consumer
+chooses consumption $c_t$ and capital accumulation $\Delta k_t$
+to maximize expected discounted utility subject to a budget
+constraint.
+
+Assume that the discount factor satisfies $\beta f = 1$ and that the
+productivity shock $\theta_t$ is white noise.
+
+The optimal decision rules reduce the true system to
+
+```{math}
+\begin{aligned}
+k_{t+1} &= k_t + f^{-1}\theta_t, \\
+y_{n,t} &= (f-1)k_t + \theta_t, \\
+c_t &= (f-1)k_t + (1-f^{-1})\theta_t, \\
+\Delta k_t &= f^{-1}\theta_t.
+\end{aligned}
+```
+
+with $f = 1.05$ and $\theta_t \sim \mathcal{N}(0, 1)$.
+
+Here $k_t$ is capital, $y_{n,t}$ is national income, $c_t$ is consumption,
+and $\Delta k_t$ is net investment.
+
+Notice the investment accelerator at work: because $\Delta k_t = f^{-1}\theta_t$,
+investment responds only to the innovation $\theta_t$, not to the level of
+capital. 
+
+This is the same mechanism that {cite:t}`Chow1968` documented
+empirically (see {doc}`chow_business_cycles`).
+
+We can cast this as a {doc}`linear state space model <linear_models>` by
+defining state and observable vectors
+
+```{math}
+x_t = \begin{bmatrix} k_t \\ \theta_t \end{bmatrix},
+\qquad
+z_t = \begin{bmatrix} y_{n,t} \\ c_t \\ \Delta k_t \end{bmatrix},
+```
+
+and matrices
+
+```{math}
+A = \begin{bmatrix}
+1 & f^{-1} \\
+0 & 0
+\end{bmatrix},
+\qquad
+C = \begin{bmatrix}
+f-1 & 1 \\
+f-1 & 1-f^{-1} \\
+0   & f^{-1}
+\end{bmatrix}.
+```
+
+The econometrician does not observe $z_t$ directly but instead
+sees $\bar z_t = z_t + v_t$, where $v_t$ is a vector of measurement
+errors.
+
+Measurement errors are AR(1):
+
+```{math}
+v_t = D v_{t-1} + \eta_t,
+```
+
+with diagonal
+
+```{math}
+D = \operatorname{diag}(0.6, 0.7, 0.3),
+```
+
+and innovation standard deviations $(0.05, 0.035, 0.65)$.
+
+```{code-cell} ipython3
+f = 1.05
+β = 1 / f
+
+A = np.array([
+    [1.0, 1.0 / f],
+    [0.0, 0.0]
+])
+
+C = np.array([
+    [f - 1.0, 1.0],
+    [f - 1.0, 1.0 - 1.0 / f],
+    [0.0, 1.0 / f]
+])
+
+Q = np.array([
+    [0.0, 0.0],
+    [0.0, 1.0]
+])
+
+ρ = np.array([0.6, 0.7, 0.3])
+D = np.diag(ρ)
+
+# Innovation std. devs shown in Table 1
+σ_η = np.array([0.05, 0.035, 0.65])
+Σ_η = np.diag(σ_η**2)
+
+# Unconditional covariance of measurement errors v_t
+R = np.diag((σ_η / np.sqrt(1.0 - ρ**2))**2)
+
+print(f"f = {f},  β = 1/f = {β:.6f}")
+print("\nA ="); display(pd.DataFrame(A))
+print("C ="); display(pd.DataFrame(C))
+print("D ="); display(pd.DataFrame(D))
+```
+
+## Kalman Filter
+
+Both models require a steady-state {doc}`Kalman filter <kalman>`.
+
+The function below iterates on the Riccati equation until convergence,
+returning the Kalman gain $K$, the state covariance $S$, and the
+innovation covariance $V$.
+
+```{code-cell} ipython3
+def steady_state_kalman(A, C_obs, Q, R, W=None, tol=1e-13, max_iter=200_000):
+    """
+    Solve steady-state Kalman equations for
+        x_{t+1} = A x_t + w_{t+1}
+        y_t     = C_obs x_t + v_t
+    with cov(w)=Q, cov(v)=R, cov(w,v)=W.
+    """
+    n = A.shape[0]
+    m = C_obs.shape[0]
+    if W is None:
+        W = np.zeros((n, m))
+
+    S = Q.copy()
+    for _ in range(max_iter):
+        V = C_obs @ S @ C_obs.T + R
+        K = (A @ S @ C_obs.T + W) @ np.linalg.inv(V)
+        S_new = Q + A @ S @ A.T - K @ V @ K.T
+
+        if np.max(np.abs(S_new - S)) < tol:
+            S = S_new
+            break
+        S = S_new
+
+    V = C_obs @ S @ C_obs.T + R
+    K = (A @ S @ C_obs.T + W) @ np.linalg.inv(V)
+    return K, S, V
+```
+
+## Table 2: True Impulse Responses
+
+Before introducing measurement error, we verify the impulse response of
+the true system to a unit shock $\theta_0 = 1$.
+
+The response shows the investment accelerator clearly: the full impact on
+net income $y_n$ occurs at lag 0, while consumption adjusts by only
+$1 - f^{-1} \approx 0.048$ and investment absorbs the remainder.
+
+From lag 1 onward the economy is in its new steady state.
+
+```{code-cell} ipython3
+def table2_irf(A, C, n_lags=6):
+    x = np.array([0.0, 1.0])  # k_0 = 0, theta_0 = 1
+    rows = []
+    for j in range(n_lags):
+        y_n, c, d_k = C @ x
+        rows.append([j, y_n, c, d_k])
+        x = A @ x
+    return np.array(rows)
+
+rep_table2 = table2_irf(A, C, n_lags=6)
+
+pd.DataFrame(
+    np.round(rep_table2[:, 1:], 4),
+    columns=[r'$y_n$', r'$c$', r'$\Delta k$'],
+    index=pd.Index(range(6), name='Lag')
+)
+```
+
+## Model 1 (Raw Measurements): Tables 3 and 4
+
+Model 1 treats the raw measured series $\bar z_t$ as the observables and
+applies a Kalman filter to extract the state.
+
+Because the measurement errors $v_t$ are serially correlated, Sargent
+quasi-differences the observation equation to obtain an innovation form
+with serially uncorrelated errors.
+
+The transformed observation equation is
+
+```{math}
+\bar z_t - D \bar z_{t-1} = (CA - DC)x_{t-1} + C w_t + \eta_t.
+```
+
+Hence
+
+```{math}
+\bar C = CA - DC, \quad R_1 = CQC^\top + R, \quad W_1 = QC^\top.
+```
+
+```{code-cell} ipython3
+C_bar = C @ A - D @ C
+R1 = C @ Q @ C.T + R
+W1 = Q @ C.T
+
+K1, S1, V1 = steady_state_kalman(A, C_bar, Q, R1, W1)
+```
+
+With the Kalman gain in hand, we can derive the Wold moving-average
+representation for the measured data.
+
+This representation tells us how measured $y_n$, $c$, and $\Delta k$
+respond over time to the orthogonalized innovations in the
+innovation covariance matrix $V_1$.
+
+To recover the Wold representation, define the augmented state
+
+```{math}
+r_t = \begin{bmatrix} \hat x_{t-1} \\ z_{t-1} \end{bmatrix},
+```
+
+with dynamics
+
+```{math}
+r_{t+1} = F_1 r_t + G_1 u_t,
+\qquad
+z_t = H_1 r_t + u_t,
+```
+
+where
+
+```{math}
+F_1 =
+\begin{bmatrix}
+A & 0 \\
+\bar C & D
+\end{bmatrix},
+\quad
+G_1 =
+\begin{bmatrix}
+K_1 \\
+I
+\end{bmatrix},
+\quad
+H_1 = [\bar C \;\; D].
+```
+
+```{code-cell} ipython3
+F1 = np.block([
+    [A, np.zeros((2, 3))],
+    [C_bar, D]
+])
+G1 = np.vstack([K1, np.eye(3)])
+H1 = np.hstack([C_bar, D])
+
+
+def measured_wold_coeffs(F, G, H, n_terms=25):
+    psi = [np.eye(3)]
+    Fpow = np.eye(F.shape[0])
+    for _ in range(1, n_terms):
+        psi.append(H @ Fpow @ G)
+        Fpow = Fpow @ F
+    return psi
+
+
+def fev_contributions(psi, V, n_horizons=20):
+    """
+    Returns contrib[var, shock, h-1] = contribution at horizon h.
+    """
+    P = linalg.cholesky(V, lower=True)
+    out = np.zeros((3, 3, n_horizons))
+    for h in range(1, n_horizons + 1):
+        acc = np.zeros((3, 3))
+        for j in range(h):
+            T = psi[j] @ P
+            acc += T**2
+        out[:, :, h - 1] = acc
+    return out
+
+
+psi1 = measured_wold_coeffs(F1, G1, H1, n_terms=40)
+resp1 = np.array([psi1[j] @ linalg.cholesky(V1, lower=True) for j in range(14)])
+decomp1 = fev_contributions(psi1, V1, n_horizons=20)
+```
+
+Table 3 reports the forecast-error-variance decomposition for Model 1.
+
+Each panel shows the cumulative contribution of one orthogonalized
+innovation to the forecast-error variance of $y_n$, $c$, and $\Delta k$
+at horizons 1 through 20.
+
+```{code-cell} ipython3
+horizons = np.arange(1, 21)
+cols = [r'$y_n$', r'$c$', r'$\Delta k$']
+
+def fev_table(decomp, shock_idx, horizons):
+    return pd.DataFrame(
+        np.round(decomp[:, shock_idx, :].T, 4),
+        columns=cols,
+        index=pd.Index(horizons, name='Horizon')
+    )
+
+print("Table 3A: Contribution of innovation 1")
+display(fev_table(decomp1, 0, horizons))
+
+print("Table 3B: Contribution of innovation 2")
+display(fev_table(decomp1, 1, horizons))
+
+print("Table 3C: Contribution of innovation 3")
+display(fev_table(decomp1, 2, horizons))
+```
+
+The innovation covariance matrix $V_1$ is:
+
+```{code-cell} ipython3
+labels = [r'$y_n$', r'$c$', r'$\Delta k$']
+pd.DataFrame(np.round(V1, 4), index=labels, columns=labels)
+```
+
+Table 4 reports the orthogonalized Wold impulse responses for Model 1
+at lags 0 through 13.
+
+```{code-cell} ipython3
+lags = np.arange(14)
+
+def wold_response_table(resp, shock_idx, lags):
+    return pd.DataFrame(
+        np.round(resp[:, :, shock_idx], 4),
+        columns=cols,
+        index=pd.Index(lags, name='Lag')
+    )
+
+print("Table 4A: Response to innovation in y_n")
+display(wold_response_table(resp1, 0, lags))
+
+print("Table 4B: Response to innovation in c")
+display(wold_response_table(resp1, 1, lags))
+
+print("Table 4C: Response to innovation in Δk")
+display(wold_response_table(resp1, 2, lags))
+```
+
+## Model 2 (Filtered Measurements): Tables 5 and 6
+
+Model 2 takes a different approach: instead of working with the raw data,
+the econometrician first applies the Kalman filter from Model 1 to
+strip out measurement error and then treats the filtered estimates
+$\hat z_t = C \hat x_t$ as if they were the true observations.
+
+A second Kalman filter is then applied to the filtered series.
+
+The state noise covariance for this second filter is
+
+```{math}
+Q_2 = K_1 V_1 K_1^\top,
+```
+
+We solve a second Kalman system with tiny measurement noise to regularize the
+near-singular covariance matrix.
+
+```{code-cell} ipython3
+Q2 = K1 @ V1 @ K1.T
+ε = 1e-7
+
+K2, S2, V2 = steady_state_kalman(A, C, Q2, ε * np.eye(3))
+
+
+def filtered_wold_coeffs(A, C, K, n_terms=25):
+    psi = [np.eye(3)]
+    Apow = np.eye(2)
+    for _ in range(1, n_terms):
+        psi.append(C @ Apow @ K)
+        Apow = Apow @ A
+    return psi
+
+
+psi2 = filtered_wold_coeffs(A, C, K2, n_terms=40)
+resp2 = np.array([psi2[j] @ linalg.cholesky(V2, lower=True) for j in range(14)])
+decomp2 = fev_contributions(psi2, V2, n_horizons=20)
+```
+
+Table 5 is the analogue of Table 3 for Model 2.
+
+Because the filtered data are nearly noiseless, the second and third
+innovations contribute very little to forecast-error variance.
+
+```{code-cell} ipython3
+print("Table 5A: Contribution of innovation 1")
+display(fev_table(decomp2, 0, horizons))
+
+print("Table 5B: Contribution of innovation 2 (×10³)")
+display(pd.DataFrame(
+    np.round(decomp2[:, 1, :].T * 1e3, 4),
+    columns=cols,
+    index=pd.Index(horizons, name='Horizon')
+))
+
+print("Table 5C: Contribution of innovation 3 (×10⁶)")
+display(pd.DataFrame(
+    np.round(decomp2[:, 2, :].T * 1e6, 4),
+    columns=cols,
+    index=pd.Index(horizons, name='Horizon')
+))
+```
+
+The innovation covariance matrix $V_2$ for Model 2 is:
+
+```{code-cell} ipython3
+pd.DataFrame(np.round(V2, 4), index=labels, columns=labels)
+```
+
+Table 6 reports the orthogonalized Wold impulse responses for Model 2.
+
+```{code-cell} ipython3
+print("Table 6A: Response to innovation in y_n")
+display(wold_response_table(resp2, 0, lags))
+
+print("Table 6B: Response to innovation in c")
+display(wold_response_table(resp2, 1, lags))
+
+print("Table 6C: Response to innovation in Δk (×10³)")
+display(pd.DataFrame(
+    np.round(resp2[:, :, 2] * 1e3, 4),
+    columns=cols,
+    index=pd.Index(lags, name='Lag')
+))
+```
+
+## Simulation: Figures 1 through 9 and Table 7
+
+The tables above characterize population moments of the two models.
+
+To see how the models perform on a finite sample, Sargent simulates
+80 periods of true, measured, and filtered data and reports
+covariance and correlation matrices (Table 7) together with
+time-series plots (Figures 1 through 9).
+
+We replicate these objects below.
+
+```{code-cell} ipython3
+def simulate_series(seed=7909, T=80, k0=10.0):
+    """
+    Simulate true, measured, and filtered series for Figures 1--9.
+    """
+    rng = np.random.default_rng(seed)
+
+    # True state/observables
+    θ = rng.normal(0.0, 1.0, size=T)
+    k = np.empty(T + 1)
+    k[0] = k0
+
+    y = np.empty(T)
+    c = np.empty(T)
+    dk = np.empty(T)
+
+    for t in range(T):
+        x_t = np.array([k[t], θ[t]])
+        y[t], c[t], dk[t] = C @ x_t
+        k[t + 1] = k[t] + (1.0 / f) * θ[t]
+
+    # Measured data with AR(1) errors
+    v_prev = np.zeros(3)
+    v = np.empty((T, 3))
+    for t in range(T):
+        η_t = rng.multivariate_normal(np.zeros(3), Σ_η)
+        v_prev = D @ v_prev + η_t
+        v[t] = v_prev
+
+    z_meas = np.column_stack([y, c, dk]) + v
+
+    # Filtered data via Model 1 transformed filter
+    xhat_prev = np.array([k0, 0.0])
+    z_prev = np.zeros(3)
+    z_filt = np.empty((T, 3))
+    k_filt = np.empty(T)
+
+    for t in range(T):
+        z_bar_t = z_meas[t] - D @ z_prev
+        u_t = z_bar_t - C_bar @ xhat_prev
+        xhat_t = A @ xhat_prev + K1 @ u_t
+
+        z_filt[t] = C @ xhat_t
+        k_filt[t] = xhat_t[0]
+
+        xhat_prev = xhat_t
+        z_prev = z_meas[t]
+
+    out = {
+        "y_true": y, "c_true": c, "dk_true": dk, "k_true": k[:-1],
+        "y_meas": z_meas[:, 0], "c_meas": z_meas[:, 1], "dk_meas": z_meas[:, 2],
+        "y_filt": z_filt[:, 0], "c_filt": z_filt[:, 1], "dk_filt": z_filt[:, 2], "k_filt": k_filt
+    }
+    return out
+
+
+sim = simulate_series(seed=7909, T=80, k0=10.0)
+```
+
+```{code-cell} ipython3
+def plot_true_vs_other(t, true_series, other_series, other_label, ylabel=""):
+    fig, ax = plt.subplots(figsize=(8, 3.6))
+    ax.plot(t, true_series, lw=2, color="black", label="true")
+    ax.plot(t, other_series, lw=2, ls="--", color="#1f77b4", label=other_label)
+    ax.set_xlabel("time", fontsize=11)
+    ax.set_ylabel(ylabel, fontsize=11)
+    ax.legend(loc="best")
+    ax.grid(alpha=0.3)
+    plt.tight_layout()
+    plt.show()
+
+
+t = np.arange(1, 81)
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True and measured consumption
+    name: fig-true-measured-consumption
+  image:
+    alt: True and measured consumption plotted over 80 time periods
+---
+plot_true_vs_other(t, sim["c_true"], sim["c_meas"], "measured", ylabel="consumption")
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True and measured investment
+    name: fig-true-measured-investment
+  image:
+    alt: True and measured investment plotted over 80 time periods
+---
+plot_true_vs_other(t, sim["dk_true"], sim["dk_meas"], "measured", ylabel="investment")
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True and measured income
+    name: fig-true-measured-income
+  image:
+    alt: True and measured income plotted over 80 time periods
+---
+plot_true_vs_other(t, sim["y_true"], sim["y_meas"], "measured", ylabel="income")
+```
+
+Figures 1 through 3 show how measurement error distorts each series.
+
+Investment (Figure 2) is hit hardest because its measurement error
+has the largest innovation variance ($\sigma_\eta = 0.65$).
+
+Figures 4 through 7 compare the true series with the Kalman-filtered
+estimates from Model 1.
+
+The filter removes much of the measurement
+noise, recovering series that track the truth closely.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True and filtered consumption
+    name: fig-true-filtered-consumption
+  image:
+    alt: True and filtered consumption plotted over 80 time periods
+---
+plot_true_vs_other(t, sim["c_true"], sim["c_filt"], "filtered", ylabel="consumption")
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True and filtered investment
+    name: fig-true-filtered-investment
+  image:
+    alt: True and filtered investment plotted over 80 time periods
+---
+plot_true_vs_other(t, sim["dk_true"], sim["dk_filt"], "filtered", ylabel="investment")
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True and filtered income
+    name: fig-true-filtered-income
+  image:
+    alt: True and filtered income plotted over 80 time periods
+---
+plot_true_vs_other(t, sim["y_true"], sim["y_filt"], "filtered", ylabel="income")
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: True and filtered capital stock
+    name: fig-true-filtered-capital
+  image:
+    alt: True and filtered capital stock plotted over 80 time periods
+---
+plot_true_vs_other(t, sim["k_true"], sim["k_filt"], "filtered", ylabel="capital stock")
+```
+
+Figures 8 and 9 plot the national income identity residual
+$c_t + \Delta k_t - y_{n,t}$.
+
+In the true model this identity holds exactly.
+
+For measured data (Figure 8) the residual is non-zero because
+independent measurement errors break the accounting identity.
+
+For filtered data (Figure 9) the Kalman filter approximately
+restores the identity.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Measured consumption plus investment minus income
+    name: fig-measured-identity-residual
+  image:
+    alt: National income identity residual for measured data over 80 time periods
+---
+fig, ax = plt.subplots(figsize=(8, 3.6))
+ax.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], color="#d62728", lw=2)
+ax.set_xlabel("time", fontsize=11)
+ax.set_ylabel("residual", fontsize=11)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Filtered consumption plus investment minus income
+    name: fig-filtered-identity-residual
+  image:
+    alt: National income identity residual for filtered data over 80 time periods
+---
+fig, ax = plt.subplots(figsize=(8, 3.6))
+ax.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], color="#2ca02c", lw=2)
+ax.set_xlabel("time", fontsize=11)
+ax.set_ylabel("residual", fontsize=11)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
+```
+
+Table 7 reports covariance and correlation matrices among the true,
+measured, and filtered versions of each variable.
+
+High correlations between true and filtered series confirm that the
+Kalman filter does a good job of removing measurement noise.
+
+Lower correlations between true and measured series quantify how much
+information is lost by using raw data.
+
+```{code-cell} ipython3
+def cov_corr_three(a, b, c):
+    X = np.vstack([a, b, c])
+    return np.cov(X), np.corrcoef(X)
+
+def matrix_df(mat, labels):
+    return pd.DataFrame(np.round(mat, 4), index=labels, columns=labels)
+
+cov_c, corr_c = cov_corr_three(sim["c_true"], sim["c_meas"], sim["c_filt"])
+cov_i, corr_i = cov_corr_three(sim["dk_true"], sim["dk_meas"], sim["dk_filt"])
+cov_y, corr_y = cov_corr_three(sim["y_true"], sim["y_meas"], sim["y_filt"])
+cov_k = np.cov(np.vstack([sim["k_true"], sim["k_filt"]]))
+corr_k = np.corrcoef(np.vstack([sim["k_true"], sim["k_filt"]]))
+
+tmf_labels = ['true', 'measured', 'filtered']
+tf_labels = ['true', 'filtered']
+
+print("Table 7A: Covariance matrix of consumption")
+display(matrix_df(cov_c, tmf_labels))
+
+print("Table 7B: Correlation matrix of consumption")
+display(matrix_df(corr_c, tmf_labels))
+
+print("Table 7C: Covariance matrix of investment")
+display(matrix_df(cov_i, tmf_labels))
+
+print("Table 7D: Correlation matrix of investment")
+display(matrix_df(corr_i, tmf_labels))
+
+print("Table 7E: Covariance matrix of income")
+display(matrix_df(cov_y, tmf_labels))
+
+print("Table 7F: Correlation matrix of income")
+display(matrix_df(corr_y, tmf_labels))
+
+print("Table 7G: Covariance matrix of capital")
+display(matrix_df(cov_k, tf_labels))
+
+print("Table 7H: Correlation matrix of capital")
+display(matrix_df(corr_k, tf_labels))
+```
+
+## Summary
+
+This lecture reproduced the tables and figures in {cite}`Sargent1989`,
+which studies how measurement error alters an econometrician's view
+of a permanent income economy driven by the investment accelerator.
+
+Several lessons emerge:
+
+* The Wold representations and variance decompositions of Model 1 (raw
+  measurements) and Model 2 (filtered measurements) are quite different,
+  even though the underlying economy is the same.
+
+* Measurement error is not a second-order issue: it can
+  reshape inferences about which shocks drive which variables.
+
+* The {doc}`Kalman filter <kalman>` effectively strips measurement noise
+  from the data.
+
+* The filtered series track the truth closely
+  (Figures 4 through 7), and the near-zero residual in Figure 9 shows that
+  the filter approximately restores the national income accounting
+  identity that raw measurement error breaks (Figure 8).
+
+* The forecast-error-variance decompositions (Tables 3 and 5) reveal
+  that Model 1 attributes substantial variance to measurement noise
+  innovations, while Model 2, working with cleaned data, attributes
+  nearly all variance to the single structural shock $\theta_t$.
+
+These results connect to broader themes in this lecture series:
+the role of {doc}`linear state space models <linear_models>` in
+representing economic dynamics, the power of {doc}`Kalman filtering <kalman>`
+for signal extraction, and the importance of the investment accelerator
+for understanding business cycles ({doc}`samuelson`,
+{doc}`chow_business_cycles`).
+
+## References
+
+* {cite}`Sargent1989`

From bc862f69015ed3917daff7954637041ad53d1765 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Mon, 9 Feb 2026 18:07:38 +1100
Subject: [PATCH 10/37] updates

---
 lectures/measurement_models.md | 640 ++++++++++++++++++++++++---------
 1 file changed, 463 insertions(+), 177 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index 6b640aece..0f73e221a 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -49,8 +49,8 @@ The two models produce different Wold representations and
 forecast-error-variance decompositions, even though they describe
 the same underlying economy.
 
-In this lecture we reproduce all numbered tables and figures from
-{cite}`Sargent1989` while studying the underlying mechanisms in the paper.
+In this lecture we reproduce the analysis from {cite}`Sargent1989`
+while studying the underlying mechanisms in the paper.
 
 We use the following imports and precision settings for tables:
 
@@ -59,8 +59,48 @@ import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from scipy import linalg
+from IPython.display import Latex
 
 np.set_printoptions(precision=4, suppress=True)
+
+def df_to_latex_matrix(df, label=''):
+    """Convert DataFrame to LaTeX matrix (for math matrices)."""
+    lines = [r'\begin{bmatrix}']
+
+    for idx, row in df.iterrows():
+        row_str = ' & '.join([f'{v:.4f}' if isinstance(v, (int, float)) else str(v) for v in row]) + r' \\'
+        lines.append(row_str)
+
+    lines.append(r'\end{bmatrix}')
+
+    if label:
+        return '$' + label + ' = ' + '\n'.join(lines) + '$'
+    else:
+        return '$' + '\n'.join(lines) + '$'
+
+def df_to_latex_array(df):
+    """Convert DataFrame to LaTeX array (for tables with headers)."""
+    n_rows, n_cols = df.shape
+
+    # Build column format (centered columns)
+    col_format = 'c' * (n_cols + 1)  # +1 for index
+
+    # Start array
+    lines = [r'\begin{array}{' + col_format + '}']
+
+    # Header row
+    header = ' & '.join([''] + [str(c) for c in df.columns]) + r' \\'
+    lines.append(header)
+    lines.append(r'\hline')
+
+    # Data rows
+    for idx, row in df.iterrows():
+        row_str = str(idx) + ' & ' + ' & '.join([f'{v:.4f}' if isinstance(v, (int, float)) else str(v) for v in row]) + r' \\'
+        lines.append(row_str)
+
+    lines.append(r'\end{array}')
+
+    return '$' + '\n'.join(lines) + '$'
 ```
 
 ## Model Setup
@@ -106,7 +146,15 @@ x_t = \begin{bmatrix} k_t \\ \theta_t \end{bmatrix},
 z_t = \begin{bmatrix} y_{n,t} \\ c_t \\ \Delta k_t \end{bmatrix},
 ```
 
-and matrices
+so that the true economy follows the state-space system
+
+```{math}
+:label: true_ss
+x_{t+1} = A x_t + \varepsilon_t, \qquad z_t = C x_t,
+```
+
+where $\varepsilon_t = \begin{bmatrix} 0 \\ \theta_t \end{bmatrix}$ has
+covariance $E \varepsilon_t \varepsilon_t^\top = Q$ and the matrices are
 
 ```{math}
 A = \begin{bmatrix}
@@ -118,26 +166,53 @@ C = \begin{bmatrix}
 f-1 & 1 \\
 f-1 & 1-f^{-1} \\
 0   & f^{-1}
+\end{bmatrix},
+\qquad
+Q = \begin{bmatrix}
+0 & 0 \\
+0 & 1
 \end{bmatrix}.
 ```
 
+Note that $Q$ is singular because only the second component of $x_t$
+(the productivity shock $\theta_t$) receives an innovation; the
+capital stock $k_t$ evolves deterministically given $\theta_t$.
+
 The econometrician does not observe $z_t$ directly but instead
 sees $\bar z_t = z_t + v_t$, where $v_t$ is a vector of measurement
 errors.
 
-Measurement errors are AR(1):
+Measurement errors follow an AR(1) process:
 
 ```{math}
+:label: meas_error_ar1
 v_t = D v_{t-1} + \eta_t,
 ```
 
-with diagonal
+where $\eta_t$ is a vector white noise with
+$E \eta_t \eta_t^\top = \Sigma_\eta$ and
+$E \varepsilon_t v_s^\top = 0$ for all $t, s$
+(measurement errors are orthogonal to the true state innovations).
+
+The autoregressive matrix and innovation standard deviations are
 
 ```{math}
 D = \operatorname{diag}(0.6, 0.7, 0.3),
+\qquad
+\sigma_\eta = (0.05, 0.035, 0.65),
+```
+
+so the unconditional covariance of $v_t$ is
+
+```{math}
+R = \operatorname{diag}\!\left(\frac{\sigma_{\eta,i}^2}{1 - \rho_i^2}\right).
 ```
 
-and innovation standard deviations $(0.05, 0.035, 0.65)$.
+The measurement errors are ordered from smallest to largest innovation
+variance: income is measured most accurately ($\sigma_\eta = 0.05$),
+consumption next ($\sigma_\eta = 0.035$), and investment least
+accurately ($\sigma_\eta = 0.65$).
+This ordering is central to the results below.
 
 ```{code-cell} ipython3
 f = 1.05
@@ -162,7 +237,7 @@ Q = np.array([
 ρ = np.array([0.6, 0.7, 0.3])
 D = np.diag(ρ)
 
-# Innovation std. devs shown in Table 1
+# Innovation std. devs
 σ_η = np.array([0.05, 0.035, 0.65])
 Σ_η = np.diag(σ_η**2)
 
@@ -170,9 +245,10 @@ D = np.diag(ρ)
 R = np.diag((σ_η / np.sqrt(1.0 - ρ**2))**2)
 
 print(f"f = {f},  β = 1/f = {β:.6f}")
-print("\nA ="); display(pd.DataFrame(A))
-print("C ="); display(pd.DataFrame(C))
-print("D ="); display(pd.DataFrame(D))
+print()
+display(Latex(df_to_latex_matrix(pd.DataFrame(A), 'A')))
+display(Latex(df_to_latex_matrix(pd.DataFrame(C), 'C')))
+display(Latex(df_to_latex_matrix(pd.DataFrame(D), 'D')))
 ```
 
 ## Kalman Filter
@@ -212,7 +288,8 @@ def steady_state_kalman(A, C_obs, Q, R, W=None, tol=1e-13, max_iter=200_000):
     return K, S, V
 ```
 
-## Table 2: True Impulse Responses
+(true-impulse-responses)=
+## True Impulse Responses
 
 Before introducing measurement error, we verify the impulse response of
 the true system to a unit shock $\theta_0 = 1$.
@@ -235,53 +312,101 @@ def table2_irf(A, C, n_lags=6):
 
 rep_table2 = table2_irf(A, C, n_lags=6)
 
-pd.DataFrame(
-    np.round(rep_table2[:, 1:], 4),
-    columns=[r'$y_n$', r'$c$', r'$\Delta k$'],
-    index=pd.Index(range(6), name='Lag')
-)
+fig, ax = plt.subplots(figsize=(8, 4.5))
+ax.plot(rep_table2[:, 0], rep_table2[:, 1], 'o-', label=r'$y_n$', lw=2.5, markersize=7)
+ax.plot(rep_table2[:, 0], rep_table2[:, 2], 's-', label=r'$c$', lw=2.5, markersize=7)
+ax.plot(rep_table2[:, 0], rep_table2[:, 3], '^-', label=r'$\Delta k$', lw=2.5, markersize=7)
+ax.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
+ax.set_xlabel('Lag', fontsize=12)
+ax.set_ylabel('Response', fontsize=12)
+ax.set_title(r'True impulse response to unit shock $\theta_0 = 1$', fontsize=13)
+ax.legend(loc='best', fontsize=11, frameon=True, shadow=True)
+ax.grid(alpha=0.3)
+plt.tight_layout()
+plt.show()
 ```
 
-## Model 1 (Raw Measurements): Tables 3 and 4
+## Model 1 (Raw Measurements)
+
+Model 1 is a classical errors-in-variables model: the data collecting
+agency simply reports the error-corrupted data $\bar z_t = z_t + v_t$
+that it collects, making no attempt to adjust for measurement errors.
 
-Model 1 treats the raw measured series $\bar z_t$ as the observables and
-applies a Kalman filter to extract the state.
+Because the measurement errors $v_t$ are serially correlated (AR(1)),
+we cannot directly apply the Kalman filter to
+$\bar z_t = C x_t + v_t$.
+Following {cite:t}`Sargent1989` (Section III.B), we quasi-difference the
+observation equation.
+
+Substituting $\bar z_t = C x_t + v_t$, $x_{t+1} = A x_t + \varepsilon_t$,
+and $v_{t+1} = D v_t + \eta_t$ into $\bar z_{t+1} - D \bar z_t$ gives
+
+```{math}
+:label: model1_obs
+\bar z_{t+1} - D \bar z_t = \bar C\, x_t + C \varepsilon_t + \eta_t,
+```
 
-Because the measurement errors $v_t$ are serially correlated, Sargent
-quasi-differences the observation equation to obtain an innovation form
-with serially uncorrelated errors.
+where $\bar C = CA - DC$.
 
-The transformed observation equation is
+The composite observation noise in {eq}`model1_obs` is
+$\bar\nu_t = C\varepsilon_t + \eta_t$, which is serially uncorrelated.
+Its covariance, and the cross-covariance between the state noise
+$\varepsilon_t$ and $\bar\nu_t$, are
 
 ```{math}
-\bar z_t - D \bar z_{t-1} = (CA - DC)x_{t-1} + C w_t + \eta_t.
+:label: model1_covs
+R_1 = C Q C^\top + \Sigma_\eta, \qquad W_1 = Q C^\top.
 ```
 
-Hence
+The system $\{x_{t+1} = A x_t + \varepsilon_t,\;
+\bar z_{t+1} - D\bar z_t = \bar C x_t + \bar\nu_t\}$
+with $\text{cov}(\varepsilon_t)=Q$, $\text{cov}(\bar\nu_t)=R_1$, and
+$\text{cov}(\varepsilon_t, \bar\nu_t)=W_1$ now has serially uncorrelated
+errors, so the standard {doc}`Kalman filter <kalman>` applies.
+
+The steady-state Kalman filter yields the **innovations representation**
 
 ```{math}
-\bar C = CA - DC, \quad R_1 = CQC^\top + R, \quad W_1 = QC^\top.
+:label: model1_innov
+\hat x_{t+1} = A \hat x_t + K_1 u_t, \qquad
+\bar z_{t+1} - D\bar z_t = \bar C \hat x_t + u_t,
 ```
 
+where $u_t = (\bar z_{t+1} - D\bar z_t) -
+E[\bar z_{t+1} - D\bar z_t \mid \bar z_t, \bar z_{t-1}, \ldots]$
+is the innovation process, $K_1$ is the Kalman gain, and
+$V_1 = \bar C S_1 \bar C^\top + R_1$ is the innovation covariance matrix
+(with $S_1 = E[(x_t - \hat x_t)(x_t - \hat x_t)^\top]$ the steady-state
+state estimation error covariance).
+
 ```{code-cell} ipython3
 C_bar = C @ A - D @ C
-R1 = C @ Q @ C.T + R
+R1 = C @ Q @ C.T + Σ_η
 W1 = Q @ C.T
 
 K1, S1, V1 = steady_state_kalman(A, C_bar, Q, R1, W1)
 ```
 
-With the Kalman gain in hand, we can derive the Wold moving-average
-representation for the measured data.
+### Wold representation for measured data
+
+With the innovations representation {eq}`model1_innov` in hand, we can
+derive a Wold moving-average representation for the measured data
+$\bar z_t$.
+
+From {eq}`model1_innov` and the quasi-differencing definition, the
+measured data satisfy (see eq. 19 of {cite:t}`Sargent1989`)
+
+```{math}
+:label: model1_wold
+\bar z_{t+1} = (I - DL)^{-1}\bigl[\bar C(I - AL)^{-1}K_1 L + I\bigr] u_t,
+```
 
-This representation tells us how measured $y_n$, $c$, and $\Delta k$
-respond over time to the orthogonalized innovations in the
-innovation covariance matrix $V_1$.
+where $L$ is the lag operator.
 
-To recover the Wold representation, define the augmented state
+To compute the Wold coefficients numerically, define the augmented state
 
 ```{math}
-r_t = \begin{bmatrix} \hat x_{t-1} \\ z_{t-1} \end{bmatrix},
+r_t = \begin{bmatrix} \hat x_{t-1} \\ \bar z_{t-1} \end{bmatrix},
 ```
 
 with dynamics
@@ -289,7 +414,7 @@ with dynamics
 ```{math}
 r_{t+1} = F_1 r_t + G_1 u_t,
 \qquad
-z_t = H_1 r_t + u_t,
+\bar z_t = H_1 r_t + u_t,
 ```
 
 where
@@ -310,6 +435,9 @@ I
 H_1 = [\bar C \;\; D].
 ```
 
+The Wold coefficients are then $\psi_0 = I$ and
+$\psi_j = H_1 F_1^{j-1} G_1$ for $j \geq 1$.
+
 ```{code-cell} ipython3
 F1 = np.block([
     [A, np.zeros((2, 3))],
@@ -348,15 +476,25 @@ resp1 = np.array([psi1[j] @ linalg.cholesky(V1, lower=True) for j in range(14)])
 decomp1 = fev_contributions(psi1, V1, n_horizons=20)
 ```
 
-Table 3 reports the forecast-error-variance decomposition for Model 1.
+### Forecast-error-variance decomposition
+
+To measure the relative importance of each innovation, we decompose
+the $j$-step-ahead forecast-error variance of each measured variable.
 
-Each panel shows the cumulative contribution of one orthogonalized
+Write $\bar z_{t+j} - E_t \bar z_{t+j} = \sum_{i=0}^{j-1} \psi_i u_{t+j-i}$.
+Let $P$ be the lower-triangular Cholesky factor of $V_1$ so that the
+orthogonalized innovations are $e_t = P^{-1} u_t$.
+Then the contribution of orthogonalized innovation $k$ to the
+$j$-step-ahead variance of variable $m$ is
+$\sum_{i=0}^{j-1} (\psi_i P)_{mk}^2$.
+
+Each panel below shows the cumulative contribution of one orthogonalized
 innovation to the forecast-error variance of $y_n$, $c$, and $\Delta k$
 at horizons 1 through 20.
 
 ```{code-cell} ipython3
 horizons = np.arange(1, 21)
-cols = [r'$y_n$', r'$c$', r'$\Delta k$']
+cols = [r'y_n', r'c', r'\Delta k']
 
 def fev_table(decomp, shock_idx, horizons):
     return pd.DataFrame(
@@ -364,26 +502,49 @@ def fev_table(decomp, shock_idx, horizons):
         columns=cols,
         index=pd.Index(horizons, name='Horizon')
     )
+```
 
-print("Table 3A: Contribution of innovation 1")
-display(fev_table(decomp1, 0, horizons))
-
-print("Table 3B: Contribution of innovation 2")
-display(fev_table(decomp1, 1, horizons))
+```{code-cell} ipython3
+fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
+
+for i, (shock_name, ax) in enumerate(zip([r'Innovation 1 ($y_n$)', r'Innovation 2 ($c$)', r'Innovation 3 ($\Delta k$)'], axes)):
+    fev_data = decomp1[:, i, :]
+    ax.plot(horizons, fev_data[0, :], label=r'$y_n$', lw=2.5)
+    ax.plot(horizons, fev_data[1, :], label=r'$c$', lw=2.5)
+    ax.plot(horizons, fev_data[2, :], label=r'$\Delta k$', lw=2.5)
+    ax.set_xlabel('Horizon', fontsize=12)
+    ax.set_ylabel('Contribution to FEV', fontsize=12)
+    ax.set_title(shock_name, fontsize=13)
+    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
+    ax.grid(alpha=0.3)
 
-print("Table 3C: Contribution of innovation 3")
-display(fev_table(decomp1, 2, horizons))
+plt.tight_layout()
+plt.show()
 ```
 
+These plots replicate Table 3 of {cite:t}`Sargent1989`.
+The income innovation accounts for substantial proportions of
+forecast-error variance in all three variables, while the consumption and
+investment innovations contribute mainly to their own variances.
+This is a **Granger causality** pattern: income appears to
+Granger-cause consumption and investment, but not vice versa.
+The pattern arises because income is the best-measured variable
+($\sigma_\eta = 0.05$), so its innovation carries the most
+information about the underlying structural shock $\theta_t$.
+
 The innovation covariance matrix $V_1$ is:
 
 ```{code-cell} ipython3
-labels = [r'$y_n$', r'$c$', r'$\Delta k$']
-pd.DataFrame(np.round(V1, 4), index=labels, columns=labels)
+labels = [r'y_n', r'c', r'\Delta k']
+df_v1 = pd.DataFrame(np.round(V1, 4), index=labels, columns=labels)
+display(Latex(df_to_latex_matrix(df_v1)))
 ```
 
-Table 4 reports the orthogonalized Wold impulse responses for Model 1
-at lags 0 through 13.
+### Wold impulse responses
+
+The orthogonalized Wold impulse responses $\psi_j P$ show how the
+measured variables respond at lag $j$ to a one-standard-deviation
+orthogonalized innovation.  We plot lags 0 through 13.
 
 ```{code-cell} ipython3
 lags = np.arange(14)
@@ -394,34 +555,122 @@ def wold_response_table(resp, shock_idx, lags):
         columns=cols,
         index=pd.Index(lags, name='Lag')
     )
+```
 
-print("Table 4A: Response to innovation in y_n")
-display(wold_response_table(resp1, 0, lags))
+```{code-cell} ipython3
+fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
+
+for i, (shock_name, ax) in enumerate(zip([r'Innovation in $y_n$', r'Innovation in $c$', r'Innovation in $\Delta k$'], axes)):
+    ax.plot(lags, resp1[:, 0, i], label=r'$y_n$', lw=2.5)
+    ax.plot(lags, resp1[:, 1, i], label=r'$c$', lw=2.5)
+    ax.plot(lags, resp1[:, 2, i], label=r'$\Delta k$', lw=2.5)
+    ax.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
+    ax.set_xlabel('Lag', fontsize=12)
+    ax.set_ylabel('Response', fontsize=12)
+    ax.set_title(shock_name, fontsize=13)
+    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
+    ax.grid(alpha=0.3)
 
-print("Table 4B: Response to innovation in c")
-display(wold_response_table(resp1, 1, lags))
+plt.tight_layout()
+plt.show()
+```
+
+These plots replicate Table 4 of {cite:t}`Sargent1989`.
+An income innovation generates persistent responses in all variables
+because, being the best-measured series, its innovation is dominated
+by the true permanent shock $\theta_t$, which permanently raises the
+capital stock and hence steady-state consumption and income.
+A consumption innovation produces smaller, decaying responses
+that reflect the AR(1) structure of its measurement error ($\rho = 0.7$).
+An investment innovation has a large initial impact on investment itself,
+consistent with the high measurement error variance ($\sigma_\eta = 0.65$),
+but the effect dies out quickly.
+
+## Model 2 (Filtered Measurements)
+
+Model 2 corresponds to a data collecting agency that, instead of
+reporting raw error-corrupted data, applies an optimal filter
+to construct least-squares estimates of the true variables.
 
-print("Table 4C: Response to innovation in Δk")
-display(wold_response_table(resp1, 2, lags))
+Specifically, the agency uses the Kalman filter from Model 1 to form
+$\hat x_t = E[x_t \mid \bar z_{t-1}, \bar z_{t-2}, \ldots]$ and reports
+filtered estimates
+
+```{math}
+\tilde z_t = G \hat x_t,
 ```
 
-## Model 2 (Filtered Measurements): Tables 5 and 6
+where $G = C$ is a selection matrix
+(see eq. 23 of {cite:t}`Sargent1989`).
 
-Model 2 takes a different approach: instead of working with the raw data,
-the econometrician first applies the Kalman filter from Model 1 to
-strip out measurement error and then treats the filtered estimates
-$\hat z_t = C \hat x_t$ as if they were the true observations.
+### State-space for filtered data
 
-A second Kalman filter is then applied to the filtered series.
+From the innovations representation {eq}`model1_innov`, the state
+$\hat x_t$ evolves as
 
-The state noise covariance for this second filter is
+```{math}
+:label: model2_state
+\hat x_{t+1} = A \hat x_t + K_1 u_t.
+```
+
+The reported filtered data are then
+
+```{math}
+:label: model2_obs
+\tilde z_t = C \hat x_t + \eta_t,
+```
+
+where $\eta_t$ is a type 2 white-noise measurement error process
+("typos") with presumably very small covariance matrix $R_2$.
+
+The state noise in {eq}`model2_state` is $K_1 u_t$, which has covariance
+
+```{math}
+:label: model2_Q
+Q_2 = K_1 V_1 K_1^\top.
+```
+
+The covariance matrix of the joint noise is
+(see eq. 25 of {cite:t}`Sargent1989`)
 
 ```{math}
-Q_2 = K_1 V_1 K_1^\top,
+E \begin{bmatrix} K_1 u_t \\ \eta_t \end{bmatrix}
+  \begin{bmatrix} K_1 u_t \\ \eta_t \end{bmatrix}^\top
+= \begin{bmatrix} Q_2 & 0 \\ 0 & R_2 \end{bmatrix}.
 ```
 
-We solve a second Kalman system with tiny measurement noise to regularize the
-near-singular covariance matrix.
+Since $R_2$ is close to or equal to zero (the filtered data have
+negligible additional noise), we approximate it with a small
+regularization term $R_2 = \epsilon I$ to keep the Kalman filter
+numerically well-conditioned.
+
+A second Kalman filter applied to {eq}`model2_state`--{eq}`model2_obs`
+yields a second innovations representation
+
+```{math}
+:label: model2_innov
+\hat{\hat x}_{t+1} = A \hat{\hat x}_t + K_2 a_t,
+\qquad
+\tilde z_t = C \hat{\hat x}_t + a_t,
+```
+
+where $a_t$ is the innovation process for the filtered data with
+covariance $V_2 = C S_2 C^\top + R_2$.
+
+### Wold representation for filtered data
+
+The Wold moving-average representation for $\tilde z_t$ is
+(see eq. 29 of {cite:t}`Sargent1989`)
+
+```{math}
+:label: model2_wold
+\tilde z_t = \bigl[C(I - AL)^{-1} K_2 L + I\bigr] a_t,
+```
+
+with coefficients $\psi_0 = I$ and $\psi_j = C A^{j-1} K_2$ for
+$j \geq 1$.  Note that this is simpler than the Model 1 Wold
+representation {eq}`model1_wold` because there is no quasi-differencing
+to undo.
 
 ```{code-cell} ipython3
 Q2 = K1 @ V1 @ K1.T
@@ -444,61 +693,95 @@ resp2 = np.array([psi2[j] @ linalg.cholesky(V2, lower=True) for j in range(14)])
 decomp2 = fev_contributions(psi2, V2, n_horizons=20)
 ```
 
-Table 5 is the analogue of Table 3 for Model 2.
+### Forecast-error-variance decomposition
 
-Because the filtered data are nearly noiseless, the second and third
-innovations contribute very little to forecast-error variance.
+Because the filtered data are nearly noiseless, the innovation
+covariance $V_2$ is close to singular with one dominant eigenvalue.
+This means the filtered economy is driven by essentially one shock,
+just like the true economy.
 
 ```{code-cell} ipython3
-print("Table 5A: Contribution of innovation 1")
-display(fev_table(decomp2, 0, horizons))
-
-print("Table 5B: Contribution of innovation 2 (×10³)")
-display(pd.DataFrame(
-    np.round(decomp2[:, 1, :].T * 1e3, 4),
-    columns=cols,
-    index=pd.Index(horizons, name='Horizon')
-))
+fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
+
+for i, (shock_name, ax) in enumerate(zip([r'Innovation 1 ($y_n$)', r'Innovation 2 ($c$) $\times 10^3$', r'Innovation 3 ($\Delta k$) $\times 10^6$'], axes)):
+    scale = 1 if i == 0 else (1e3 if i == 1 else 1e6)
+    fev_data = decomp2[:, i, :] * scale
+    ax.plot(horizons, fev_data[0, :], label=r'$y_n$', lw=2.5)
+    ax.plot(horizons, fev_data[1, :], label=r'$c$', lw=2.5)
+    ax.plot(horizons, fev_data[2, :], label=r'$\Delta k$', lw=2.5)
+    ax.set_xlabel('Horizon', fontsize=12)
+    ax.set_ylabel('Contribution to FEV', fontsize=12)
+    ax.set_title(shock_name, fontsize=13)
+    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
+    ax.grid(alpha=0.3)
 
-print("Table 5C: Contribution of innovation 3 (×10⁶)")
-display(pd.DataFrame(
-    np.round(decomp2[:, 2, :].T * 1e6, 4),
-    columns=cols,
-    index=pd.Index(horizons, name='Horizon')
-))
+plt.tight_layout()
+plt.show()
 ```
 
+These plots replicate Table 5 of {cite:t}`Sargent1989`.
+In Model 2, the first innovation accounts for virtually all forecast-error
+variance, just as in the true economy where the single structural shock
+$\theta_t$ drives everything.
+The second and third innovations contribute negligibly (note the scaling
+factors of $10^3$ and $10^6$ required to make them visible).
+This confirms that filtering strips away the measurement noise that created
+the appearance of multiple independent sources of variation in Model 1.
+
 The innovation covariance matrix $V_2$ for Model 2 is:
 
 ```{code-cell} ipython3
-pd.DataFrame(np.round(V2, 4), index=labels, columns=labels)
+df_v2 = pd.DataFrame(np.round(V2, 4), index=labels, columns=labels)
+display(Latex(df_to_latex_matrix(df_v2)))
 ```
 
-Table 6 reports the orthogonalized Wold impulse responses for Model 2.
+### Wold impulse responses
 
-```{code-cell} ipython3
-print("Table 6A: Response to innovation in y_n")
-display(wold_response_table(resp2, 0, lags))
+The following plots show the orthogonalized Wold impulse responses for Model 2.
 
-print("Table 6B: Response to innovation in c")
-display(wold_response_table(resp2, 1, lags))
+```{code-cell} ipython3
+fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
+
+for i, (shock_name, scale) in enumerate(zip([r'Innovation in $y_n$', r'Innovation in $c$ $\times 10^3$', r'Innovation in $\Delta k$ $\times 10^3$'],
+                                             [1, 1e3, 1e3])):
+    ax = axes[i]
+    ax.plot(lags, resp2[:, 0, i] * scale, label=r'$y_n$', lw=2.5)
+    ax.plot(lags, resp2[:, 1, i] * scale, label=r'$c$', lw=2.5)
+    ax.plot(lags, resp2[:, 2, i] * scale, label=r'$\Delta k$', lw=2.5)
+    ax.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
+    ax.set_xlabel('Lag', fontsize=12)
+    ax.set_ylabel('Response', fontsize=12)
+    ax.set_title(shock_name, fontsize=13)
+    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
+    ax.grid(alpha=0.3)
 
-print("Table 6C: Response to innovation in Δk (×10³)")
-display(pd.DataFrame(
-    np.round(resp2[:, :, 2] * 1e3, 4),
-    columns=cols,
-    index=pd.Index(lags, name='Lag')
-))
+plt.tight_layout()
+plt.show()
 ```
 
-## Simulation: Figures 1 through 9 and Table 7
+These plots replicate Table 6 of {cite:t}`Sargent1989`.
+The income innovation in Model 2 produces responses that closely
+approximate the true impulse response function from the structural
+shock $\theta_t$ (compare with the figure in the
+{ref}`true-impulse-responses` section above).
+The consumption and investment innovations produce responses
+that are orders of magnitude smaller (note the $10^3$ scaling),
+confirming that the filtered data are driven by essentially one shock.
+
+A key implication: unlike Model 1, the filtered data from Model 2
+**cannot** reproduce the apparent Granger causality pattern that the
+accelerator literature has documented empirically.
+As {cite:t}`Sargent1989` emphasizes, the two models of measurement
+produce quite different inferences about the economy's dynamics despite
+sharing identical deep parameters.
+
+## Simulation
 
 The tables above characterize population moments of the two models.
 
 To see how the models perform on a finite sample, Sargent simulates
 80 periods of true, measured, and filtered data and reports
-covariance and correlation matrices (Table 7) together with
-time-series plots (Figures 1 through 9).
+covariance and correlation matrices together with time-series plots.
 
 We replicate these objects below.
 
@@ -563,12 +846,12 @@ sim = simulate_series(seed=7909, T=80, k0=10.0)
 
 ```{code-cell} ipython3
 def plot_true_vs_other(t, true_series, other_series, other_label, ylabel=""):
-    fig, ax = plt.subplots(figsize=(8, 3.6))
-    ax.plot(t, true_series, lw=2, color="black", label="true")
-    ax.plot(t, other_series, lw=2, ls="--", color="#1f77b4", label=other_label)
-    ax.set_xlabel("time", fontsize=11)
-    ax.set_ylabel(ylabel, fontsize=11)
-    ax.legend(loc="best")
+    fig, ax = plt.subplots(figsize=(8, 4))
+    ax.plot(t, true_series, lw=2.5, color="black", label="true")
+    ax.plot(t, other_series, lw=2.5, ls="--", color="#1f77b4", label=other_label)
+    ax.set_xlabel("Time", fontsize=12)
+    ax.set_ylabel(ylabel.capitalize(), fontsize=12)
+    ax.legend(loc="best", fontsize=11, frameon=True, shadow=True)
     ax.grid(alpha=0.3)
     plt.tight_layout()
     plt.show()
@@ -613,16 +896,15 @@ mystnb:
 plot_true_vs_other(t, sim["y_true"], sim["y_meas"], "measured", ylabel="income")
 ```
 
-Figures 1 through 3 show how measurement error distorts each series.
+The first three figures replicate Figures 1--3 of {cite:t}`Sargent1989`.
+Investment is distorted the most because its measurement error
+has the largest innovation variance ($\sigma_\eta = 0.65$),
+while income is distorted the least ($\sigma_\eta = 0.05$).
 
-Investment (Figure 2) is hit hardest because its measurement error
-has the largest innovation variance ($\sigma_\eta = 0.65$).
-
-Figures 4 through 7 compare the true series with the Kalman-filtered
-estimates from Model 1.
-
-The filter removes much of the measurement
-noise, recovering series that track the truth closely.
+The next four figures (Figures 4--7 in the paper) compare
+true series with the Kalman-filtered estimates from Model 1.
+The filter removes much of the measurement noise, recovering
+series that track the truth closely.
 
 ```{code-cell} ipython3
 ---
@@ -672,59 +954,51 @@ mystnb:
 plot_true_vs_other(t, sim["k_true"], sim["k_filt"], "filtered", ylabel="capital stock")
 ```
 
-Figures 8 and 9 plot the national income identity residual
-$c_t + \Delta k_t - y_{n,t}$.
+The following figure plots the national income identity residual
+$c_t + \Delta k_t - y_{n,t}$ for both measured and filtered data
+(Figures 8--9 of {cite:t}`Sargent1989`).
 
 In the true model this identity holds exactly.
-
-For measured data (Figure 8) the residual is non-zero because
+For measured data the residual is non-zero because
 independent measurement errors break the accounting identity.
-
-For filtered data (Figure 9) the Kalman filter approximately
-restores the identity.
+For filtered data the Kalman filter approximately restores the identity.
 
 ```{code-cell} ipython3
 ---
 mystnb:
   figure:
-    caption: Measured consumption plus investment minus income
-    name: fig-measured-identity-residual
+    caption: "National income identity residual: measured (left) vs. filtered (right)"
+    name: fig-identity-residual
   image:
-    alt: National income identity residual for measured data over 80 time periods
+    alt: National income identity residual for measured and filtered data side by side
 ---
-fig, ax = plt.subplots(figsize=(8, 3.6))
-ax.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], color="#d62728", lw=2)
-ax.set_xlabel("time", fontsize=11)
-ax.set_ylabel("residual", fontsize=11)
-ax.grid(alpha=0.3)
-plt.tight_layout()
-plt.show()
-```
+fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
+
+ax1.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], color="#d62728", lw=2.5)
+ax1.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
+ax1.set_xlabel("Time", fontsize=12)
+ax1.set_ylabel("Residual", fontsize=12)
+ax1.set_title(r'Measured: $c_t + \Delta k_t - y_{n,t}$', fontsize=13)
+ax1.grid(alpha=0.3)
+
+ax2.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], color="#2ca02c", lw=2.5)
+ax2.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
+ax2.set_xlabel("Time", fontsize=12)
+ax2.set_ylabel("Residual", fontsize=12)
+ax2.set_title(r'Filtered: $c_t + \Delta k_t - y_{n,t}$', fontsize=13)
+ax2.grid(alpha=0.3)
 
-```{code-cell} ipython3
----
-mystnb:
-  figure:
-    caption: Filtered consumption plus investment minus income
-    name: fig-filtered-identity-residual
-  image:
-    alt: National income identity residual for filtered data over 80 time periods
----
-fig, ax = plt.subplots(figsize=(8, 3.6))
-ax.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], color="#2ca02c", lw=2)
-ax.set_xlabel("time", fontsize=11)
-ax.set_ylabel("residual", fontsize=11)
-ax.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 
-Table 7 reports covariance and correlation matrices among the true,
-measured, and filtered versions of each variable.
+The following covariance and correlation matrices replicate Table 7
+of {cite:t}`Sargent1989`.
+For each variable we report the $3 \times 3$ covariance and correlation
+matrices among the true, measured, and filtered versions.
 
 High correlations between true and filtered series confirm that the
-Kalman filter does a good job of removing measurement noise.
-
+Kalman filter removes most measurement noise.
 Lower correlations between true and measured series quantify how much
 information is lost by using raw data.
 
@@ -744,35 +1018,46 @@ corr_k = np.corrcoef(np.vstack([sim["k_true"], sim["k_filt"]]))
 
 tmf_labels = ['true', 'measured', 'filtered']
 tf_labels = ['true', 'filtered']
+```
 
-print("Table 7A: Covariance matrix of consumption")
-display(matrix_df(cov_c, tmf_labels))
+**Consumption** -- Measurement error inflates variance, but the filtered
+series recovers a variance close to the truth.
+The true-filtered correlation exceeds 0.99.
 
-print("Table 7B: Correlation matrix of consumption")
-display(matrix_df(corr_c, tmf_labels))
+```{code-cell} ipython3
+display(Latex(df_to_latex_matrix(matrix_df(cov_c, tmf_labels))))
+display(Latex(df_to_latex_matrix(matrix_df(corr_c, tmf_labels))))
+```
 
-print("Table 7C: Covariance matrix of investment")
-display(matrix_df(cov_i, tmf_labels))
+**Investment** -- Because $\sigma_\eta = 0.65$ is large, measurement error
+creates the most variance inflation here.
+Despite this, the true-filtered correlation remains high,
+demonstrating the filter's effectiveness even with severe noise.
 
-print("Table 7D: Correlation matrix of investment")
-display(matrix_df(corr_i, tmf_labels))
+```{code-cell} ipython3
+display(Latex(df_to_latex_matrix(matrix_df(cov_i, tmf_labels))))
+display(Latex(df_to_latex_matrix(matrix_df(corr_i, tmf_labels))))
+```
 
-print("Table 7E: Covariance matrix of income")
-display(matrix_df(cov_y, tmf_labels))
+**Income** -- Income has the smallest measurement error, so measured
+and true variances are close.  True-filtered correlations are very high.
 
-print("Table 7F: Correlation matrix of income")
-display(matrix_df(corr_y, tmf_labels))
+```{code-cell} ipython3
+display(Latex(df_to_latex_matrix(matrix_df(cov_y, tmf_labels))))
+display(Latex(df_to_latex_matrix(matrix_df(corr_y, tmf_labels))))
+```
 
-print("Table 7G: Covariance matrix of capital")
-display(matrix_df(cov_k, tf_labels))
+**Capital stock** -- The capital stock is never directly observed, yet
+the filter recovers it with very high accuracy.
 
-print("Table 7H: Correlation matrix of capital")
-display(matrix_df(corr_k, tf_labels))
+```{code-cell} ipython3
+display(Latex(df_to_latex_matrix(matrix_df(cov_k, tf_labels))))
+display(Latex(df_to_latex_matrix(matrix_df(corr_k, tf_labels))))
 ```
 
 ## Summary
 
-This lecture reproduced the tables and figures in {cite}`Sargent1989`,
+This lecture reproduced the analysis in {cite}`Sargent1989`,
 which studies how measurement error alters an econometrician's view
 of a permanent income economy driven by the investment accelerator.
 
@@ -785,18 +1070,19 @@ Several lessons emerge:
 * Measurement error is not a second-order issue: it can
   reshape inferences about which shocks drive which variables.
 
-* The {doc}`Kalman filter <kalman>` effectively strips measurement noise
-  from the data.
+* Model 1 reproduces the **Granger causality** pattern documented in the
+  empirical accelerator literature -- income appears to Granger-cause
+  consumption and investment -- but this pattern is an artifact of
+  measurement error ordering, not of the structural model.
 
-* The filtered series track the truth closely
-  (Figures 4 through 7), and the near-zero residual in Figure 9 shows that
-  the filter approximately restores the national income accounting
-  identity that raw measurement error breaks (Figure 8).
+* Model 2, working with filtered data, attributes nearly all variance to
+  the single structural shock $\theta_t$ and **cannot** reproduce the
+  Granger causality pattern.
 
-* The forecast-error-variance decompositions (Tables 3 and 5) reveal
-  that Model 1 attributes substantial variance to measurement noise
-  innovations, while Model 2, working with cleaned data, attributes
-  nearly all variance to the single structural shock $\theta_t$.
+* The {doc}`Kalman filter <kalman>` effectively strips measurement noise
+  from the data: the filtered series track the truth closely, and the
+  near-zero residual shows that the filter approximately restores the
+  national income accounting identity that raw measurement error breaks.
 
 These results connect to broader themes in this lecture series:
 the role of {doc}`linear state space models <linear_models>` in

From 23d89876aae4b1a52d5c29eb1113adefc4d416ad Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Mon, 9 Feb 2026 23:00:03 +1100
Subject: [PATCH 11/37] updates

---
 lectures/measurement_models.md | 722 ++++++++++++++++++++++-----------
 1 file changed, 478 insertions(+), 244 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index 0f73e221a..637a039b4 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -36,23 +36,24 @@ The setting is a {doc}`permanent income <perm_income>` economy in which the
 investment accelerator, the mechanism studied in {doc}`samuelson` and
 {doc}`chow_business_cycles`, drives business cycle fluctuations.
 
-Sargent specifies a {doc}`linear state space model <linear_models>` for the
-true economy and then considers two ways of extracting information from
+We specify a {doc}`linear state space model <linear_models>` for the
+true economy and then consider two ways of extracting information from
 noisy measurements:
 
-- Model 1 applies a {doc}`Kalman filter <kalman>` directly to
+- In Model 1, the data collecting agency simply reports
   raw (noisy) observations.
-- Model 2 first filters the data to remove measurement error,
-  then computes dynamics from the filtered series.
+- In Model 2, the agency applies an optimal
+  {doc}`Kalman filter <kalman>` to the noisy data and
+  reports least-squares estimates of the true variables.
 
 The two models produce different Wold representations and
 forecast-error-variance decompositions, even though they describe
 the same underlying economy.
 
-In this lecture we reproduce the analysis from {cite}`Sargent1989`
+In this lecture we reproduce the analysis from {cite:t}`Sargent1989`
 while studying the underlying mechanisms in the paper.
 
-We use the following imports and precision settings for tables:
+We use the following imports and functions for matrices and tables
 
 ```{code-cell} ipython3
 import numpy as np
@@ -61,14 +62,16 @@ import matplotlib.pyplot as plt
 from scipy import linalg
 from IPython.display import Latex
 
-np.set_printoptions(precision=4, suppress=True)
+np.set_printoptions(precision=3, suppress=True)
 
 def df_to_latex_matrix(df, label=''):
-    """Convert DataFrame to LaTeX matrix (for math matrices)."""
+    """Convert DataFrame to LaTeX matrix."""
     lines = [r'\begin{bmatrix}']
 
     for idx, row in df.iterrows():
-        row_str = ' & '.join([f'{v:.4f}' if isinstance(v, (int, float)) else str(v) for v in row]) + r' \\'
+        row_str = ' & '.join(
+          [f'{v:.4f}' if isinstance(v, (int, float)) 
+            else str(v) for v in row]) + r' \\'
         lines.append(row_str)
 
     lines.append(r'\end{bmatrix}')
@@ -79,7 +82,7 @@ def df_to_latex_matrix(df, label=''):
         return '$' + '\n'.join(lines) + '$'
 
 def df_to_latex_array(df):
-    """Convert DataFrame to LaTeX array (for tables with headers)."""
+    """Convert DataFrame to LaTeX array."""
     n_rows, n_cols = df.shape
 
     # Build column format (centered columns)
@@ -95,7 +98,9 @@ def df_to_latex_array(df):
 
     # Data rows
     for idx, row in df.iterrows():
-        row_str = str(idx) + ' & ' + ' & '.join([f'{v:.4f}' if isinstance(v, (int, float)) else str(v) for v in row]) + r' \\'
+        row_str = str(idx) + ' & ' + ' & '.join(
+          [f'{v:.3f}' if isinstance(v, (int, float)) else str(v) 
+          for v in row]) + r' \\'
         lines.append(row_str)
 
     lines.append(r'\end{array}')
@@ -103,47 +108,205 @@ def df_to_latex_array(df):
     return '$' + '\n'.join(lines) + '$'
 ```
 
-## Model Setup
+## The economic model
 
-The true economy is a version of the permanent income model
-(see {doc}`perm_income`) in which a representative consumer
-chooses consumption $c_t$ and capital accumulation $\Delta k_t$
-to maximize expected discounted utility subject to a budget
-constraint.
+The true economy is a linear-quadratic version of a stochastic
+optimal growth model (see also {doc}`perm_income`).
 
-Assume that the discount factor satisfies $\beta f = 1$ and that the
-productivity shock $\theta_t$ is white noise.
+A social planner maximizes
 
-The optimal decision rules reduce the true system to
+```{math}
+:label: planner_obj
+E \sum_{t=0}^{\infty} \beta^t \left( u_0 + u_1 c_t - \frac{u_2}{2} c_t^2 \right)
+```
+
+subject to the technology
+
+```{math}
+:label: tech_constraint
+c_t + k_{t+1} = f k_t + \theta_t, \qquad \beta f^2 > 1,
+```
+
+where $c_t$ is consumption, $k_t$ is the capital stock,
+$f$ is the gross rate of return on capital,
+and $\theta_t$ is an endowment or technology shock following
+
+```{math}
+:label: shock_process
+a(L)\,\theta_t = \varepsilon_t,
+```
+
+with $a(L) = 1 - a_1 L - a_2 L^2 - \cdots - a_r L^r$ having all roots
+outside the unit circle.
+
+### Optimal decision rule
+
+The solution can be represented by the optimal decision rule
+for $c_t$:
 
 ```{math}
-\begin{aligned}
-k_{t+1} &= k_t + f^{-1}\theta_t, \\
-y_{n,t} &= (f-1)k_t + \theta_t, \\
-c_t &= (f-1)k_t + (1-f^{-1})\theta_t, \\
-\Delta k_t &= f^{-1}\theta_t.
-\end{aligned}
+:label: opt_decision
+c_t = \frac{-\alpha}{f-1}
+      + \left(1 - \frac{1}{\beta f^2}\right)
+        \frac{L - f^{-1} a(f^{-1})^{-1} a(L)}{L - f^{-1}}\,\theta_t
+      + f k_t,
+\qquad
+k_{t+1} = f k_t + \theta_t - c_t,
 ```
 
-with $f = 1.05$ and $\theta_t \sim \mathcal{N}(0, 1)$.
+where $\alpha = u_1[1-(\beta f)^{-1}]/u_2$.
+
+Equations {eq}`shock_process` and {eq}`opt_decision` exhibit the
+cross-equation restrictions characteristic of rational expectations
+models.
+
+### Net income and the accelerator
+
+Define net output or national income as
+
+```{math}
+:label: net_income
+y_{nt} = (f-1)k_t + \theta_t.
+```
+
+Note that {eq}`tech_constraint` and {eq}`net_income` imply
+$(k_{t+1} - k_t) + c_t = y_{nt}$.
+
+To obtain both a version of {cite:t}`Friedman1956`'s geometric
+distributed lag consumption function and a distributed lag
+accelerator, we impose two assumptions:
 
-Here $k_t$ is capital, $y_{n,t}$ is national income, $c_t$ is consumption,
-and $\Delta k_t$ is net investment.
+1. $a(L) = 1$, so that $\theta_t$ is white noise.
+2. $\beta f = 1$, so the rate of return on capital equals the rate
+   of time preference.
 
-Notice the investment accelerator at work: because $\Delta k_t = f^{-1}\theta_t$,
-investment responds only to the innovation $\theta_t$, not to the level of
-capital. 
+Assumption 1 is crucial for the strict form of the accelerator.
+
+Relaxing it to allow serially correlated $\theta_t$ preserves an
+accelerator in a broad sense but loses the sharp geometric-lag
+form of {eq}`accelerator`.
+
+Adding a second shock breaks the one-index structure entirely
+and can generate nontrivial Granger causality even without
+measurement error.
+
+The accelerator projection is also not invariant under
+interventions that alter predictable components of income.
+
+Assumption 2 is less important, affecting only various constants.
+
+Under both assumptions, {eq}`opt_decision` simplifies to
+
+```{math}
+:label: simple_crule
+c_t = (1-f^{-1})\,\theta_t + (f-1)\,k_t.
+```
+
+When {eq}`simple_crule`, {eq}`net_income`, and
+{eq}`tech_constraint` are combined, the optimal plan satisfies
+
+```{math}
+:label: friedman_consumption
+c_t = \left(\frac{1-\beta}{1-\beta L}\right) y_{nt},
+```
+
+```{math}
+:label: accelerator
+k_{t+1} - k_t = f^{-1} \left(\frac{1-L}{1-\beta L}\right) y_{nt},
+```
+
+```{math}
+:label: income_process
+y_{nt} = \theta_t + (1-\beta)(\theta_{t-1} + \theta_{t-2} + \cdots).
+```
+
+Equation {eq}`friedman_consumption` is Friedman's consumption
+model: consumption is a geometric distributed lag of income,
+with the decay coefficient $\beta$ equal to the discount factor.
+
+Equation {eq}`accelerator` is the distributed lag accelerator:
+investment is a geometric distributed lag of the first difference
+of income.
 
 This is the same mechanism that {cite:t}`Chow1968` documented
 empirically (see {doc}`chow_business_cycles`).
 
-We can cast this as a {doc}`linear state space model <linear_models>` by
-defining state and observable vectors
+Equation {eq}`income_process` says that $y_{nt}$ is an IMA(1,1)
+process with innovation $\theta_t$.
+
+As {cite:t}`Muth1960` showed, such a process is optimally forecast
+via a geometric distributed lag or "adaptive expectations" scheme.
+
+### The accelerator puzzle
+
+When all variables are measured accurately and are driven by
+the single shock $\theta_t$, the spectral density of
+$(c_t,\, k_{t+1}-k_t,\, y_{nt})$ has rank one at all frequencies.
+
+Each variable is an invertible one-sided distributed lag of the
+same white noise, so no variable Granger-causes any other.
+
+Empirically, however, measures of output Granger-cause investment
+but not vice versa.
+
+{cite:t}`Sargent1989` shows that measurement error can resolve
+this puzzle.
+
+To illustrate, suppose first that output $y_{nt}$ is measured
+perfectly while consumption and capital are each polluted by
+serially correlated measurement errors $v_{ct}$ and $v_{kt}$
+orthogonal to $\theta_t$.
+
+Let $\bar c_t$ and $\bar k_{t+1} - \bar k_t$ denote the measured
+series.  Then
+
+```{math}
+:label: meas_consumption
+\bar c_t = \left(\frac{1-\beta}{1-\beta L}\right) y_{nt} + v_{ct},
+```
+
+```{math}
+:label: meas_investment
+\bar k_{t+1} - \bar k_t
+  = \beta\left(\frac{1-L}{1-\beta L}\right) y_{nt}
+  + (v_{k,t+1} - v_{kt}),
+```
+
+```{math}
+:label: income_process_ma
+y_{nt} = \theta_t + (1-\beta)(\theta_{t-1} + \theta_{t-2} + \cdots).
+```
+
+In this case income Granger-causes consumption and investment
+but is not Granger-caused by them.
+
+In the numerical example below, $y_{nt}$ is also measured
+with error: the agency reports $\bar y_{nt} = y_{nt} + v_{yt}$,
+where $v_{yt}$ follows an AR(1) process orthogonal to $\theta_t$.
+
+When every series is corrupted by measurement error, every measured
+variable Granger-causes every other.
+
+The strength of Granger causality depends on the relative
+signal-to-noise ratios.
+
+In a one-common-index model like this one ($\theta_t$ is the
+common index), the best-measured variable extends the most
+Granger causality to the others.
+
+## State-space formulation
+
+We now map the economic model and the measurement process into
+a recursive state-space framework.
+
+Set $f = 1.05$ and $\theta_t \sim \mathcal{N}(0, 1)$.
+
+Define the state and observable vectors
 
 ```{math}
 x_t = \begin{bmatrix} k_t \\ \theta_t \end{bmatrix},
 \qquad
-z_t = \begin{bmatrix} y_{n,t} \\ c_t \\ \Delta k_t \end{bmatrix},
+z_t = \begin{bmatrix} y_{nt} \\ c_t \\ \Delta k_t \end{bmatrix},
 ```
 
 so that the true economy follows the state-space system
@@ -174,27 +337,28 @@ Q = \begin{bmatrix}
 \end{bmatrix}.
 ```
 
-Note that $Q$ is singular because only the second component of $x_t$
-(the productivity shock $\theta_t$) receives an innovation; the
-capital stock $k_t$ evolves deterministically given $\theta_t$.
+$Q$ is singular because there is only one source of randomness
+$\theta_t$; the capital stock $k_t$ evolves deterministically
+given $\theta_t$.
+
+### Measurement errors
 
 The econometrician does not observe $z_t$ directly but instead
 sees $\bar z_t = z_t + v_t$, where $v_t$ is a vector of measurement
 errors.
 
-Measurement errors follow an AR(1) process:
+Measurement errors follow an AR(1) process
 
 ```{math}
 :label: meas_error_ar1
-v_t = D v_{t-1} + \eta_t,
+v_{t+1} = D v_t + \eta_t,
 ```
 
 where $\eta_t$ is a vector white noise with
 $E \eta_t \eta_t^\top = \Sigma_\eta$ and
-$E \varepsilon_t v_s^\top = 0$ for all $t, s$
-(measurement errors are orthogonal to the true state innovations).
+$E \varepsilon_t v_s^\top = 0$ for all $t, s$.
 
-The autoregressive matrix and innovation standard deviations are
+The parameters are
 
 ```{math}
 D = \operatorname{diag}(0.6, 0.7, 0.3),
@@ -208,11 +372,19 @@ so the unconditional covariance of $v_t$ is
 R = \operatorname{diag}\!\left(\frac{\sigma_{\eta,i}^2}{1 - \rho_i^2}\right).
 ```
 
-The measurement errors are ordered from smallest to largest innovation
-variance: income is measured most accurately ($\sigma_\eta = 0.05$),
-consumption next ($\sigma_\eta = 0.035$), and investment least
-accurately ($\sigma_\eta = 0.65$).
-This ordering is central to the results below.
+Consumption has the smallest measurement error innovation variance
+($\sigma_\eta = 0.035$), income is next ($\sigma_\eta = 0.05$),
+and investment has the largest ($\sigma_\eta = 0.65$).
+
+However, the ordering that matters for the results below is the
+signal-to-noise ratio.
+
+Income carries a coefficient of $1$ on $\theta_t$,
+whereas consumption carries only $1 - f^{-1} \approx 0.048$.
+
+The income innovation is therefore by far the most informative
+about $\theta_t$, even though its measurement error innovation
+is slightly larger than consumption's.
 
 ```{code-cell} ipython3
 f = 1.05
@@ -237,7 +409,7 @@ Q = np.array([
 ρ = np.array([0.6, 0.7, 0.3])
 D = np.diag(ρ)
 
-# Innovation std. devs
+# Innovation std. devs of η_t
 σ_η = np.array([0.05, 0.035, 0.65])
 Σ_η = np.diag(σ_η**2)
 
@@ -251,13 +423,13 @@ display(Latex(df_to_latex_matrix(pd.DataFrame(C), 'C')))
 display(Latex(df_to_latex_matrix(pd.DataFrame(D), 'D')))
 ```
 
-## Kalman Filter
+## Kalman filter
 
 Both models require a steady-state {doc}`Kalman filter <kalman>`.
 
 The function below iterates on the Riccati equation until convergence,
 returning the Kalman gain $K$, the state covariance $S$, and the
-innovation covariance $V$.
+innovation covariance $V$
 
 ```{code-cell} ipython3
 def steady_state_kalman(A, C_obs, Q, R, W=None, tol=1e-13, max_iter=200_000):
@@ -289,7 +461,7 @@ def steady_state_kalman(A, C_obs, Q, R, W=None, tol=1e-13, max_iter=200_000):
 ```
 
 (true-impulse-responses)=
-## True Impulse Responses
+## True impulse responses
 
 Before introducing measurement error, we verify the impulse response of
 the true system to a unit shock $\theta_0 = 1$.
@@ -298,7 +470,7 @@ The response shows the investment accelerator clearly: the full impact on
 net income $y_n$ occurs at lag 0, while consumption adjusts by only
 $1 - f^{-1} \approx 0.048$ and investment absorbs the remainder.
 
-From lag 1 onward the economy is in its new steady state.
+From lag 1 onward the economy is in its new steady state
 
 ```{code-cell} ipython3
 def table2_irf(A, C, n_lags=6):
@@ -306,37 +478,31 @@ def table2_irf(A, C, n_lags=6):
     rows = []
     for j in range(n_lags):
         y_n, c, d_k = C @ x
-        rows.append([j, y_n, c, d_k])
+        rows.append([y_n, c, d_k])
         x = A @ x
-    return np.array(rows)
-
-rep_table2 = table2_irf(A, C, n_lags=6)
-
-fig, ax = plt.subplots(figsize=(8, 4.5))
-ax.plot(rep_table2[:, 0], rep_table2[:, 1], 'o-', label=r'$y_n$', lw=2.5, markersize=7)
-ax.plot(rep_table2[:, 0], rep_table2[:, 2], 's-', label=r'$c$', lw=2.5, markersize=7)
-ax.plot(rep_table2[:, 0], rep_table2[:, 3], '^-', label=r'$\Delta k$', lw=2.5, markersize=7)
-ax.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
-ax.set_xlabel('Lag', fontsize=12)
-ax.set_ylabel('Response', fontsize=12)
-ax.set_title(r'True impulse response to unit shock $\theta_0 = 1$', fontsize=13)
-ax.legend(loc='best', fontsize=11, frameon=True, shadow=True)
-ax.grid(alpha=0.3)
-plt.tight_layout()
-plt.show()
+    return pd.DataFrame(rows, columns=[r'y_n', r'c', r'\Delta k'],
+                         index=pd.Index(range(n_lags), name='lag'))
+
+table2 = table2_irf(A, C, n_lags=6)
+display(Latex(df_to_latex_array(table2)))
 ```
 
-## Model 1 (Raw Measurements)
+## Model 1 (raw measurements)
 
 Model 1 is a classical errors-in-variables model: the data collecting
 agency simply reports the error-corrupted data $\bar z_t = z_t + v_t$
 that it collects, making no attempt to adjust for measurement errors.
 
-Because the measurement errors $v_t$ are serially correlated (AR(1)),
-we cannot directly apply the Kalman filter to
-$\bar z_t = C x_t + v_t$.
-Following {cite:t}`Sargent1989` (Section III.B), we quasi-difference the
-observation equation.
+Because the measurement errors $v_t$ are serially correlated,
+the standard Kalman filter with white-noise measurement error
+cannot be applied directly to $\bar z_t = C x_t + v_t$.
+
+An alternative is to augment the state vector with the
+measurement-error AR components (see Appendix B of
+{cite:t}`Sargent1989`).
+
+Here we take the quasi-differencing route, which reduces the
+system to one with serially uncorrelated observation noise.
 
 Substituting $\bar z_t = C x_t + v_t$, $x_{t+1} = A x_t + \varepsilon_t$,
 and $v_{t+1} = D v_t + \eta_t$ into $\bar z_{t+1} - D \bar z_t$ gives
@@ -350,12 +516,13 @@ where $\bar C = CA - DC$.
 
 The composite observation noise in {eq}`model1_obs` is
 $\bar\nu_t = C\varepsilon_t + \eta_t$, which is serially uncorrelated.
+
 Its covariance, and the cross-covariance between the state noise
 $\varepsilon_t$ and $\bar\nu_t$, are
 
 ```{math}
 :label: model1_covs
-R_1 = C Q C^\top + \Sigma_\eta, \qquad W_1 = Q C^\top.
+R_1 = C Q C^\top + R, \qquad W_1 = Q C^\top.
 ```
 
 The system $\{x_{t+1} = A x_t + \varepsilon_t,\;
@@ -379,9 +546,32 @@ $V_1 = \bar C S_1 \bar C^\top + R_1$ is the innovation covariance matrix
 (with $S_1 = E[(x_t - \hat x_t)(x_t - \hat x_t)^\top]$ the steady-state
 state estimation error covariance).
 
+To compute the innovations $\{u_t\}$ recursively from the data
+$\{\bar z_t\}$, it is useful to represent {eq}`model1_innov` as
+
+```{math}
+:label: model1_recursion
+\hat x_{t+1} = (A - K_1 \bar C)\,\hat x_t + K_1 \bar z_t,
+\qquad
+u_t = -\bar C\,\hat x_t + \bar z_t,
+```
+
+where $\bar z_t := \bar z_{t+1} - D\bar z_t$ is the quasi-differenced
+observation.
+
+Given an initial $\hat x_0$, equation {eq}`model1_recursion` generates
+the innovation sequence, from which the Gaussian log-likelihood
+of a sample $\{\bar z_t,\, t=0,\ldots,T\}$ is
+
+```{math}
+:label: model1_loglik
+\mathcal{L}^* = -T\ln 2\pi - \tfrac{1}{2}T\ln|V_1|
+  - \tfrac{1}{2}\sum_{t=0}^{T-1} u_t' V_1^{-1} u_t.
+```
+
 ```{code-cell} ipython3
 C_bar = C @ A - D @ C
-R1 = C @ Q @ C.T + Σ_η
+R1 = C @ Q @ C.T + R
 W1 = Q @ C.T
 
 K1, S1, V1 = steady_state_kalman(A, C_bar, Q, R1, W1)
@@ -394,7 +584,7 @@ derive a Wold moving-average representation for the measured data
 $\bar z_t$.
 
 From {eq}`model1_innov` and the quasi-differencing definition, the
-measured data satisfy (see eq. 19 of {cite:t}`Sargent1989`)
+measured data satisfy
 
 ```{math}
 :label: model1_wold
@@ -472,7 +662,9 @@ def fev_contributions(psi, V, n_horizons=20):
 
 
 psi1 = measured_wold_coeffs(F1, G1, H1, n_terms=40)
-resp1 = np.array([psi1[j] @ linalg.cholesky(V1, lower=True) for j in range(14)])
+# Non-orthogonalized: scale each column by its own innovation std dev
+std_u1 = np.sqrt(np.diag(V1))
+resp1 = np.array([psi1[j] * std_u1 for j in range(14)])
 decomp1 = fev_contributions(psi1, V1, n_horizons=20)
 ```
 
@@ -482,69 +674,79 @@ To measure the relative importance of each innovation, we decompose
 the $j$-step-ahead forecast-error variance of each measured variable.
 
 Write $\bar z_{t+j} - E_t \bar z_{t+j} = \sum_{i=0}^{j-1} \psi_i u_{t+j-i}$.
+
 Let $P$ be the lower-triangular Cholesky factor of $V_1$ so that the
 orthogonalized innovations are $e_t = P^{-1} u_t$.
+
 Then the contribution of orthogonalized innovation $k$ to the
 $j$-step-ahead variance of variable $m$ is
 $\sum_{i=0}^{j-1} (\psi_i P)_{mk}^2$.
 
-Each panel below shows the cumulative contribution of one orthogonalized
+The table below shows the cumulative contribution of each orthogonalized
 innovation to the forecast-error variance of $y_n$, $c$, and $\Delta k$
 at horizons 1 through 20.
 
 ```{code-cell} ipython3
 horizons = np.arange(1, 21)
-cols = [r'y_n', r'c', r'\Delta k']
+labels = [r'y_n', r'c', r'\Delta k']
 
 def fev_table(decomp, shock_idx, horizons):
     return pd.DataFrame(
         np.round(decomp[:, shock_idx, :].T, 4),
-        columns=cols,
-        index=pd.Index(horizons, name='Horizon')
+        columns=labels,
+        index=pd.Index(horizons, name='j')
     )
 ```
 
 ```{code-cell} ipython3
-fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
-
-for i, (shock_name, ax) in enumerate(zip([r'Innovation 1 ($y_n$)', r'Innovation 2 ($c$)', r'Innovation 3 ($\Delta k$)'], axes)):
-    fev_data = decomp1[:, i, :]
-    ax.plot(horizons, fev_data[0, :], label=r'$y_n$', lw=2.5)
-    ax.plot(horizons, fev_data[1, :], label=r'$c$', lw=2.5)
-    ax.plot(horizons, fev_data[2, :], label=r'$\Delta k$', lw=2.5)
-    ax.set_xlabel('Horizon', fontsize=12)
-    ax.set_ylabel('Contribution to FEV', fontsize=12)
-    ax.set_title(shock_name, fontsize=13)
-    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
-    ax.grid(alpha=0.3)
+shock_titles = [r'\text{A. Innovation in } y_n',
+                r'\text{B. Innovation in } c',
+                r'\text{C. Innovation in } \Delta k']
 
-plt.tight_layout()
-plt.show()
+parts = []
+for i, title in enumerate(shock_titles):
+    arr = df_to_latex_array(fev_table(decomp1, i, horizons)).strip('$')
+    parts.append(r'\begin{array}{c} ' + title + r' \\ ' + arr + r' \end{array}')
+
+display(Latex('$' + r' \quad '.join(parts) + '$'))
 ```
 
-These plots replicate Table 3 of {cite:t}`Sargent1989`.
 The income innovation accounts for substantial proportions of
 forecast-error variance in all three variables, while the consumption and
 investment innovations contribute mainly to their own variances.
+
 This is a **Granger causality** pattern: income appears to
 Granger-cause consumption and investment, but not vice versa.
-The pattern arises because income is the best-measured variable
-($\sigma_\eta = 0.05$), so its innovation carries the most
-information about the underlying structural shock $\theta_t$.
 
-The innovation covariance matrix $V_1$ is:
+The pattern arises because income has the highest signal-to-noise
+ratio: its coefficient on $\theta_t$ is $1$, so its innovation carries
+the most information about the underlying structural shock
+
+The covariance matrix of the innovations is not diagonal, but the eigenvalues are well-separated, with the first eigenvalue much larger
+than the others, consistent with the presence of a dominant common shock $\theta_t$
 
 ```{code-cell} ipython3
-labels = [r'y_n', r'c', r'\Delta k']
+print('Covariance matrix of innovations:')
 df_v1 = pd.DataFrame(np.round(V1, 4), index=labels, columns=labels)
 display(Latex(df_to_latex_matrix(df_v1)))
 ```
 
+```{code-cell} ipython3
+print('Eigenvalues of covariance matrix:')
+print(np.sort(np.linalg.eigvalsh(V1))[::-1].round(4))
+```
+
 ### Wold impulse responses
 
-The orthogonalized Wold impulse responses $\psi_j P$ show how the
-measured variables respond at lag $j$ to a one-standard-deviation
-orthogonalized innovation.  We plot lags 0 through 13.
+The Wold impulse responses $\psi_j$ scaled by the standard
+deviation of each innovation show how the measured variables
+respond at lag $j$ to a one-standard-deviation shock.
+
+Because $\psi_0 = I$, each innovation moves only its own
+variable at impact (lag 0), with cross-variable effects
+appearing from lag 1 onward.
+
+We report lags 0 through 13
 
 ```{code-cell} ipython3
 lags = np.arange(14)
@@ -552,56 +754,57 @@ lags = np.arange(14)
 def wold_response_table(resp, shock_idx, lags):
     return pd.DataFrame(
         np.round(resp[:, :, shock_idx], 4),
-        columns=cols,
-        index=pd.Index(lags, name='Lag')
+        columns=labels,
+        index=pd.Index(lags, name='j')
     )
 ```
 
 ```{code-cell} ipython3
-fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
-
-for i, (shock_name, ax) in enumerate(zip([r'Innovation in $y_n$', r'Innovation in $c$', r'Innovation in $\Delta k$'], axes)):
-    ax.plot(lags, resp1[:, 0, i], label=r'$y_n$', lw=2.5)
-    ax.plot(lags, resp1[:, 1, i], label=r'$c$', lw=2.5)
-    ax.plot(lags, resp1[:, 2, i], label=r'$\Delta k$', lw=2.5)
-    ax.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
-    ax.set_xlabel('Lag', fontsize=12)
-    ax.set_ylabel('Response', fontsize=12)
-    ax.set_title(shock_name, fontsize=13)
-    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
-    ax.grid(alpha=0.3)
+wold_titles = [r'\text{A. Response to } y_n \text{ innovation}',
+               r'\text{B. Response to } c \text{ innovation}',
+               r'\text{C. Response to } \Delta k \text{ innovation}']
 
-plt.tight_layout()
-plt.show()
+parts = []
+for i, title in enumerate(wold_titles):
+    arr = df_to_latex_array(wold_response_table(resp1, i, lags)).strip('$')
+    parts.append(r'\begin{array}{c} ' + title + r' \\ ' + arr + r' \end{array}')
+
+display(Latex('$' + r' \quad '.join(parts) + '$'))
 ```
 
-These plots replicate Table 4 of {cite:t}`Sargent1989`.
-An income innovation generates persistent responses in all variables
-because, being the best-measured series, its innovation is dominated
-by the true permanent shock $\theta_t$, which permanently raises the
-capital stock and hence steady-state consumption and income.
-A consumption innovation produces smaller, decaying responses
-that reflect the AR(1) structure of its measurement error ($\rho = 0.7$).
-An investment innovation has a large initial impact on investment itself,
-consistent with the high measurement error variance ($\sigma_\eta = 0.65$),
-but the effect dies out quickly.
+At impact each innovation moves only its own variable.
+
+At subsequent lags the income innovation generates persistent
+responses in all three variables because, being the best-measured
+series, its innovation is dominated by the true permanent shock
+$\theta_t$.
 
-## Model 2 (Filtered Measurements)
+The consumption and investment innovations produce responses that
+decay according to the AR(1) structure of their respective
+measurement errors ($\rho_c = 0.7$, $\rho_{\Delta k} = 0.3$),
+with little spillover to other variables.
+
+## Model 2 (filtered measurements)
 
 Model 2 corresponds to a data collecting agency that, instead of
 reporting raw error-corrupted data, applies an optimal filter
 to construct least-squares estimates of the true variables.
 
+This is a natural model for agencies that seasonally adjust
+data (one-sided filtering of current and past observations) or
+publish preliminary, revised, and final estimates of the same
+variable (successive conditional expectations as more data
+accumulate).
+
 Specifically, the agency uses the Kalman filter from Model 1 to form
-$\hat x_t = E[x_t \mid \bar z_{t-1}, \bar z_{t-2}, \ldots]$ and reports
+$\hat x_t = E[x_t \mid \bar z_t, \bar z_{t-1}, \ldots]$ and reports
 filtered estimates
 
 ```{math}
 \tilde z_t = G \hat x_t,
 ```
 
-where $G = C$ is a selection matrix
-(see eq. 23 of {cite:t}`Sargent1989`).
+where $G = C$ is a selection matrix.
 
 ### State-space for filtered data
 
@@ -631,7 +834,6 @@ Q_2 = K_1 V_1 K_1^\top.
 ```
 
 The covariance matrix of the joint noise is
-(see eq. 25 of {cite:t}`Sargent1989`)
 
 ```{math}
 E \begin{bmatrix} K_1 u_t \\ \eta_t \end{bmatrix}
@@ -649,18 +851,43 @@ yields a second innovations representation
 
 ```{math}
 :label: model2_innov
-\hat{\hat x}_{t+1} = A \hat{\hat x}_t + K_2 a_t,
+\check{x}_{t+1} = A \check{x}_t + K_2 a_t,
 \qquad
-\tilde z_t = C \hat{\hat x}_t + a_t,
+\tilde z_t = C \check{x}_t + a_t,
 ```
 
 where $a_t$ is the innovation process for the filtered data with
 covariance $V_2 = C S_2 C^\top + R_2$.
 
+To compute the innovations $\{a_t\}$ from observations on
+$\tilde z_t$, use
+
+```{math}
+:label: model2_recursion
+\check{x}_{t+1} = (A - K_2 C)\,\check{x}_t + K_2 \tilde z_t,
+\qquad
+a_t = -C\,\check{x}_t + \tilde z_t.
+```
+
+The Gaussian log-likelihood for a sample of $T$ observations
+$\{\tilde z_t\}$ is then
+
+```{math}
+:label: model2_loglik
+\mathcal{L}^{**} = -T\ln 2\pi - \tfrac{1}{2}T\ln|V_2|
+  - \tfrac{1}{2}\sum_{t=0}^{T-1} a_t' V_2^{-1} a_t.
+```
+
+Computing {eq}`model2_loglik` requires both the first Kalman filter
+(to form $\hat x_t$ and $u_t$) and the second Kalman filter
+(to form $\check{x}_t$ and $a_t$).
+
+In effect, the econometrician must retrace the steps that the agency
+used to synthesize the filtered data.
+
 ### Wold representation for filtered data
 
 The Wold moving-average representation for $\tilde z_t$ is
-(see eq. 29 of {cite:t}`Sargent1989`)
 
 ```{math}
 :label: model2_wold
@@ -668,13 +895,15 @@ The Wold moving-average representation for $\tilde z_t$ is
 ```
 
 with coefficients $\psi_0 = I$ and $\psi_j = C A^{j-1} K_2$ for
-$j \geq 1$.  Note that this is simpler than the Model 1 Wold
+$j \geq 1$.
+
+Note that this is simpler than the Model 1 Wold
 representation {eq}`model1_wold` because there is no quasi-differencing
 to undo.
 
 ```{code-cell} ipython3
 Q2 = K1 @ V1 @ K1.T
-ε = 1e-7
+ε = 1e-6
 
 K2, S2, V2 = steady_state_kalman(A, C, Q2, ε * np.eye(3))
 
@@ -697,98 +926,90 @@ decomp2 = fev_contributions(psi2, V2, n_horizons=20)
 
 Because the filtered data are nearly noiseless, the innovation
 covariance $V_2$ is close to singular with one dominant eigenvalue.
+
 This means the filtered economy is driven by essentially one shock,
 just like the true economy.
 
 ```{code-cell} ipython3
-fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
-
-for i, (shock_name, ax) in enumerate(zip([r'Innovation 1 ($y_n$)', r'Innovation 2 ($c$) $\times 10^3$', r'Innovation 3 ($\Delta k$) $\times 10^6$'], axes)):
-    scale = 1 if i == 0 else (1e3 if i == 1 else 1e6)
-    fev_data = decomp2[:, i, :] * scale
-    ax.plot(horizons, fev_data[0, :], label=r'$y_n$', lw=2.5)
-    ax.plot(horizons, fev_data[1, :], label=r'$c$', lw=2.5)
-    ax.plot(horizons, fev_data[2, :], label=r'$\Delta k$', lw=2.5)
-    ax.set_xlabel('Horizon', fontsize=12)
-    ax.set_ylabel('Contribution to FEV', fontsize=12)
-    ax.set_title(shock_name, fontsize=13)
-    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
-    ax.grid(alpha=0.3)
+parts = []
+for i, title in enumerate(shock_titles):
+    arr = df_to_latex_array(fev_table(decomp2, i, horizons)).strip('$')
+    parts.append(r'\begin{array}{c} ' + title + r' \\ ' + arr + r' \end{array}')
 
-plt.tight_layout()
-plt.show()
+display(Latex('$' + r' \quad '.join(parts) + '$'))
 ```
 
-These plots replicate Table 5 of {cite:t}`Sargent1989`.
 In Model 2, the first innovation accounts for virtually all forecast-error
 variance, just as in the true economy where the single structural shock
 $\theta_t$ drives everything.
-The second and third innovations contribute negligibly (note the scaling
-factors of $10^3$ and $10^6$ required to make them visible).
+
+The second and third innovations contribute negligibly.
+
 This confirms that filtering strips away the measurement noise that created
 the appearance of multiple independent sources of variation in Model 1.
 
-The innovation covariance matrix $V_2$ for Model 2 is:
-
-```{code-cell} ipython3
-df_v2 = pd.DataFrame(np.round(V2, 4), index=labels, columns=labels)
-display(Latex(df_to_latex_matrix(df_v2)))
-```
 
 ### Wold impulse responses
 
-The following plots show the orthogonalized Wold impulse responses for Model 2.
+Unlike Model 1, whose impulse responses use non-orthogonalized
+innovations, the Model 2 Wold representation is orthogonalized
+via a Cholesky decomposition of $V_2$ with the ordering
+$y_n$, $c$, $\Delta k$.
 
 ```{code-cell} ipython3
-fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
-
-for i, (shock_name, scale) in enumerate(zip([r'Innovation in $y_n$', r'Innovation in $c$ $\times 10^3$', r'Innovation in $\Delta k$ $\times 10^3$'],
-                                             [1, 1e3, 1e3])):
-    ax = axes[i]
-    ax.plot(lags, resp2[:, 0, i] * scale, label=r'$y_n$', lw=2.5)
-    ax.plot(lags, resp2[:, 1, i] * scale, label=r'$c$', lw=2.5)
-    ax.plot(lags, resp2[:, 2, i] * scale, label=r'$\Delta k$', lw=2.5)
-    ax.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
-    ax.set_xlabel('Lag', fontsize=12)
-    ax.set_ylabel('Response', fontsize=12)
-    ax.set_title(shock_name, fontsize=13)
-    ax.legend(loc='best', fontsize=10, frameon=True, shadow=True)
-    ax.grid(alpha=0.3)
+parts = []
+for i, title in enumerate(wold_titles):
+    arr = df_to_latex_array(wold_response_table(resp2, i, lags)).strip('$')
+    parts.append(r'\begin{array}{c} ' + title + r' \\ ' + arr + r' \end{array}')
 
-plt.tight_layout()
-plt.show()
+display(Latex('$' + r' \quad '.join(parts) + '$'))
 ```
 
-These plots replicate Table 6 of {cite:t}`Sargent1989`.
 The income innovation in Model 2 produces responses that closely
 approximate the true impulse response function from the structural
-shock $\theta_t$ (compare with the figure in the
+shock $\theta_t$ (compare with the table in the
 {ref}`true-impulse-responses` section above).
+
 The consumption and investment innovations produce responses
-that are orders of magnitude smaller (note the $10^3$ scaling),
-confirming that the filtered data are driven by essentially one shock.
+that are orders of magnitude smaller, confirming that the filtered
+data are driven by essentially one shock.
 
-A key implication: unlike Model 1, the filtered data from Model 2
-**cannot** reproduce the apparent Granger causality pattern that the
+Unlike Model 1, the filtered data from Model 2
+*cannot* reproduce the apparent Granger causality pattern that the
 accelerator literature has documented empirically.
+
+We also report the covariance matrix and eigenvalues of the innovations for Model 2
+
+```{code-cell} ipython3
+print('Covariance matrix of innovations:')
+df_v2 = pd.DataFrame(np.round(V2, 4), index=labels, columns=labels)
+display(Latex(df_to_latex_matrix(df_v2)))
+```
+
+```{code-cell} ipython3
+print('Eigenvalues of covariance matrix:')
+print(np.sort(np.linalg.eigvalsh(V2))[::-1].round(4))
+```
+
+
 As {cite:t}`Sargent1989` emphasizes, the two models of measurement
 produce quite different inferences about the economy's dynamics despite
-sharing identical deep parameters.
+sharing identical underlying parameters.
 
 ## Simulation
 
 The tables above characterize population moments of the two models.
 
-To see how the models perform on a finite sample, Sargent simulates
-80 periods of true, measured, and filtered data and reports
+To see how the models perform on a finite sample, we simulate
+80 periods of true, measured, and filtered data and report
 covariance and correlation matrices together with time-series plots.
 
-We replicate these objects below.
+We replicate these objects below
 
 ```{code-cell} ipython3
-def simulate_series(seed=7909, T=80, k0=10.0):
+def simulate_series(seed=0, T=80, k0=10.0):
     """
-    Simulate true, measured, and filtered series for Figures 1--9.
+    Simulate true, measured, and filtered series.
     """
     rng = np.random.default_rng(seed)
 
@@ -835,24 +1056,25 @@ def simulate_series(seed=7909, T=80, k0=10.0):
 
     out = {
         "y_true": y, "c_true": c, "dk_true": dk, "k_true": k[:-1],
-        "y_meas": z_meas[:, 0], "c_meas": z_meas[:, 1], "dk_meas": z_meas[:, 2],
-        "y_filt": z_filt[:, 0], "c_filt": z_filt[:, 1], "dk_filt": z_filt[:, 2], "k_filt": k_filt
+        "y_meas": z_meas[:, 0], "c_meas": z_meas[:, 1], 
+        "dk_meas": z_meas[:, 2],
+        "y_filt": z_filt[:, 0], "c_filt": z_filt[:, 1], 
+        "dk_filt": z_filt[:, 2], "k_filt": k_filt
     }
     return out
 
 
-sim = simulate_series(seed=7909, T=80, k0=10.0)
+sim = simulate_series(seed=0, T=80, k0=10.0)
 ```
 
 ```{code-cell} ipython3
 def plot_true_vs_other(t, true_series, other_series, other_label, ylabel=""):
     fig, ax = plt.subplots(figsize=(8, 4))
-    ax.plot(t, true_series, lw=2.5, color="black", label="true")
-    ax.plot(t, other_series, lw=2.5, ls="--", color="#1f77b4", label=other_label)
-    ax.set_xlabel("Time", fontsize=12)
-    ax.set_ylabel(ylabel.capitalize(), fontsize=12)
+    ax.plot(t, true_series, lw=2, color="black", label="true")
+    ax.plot(t, other_series, lw=2, ls="--", color="#1f77b4", label=other_label)
+    ax.set_xlabel("time", fontsize=12)
+    ax.set_ylabel(ylabel, fontsize=12)
     ax.legend(loc="best", fontsize=11, frameon=True, shadow=True)
-    ax.grid(alpha=0.3)
     plt.tight_layout()
     plt.show()
 
@@ -896,13 +1118,13 @@ mystnb:
 plot_true_vs_other(t, sim["y_true"], sim["y_meas"], "measured", ylabel="income")
 ```
 
-The first three figures replicate Figures 1--3 of {cite:t}`Sargent1989`.
 Investment is distorted the most because its measurement error
 has the largest innovation variance ($\sigma_\eta = 0.65$),
 while income is distorted the least ($\sigma_\eta = 0.05$).
 
-The next four figures (Figures 4--7 in the paper) compare
-true series with the Kalman-filtered estimates from Model 1.
+The next four figures compare true series with the
+Kalman-filtered estimates from Model 1.
+
 The filter removes much of the measurement noise, recovering
 series that track the truth closely.
 
@@ -955,12 +1177,13 @@ plot_true_vs_other(t, sim["k_true"], sim["k_filt"], "filtered", ylabel="capital
 ```
 
 The following figure plots the national income identity residual
-$c_t + \Delta k_t - y_{n,t}$ for both measured and filtered data
-(Figures 8--9 of {cite:t}`Sargent1989`).
+$c_t + \Delta k_t - y_{n,t}$ for both measured and filtered data.
 
 In the true model this identity holds exactly.
+
 For measured data the residual is non-zero because
 independent measurement errors break the accounting identity.
+
 For filtered data the Kalman filter approximately restores the identity.
 
 ```{code-cell} ipython3
@@ -974,31 +1197,31 @@ mystnb:
 ---
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
 
-ax1.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], color="#d62728", lw=2.5)
+ax1.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], color="#d62728", lw=2)
 ax1.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
-ax1.set_xlabel("Time", fontsize=12)
-ax1.set_ylabel("Residual", fontsize=12)
+ax1.set_xlabel("time", fontsize=12)
+ax1.set_ylabel("residual", fontsize=12)
 ax1.set_title(r'Measured: $c_t + \Delta k_t - y_{n,t}$', fontsize=13)
-ax1.grid(alpha=0.3)
 
-ax2.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], color="#2ca02c", lw=2.5)
+ax2.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], color="#2ca02c", lw=2)
 ax2.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
-ax2.set_xlabel("Time", fontsize=12)
-ax2.set_ylabel("Residual", fontsize=12)
+ax2.set_xlabel("time", fontsize=12)
+ax2.set_ylabel("residual", fontsize=12)
 ax2.set_title(r'Filtered: $c_t + \Delta k_t - y_{n,t}$', fontsize=13)
-ax2.grid(alpha=0.3)
 
 plt.tight_layout()
 plt.show()
 ```
 
-The following covariance and correlation matrices replicate Table 7
-of {cite:t}`Sargent1989`.
+The following covariance and correlation matrices compare the true,
+measured, and filtered versions of each variable.
+
 For each variable we report the $3 \times 3$ covariance and correlation
 matrices among the true, measured, and filtered versions.
 
 High correlations between true and filtered series confirm that the
 Kalman filter removes most measurement noise.
+
 Lower correlations between true and measured series quantify how much
 information is lost by using raw data.
 
@@ -1020,63 +1243,84 @@ tmf_labels = ['true', 'measured', 'filtered']
 tf_labels = ['true', 'filtered']
 ```
 
-**Consumption** -- Measurement error inflates variance, but the filtered
-series recovers a variance close to the truth.
-The true-filtered correlation exceeds 0.99.
+**Consumption** -- Measurement error inflates the variance of measured
+consumption relative to the truth, as the diagonal of the covariance
+matrix shows.
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_c, tmf_labels))))
+```
+
+The correlation matrix confirms that the filtered series recovers the
+true series almost perfectly (true-filtered correlation exceeds 0.99).
+
+```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(corr_c, tmf_labels))))
 ```
 
 **Investment** -- Because $\sigma_\eta = 0.65$ is large, measurement error
 creates the most variance inflation here.
+
+```{code-cell} ipython3
+display(Latex(df_to_latex_matrix(matrix_df(cov_i, tmf_labels))))
+```
+
 Despite this, the true-filtered correlation remains high,
 demonstrating the filter's effectiveness even with severe noise.
 
 ```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(cov_i, tmf_labels))))
 display(Latex(df_to_latex_matrix(matrix_df(corr_i, tmf_labels))))
 ```
 
-**Income** -- Income has the smallest measurement error, so measured
-and true variances are close.  True-filtered correlations are very high.
+**Income** -- Income has the smallest measurement error ($\sigma_\eta = 0.05$),
+so measured and true covariances are nearly identical.
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_y, tmf_labels))))
+```
+
+The correlation matrix shows that both measured and filtered series
+track the truth very closely.
+
+```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(corr_y, tmf_labels))))
 ```
 
 **Capital stock** -- The capital stock is never directly observed, yet
-the filter recovers it with very high accuracy.
+the covariance matrix shows that the filter recovers it with very
+high accuracy.
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_k, tf_labels))))
+```
+
+The near-unity correlation confirms this.
+
+```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(corr_k, tf_labels))))
 ```
 
 ## Summary
 
-This lecture reproduced the analysis in {cite}`Sargent1989`,
-which studies how measurement error alters an econometrician's view
+{cite}`Sargent1989` studies how measurement error alters an econometrician's view
 of a permanent income economy driven by the investment accelerator.
 
-Several lessons emerge:
+We had the following findings:
 
 * The Wold representations and variance decompositions of Model 1 (raw
   measurements) and Model 2 (filtered measurements) are quite different,
   even though the underlying economy is the same.
 
-* Measurement error is not a second-order issue: it can
+* Measurement error can
   reshape inferences about which shocks drive which variables.
 
 * Model 1 reproduces the **Granger causality** pattern documented in the
-  empirical accelerator literature -- income appears to Granger-cause
-  consumption and investment -- but this pattern is an artifact of
+  empirical accelerator literature: income appears to Granger-cause
+  consumption and investment, but this pattern is an artifact of
   measurement error ordering, not of the structural model.
 
 * Model 2, working with filtered data, attributes nearly all variance to
-  the single structural shock $\theta_t$ and **cannot** reproduce the
+  the single structural shock $\theta_t$ and *cannot* reproduce the
   Granger causality pattern.
 
 * The {doc}`Kalman filter <kalman>` effectively strips measurement noise
@@ -1084,13 +1328,3 @@ Several lessons emerge:
   near-zero residual shows that the filter approximately restores the
   national income accounting identity that raw measurement error breaks.
 
-These results connect to broader themes in this lecture series:
-the role of {doc}`linear state space models <linear_models>` in
-representing economic dynamics, the power of {doc}`Kalman filtering <kalman>`
-for signal extraction, and the importance of the investment accelerator
-for understanding business cycles ({doc}`samuelson`,
-{doc}`chow_business_cycles`).
-
-## References
-
-* {cite}`Sargent1989`

From 7857fe4b151faeab51d1656c7465f5e1c33bdd20 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Mon, 9 Feb 2026 23:58:28 +1100
Subject: [PATCH 12/37] updates

---
 lectures/measurement_models.md | 284 ++++++++++++++++++++-------------
 1 file changed, 176 insertions(+), 108 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index 637a039b4..6e5170226 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -50,10 +50,10 @@ The two models produce different Wold representations and
 forecast-error-variance decompositions, even though they describe
 the same underlying economy.
 
-In this lecture we reproduce the analysis from {cite:t}`Sargent1989`
-while studying the underlying mechanisms in the paper.
+In this lecture we follow {cite:t}`Sargent1989` and study how
+alternative measurement schemes change empirical implications.
 
-We use the following imports and functions for matrices and tables
+We start with imports and helper functions used throughout.
 
 ```{code-cell} ipython3
 import numpy as np
@@ -108,6 +108,89 @@ def df_to_latex_array(df):
     return '$' + '\n'.join(lines) + '$'
 ```
 
+## Classical formulation
+
+Before moving to state-space methods, {cite:t}`Sargent1989` formulates
+both measurement models in classical Wold form.
+
+This setup separates:
+
+- The law of motion for true economic variables.
+- The law of motion for measurement errors.
+- The map from these two objects to observables used by an econometrician.
+
+Let the true data be
+
+```{math}
+:label: classical_true_wold
+Z_t = c_Z(L)\,\varepsilon_t^Z, \qquad
+E\varepsilon_t^Z {\varepsilon_t^Z}' = I.
+```
+
+In Model 1 (raw reports), the agency observes and reports
+
+```{math}
+:label: classical_model1_meas
+z_t = Z_t + v_t, \qquad
+v_t = c_v(L)\,\varepsilon_t^v, \qquad
+E(Z_t v_s') = 0\ \forall t,s.
+```
+
+Then measured data have Wold representation
+
+```{math}
+:label: classical_model1_wold
+z_t = c_z(L)\,\varepsilon_t,
+```
+
+with spectral factorization
+
+```{math}
+:label: classical_model1_factor
+c_z(s)c_z(s^{-1})' = c_Z(s)c_Z(s^{-1})' + c_v(s)c_v(s^{-1})'.
+```
+
+In Model 2 (filtered reports), the agency reports
+
+```{math}
+:label: classical_model2_report
+\tilde z_t = E[Z_t \mid z_t, z_{t-1}, \ldots] = h(L) z_t,
+```
+
+where
+
+```{math}
+:label: classical_model2_filter
+h(L)
+= \Big[
+    c_Z(L)c_Z(L^{-1})'
+    \big(c_z(L^{-1})'\big)^{-1}
+  \Big]_+ c_z(L)^{-1},
+```
+
+and $[\cdot]_+$ keeps only nonnegative powers of $L$.
+
+Filtered reports satisfy
+
+```{math}
+:label: classical_model2_wold
+\tilde z_t = c_{\tilde z}(L)\,a_t,
+```
+
+with
+
+```{math}
+:label: classical_model2_factor
+c_{\tilde z}(s)c_{\tilde z}(s^{-1})'
+= h(s)c_z(s)c_z(s^{-1})'h(s^{-1})'.
+```
+
+These two data-generation schemes imply different Gaussian likelihood
+functions.
+
+In the rest of the lecture, we switch to a recursive state-space
+representation because it makes these objects easy to compute.
+
 ## The economic model
 
 The true economy is a linear-quadratic version of a stochastic
@@ -294,6 +377,8 @@ In a one-common-index model like this one ($\theta_t$ is the
 common index), the best-measured variable extends the most
 Granger causality to the others.
 
+This mechanism drives the numerical results below.
+
 ## State-space formulation
 
 We now map the economic model and the measurement process into
@@ -372,19 +457,14 @@ so the unconditional covariance of $v_t$ is
 R = \operatorname{diag}\!\left(\frac{\sigma_{\eta,i}^2}{1 - \rho_i^2}\right).
 ```
 
-Consumption has the smallest measurement error innovation variance
-($\sigma_\eta = 0.035$), income is next ($\sigma_\eta = 0.05$),
-and investment has the largest ($\sigma_\eta = 0.65$).
-
-However, the ordering that matters for the results below is the
-signal-to-noise ratio.
+The innovation variances are smallest for consumption
+($\sigma_\eta = 0.035$), next for income ($\sigma_\eta = 0.05$),
+and largest for investment ($\sigma_\eta = 0.65$).
 
-Income carries a coefficient of $1$ on $\theta_t$,
-whereas consumption carries only $1 - f^{-1} \approx 0.048$.
-
-The income innovation is therefore by far the most informative
-about $\theta_t$, even though its measurement error innovation
-is slightly larger than consumption's.
+As in {cite:t}`Sargent1989`, what matters for Granger-causality
+asymmetries is the overall measurement quality in the full system:
+output is relatively well measured while investment is relatively
+poorly measured.
 
 ```{code-cell} ipython3
 f = 1.05
@@ -463,7 +543,7 @@ def steady_state_kalman(A, C_obs, Q, R, W=None, tol=1e-13, max_iter=200_000):
 (true-impulse-responses)=
 ## True impulse responses
 
-Before introducing measurement error, we verify the impulse response of
+Before introducing measurement error, we compute the impulse response of
 the true system to a unit shock $\theta_0 = 1$.
 
 The response shows the investment accelerator clearly: the full impact on
@@ -662,9 +742,7 @@ def fev_contributions(psi, V, n_horizons=20):
 
 
 psi1 = measured_wold_coeffs(F1, G1, H1, n_terms=40)
-# Non-orthogonalized: scale each column by its own innovation std dev
-std_u1 = np.sqrt(np.diag(V1))
-resp1 = np.array([psi1[j] * std_u1 for j in range(14)])
+resp1 = np.array([psi1[j] @ linalg.cholesky(V1, lower=True) for j in range(14)])
 decomp1 = fev_contributions(psi1, V1, n_horizons=20)
 ```
 
@@ -686,6 +764,11 @@ The table below shows the cumulative contribution of each orthogonalized
 innovation to the forecast-error variance of $y_n$, $c$, and $\Delta k$
 at horizons 1 through 20.
 
+Each panel fixes one orthogonalized innovation and reports its
+cumulative contribution to each variable's forecast-error variance.
+
+Rows are forecast horizons and columns are forecasted variables.
+
 ```{code-cell} ipython3
 horizons = np.arange(1, 21)
 labels = [r'y_n', r'c', r'\Delta k']
@@ -718,12 +801,14 @@ investment innovations contribute mainly to their own variances.
 This is a **Granger causality** pattern: income appears to
 Granger-cause consumption and investment, but not vice versa.
 
-The pattern arises because income has the highest signal-to-noise
-ratio: its coefficient on $\theta_t$ is $1$, so its innovation carries
-the most information about the underlying structural shock
+This matches the paper's message that, in a one-common-index model,
+the relatively best measured series has the strongest predictive content.
+
+The covariance matrix of the innovations is not diagonal, but the
+eigenvalues are well separated.
 
-The covariance matrix of the innovations is not diagonal, but the eigenvalues are well-separated, with the first eigenvalue much larger
-than the others, consistent with the presence of a dominant common shock $\theta_t$
+The first eigenvalue is much larger than the others, consistent with
+the presence of a dominant common shock $\theta_t$.
 
 ```{code-cell} ipython3
 print('Covariance matrix of innovations:')
@@ -738,15 +823,12 @@ print(np.sort(np.linalg.eigvalsh(V1))[::-1].round(4))
 
 ### Wold impulse responses
 
-The Wold impulse responses $\psi_j$ scaled by the standard
-deviation of each innovation show how the measured variables
-respond at lag $j$ to a one-standard-deviation shock.
-
-Because $\psi_0 = I$, each innovation moves only its own
-variable at impact (lag 0), with cross-variable effects
-appearing from lag 1 onward.
+The Wold impulse responses are reported using orthogonalized
+innovations (Cholesky factorization of $V_1$ with ordering
+$y_n$, $c$, $\Delta k$).
 
-We report lags 0 through 13
+Under this identification, lag-0 responses reflect both
+contemporaneous covariance and the Cholesky ordering.
 
 ```{code-cell} ipython3
 lags = np.arange(14)
@@ -772,7 +854,8 @@ for i, title in enumerate(wold_titles):
 display(Latex('$' + r' \quad '.join(parts) + '$'))
 ```
 
-At impact each innovation moves only its own variable.
+At impact, the first orthogonalized innovation (ordered as output)
+loads on all three measured variables, matching the paper's Table 4.
 
 At subsequent lags the income innovation generates persistent
 responses in all three variables because, being the best-measured
@@ -948,13 +1031,29 @@ The second and third innovations contribute negligibly.
 This confirms that filtering strips away the measurement noise that created
 the appearance of multiple independent sources of variation in Model 1.
 
+The covariance matrix and eigenvalues of the Model 2 innovations are
+
+```{code-cell} ipython3
+print('Covariance matrix of innovations:')
+df_v2 = pd.DataFrame(np.round(V2, 4), index=labels, columns=labels)
+display(Latex(df_to_latex_matrix(df_v2)))
+```
+
+```{code-cell} ipython3
+print('Eigenvalues of covariance matrix:')
+print(np.sort(np.linalg.eigvalsh(V2))[::-1].round(4))
+```
+
+As {cite:t}`Sargent1989` emphasizes, the two models of measurement
+produce quite different inferences about the economy's dynamics despite
+sharing identical underlying parameters.
+
+
 
 ### Wold impulse responses
 
-Unlike Model 1, whose impulse responses use non-orthogonalized
-innovations, the Model 2 Wold representation is orthogonalized
-via a Cholesky decomposition of $V_2$ with the ordering
-$y_n$, $c$, $\Delta k$.
+We again use orthogonalized Wold responses with a Cholesky
+decomposition of $V_2$ ordered as $y_n$, $c$, $\Delta k$.
 
 ```{code-cell} ipython3
 parts = []
@@ -978,36 +1077,16 @@ Unlike Model 1, the filtered data from Model 2
 *cannot* reproduce the apparent Granger causality pattern that the
 accelerator literature has documented empirically.
 
-We also report the covariance matrix and eigenvalues of the innovations for Model 2
-
-```{code-cell} ipython3
-print('Covariance matrix of innovations:')
-df_v2 = pd.DataFrame(np.round(V2, 4), index=labels, columns=labels)
-display(Latex(df_to_latex_matrix(df_v2)))
-```
-
-```{code-cell} ipython3
-print('Eigenvalues of covariance matrix:')
-print(np.sort(np.linalg.eigvalsh(V2))[::-1].round(4))
-```
-
-
-As {cite:t}`Sargent1989` emphasizes, the two models of measurement
-produce quite different inferences about the economy's dynamics despite
-sharing identical underlying parameters.
 
 ## Simulation
 
 The tables above characterize population moments of the two models.
 
-To see how the models perform on a finite sample, we simulate
-80 periods of true, measured, and filtered data and report
-covariance and correlation matrices together with time-series plots.
-
-We replicate these objects below
+We now simulate 80 periods of true, measured, and filtered data
+to compare population implications with finite-sample behavior.
 
 ```{code-cell} ipython3
-def simulate_series(seed=0, T=80, k0=10.0):
+def simulate_series(seed=7909, T=80, k0=10.0):
     """
     Simulate true, measured, and filtered series.
     """
@@ -1064,7 +1143,7 @@ def simulate_series(seed=0, T=80, k0=10.0):
     return out
 
 
-sim = simulate_series(seed=0, T=80, k0=10.0)
+sim = simulate_series(seed=7909, T=80, k0=10.0)
 ```
 
 ```{code-cell} ipython3
@@ -1122,11 +1201,8 @@ Investment is distorted the most because its measurement error
 has the largest innovation variance ($\sigma_\eta = 0.65$),
 while income is distorted the least ($\sigma_\eta = 0.05$).
 
-The next four figures compare true series with the
-Kalman-filtered estimates from Model 1.
-
-The filter removes much of the measurement noise, recovering
-series that track the truth closely.
+The Kalman-filtered estimates from Model 1 remove much of the
+measurement noise and track the truth closely.
 
 ```{code-cell} ipython3
 ---
@@ -1176,15 +1252,13 @@ mystnb:
 plot_true_vs_other(t, sim["k_true"], sim["k_filt"], "filtered", ylabel="capital stock")
 ```
 
-The following figure plots the national income identity residual
-$c_t + \Delta k_t - y_{n,t}$ for both measured and filtered data.
+In the true model the national income identity
+$c_t + \Delta k_t = y_{n,t}$ holds exactly.
 
-In the true model this identity holds exactly.
+Independent measurement errors break this accounting identity
+in the measured data.
 
-For measured data the residual is non-zero because
-independent measurement errors break the accounting identity.
-
-For filtered data the Kalman filter approximately restores the identity.
+The Kalman filter approximately restores it.
 
 ```{code-cell} ipython3
 ---
@@ -1213,17 +1287,13 @@ plt.tight_layout()
 plt.show()
 ```
 
-The following covariance and correlation matrices compare the true,
-measured, and filtered versions of each variable.
-
-For each variable we report the $3 \times 3$ covariance and correlation
-matrices among the true, measured, and filtered versions.
+We can also compare the true, measured, and filtered versions of
+each variable through their covariance and correlation matrices.
 
 High correlations between true and filtered series confirm that the
-Kalman filter removes most measurement noise.
-
-Lower correlations between true and measured series quantify how much
-information is lost by using raw data.
+Kalman filter removes most measurement noise, while lower correlations
+between true and measured series quantify how much information raw
+data lose.
 
 ```{code-cell} ipython3
 def cov_corr_three(a, b, c):
@@ -1243,7 +1313,7 @@ tmf_labels = ['true', 'measured', 'filtered']
 tf_labels = ['true', 'filtered']
 ```
 
-**Consumption** -- Measurement error inflates the variance of measured
+For consumption, measurement error inflates the variance of measured
 consumption relative to the truth, as the diagonal of the covariance
 matrix shows.
 
@@ -1258,8 +1328,7 @@ true series almost perfectly (true-filtered correlation exceeds 0.99).
 display(Latex(df_to_latex_matrix(matrix_df(corr_c, tmf_labels))))
 ```
 
-**Investment** -- Because $\sigma_\eta = 0.65$ is large, measurement error
-creates the most variance inflation here.
+For investment, measurement error creates the most variance inflation here.
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_i, tmf_labels))))
@@ -1272,7 +1341,7 @@ demonstrating the filter's effectiveness even with severe noise.
 display(Latex(df_to_latex_matrix(matrix_df(corr_i, tmf_labels))))
 ```
 
-**Income** -- Income has the smallest measurement error ($\sigma_\eta = 0.05$),
+Income has the smallest measurement error ($\sigma_\eta = 0.05$),
 so measured and true covariances are nearly identical.
 
 ```{code-cell} ipython3
@@ -1286,7 +1355,7 @@ track the truth very closely.
 display(Latex(df_to_latex_matrix(matrix_df(corr_y, tmf_labels))))
 ```
 
-**Capital stock** -- The capital stock is never directly observed, yet
+The capital stock is never directly observed, yet
 the covariance matrix shows that the filter recovers it with very
 high accuracy.
 
@@ -1302,29 +1371,28 @@ display(Latex(df_to_latex_matrix(matrix_df(corr_k, tf_labels))))
 
 ## Summary
 
-{cite}`Sargent1989` studies how measurement error alters an econometrician's view
-of a permanent income economy driven by the investment accelerator.
-
-We had the following findings:
-
-* The Wold representations and variance decompositions of Model 1 (raw
-  measurements) and Model 2 (filtered measurements) are quite different,
-  even though the underlying economy is the same.
+{cite}`Sargent1989` shows how measurement error alters an
+econometrician's view of a permanent income economy driven by
+the investment accelerator.
 
-* Measurement error can
-  reshape inferences about which shocks drive which variables.
+The Wold representations and variance decompositions of Model 1
+(raw measurements) and Model 2 (filtered measurements) differ
+substantially, even though the underlying economy is the same.
 
-* Model 1 reproduces the **Granger causality** pattern documented in the
-  empirical accelerator literature: income appears to Granger-cause
-  consumption and investment, but this pattern is an artifact of
-  measurement error ordering, not of the structural model.
+Measurement error can reshape inferences about which shocks
+drive which variables.
 
-* Model 2, working with filtered data, attributes nearly all variance to
-  the single structural shock $\theta_t$ and *cannot* reproduce the
-  Granger causality pattern.
+Model 1 reproduces the **Granger causality** pattern documented in
+the empirical accelerator literature: income appears to Granger-cause
+consumption and investment, a result {cite:t}`Sargent1989` attributes
+to measurement error and signal extraction in raw reported data.
 
-* The {doc}`Kalman filter <kalman>` effectively strips measurement noise
-  from the data: the filtered series track the truth closely, and the
-  near-zero residual shows that the filter approximately restores the
-  national income accounting identity that raw measurement error breaks.
+Model 2, working with filtered data, attributes nearly all variance
+to the single structural shock $\theta_t$ and *cannot* reproduce
+the Granger causality pattern.
 
+The {doc}`Kalman filter <kalman>` effectively strips measurement
+noise from the data: the filtered series track the truth closely,
+and the near-zero residual shows that the filter approximately
+restores the national income accounting identity that raw
+measurement error breaks.

From 15e0c9d9a77653a3e8ff3c27d71e27553b38d127 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 10 Feb 2026 08:47:42 +1100
Subject: [PATCH 13/37] updates

---
 lectures/measurement_models.md | 144 +++++++++++++++++++++------------
 1 file changed, 91 insertions(+), 53 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index 6e5170226..4dd700750 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -398,7 +398,10 @@ so that the true economy follows the state-space system
 
 ```{math}
 :label: true_ss
-x_{t+1} = A x_t + \varepsilon_t, \qquad z_t = C x_t,
+\begin{aligned}
+x_{t+1} &= A x_t + \varepsilon_t, \\
+z_t &= C x_t.
+\end{aligned}
 ```
 
 where $\varepsilon_t = \begin{bmatrix} 0 \\ \theta_t \end{bmatrix}$ has
@@ -615,8 +618,10 @@ The steady-state Kalman filter yields the **innovations representation**
 
 ```{math}
 :label: model1_innov
-\hat x_{t+1} = A \hat x_t + K_1 u_t, \qquad
-\bar z_{t+1} - D\bar z_t = \bar C \hat x_t + u_t,
+\begin{aligned}
+\hat x_{t+1} &= A \hat x_t + K_1 u_t, \\
+\bar z_{t+1} - D\bar z_t &= \bar C \hat x_t + u_t.
+\end{aligned}
 ```
 
 where $u_t = (\bar z_{t+1} - D\bar z_t) -
@@ -631,9 +636,10 @@ $\{\bar z_t\}$, it is useful to represent {eq}`model1_innov` as
 
 ```{math}
 :label: model1_recursion
-\hat x_{t+1} = (A - K_1 \bar C)\,\hat x_t + K_1 \bar z_t,
-\qquad
-u_t = -\bar C\,\hat x_t + \bar z_t,
+\begin{aligned}
+\hat x_{t+1} &= (A - K_1 \bar C)\,\hat x_t + K_1 \bar z_t, \\
+u_t &= -\bar C\,\hat x_t + \bar z_t.
+\end{aligned}
 ```
 
 where $\bar z_t := \bar z_{t+1} - D\bar z_t$ is the quasi-differenced
@@ -805,10 +811,8 @@ This matches the paper's message that, in a one-common-index model,
 the relatively best measured series has the strongest predictive content.
 
 The covariance matrix of the innovations is not diagonal, but the
-eigenvalues are well separated.
+eigenvalues are well separated
 
-The first eigenvalue is much larger than the others, consistent with
-the presence of a dominant common shock $\theta_t$.
 
 ```{code-cell} ipython3
 print('Covariance matrix of innovations:')
@@ -816,6 +820,9 @@ df_v1 = pd.DataFrame(np.round(V1, 4), index=labels, columns=labels)
 display(Latex(df_to_latex_matrix(df_v1)))
 ```
 
+The first eigenvalue is much larger than the others, consistent with
+the presence of a dominant common shock $\theta_t$
+
 ```{code-cell} ipython3
 print('Eigenvalues of covariance matrix:')
 print(np.sort(np.linalg.eigvalsh(V1))[::-1].round(4))
@@ -854,8 +861,8 @@ for i, title in enumerate(wold_titles):
 display(Latex('$' + r' \quad '.join(parts) + '$'))
 ```
 
-At impact, the first orthogonalized innovation (ordered as output)
-loads on all three measured variables, matching the paper's Table 4.
+At impact, the first orthogonalized innovation
+loads on all three measured variables.
 
 At subsequent lags the income innovation generates persistent
 responses in all three variables because, being the best-measured
@@ -934,9 +941,10 @@ yields a second innovations representation
 
 ```{math}
 :label: model2_innov
-\check{x}_{t+1} = A \check{x}_t + K_2 a_t,
-\qquad
-\tilde z_t = C \check{x}_t + a_t,
+\begin{aligned}
+\check{x}_{t+1} &= A \check{x}_t + K_2 a_t, \\
+\tilde z_t &= C \check{x}_t + a_t.
+\end{aligned}
 ```
 
 where $a_t$ is the innovation process for the filtered data with
@@ -947,9 +955,10 @@ $\tilde z_t$, use
 
 ```{math}
 :label: model2_recursion
-\check{x}_{t+1} = (A - K_2 C)\,\check{x}_t + K_2 \tilde z_t,
-\qquad
-a_t = -C\,\check{x}_t + \tilde z_t.
+\begin{aligned}
+\check{x}_{t+1} &= (A - K_2 C)\,\check{x}_t + K_2 \tilde z_t, \\
+a_t &= -C\,\check{x}_t + \tilde z_t.
+\end{aligned}
 ```
 
 The Gaussian log-likelihood for a sample of $T$ observations
@@ -982,7 +991,7 @@ $j \geq 1$.
 
 Note that this is simpler than the Model 1 Wold
 representation {eq}`model1_wold` because there is no quasi-differencing
-to undo.
+to undo
 
 ```{code-cell} ipython3
 Q2 = K1 @ V1 @ K1.T
@@ -1011,7 +1020,7 @@ Because the filtered data are nearly noiseless, the innovation
 covariance $V_2$ is close to singular with one dominant eigenvalue.
 
 This means the filtered economy is driven by essentially one shock,
-just like the true economy.
+just like the true economy
 
 ```{code-cell} ipython3
 parts = []
@@ -1031,6 +1040,10 @@ The second and third innovations contribute negligibly.
 This confirms that filtering strips away the measurement noise that created
 the appearance of multiple independent sources of variation in Model 1.
 
+We invite readers to compare this table to the one for the true impulse responses in the {ref}`true-impulse-responses` section above.
+
+The numbers are essentially the same.
+
 The covariance matrix and eigenvalues of the Model 2 innovations are
 
 ```{code-cell} ipython3
@@ -1058,8 +1071,10 @@ decomposition of $V_2$ ordered as $y_n$, $c$, $\Delta k$.
 ```{code-cell} ipython3
 parts = []
 for i, title in enumerate(wold_titles):
-    arr = df_to_latex_array(wold_response_table(resp2, i, lags)).strip('$')
-    parts.append(r'\begin{array}{c} ' + title + r' \\ ' + arr + r' \end{array}')
+    arr = df_to_latex_array(
+      wold_response_table(resp2, i, lags)).strip('$')
+    parts.append(
+      r'\begin{array}{c} ' + title + r' \\ ' + arr + r' \end{array}')
 
 display(Latex('$' + r' \quad '.join(parts) + '$'))
 ```
@@ -1147,10 +1162,12 @@ sim = simulate_series(seed=7909, T=80, k0=10.0)
 ```
 
 ```{code-cell} ipython3
-def plot_true_vs_other(t, true_series, other_series, other_label, ylabel=""):
+def plot_true_vs_other(t, true_series, other_series, 
+                                  other_label, ylabel=""):
     fig, ax = plt.subplots(figsize=(8, 4))
     ax.plot(t, true_series, lw=2, color="black", label="true")
-    ax.plot(t, other_series, lw=2, ls="--", color="#1f77b4", label=other_label)
+    ax.plot(t, other_series, lw=2, ls="--", 
+                          color="#1f77b4", label=other_label)
     ax.set_xlabel("time", fontsize=12)
     ax.set_ylabel(ylabel, fontsize=12)
     ax.legend(loc="best", fontsize=11, frameon=True, shadow=True)
@@ -1170,7 +1187,8 @@ mystnb:
   image:
     alt: True and measured consumption plotted over 80 time periods
 ---
-plot_true_vs_other(t, sim["c_true"], sim["c_meas"], "measured", ylabel="consumption")
+plot_true_vs_other(t, sim["c_true"], sim["c_meas"], 
+                                    "measured", ylabel="consumption")
 ```
 
 ```{code-cell} ipython3
@@ -1182,7 +1200,8 @@ mystnb:
   image:
     alt: True and measured investment plotted over 80 time periods
 ---
-plot_true_vs_other(t, sim["dk_true"], sim["dk_meas"], "measured", ylabel="investment")
+plot_true_vs_other(t, sim["dk_true"], sim["dk_meas"], 
+                                    "measured", ylabel="investment")
 ```
 
 ```{code-cell} ipython3
@@ -1194,7 +1213,8 @@ mystnb:
   image:
     alt: True and measured income plotted over 80 time periods
 ---
-plot_true_vs_other(t, sim["y_true"], sim["y_meas"], "measured", ylabel="income")
+plot_true_vs_other(t, sim["y_true"], sim["y_meas"], 
+                                    "measured", ylabel="income")
 ```
 
 Investment is distorted the most because its measurement error
@@ -1213,7 +1233,8 @@ mystnb:
   image:
     alt: True and filtered consumption plotted over 80 time periods
 ---
-plot_true_vs_other(t, sim["c_true"], sim["c_filt"], "filtered", ylabel="consumption")
+plot_true_vs_other(t, sim["c_true"], sim["c_filt"], 
+                                    "filtered", ylabel="consumption")
 ```
 
 ```{code-cell} ipython3
@@ -1225,7 +1246,8 @@ mystnb:
   image:
     alt: True and filtered investment plotted over 80 time periods
 ---
-plot_true_vs_other(t, sim["dk_true"], sim["dk_filt"], "filtered", ylabel="investment")
+plot_true_vs_other(t, sim["dk_true"], sim["dk_filt"], 
+                                    "filtered", ylabel="investment")
 ```
 
 ```{code-cell} ipython3
@@ -1237,7 +1259,8 @@ mystnb:
   image:
     alt: True and filtered income plotted over 80 time periods
 ---
-plot_true_vs_other(t, sim["y_true"], sim["y_filt"], "filtered", ylabel="income")
+plot_true_vs_other(t, sim["y_true"], sim["y_filt"], 
+                                    "filtered", ylabel="income")
 ```
 
 ```{code-cell} ipython3
@@ -1249,7 +1272,8 @@ mystnb:
   image:
     alt: True and filtered capital stock plotted over 80 time periods
 ---
-plot_true_vs_other(t, sim["k_true"], sim["k_filt"], "filtered", ylabel="capital stock")
+plot_true_vs_other(t, sim["k_true"], sim["k_filt"], 
+                                    "filtered", ylabel="capital stock")
 ```
 
 In the true model the national income identity
@@ -1271,13 +1295,13 @@ mystnb:
 ---
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
 
-ax1.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], color="#d62728", lw=2)
+ax1.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], lw=2)
 ax1.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
 ax1.set_xlabel("time", fontsize=12)
 ax1.set_ylabel("residual", fontsize=12)
 ax1.set_title(r'Measured: $c_t + \Delta k_t - y_{n,t}$', fontsize=13)
 
-ax2.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], color="#2ca02c", lw=2)
+ax2.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], lw=2)
 ax2.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
 ax2.set_xlabel("time", fontsize=12)
 ax2.set_ylabel("residual", fontsize=12)
@@ -1287,13 +1311,23 @@ plt.tight_layout()
 plt.show()
 ```
 
-We can also compare the true, measured, and filtered versions of
-each variable through their covariance and correlation matrices.
+For each variable $w \in \{c, \Delta k, y_n\}$ we compute the
+covariance and correlation matrices among its true, measured, and
+filtered versions.
+
+Each matrix has the structure
+
+```{math}
+\begin{bmatrix}
+\text{var}(w^{\text{true}}) & \text{cov}(w^{\text{true}}, w^{\text{meas}}) & \text{cov}(w^{\text{true}}, w^{\text{filt}}) \\
+\cdot & \text{var}(w^{\text{meas}}) & \text{cov}(w^{\text{meas}}, w^{\text{filt}}) \\
+\cdot & \cdot & \text{var}(w^{\text{filt}})
+\end{bmatrix}.
+```
 
-High correlations between true and filtered series confirm that the
-Kalman filter removes most measurement noise, while lower correlations
-between true and measured series quantify how much information raw
-data lose.
+The key entries are the off-diagonal terms linking true to measured
+(distortion from noise) and true to filtered (recovery by the Kalman
+filter).
 
 ```{code-cell} ipython3
 def cov_corr_three(a, b, c):
@@ -1303,9 +1337,12 @@ def cov_corr_three(a, b, c):
 def matrix_df(mat, labels):
     return pd.DataFrame(np.round(mat, 4), index=labels, columns=labels)
 
-cov_c, corr_c = cov_corr_three(sim["c_true"], sim["c_meas"], sim["c_filt"])
-cov_i, corr_i = cov_corr_three(sim["dk_true"], sim["dk_meas"], sim["dk_filt"])
-cov_y, corr_y = cov_corr_three(sim["y_true"], sim["y_meas"], sim["y_filt"])
+cov_c, corr_c = cov_corr_three(
+                sim["c_true"], sim["c_meas"], sim["c_filt"])
+cov_i, corr_i = cov_corr_three(
+                sim["dk_true"], sim["dk_meas"], sim["dk_filt"])
+cov_y, corr_y = cov_corr_three(
+                sim["y_true"], sim["y_meas"], sim["y_filt"])
 cov_k = np.cov(np.vstack([sim["k_true"], sim["k_filt"]]))
 corr_k = np.corrcoef(np.vstack([sim["k_true"], sim["k_filt"]]))
 
@@ -1315,41 +1352,41 @@ tf_labels = ['true', 'filtered']
 
 For consumption, measurement error inflates the variance of measured
 consumption relative to the truth, as the diagonal of the covariance
-matrix shows.
+matrix shows
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_c, tmf_labels))))
 ```
 
 The correlation matrix confirms that the filtered series recovers the
-true series almost perfectly (true-filtered correlation exceeds 0.99).
+true series almost perfectly 
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(corr_c, tmf_labels))))
 ```
 
-For investment, measurement error creates the most variance inflation here.
+For investment, measurement error creates the most variance inflation here
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_i, tmf_labels))))
 ```
 
 Despite this, the true-filtered correlation remains high,
-demonstrating the filter's effectiveness even with severe noise.
+demonstrating the filter's effectiveness even with severe noise
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(corr_i, tmf_labels))))
 ```
 
 Income has the smallest measurement error ($\sigma_\eta = 0.05$),
-so measured and true covariances are nearly identical.
+so measured and true covariances are nearly identical
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_y, tmf_labels))))
 ```
 
 The correlation matrix shows that both measured and filtered series
-track the truth very closely.
+track the truth very closely
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(corr_y, tmf_labels))))
@@ -1357,13 +1394,13 @@ display(Latex(df_to_latex_matrix(matrix_df(corr_y, tmf_labels))))
 
 The capital stock is never directly observed, yet
 the covariance matrix shows that the filter recovers it with very
-high accuracy.
+high accuracy
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(cov_k, tf_labels))))
 ```
 
-The near-unity correlation confirms this.
+The near-unity correlation confirms this
 
 ```{code-cell} ipython3
 display(Latex(df_to_latex_matrix(matrix_df(corr_k, tf_labels))))
@@ -1392,7 +1429,8 @@ to the single structural shock $\theta_t$ and *cannot* reproduce
 the Granger causality pattern.
 
 The {doc}`Kalman filter <kalman>` effectively strips measurement
-noise from the data: the filtered series track the truth closely,
-and the near-zero residual shows that the filter approximately
-restores the national income accounting identity that raw
-measurement error breaks.
+noise from the data, so the filtered series track the truth closely.
+
+Raw measurement error breaks the national income accounting identity,
+but the near-zero residual shows that the filter approximately
+restores it.

From 6d41d5140fa0c6035b6f3c7007c94ab921ef22ea Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 10 Feb 2026 11:23:01 +1100
Subject: [PATCH 14/37] updates

---
 lectures/measurement_models.md | 812 +++++++++++++++++----------------
 1 file changed, 422 insertions(+), 390 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index 4dd700750..ebddc3fe2 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -28,32 +28,49 @@ kernelspec:
 
 ## Overview
 
-{cite:t}`Sargent1989` studies what happens to an econometrician's
-inferences about economic dynamics when observed data are contaminated
-by measurement error.
+"Rational expectations econometrics" aims to interpret economic time
+series in terms of objects that are meaningful to economists, namely,
+parameters describing preferences, technologies, information sets,
+endowments, and equilibrium concepts.
 
-The setting is a {doc}`permanent income <perm_income>` economy in which the
+When fully worked out, rational expectations models typically deliver
+a well-defined mapping from these economically interpretable parameters
+to the moments of the time series determined by the model.
+
+If accurate observations on these time series are available, one can
+use that mapping to implement parameter estimation methods based
+either on the likelihood function or on the method of moments.
+
+However, if only error-ridden data exist for the variables of interest,
+then more steps are needed to extract parameter estimates.
+
+In effect, we require a model of the data reporting agency, one that
+is workable enough that we can determine the mapping induced jointly
+by the dynamic economic model and the measurement process to the
+probability law for the measured data.
+
+The model chosen for the data collection agency is an aspect of an
+econometric specification that can make big differences in inferences
+about the economic structure.
+
+{cite:t}`Sargent1989` describes two alternative models of data generation
+in a {doc}`permanent income <perm_income>` economy in which the
 investment accelerator, the mechanism studied in {doc}`samuelson` and
 {doc}`chow_business_cycles`, drives business cycle fluctuations.
 
-We specify a {doc}`linear state space model <linear_models>` for the
-true economy and then consider two ways of extracting information from
-noisy measurements:
-
-- In Model 1, the data collecting agency simply reports
-  raw (noisy) observations.
-- In Model 2, the agency applies an optimal
-  {doc}`Kalman filter <kalman>` to the noisy data and
-  reports least-squares estimates of the true variables.
+- In Model 1, the data collecting agency simply reports the
+  error-ridden data that it collects.
+- In Model 2, although it collects error-ridden data that satisfy
+  a classical errors-in-variables model, the data collecting agency
+  filters the data and reports the best estimates that it possibly can.
 
-The two models produce different Wold representations and
-forecast-error-variance decompositions, even though they describe
-the same underlying economy.
+Although the two models have the same "deep parameters," they produce
+quite different sets of restrictions on the data.
 
 In this lecture we follow {cite:t}`Sargent1989` and study how
 alternative measurement schemes change empirical implications.
 
-We start with imports and helper functions used throughout.
+We start with imports and helper functions used throughout
 
 ```{code-cell} ipython3
 import numpy as np
@@ -108,89 +125,6 @@ def df_to_latex_array(df):
     return '$' + '\n'.join(lines) + '$'
 ```
 
-## Classical formulation
-
-Before moving to state-space methods, {cite:t}`Sargent1989` formulates
-both measurement models in classical Wold form.
-
-This setup separates:
-
-- The law of motion for true economic variables.
-- The law of motion for measurement errors.
-- The map from these two objects to observables used by an econometrician.
-
-Let the true data be
-
-```{math}
-:label: classical_true_wold
-Z_t = c_Z(L)\,\varepsilon_t^Z, \qquad
-E\varepsilon_t^Z {\varepsilon_t^Z}' = I.
-```
-
-In Model 1 (raw reports), the agency observes and reports
-
-```{math}
-:label: classical_model1_meas
-z_t = Z_t + v_t, \qquad
-v_t = c_v(L)\,\varepsilon_t^v, \qquad
-E(Z_t v_s') = 0\ \forall t,s.
-```
-
-Then measured data have Wold representation
-
-```{math}
-:label: classical_model1_wold
-z_t = c_z(L)\,\varepsilon_t,
-```
-
-with spectral factorization
-
-```{math}
-:label: classical_model1_factor
-c_z(s)c_z(s^{-1})' = c_Z(s)c_Z(s^{-1})' + c_v(s)c_v(s^{-1})'.
-```
-
-In Model 2 (filtered reports), the agency reports
-
-```{math}
-:label: classical_model2_report
-\tilde z_t = E[Z_t \mid z_t, z_{t-1}, \ldots] = h(L) z_t,
-```
-
-where
-
-```{math}
-:label: classical_model2_filter
-h(L)
-= \Big[
-    c_Z(L)c_Z(L^{-1})'
-    \big(c_z(L^{-1})'\big)^{-1}
-  \Big]_+ c_z(L)^{-1},
-```
-
-and $[\cdot]_+$ keeps only nonnegative powers of $L$.
-
-Filtered reports satisfy
-
-```{math}
-:label: classical_model2_wold
-\tilde z_t = c_{\tilde z}(L)\,a_t,
-```
-
-with
-
-```{math}
-:label: classical_model2_factor
-c_{\tilde z}(s)c_{\tilde z}(s^{-1})'
-= h(s)c_z(s)c_z(s^{-1})'h(s^{-1})'.
-```
-
-These two data-generation schemes imply different Gaussian likelihood
-functions.
-
-In the rest of the lecture, we switch to a recursive state-space
-representation because it makes these objects easy to compute.
-
 ## The economic model
 
 The true economy is a linear-quadratic version of a stochastic
@@ -273,9 +207,6 @@ Adding a second shock breaks the one-index structure entirely
 and can generate nontrivial Granger causality even without
 measurement error.
 
-The accelerator projection is also not invariant under
-interventions that alter predictable components of income.
-
 Assumption 2 is less important, affecting only various constants.
 
 Under both assumptions, {eq}`opt_decision` simplifies to
@@ -314,8 +245,8 @@ of income.
 This is the same mechanism that {cite:t}`Chow1968` documented
 empirically (see {doc}`chow_business_cycles`).
 
-Equation {eq}`income_process` says that $y_{nt}$ is an IMA(1,1)
-process with innovation $\theta_t$.
+Equation {eq}`income_process` states that the first difference of disposable income is a
+first-order moving average process with innovation equal to the innovation of the endowment shock $\theta_t$.
 
 As {cite:t}`Muth1960` showed, such a process is optimally forecast
 via a geometric distributed lag or "adaptive expectations" scheme.
@@ -363,25 +294,33 @@ y_{nt} = \theta_t + (1-\beta)(\theta_{t-1} + \theta_{t-2} + \cdots).
 In this case income Granger-causes consumption and investment
 but is not Granger-caused by them.
 
-In the numerical example below, $y_{nt}$ is also measured
-with error: the agency reports $\bar y_{nt} = y_{nt} + v_{yt}$,
-where $v_{yt}$ follows an AR(1) process orthogonal to $\theta_t$.
+When each measured series is corrupted by measurement error, every
+measured variable will in general Granger-cause every other.
 
-When every series is corrupted by measurement error, every measured
-variable Granger-causes every other.
+The strength of this Granger causality, as measured by decompositions
+of $j$-step-ahead prediction error variances, depends on the relative
+variances of the measurement errors.
 
-The strength of Granger causality depends on the relative
-signal-to-noise ratios.
+In this case, each observed series mixes the common signal $\theta_t$
+with idiosyncratic measurement noise. 
 
-In a one-common-index model like this one ($\theta_t$ is the
-common index), the best-measured variable extends the most
-Granger causality to the others.
+A series with lower measurement
+error variance tracks $\theta_t$ more closely, so its innovations
+contain more information about future values of the other series.
 
-This mechanism drives the numerical results below.
+Accordingly, in a forecast-error-variance decomposition, shocks to
+better-measured series account for a larger share of other variables'
+$j$-step-ahead prediction errors.
 
-## State-space formulation
+In a one-common-index model like this one ($\theta_t$ is the common
+index), better-measured variables extend more Granger causality to
+less well measured series than vice versa.
 
-We now map the economic model and the measurement process into
+This asymmetry drives the numerical results we observe soon.
+
+### State-space formulation
+
+Let's map the economic model and the measurement process into
 a recursive state-space framework.
 
 Set $f = 1.05$ and $\theta_t \sim \mathcal{N}(0, 1)$.
@@ -429,7 +368,61 @@ $Q$ is singular because there is only one source of randomness
 $\theta_t$; the capital stock $k_t$ evolves deterministically
 given $\theta_t$.
 
-### Measurement errors
+```{code-cell} ipython3
+# Baseline structural matrices for the true economy
+f = 1.05
+β = 1 / f
+
+A = np.array([
+    [1.0, 1.0 / f],
+    [0.0, 0.0]
+])
+
+C = np.array([
+    [f - 1.0, 1.0],
+    [f - 1.0, 1.0 - 1.0 / f],
+    [0.0, 1.0 / f]
+])
+
+Q = np.array([
+    [0.0, 0.0],
+    [0.0, 1.0]
+])
+```
+
+(true-impulse-responses)=
+### True impulse responses
+
+Before introducing measurement error, we compute the impulse response of
+the true system to a unit shock $\theta_0 = 1$.
+
+This benchmark clarifies what changes when we later switch from
+true variables to reported variables.
+
+The response shows the investment accelerator clearly: the full impact on
+net income $y_n$ occurs at lag 0, while consumption adjusts by only
+$1 - f^{-1} \approx 0.048$ and investment absorbs the remainder.
+
+From lag 1 onward the economy is in its new steady state
+
+```{code-cell} ipython3
+def table2_irf(A, C, n_lags=6):
+    x = np.array([0.0, 1.0])  # k_0 = 0, theta_0 = 1
+    rows = []
+    for j in range(n_lags):
+        y_n, c, d_k = C @ x
+        rows.append([y_n, c, d_k])
+        x = A @ x
+    return pd.DataFrame(rows, columns=[r'y_n', r'c', r'\Delta k'],
+                         index=pd.Index(range(n_lags), name='lag'))
+
+table2 = table2_irf(A, C, n_lags=6)
+display(Latex(df_to_latex_array(table2)))
+```
+
+## Measurement errors
+
+Let's add the measurement layer that generates reported data.
 
 The econometrician does not observe $z_t$ directly but instead
 sees $\bar z_t = z_t + v_t$, where $v_t$ is a vector of measurement
@@ -464,31 +457,12 @@ The innovation variances are smallest for consumption
 ($\sigma_\eta = 0.035$), next for income ($\sigma_\eta = 0.05$),
 and largest for investment ($\sigma_\eta = 0.65$).
 
-As in {cite:t}`Sargent1989`, what matters for Granger-causality
+As in {cite:t}`Sargent1989` and our discussion above, what matters for Granger-causality
 asymmetries is the overall measurement quality in the full system:
 output is relatively well measured while investment is relatively
 poorly measured.
 
 ```{code-cell} ipython3
-f = 1.05
-β = 1 / f
-
-A = np.array([
-    [1.0, 1.0 / f],
-    [0.0, 0.0]
-])
-
-C = np.array([
-    [f - 1.0, 1.0],
-    [f - 1.0, 1.0 - 1.0 / f],
-    [0.0, 1.0 / f]
-])
-
-Q = np.array([
-    [0.0, 0.0],
-    [0.0, 1.0]
-])
-
 ρ = np.array([0.6, 0.7, 0.3])
 D = np.diag(ρ)
 
@@ -506,9 +480,7 @@ display(Latex(df_to_latex_matrix(pd.DataFrame(C), 'C')))
 display(Latex(df_to_latex_matrix(pd.DataFrame(D), 'D')))
 ```
 
-## Kalman filter
-
-Both models require a steady-state {doc}`Kalman filter <kalman>`.
+We will analyze the two reporting schemes separately, but first we need a solver for the steady-state Kalman gain and error covariances.
 
 The function below iterates on the Riccati equation until convergence,
 returning the Kalman gain $K$, the state covariance $S$, and the
@@ -543,38 +515,47 @@ def steady_state_kalman(A, C_obs, Q, R, W=None, tol=1e-13, max_iter=200_000):
     return K, S, V
 ```
 
-(true-impulse-responses)=
-## True impulse responses
+With structural matrices and tools we need in place, we now follow
+{cite:t}`Sargent1989`'s two reporting schemes in sequence.
 
-Before introducing measurement error, we compute the impulse response of
-the true system to a unit shock $\theta_0 = 1$.
+## A Classical Model of Measurements Initially Collected by an Agency
 
-The response shows the investment accelerator clearly: the full impact on
-net income $y_n$ occurs at lag 0, while consumption adjusts by only
-$1 - f^{-1} \approx 0.048$ and investment absorbs the remainder.
+A data collecting agency observes a noise-corrupted version of $z_t$, namely
 
-From lag 1 onward the economy is in its new steady state
+```{math}
+:label: model1_obs
+\bar z_t = C x_t + v_t.
+```
 
-```{code-cell} ipython3
-def table2_irf(A, C, n_lags=6):
-    x = np.array([0.0, 1.0])  # k_0 = 0, theta_0 = 1
-    rows = []
-    for j in range(n_lags):
-        y_n, c, d_k = C @ x
-        rows.append([y_n, c, d_k])
-        x = A @ x
-    return pd.DataFrame(rows, columns=[r'y_n', r'c', r'\Delta k'],
-                         index=pd.Index(range(n_lags), name='lag'))
+We refer to this as *Model 1*: the agency collects noisy
+data and reports them without filtering.
 
-table2 = table2_irf(A, C, n_lags=6)
-display(Latex(df_to_latex_array(table2)))
-```
+To represent the second moments of the $\bar z_t$ process, it is
+convenient to obtain its population vector autoregression.
+
+The error vector in the vector autoregression is the
+innovation to $\bar z_t$ and can be taken to be the white noise in a Wold
+moving average representation, which can be obtained by "inverting"
+the autoregressive representation.
+
+The population vector autoregression, and how it depends on the
+parameters of the state-space system and the measurement error process,
+carries insights about how to interpret estimated vector
+autoregressions for $\bar z_t$.
 
-## Model 1 (raw measurements)
+Constructing the vector autoregression is also useful as an
+intermediate step in computing the likelihood of a sample of
+$\bar z_t$'s as a function of the free parameters
+$\{A, C, D, Q, R\}$.
 
-Model 1 is a classical errors-in-variables model: the data collecting
-agency simply reports the error-corrupted data $\bar z_t = z_t + v_t$
-that it collects, making no attempt to adjust for measurement errors.
+The particular method that will be used to construct the vector
+autoregressive representation also proves useful as an intermediate
+step in constructing a model of an optimal reporting agency.
+
+We use recursive (Kalman filtering) methods to obtain the
+vector autoregression for $\bar z_t$.
+
+### Quasi-differencing
 
 Because the measurement errors $v_t$ are serially correlated,
 the standard Kalman filter with white-noise measurement error
@@ -584,77 +565,104 @@ An alternative is to augment the state vector with the
 measurement-error AR components (see Appendix B of
 {cite:t}`Sargent1989`).
 
-Here we take the quasi-differencing route, which reduces the
+Here we take the quasi-differencing route described in
+{cite:t}`Sargent1989`, which reduces the
 system to one with serially uncorrelated observation noise.
 
-Substituting $\bar z_t = C x_t + v_t$, $x_{t+1} = A x_t + \varepsilon_t$,
-and $v_{t+1} = D v_t + \eta_t$ into $\bar z_{t+1} - D \bar z_t$ gives
+Define
 
 ```{math}
-:label: model1_obs
-\bar z_{t+1} - D \bar z_t = \bar C\, x_t + C \varepsilon_t + \eta_t,
+:label: model1_qd
+\tilde z_t = \bar z_{t+1} - D \bar z_t, \qquad
+\bar\nu_t = C \varepsilon_t + \eta_t, \qquad
+\bar C = CA - DC.
 ```
 
-where $\bar C = CA - DC$.
+Then the state-space system {eq}`true_ss`, the measurement error
+process {eq}`meas_error_ar1`, and the observation equation {eq}`model1_obs`
+imply the state-space system
 
-The composite observation noise in {eq}`model1_obs` is
-$\bar\nu_t = C\varepsilon_t + \eta_t$, which is serially uncorrelated.
+```{math}
+:label: model1_transformed
+\begin{aligned}
+x_{t+1} &= A x_t + \varepsilon_t, \\
+\tilde z_t &= \bar C\, x_t + \bar\nu_t,
+\end{aligned}
+```
 
-Its covariance, and the cross-covariance between the state noise
-$\varepsilon_t$ and $\bar\nu_t$, are
+where $(\varepsilon_t, \bar\nu_t)$ is a white noise process with
 
 ```{math}
 :label: model1_covs
-R_1 = C Q C^\top + R, \qquad W_1 = Q C^\top.
+E \begin{bmatrix} \varepsilon_t \end{bmatrix}
+\begin{bmatrix} \varepsilon_t' & \bar\nu_t' \end{bmatrix}
+= \begin{bmatrix} Q & W_1 \\ W_1' & R_1 \end{bmatrix},
+\qquad
+R_1 = C Q C^\top + R, \quad W_1 = Q C^\top.
 ```
 
-The system $\{x_{t+1} = A x_t + \varepsilon_t,\;
-\bar z_{t+1} - D\bar z_t = \bar C x_t + \bar\nu_t\}$
-with $\text{cov}(\varepsilon_t)=Q$, $\text{cov}(\bar\nu_t)=R_1$, and
-$\text{cov}(\varepsilon_t, \bar\nu_t)=W_1$ now has serially uncorrelated
-errors, so the standard {doc}`Kalman filter <kalman>` applies.
+System {eq}`model1_transformed` with covariances {eq}`model1_covs` is
+characterized by the five matrices
+$[A, \bar C, Q, R_1, W_1]$.
+
+### Innovations representation
 
-The steady-state Kalman filter yields the **innovations representation**
+Associated with {eq}`model1_transformed` and {eq}`model1_covs` is the
+**innovations representation** for $\tilde z_t$,
 
 ```{math}
 :label: model1_innov
 \begin{aligned}
 \hat x_{t+1} &= A \hat x_t + K_1 u_t, \\
-\bar z_{t+1} - D\bar z_t &= \bar C \hat x_t + u_t.
+\tilde z_t &= \bar C \hat x_t + u_t,
 \end{aligned}
 ```
 
-where $u_t = (\bar z_{t+1} - D\bar z_t) -
-E[\bar z_{t+1} - D\bar z_t \mid \bar z_t, \bar z_{t-1}, \ldots]$
-is the innovation process, $K_1$ is the Kalman gain, and
-$V_1 = \bar C S_1 \bar C^\top + R_1$ is the innovation covariance matrix
-(with $S_1 = E[(x_t - \hat x_t)(x_t - \hat x_t)^\top]$ the steady-state
-state estimation error covariance).
-
-To compute the innovations $\{u_t\}$ recursively from the data
-$\{\bar z_t\}$, it is useful to represent {eq}`model1_innov` as
+where
 
 ```{math}
-:label: model1_recursion
+:label: model1_innov_defs
 \begin{aligned}
-\hat x_{t+1} &= (A - K_1 \bar C)\,\hat x_t + K_1 \bar z_t, \\
-u_t &= -\bar C\,\hat x_t + \bar z_t.
+\hat x_t &= E[x_t \mid \tilde z_{t-1}, \tilde z_{t-2}, \ldots, \hat x_0]
+         = E[x_t \mid \bar z_t, \bar z_{t-1}, \ldots], \\
+u_t &= \tilde z_t - E[\tilde z_t \mid \tilde z_{t-1}, \tilde z_{t-2}, \ldots]
+     = \bar z_{t+1} - E[\bar z_{t+1} \mid \bar z_t, \bar z_{t-1}, \ldots],
 \end{aligned}
 ```
 
-where $\bar z_t := \bar z_{t+1} - D\bar z_t$ is the quasi-differenced
-observation.
+$[K_1, S_1]$ are computed from the steady-state Kalman filter applied to
+$[A, \bar C, Q, R_1, W_1]$, and
+
+```{math}
+:label: model1_S1
+S_1 = E[(x_t - \hat x_t)(x_t - \hat x_t)^\top].
+```
+
+From {eq}`model1_innov_defs`, $u_t$ is the innovation process for the
+$\bar z_t$ process.
 
-Given an initial $\hat x_0$, equation {eq}`model1_recursion` generates
-the innovation sequence, from which the Gaussian log-likelihood
-of a sample $\{\bar z_t,\, t=0,\ldots,T\}$ is
+### Wold representation
+
+System {eq}`model1_innov` and definition {eq}`model1_qd` can be used to
+obtain a Wold vector moving average representation for the $\bar z_t$ process:
 
 ```{math}
-:label: model1_loglik
-\mathcal{L}^* = -T\ln 2\pi - \tfrac{1}{2}T\ln|V_1|
-  - \tfrac{1}{2}\sum_{t=0}^{T-1} u_t' V_1^{-1} u_t.
+:label: model1_wold
+\bar z_{t+1} = (I - DL)^{-1}\bigl[\bar C(I - AL)^{-1}K_1 L + I\bigr] u_t,
 ```
 
+where $L$ is the lag operator.
+
+From {eq}`model1_transformed` and {eq}`model1_innov` the innovation
+covariance is
+
+```{math}
+:label: model1_V1
+V_1 = E\, u_t u_t^\top = \bar C\, S_1\, \bar C^\top + R_1.
+```
+
+Below we compute $K_1$, $S_1$, and $V_1$ numerically 
+
 ```{code-cell} ipython3
 C_bar = C @ A - D @ C
 R1 = C @ Q @ C.T + R
@@ -663,23 +671,11 @@ W1 = Q @ C.T
 K1, S1, V1 = steady_state_kalman(A, C_bar, Q, R1, W1)
 ```
 
-### Wold representation for measured data
 
-With the innovations representation {eq}`model1_innov` in hand, we can
-derive a Wold moving-average representation for the measured data
-$\bar z_t$.
+### Computing the Wold coefficients
 
-From {eq}`model1_innov` and the quasi-differencing definition, the
-measured data satisfy
-
-```{math}
-:label: model1_wold
-\bar z_{t+1} = (I - DL)^{-1}\bigl[\bar C(I - AL)^{-1}K_1 L + I\bigr] u_t,
-```
-
-where $L$ is the lag operator.
-
-To compute the Wold coefficients numerically, define the augmented state
+To compute the Wold coefficients in {eq}`model1_wold` numerically,
+define the augmented state
 
 ```{math}
 r_t = \begin{bmatrix} \hat x_{t-1} \\ \bar z_{t-1} \end{bmatrix},
@@ -752,6 +748,42 @@ resp1 = np.array([psi1[j] @ linalg.cholesky(V1, lower=True) for j in range(14)])
 decomp1 = fev_contributions(psi1, V1, n_horizons=20)
 ```
 
+### Gaussian likelihood
+
+The Gaussian log-likelihood function for a sample
+$\{\bar z_t,\, t=0,\ldots,T\}$, conditioned on an initial state estimate
+$\hat x_0$, can be represented as
+
+```{math}
+:label: model1_loglik
+\mathcal{L}^* = -T\ln 2\pi - \tfrac{1}{2}T\ln|V_1|
+  - \tfrac{1}{2}\sum_{t=0}^{T-1} u_t' V_1^{-1} u_t,
+```
+
+where $u_t$ is a function of $\{\bar z_t\}$ defined by
+{eq}`model1_recursion` below.
+
+To use {eq}`model1_innov` to compute $\{u_t\}$, it is useful to
+represent it as
+
+```{math}
+:label: model1_recursion
+\begin{aligned}
+\hat x_{t+1} &= (A - K_1 \bar C)\,\hat x_t + K_1 \tilde z_t, \\
+u_t &= -\bar C\,\hat x_t + \tilde z_t,
+\end{aligned}
+```
+
+where $\tilde z_t = \bar z_{t+1} - D\bar z_t$ is the quasi-differenced
+observation.
+
+Given $\hat x_0$, equation {eq}`model1_recursion` can be used recursively
+to compute a $\{u_t\}$ process.
+
+Equations {eq}`model1_loglik` and {eq}`model1_recursion` give the
+likelihood function of a sample of error-corrupted data
+$\{\bar z_t\}$.
+
 ### Forecast-error-variance decomposition
 
 To measure the relative importance of each innovation, we decompose
@@ -810,9 +842,7 @@ Granger-cause consumption and investment, but not vice versa.
 This matches the paper's message that, in a one-common-index model,
 the relatively best measured series has the strongest predictive content.
 
-The covariance matrix of the innovations is not diagonal, but the
-eigenvalues are well separated
-
+Let's look at the the covariance matrix of the innovations
 
 ```{code-cell} ipython3
 print('Covariance matrix of innovations:')
@@ -820,23 +850,29 @@ df_v1 = pd.DataFrame(np.round(V1, 4), index=labels, columns=labels)
 display(Latex(df_to_latex_matrix(df_v1)))
 ```
 
-The first eigenvalue is much larger than the others, consistent with
-the presence of a dominant common shock $\theta_t$
+The covariance matrix of the innovations is not diagonal, but the
+eigenvalues are well separated as shown below
+
 
 ```{code-cell} ipython3
 print('Eigenvalues of covariance matrix:')
 print(np.sort(np.linalg.eigvalsh(V1))[::-1].round(4))
 ```
 
+The first eigenvalue is much larger than the others, consistent with
+the presence of a dominant common shock $\theta_t$
+
 ### Wold impulse responses
 
 The Wold impulse responses are reported using orthogonalized
 innovations (Cholesky factorization of $V_1$ with ordering
 $y_n$, $c$, $\Delta k$).
 
-Under this identification, lag-0 responses reflect both
+Under this method, lag-0 responses reflect both
 contemporaneous covariance and the Cholesky ordering.
 
+We first define a helper function to format the Wold responses as a LaTeX array
+
 ```{code-cell} ipython3
 lags = np.arange(14)
 
@@ -848,6 +884,8 @@ def wold_response_table(resp, shock_idx, lags):
     )
 ```
 
+Now we report the Wold responses to each orthogonalized innovation in a single table with three panels
+
 ```{code-cell} ipython3
 wold_titles = [r'\text{A. Response to } y_n \text{ innovation}',
                r'\text{B. Response to } c \text{ innovation}',
@@ -874,94 +912,158 @@ decay according to the AR(1) structure of their respective
 measurement errors ($\rho_c = 0.7$, $\rho_{\Delta k} = 0.3$),
 with little spillover to other variables.
 
-## Model 2 (filtered measurements)
+## A Model of Optimal Estimates Reported by an Agency
 
-Model 2 corresponds to a data collecting agency that, instead of
-reporting raw error-corrupted data, applies an optimal filter
-to construct least-squares estimates of the true variables.
+Suppose that instead of reporting the error-corrupted data $\bar z_t$,
+the data collecting agency reports linear least-squares projections of
+the true data on a history of the error-corrupted data.
 
-This is a natural model for agencies that seasonally adjust
-data (one-sided filtering of current and past observations) or
-publish preliminary, revised, and final estimates of the same
-variable (successive conditional expectations as more data
-accumulate).
+This model provides a possible way of interpreting two features of
+the data-reporting process.
 
-Specifically, the agency uses the Kalman filter from Model 1 to form
-$\hat x_t = E[x_t \mid \bar z_t, \bar z_{t-1}, \ldots]$ and reports
-filtered estimates
+- *seasonal adjustment*: if the components of $v_t$ have
+strong seasonals, the optimal filter will assume a shape that can be
+interpreted partly in terms of a seasonal adjustment filter, one that
+is one-sided in current and past $\bar z_t$'s.
 
-```{math}
-\tilde z_t = G \hat x_t,
-```
+- *data revisions*: if $z_t$ contains current and lagged
+values of some variable of interest, then the model simultaneously
+determines "preliminary," "revised," and "final" estimates as
+successive conditional expectations based on progressively longer
+histories of error-ridden observations.
+
+To make this operational, we impute to the reporting agency a model of
+the joint process generating the true data and the measurement errors.
+
+We assume that the reporting agency has "rational expectations": it
+knows the economic and measurement structure leading to
+{eq}`model1_transformed`--{eq}`model1_covs`.
 
-where $G = C$ is a selection matrix.
+To prepare its estimates, the reporting agency itself computes the
+Kalman filter to obtain the innovations representation {eq}`model1_innov`.
 
-### State-space for filtered data
+Rather than reporting the error-corrupted data $\bar z_t$, the agency
+reports $\tilde z_t = G \hat x_t$, where $G$ is a "selection matrix,"
+possibly equal to $C$, for the data reported by the agency.
 
-From the innovations representation {eq}`model1_innov`, the state
-$\hat x_t$ evolves as
+The data $G \hat x_t = E[G x_t \mid \bar z_t, \bar z_{t-1}, \ldots, \hat x_0]$.
+
+The state-space representation for the reported data is then
 
 ```{math}
 :label: model2_state
-\hat x_{t+1} = A \hat x_t + K_1 u_t.
+\begin{aligned}
+\hat x_{t+1} &= A \hat x_t + K_1 u_t, \\
+\tilde z_t &= G \hat x_t,
+\end{aligned}
 ```
 
-The reported filtered data are then
+where the first line of {eq}`model2_state` is from the innovations
+representation {eq}`model1_innov`.
+
+Note that $u_t$ is the innovation to $\bar z_{t+1}$ and is *not* the
+innovation to $\tilde z_t$.
+
+To obtain a Wold representation for $\tilde z_t$ and the likelihood
+function for a sample of $\tilde z_t$ requires that we obtain an
+innovations representation for {eq}`model2_state`.
+
+### Innovations representation for filtered data
+
+To add a little generality to {eq}`model2_state` we amend it to the system
 
 ```{math}
 :label: model2_obs
-\tilde z_t = C \hat x_t + \eta_t,
+\begin{aligned}
+\hat x_{t+1} &= A \hat x_t + K_1 u_t, \\
+\tilde z_t &= G \hat x_t + \eta_t,
+\end{aligned}
 ```
 
 where $\eta_t$ is a type 2 white-noise measurement error process
 ("typos") with presumably very small covariance matrix $R_2$.
 
-The state noise in {eq}`model2_state` is $K_1 u_t$, which has covariance
-
-```{math}
-:label: model2_Q
-Q_2 = K_1 V_1 K_1^\top.
-```
-
 The covariance matrix of the joint noise is
 
 ```{math}
+:label: model2_Q
 E \begin{bmatrix} K_1 u_t \\ \eta_t \end{bmatrix}
   \begin{bmatrix} K_1 u_t \\ \eta_t \end{bmatrix}^\top
-= \begin{bmatrix} Q_2 & 0 \\ 0 & R_2 \end{bmatrix}.
+= \begin{bmatrix} Q_2 & 0 \\ 0 & R_2 \end{bmatrix},
 ```
 
-Since $R_2$ is close to or equal to zero (the filtered data have
-negligible additional noise), we approximate it with a small
-regularization term $R_2 = \epsilon I$ to keep the Kalman filter
+where $Q_2 = K_1 V_1 K_1^\top$.
+
+If $R_2$ is singular, it is necessary to adjust the Kalman filtering
+formulas by using transformations that induce a "reduced order observer."
+
+In practice, we approximate a zero $R_2$ matrix with the matrix
+$\epsilon I$ for a small $\epsilon > 0$ to keep the Kalman filter
 numerically well-conditioned.
 
-A second Kalman filter applied to {eq}`model2_state`--{eq}`model2_obs`
-yields a second innovations representation
+For system {eq}`model2_obs` and {eq}`model2_Q`, an innovations
+representation is
 
 ```{math}
 :label: model2_innov
 \begin{aligned}
 \check{x}_{t+1} &= A \check{x}_t + K_2 a_t, \\
-\tilde z_t &= C \check{x}_t + a_t.
+\tilde z_t &= G \check{x}_t + a_t,
+\end{aligned}
+```
+
+where
+
+```{math}
+:label: model2_innov_defs
+\begin{aligned}
+a_t &= \tilde z_t - E[\tilde z_t \mid \tilde z_{t-1}, \tilde z_{t-2}, \ldots], \\
+\check{x}_t &= E[\hat x_t \mid \tilde z_{t-1}, \tilde z_{t-2}, \ldots, \check{x}_0], \\
+S_2 &= E[(\hat x_t - \check{x}_t)(\hat x_t - \check{x}_t)^\top], \\
+[K_2, S_2] &= \text{kelmanfilter}(A, G, Q_2, R_2, 0).
 \end{aligned}
 ```
 
-where $a_t$ is the innovation process for the filtered data with
-covariance $V_2 = C S_2 C^\top + R_2$.
+Thus $\{a_t\}$ is the innovation process for the reported data
+$\tilde z_t$, with innovation covariance
+
+```{math}
+:label: model2_V2
+V_2 = E\, a_t a_t^\top = G\, S_2\, G^\top + R_2.
+```
+
+### Wold representation
 
-To compute the innovations $\{a_t\}$ from observations on
-$\tilde z_t$, use
+A Wold moving average representation for $\tilde z_t$ is found from
+{eq}`model2_innov` to be
+
+```{math}
+:label: model2_wold
+\tilde z_t = \bigl[G(I - AL)^{-1} K_2 L + I\bigr] a_t,
+```
+
+with coefficients $\psi_0 = I$ and $\psi_j = G A^{j-1} K_2$ for
+$j \geq 1$.
+
+Note that this is simpler than the Model 1 Wold
+representation {eq}`model1_wold` because there is no quasi-differencing
+to undo.
+
+### Gaussian likelihood
+
+When a method analogous to Model 1 is used, a Gaussian log-likelihood
+for $\tilde z_t$ can be computed by first computing an $\{a_t\}$ sequence
+from observations on $\tilde z_t$ by using
 
 ```{math}
 :label: model2_recursion
 \begin{aligned}
-\check{x}_{t+1} &= (A - K_2 C)\,\check{x}_t + K_2 \tilde z_t, \\
-a_t &= -C\,\check{x}_t + \tilde z_t.
+\check{x}_{t+1} &= (A - K_2 G)\,\check{x}_t + K_2 \tilde z_t, \\
+a_t &= -G\,\check{x}_t + \tilde z_t.
 \end{aligned}
 ```
 
-The Gaussian log-likelihood for a sample of $T$ observations
+The likelihood function for a sample of $T$ observations
 $\{\tilde z_t\}$ is then
 
 ```{math}
@@ -970,28 +1072,27 @@ $\{\tilde z_t\}$ is then
   - \tfrac{1}{2}\sum_{t=0}^{T-1} a_t' V_2^{-1} a_t.
 ```
 
-Computing {eq}`model2_loglik` requires both the first Kalman filter
-(to form $\hat x_t$ and $u_t$) and the second Kalman filter
-(to form $\check{x}_t$ and $a_t$).
+Note that relative to computing the likelihood function
+{eq}`model1_loglik` for the error-corrupted data, computing the
+likelihood function for the optimally filtered data requires more
+calculations.
 
-In effect, the econometrician must retrace the steps that the agency
-used to synthesize the filtered data.
+Both likelihood functions require that the Kalman filter
+{eq}`model1_innov_defs` be computed, while the likelihood function for
+the filtered data requires that the Kalman filter
+{eq}`model2_innov_defs` also be computed.
 
-### Wold representation for filtered data
+In effect, in order to interpret and use the filtered data reported by
+the agency, it is necessary to retrace the steps that the agency used
+to synthesize those data.
 
-The Wold moving-average representation for $\tilde z_t$ is
+The Kalman filter {eq}`model1_innov_defs` is supposed to be formed by
+the agency.
 
-```{math}
-:label: model2_wold
-\tilde z_t = \bigl[C(I - AL)^{-1} K_2 L + I\bigr] a_t,
-```
-
-with coefficients $\psi_0 = I$ and $\psi_j = C A^{j-1} K_2$ for
-$j \geq 1$.
+The agency need not use Kalman filter {eq}`model2_innov_defs` because
+it does not need the Wold representation for the filtered data.
 
-Note that this is simpler than the Model 1 Wold
-representation {eq}`model1_wold` because there is no quasi-differencing
-to undo
+In our parameterization $G = C$.
 
 ```{code-cell} ipython3
 Q2 = K1 @ V1 @ K1.T
@@ -1010,7 +1111,8 @@ def filtered_wold_coeffs(A, C, K, n_terms=25):
 
 
 psi2 = filtered_wold_coeffs(A, C, K2, n_terms=40)
-resp2 = np.array([psi2[j] @ linalg.cholesky(V2, lower=True) for j in range(14)])
+resp2 = np.array(
+  [psi2[j] @ linalg.cholesky(V2, lower=True) for j in range(14)])
 decomp2 = fev_contributions(psi2, V2, n_horizons=20)
 ```
 
@@ -1040,10 +1142,6 @@ The second and third innovations contribute negligibly.
 This confirms that filtering strips away the measurement noise that created
 the appearance of multiple independent sources of variation in Model 1.
 
-We invite readers to compare this table to the one for the true impulse responses in the {ref}`true-impulse-responses` section above.
-
-The numbers are essentially the same.
-
 The covariance matrix and eigenvalues of the Model 2 innovations are
 
 ```{code-cell} ipython3
@@ -1081,8 +1179,12 @@ display(Latex('$' + r' \quad '.join(parts) + '$'))
 
 The income innovation in Model 2 produces responses that closely
 approximate the true impulse response function from the structural
-shock $\theta_t$ (compare with the table in the
-{ref}`true-impulse-responses` section above).
+shock $\theta_t$.
+
+Readers can compare the left table with the table in the
+{ref}`true-impulse-responses` section above.
+
+The numbers are essentially the same.
 
 The consumption and investment innovations produce responses
 that are orders of magnitude smaller, confirming that the filtered
@@ -1093,13 +1195,25 @@ Unlike Model 1, the filtered data from Model 2
 accelerator literature has documented empirically.
 
 
+Hence, at the population level, the two measurement models imply different
+empirical stories even though they share the same structural economy.
+
+- In Model 1 (raw data), measurement noise creates multiple innovations
+  and an apparent Granger-causality pattern.
+- In Model 2 (filtered data), innovations collapse back to essentially
+  one dominant shock, mirroring the true one-index economy.
+
+Let's verify these implications in a finite sample simulation.
+
 ## Simulation
 
 The tables above characterize population moments of the two models.
 
-We now simulate 80 periods of true, measured, and filtered data
+Let's simulate 80 periods of true, measured, and filtered data
 to compare population implications with finite-sample behavior.
 
+First, we define a function to simulate the true economy, generate measured data with AR(1) measurement errors, and apply the Model 1 Kalman filter to produce filtered estimates
+
 ```{code-cell} ipython3
 def simulate_series(seed=7909, T=80, k0=10.0):
     """
@@ -1161,6 +1275,8 @@ def simulate_series(seed=7909, T=80, k0=10.0):
 sim = simulate_series(seed=7909, T=80, k0=10.0)
 ```
 
+We use the following helper function to plot the true series against either the measured or filtered series
+
 ```{code-cell} ipython3
 def plot_true_vs_other(t, true_series, other_series, 
                                   other_label, ylabel=""):
@@ -1178,6 +1294,8 @@ def plot_true_vs_other(t, true_series, other_series,
 t = np.arange(1, 81)
 ```
 
+Let's first compare the true series with the measured series to see how measurement errors distort the data
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -1221,8 +1339,9 @@ Investment is distorted the most because its measurement error
 has the largest innovation variance ($\sigma_\eta = 0.65$),
 while income is distorted the least ($\sigma_\eta = 0.05$).
 
-The Kalman-filtered estimates from Model 1 remove much of the
-measurement noise and track the truth closely.
+
+For the filtered series, we expect the Kalman filter to recover the true series more closely by stripping away measurement noise
+
 
 ```{code-cell} ipython3
 ---
@@ -1276,6 +1395,9 @@ plot_true_vs_other(t, sim["k_true"], sim["k_filt"],
                                     "filtered", ylabel="capital stock")
 ```
 
+Indeed, Kalman-filtered estimates from Model 1 remove much of the
+measurement noise and track the truth closely.
+
 In the true model the national income identity
 $c_t + \Delta k_t = y_{n,t}$ holds exactly.
 
@@ -1284,6 +1406,9 @@ in the measured data.
 
 The Kalman filter approximately restores it.
 
+The following figure confirms this by showing the residual $c_t + \Delta k_t - y_{n,t}$ for
+both measured and filtered data
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -1311,100 +1436,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-For each variable $w \in \{c, \Delta k, y_n\}$ we compute the
-covariance and correlation matrices among its true, measured, and
-filtered versions.
-
-Each matrix has the structure
-
-```{math}
-\begin{bmatrix}
-\text{var}(w^{\text{true}}) & \text{cov}(w^{\text{true}}, w^{\text{meas}}) & \text{cov}(w^{\text{true}}, w^{\text{filt}}) \\
-\cdot & \text{var}(w^{\text{meas}}) & \text{cov}(w^{\text{meas}}, w^{\text{filt}}) \\
-\cdot & \cdot & \text{var}(w^{\text{filt}})
-\end{bmatrix}.
-```
-
-The key entries are the off-diagonal terms linking true to measured
-(distortion from noise) and true to filtered (recovery by the Kalman
-filter).
-
-```{code-cell} ipython3
-def cov_corr_three(a, b, c):
-    X = np.vstack([a, b, c])
-    return np.cov(X), np.corrcoef(X)
-
-def matrix_df(mat, labels):
-    return pd.DataFrame(np.round(mat, 4), index=labels, columns=labels)
-
-cov_c, corr_c = cov_corr_three(
-                sim["c_true"], sim["c_meas"], sim["c_filt"])
-cov_i, corr_i = cov_corr_three(
-                sim["dk_true"], sim["dk_meas"], sim["dk_filt"])
-cov_y, corr_y = cov_corr_three(
-                sim["y_true"], sim["y_meas"], sim["y_filt"])
-cov_k = np.cov(np.vstack([sim["k_true"], sim["k_filt"]]))
-corr_k = np.corrcoef(np.vstack([sim["k_true"], sim["k_filt"]]))
-
-tmf_labels = ['true', 'measured', 'filtered']
-tf_labels = ['true', 'filtered']
-```
-
-For consumption, measurement error inflates the variance of measured
-consumption relative to the truth, as the diagonal of the covariance
-matrix shows
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(cov_c, tmf_labels))))
-```
-
-The correlation matrix confirms that the filtered series recovers the
-true series almost perfectly 
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(corr_c, tmf_labels))))
-```
-
-For investment, measurement error creates the most variance inflation here
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(cov_i, tmf_labels))))
-```
-
-Despite this, the true-filtered correlation remains high,
-demonstrating the filter's effectiveness even with severe noise
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(corr_i, tmf_labels))))
-```
-
-Income has the smallest measurement error ($\sigma_\eta = 0.05$),
-so measured and true covariances are nearly identical
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(cov_y, tmf_labels))))
-```
-
-The correlation matrix shows that both measured and filtered series
-track the truth very closely
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(corr_y, tmf_labels))))
-```
-
-The capital stock is never directly observed, yet
-the covariance matrix shows that the filter recovers it with very
-high accuracy
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(cov_k, tf_labels))))
-```
-
-The near-unity correlation confirms this
-
-```{code-cell} ipython3
-display(Latex(df_to_latex_matrix(matrix_df(corr_k, tf_labels))))
-```
+As we have predicted, the residual for the measured data is large and volatile, while the residual for the filtered data is numerically 0.
 
 ## Summary
 

From 150f848ab9b3632a2a11d8ab6238c9f87b3665a9 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 10 Feb 2026 11:25:24 +1100
Subject: [PATCH 15/37] updates

---
 lectures/measurement_models.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index ebddc3fe2..39103ddd0 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -842,7 +842,7 @@ Granger-cause consumption and investment, but not vice versa.
 This matches the paper's message that, in a one-common-index model,
 the relatively best measured series has the strongest predictive content.
 
-Let's look at the the covariance matrix of the innovations
+Let's look at the covariance matrix of the innovations
 
 ```{code-cell} ipython3
 print('Covariance matrix of innovations:')
@@ -1020,7 +1020,7 @@ where
 a_t &= \tilde z_t - E[\tilde z_t \mid \tilde z_{t-1}, \tilde z_{t-2}, \ldots], \\
 \check{x}_t &= E[\hat x_t \mid \tilde z_{t-1}, \tilde z_{t-2}, \ldots, \check{x}_0], \\
 S_2 &= E[(\hat x_t - \check{x}_t)(\hat x_t - \check{x}_t)^\top], \\
-[K_2, S_2] &= \text{kelmanfilter}(A, G, Q_2, R_2, 0).
+[K_2, S_2] &= \text{kalmanfilter}(A, G, Q_2, R_2, 0).
 \end{aligned}
 ```
 

From d6d2fe1fafbeda6461a0dd5eee76649ea6c09455 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Tue, 10 Feb 2026 11:29:37 +1100
Subject: [PATCH 16/37] updates

---
 lectures/measurement_models.md | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index 39103ddd0..0ae50c7c0 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -302,7 +302,7 @@ of $j$-step-ahead prediction error variances, depends on the relative
 variances of the measurement errors.
 
 In this case, each observed series mixes the common signal $\theta_t$
-with idiosyncratic measurement noise. 
+with idiosyncratic measurement noise.
 
 A series with lower measurement
 error variance tracks $\theta_t$ more closely, so its innovations
@@ -518,7 +518,7 @@ def steady_state_kalman(A, C_obs, Q, R, W=None, tol=1e-13, max_iter=200_000):
 With structural matrices and tools we need in place, we now follow
 {cite:t}`Sargent1989`'s two reporting schemes in sequence.
 
-## A Classical Model of Measurements Initially Collected by an Agency
+## A classical model of measurements initially collected by an agency
 
 A data collecting agency observes a noise-corrupted version of $z_t$, namely
 
@@ -595,8 +595,8 @@ where $(\varepsilon_t, \bar\nu_t)$ is a white noise process with
 ```{math}
 :label: model1_covs
 E \begin{bmatrix} \varepsilon_t \end{bmatrix}
-\begin{bmatrix} \varepsilon_t' & \bar\nu_t' \end{bmatrix}
-= \begin{bmatrix} Q & W_1 \\ W_1' & R_1 \end{bmatrix},
+\begin{bmatrix} \varepsilon_t^\top & \bar\nu_t^\top \end{bmatrix}
+= \begin{bmatrix} Q & W_1 \\ W_1^\top & R_1 \end{bmatrix},
 \qquad
 R_1 = C Q C^\top + R, \quad W_1 = Q C^\top.
 ```
@@ -757,7 +757,7 @@ $\hat x_0$, can be represented as
 ```{math}
 :label: model1_loglik
 \mathcal{L}^* = -T\ln 2\pi - \tfrac{1}{2}T\ln|V_1|
-  - \tfrac{1}{2}\sum_{t=0}^{T-1} u_t' V_1^{-1} u_t,
+  - \tfrac{1}{2}\sum_{t=0}^{T-1} u_t^\top V_1^{-1} u_t,
 ```
 
 where $u_t$ is a function of $\{\bar z_t\}$ defined by
@@ -912,7 +912,7 @@ decay according to the AR(1) structure of their respective
 measurement errors ($\rho_c = 0.7$, $\rho_{\Delta k} = 0.3$),
 with little spillover to other variables.
 
-## A Model of Optimal Estimates Reported by an Agency
+## A model of optimal estimates reported by an agency
 
 Suppose that instead of reporting the error-corrupted data $\bar z_t$,
 the data collecting agency reports linear least-squares projections of
@@ -1069,7 +1069,7 @@ $\{\tilde z_t\}$ is then
 ```{math}
 :label: model2_loglik
 \mathcal{L}^{**} = -T\ln 2\pi - \tfrac{1}{2}T\ln|V_2|
-  - \tfrac{1}{2}\sum_{t=0}^{T-1} a_t' V_2^{-1} a_t.
+  - \tfrac{1}{2}\sum_{t=0}^{T-1} a_t^\top V_2^{-1} a_t.
 ```
 
 Note that relative to computing the likelihood function
@@ -1159,8 +1159,6 @@ As {cite:t}`Sargent1989` emphasizes, the two models of measurement
 produce quite different inferences about the economy's dynamics despite
 sharing identical underlying parameters.
 
-
-
 ### Wold impulse responses
 
 We again use orthogonalized Wold responses with a Cholesky
@@ -1194,7 +1192,6 @@ Unlike Model 1, the filtered data from Model 2
 *cannot* reproduce the apparent Granger causality pattern that the
 accelerator literature has documented empirically.
 
-
 Hence, at the population level, the two measurement models imply different
 empirical stories even though they share the same structural economy.
 
@@ -1339,10 +1336,8 @@ Investment is distorted the most because its measurement error
 has the largest innovation variance ($\sigma_\eta = 0.65$),
 while income is distorted the least ($\sigma_\eta = 0.05$).
 
-
 For the filtered series, we expect the Kalman filter to recover the true series more closely by stripping away measurement noise
 
-
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -1413,24 +1408,20 @@ both measured and filtered data
 ---
 mystnb:
   figure:
-    caption: "National income identity residual: measured (left) vs. filtered (right)"
+    caption: National income identity residual
     name: fig-identity-residual
-  image:
-    alt: National income identity residual for measured and filtered data side by side
 ---
 fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
 
 ax1.plot(t, sim["c_meas"] + sim["dk_meas"] - sim["y_meas"], lw=2)
 ax1.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
 ax1.set_xlabel("time", fontsize=12)
-ax1.set_ylabel("residual", fontsize=12)
-ax1.set_title(r'Measured: $c_t + \Delta k_t - y_{n,t}$', fontsize=13)
+ax1.set_ylabel("measured residual", fontsize=12)
 
 ax2.plot(t, sim["c_filt"] + sim["dk_filt"] - sim["y_filt"], lw=2)
 ax2.axhline(0, color='black', lw=0.8, ls='--', alpha=0.5)
 ax2.set_xlabel("time", fontsize=12)
-ax2.set_ylabel("residual", fontsize=12)
-ax2.set_title(r'Filtered: $c_t + \Delta k_t - y_{n,t}$', fontsize=13)
+ax2.set_ylabel("filtered residual", fontsize=12)
 
 plt.tight_layout()
 plt.show()
@@ -1440,7 +1431,7 @@ As we have predicted, the residual for the measured data is large and volatile,
 
 ## Summary
 
-{cite}`Sargent1989` shows how measurement error alters an
+{cite:t}`Sargent1989` shows how measurement error alters an
 econometrician's view of a permanent income economy driven by
 the investment accelerator.
 

From 65b8e4ca76809eaa034a4cfb6c153fc2bb95edcc Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Tue, 10 Feb 2026 12:06:59 +0800
Subject: [PATCH 17/37] Tom's Feb 10 edits of two measurement models lecture

---
 lectures/measurement_models.md | 64 ++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/lectures/measurement_models.md b/lectures/measurement_models.md
index 0ae50c7c0..d657948e3 100644
--- a/lectures/measurement_models.md
+++ b/lectures/measurement_models.md
@@ -41,6 +41,11 @@ If accurate observations on these time series are available, one can
 use that mapping to implement parameter estimation methods based
 either on the likelihood function or on the method of moments.
 
+```{note} This is why econometrics estimation is often called an ''inverse'' problem, while
+simulating a model for given parameter values is called a ''direct problem''. The direct problem
+refers to the mapping we have just described, while the inverse problem involves somehow applying an ''inverse'' of that mapping to a data set that is treated as if it were one draw from the joint probability distribution described by the mapping.
+```
+
 However, if only error-ridden data exist for the variables of interest,
 then more steps are needed to extract parameter estimates.
 
@@ -55,22 +60,21 @@ about the economic structure.
 
 {cite:t}`Sargent1989` describes two alternative models of data generation
 in a {doc}`permanent income <perm_income>` economy in which the
-investment accelerator, the mechanism studied in {doc}`samuelson` and
-{doc}`chow_business_cycles`, drives business cycle fluctuations.
+investment accelerator, the mechanism studied in these two quantecon lectures --  {doc}`samuelson` and
+{doc}`chow_business_cycles` --  shapes  business cycle fluctuations.
 
 - In Model 1, the data collecting agency simply reports the
   error-ridden data that it collects.
-- In Model 2, although it collects error-ridden data that satisfy
-  a classical errors-in-variables model, the data collecting agency
-  filters the data and reports the best estimates that it possibly can.
+- In Model 2, the data collection agents first collects error-ridden data that satisfy
+  a classical errors-in-variables model, then filters the data,  and reports the filtered objects.
 
 Although the two models have the same "deep parameters," they produce
 quite different sets of restrictions on the data.
 
-In this lecture we follow {cite:t}`Sargent1989` and study how
-alternative measurement schemes change empirical implications.
+In this lecture we follow {cite:t}`Sargent1989` and study how these 
+alternative measurement schemes affect  empirical implications.
 
-We start with imports and helper functions used throughout
+We start with imports and helper functions to be used throughout this lecture
 
 ```{code-cell} ipython3
 import numpy as np
@@ -127,24 +131,24 @@ def df_to_latex_array(df):
 
 ## The economic model
 
-The true economy is a linear-quadratic version of a stochastic
-optimal growth model (see also {doc}`perm_income`).
+The data are generated by  a linear-quadratic version of a stochastic
+optimal growth model that is an instance of models described in this quantecon lecture: {doc}`perm_income`.
 
-A social planner maximizes
+A social planner chooses a stochastic process for $\{c_t, k_{t+1}\}_{t=0}^\infty$ that  maximizes
 
 ```{math}
 :label: planner_obj
 E \sum_{t=0}^{\infty} \beta^t \left( u_0 + u_1 c_t - \frac{u_2}{2} c_t^2 \right)
 ```
 
-subject to the technology
+subject to the restrictions imposed by the technology
 
 ```{math}
 :label: tech_constraint
-c_t + k_{t+1} = f k_t + \theta_t, \qquad \beta f^2 > 1,
+c_t + k_{t+1} = f k_t + \theta_t, \qquad \beta f^2 > 1.
 ```
 
-where $c_t$ is consumption, $k_t$ is the capital stock,
+Here $c_t$ is consumption, $k_t$ is the capital stock,
 $f$ is the gross rate of return on capital,
 and $\theta_t$ is an endowment or technology shock following
 
@@ -152,14 +156,12 @@ and $\theta_t$ is an endowment or technology shock following
 :label: shock_process
 a(L)\,\theta_t = \varepsilon_t,
 ```
-
-with $a(L) = 1 - a_1 L - a_2 L^2 - \cdots - a_r L^r$ having all roots
+where $L$ is the backward shift (or 'lag') operator and  $a(z) = 1 - a_1 z - a_2 z^2 - \cdots - a_r z^r$ having all its zeroes 
 outside the unit circle.
 
 ### Optimal decision rule
 
-The solution can be represented by the optimal decision rule
-for $c_t$:
+The  optimal decision rule for $c_t$ is 
 
 ```{math}
 :label: opt_decision
@@ -254,7 +256,7 @@ via a geometric distributed lag or "adaptive expectations" scheme.
 ### The accelerator puzzle
 
 When all variables are measured accurately and are driven by
-the single shock $\theta_t$, the spectral density of
+the single shock $\theta_t$, the spectral density matrix of
 $(c_t,\, k_{t+1}-k_t,\, y_{nt})$ has rank one at all frequencies.
 
 Each variable is an invertible one-sided distributed lag of the
@@ -321,11 +323,11 @@ This asymmetry drives the numerical results we observe soon.
 ### State-space formulation
 
 Let's map the economic model and the measurement process into
-a recursive state-space framework.
+a linear  state-space framework.
 
 Set $f = 1.05$ and $\theta_t \sim \mathcal{N}(0, 1)$.
 
-Define the state and observable vectors
+Define the state and observation vectors
 
 ```{math}
 x_t = \begin{bmatrix} k_t \\ \theta_t \end{bmatrix},
@@ -333,7 +335,7 @@ x_t = \begin{bmatrix} k_t \\ \theta_t \end{bmatrix},
 z_t = \begin{bmatrix} y_{nt} \\ c_t \\ \Delta k_t \end{bmatrix},
 ```
 
-so that the true economy follows the state-space system
+so that the error-free data are described by  the state-space system
 
 ```{math}
 :label: true_ss
@@ -394,10 +396,10 @@ Q = np.array([
 ### True impulse responses
 
 Before introducing measurement error, we compute the impulse response of
-the true system to a unit shock $\theta_0 = 1$.
+the error-free variables  to a unit shock $\theta_0 = 1$.
 
 This benchmark clarifies what changes when we later switch from
-true variables to reported variables.
+error-free  variables to  variables reported by the statistical agency.
 
 The response shows the investment accelerator clearly: the full impact on
 net income $y_n$ occurs at lag 0, while consumption adjusts by only
@@ -672,9 +674,9 @@ K1, S1, V1 = steady_state_kalman(A, C_bar, Q, R1, W1)
 ```
 
 
-### Computing the Wold coefficients
+### Computing  coefficients in a Wold moving average representation
 
-To compute the Wold coefficients in {eq}`model1_wold` numerically,
+To compute the moving average  coefficients in {eq}`model1_wold` numerically,
 define the augmented state
 
 ```{math}
@@ -707,7 +709,7 @@ I
 H_1 = [\bar C \;\; D].
 ```
 
-The Wold coefficients are then $\psi_0 = I$ and
+The moving average  coefficients are then $\psi_0 = I$ and
 $\psi_j = H_1 F_1^{j-1} G_1$ for $j \geq 1$.
 
 ```{code-cell} ipython3
@@ -864,14 +866,14 @@ the presence of a dominant common shock $\theta_t$
 
 ### Wold impulse responses
 
-The Wold impulse responses are reported using orthogonalized
+Impulse responses in the Wold representation are reported using orthogonalized
 innovations (Cholesky factorization of $V_1$ with ordering
 $y_n$, $c$, $\Delta k$).
 
 Under this method, lag-0 responses reflect both
 contemporaneous covariance and the Cholesky ordering.
 
-We first define a helper function to format the Wold responses as a LaTeX array
+We first define a helper function to format the  response coefficients as a LaTeX array
 
 ```{code-cell} ipython3
 lags = np.arange(14)
@@ -884,7 +886,7 @@ def wold_response_table(resp, shock_idx, lags):
     )
 ```
 
-Now we report the Wold responses to each orthogonalized innovation in a single table with three panels
+Now we report the impulse  responses to each orthogonalized innovation in a single table with three panels
 
 ```{code-cell} ipython3
 wold_titles = [r'\text{A. Response to } y_n \text{ innovation}',
@@ -1161,7 +1163,7 @@ sharing identical underlying parameters.
 
 ### Wold impulse responses
 
-We again use orthogonalized Wold responses with a Cholesky
+We again use orthogonalized Wold representation impulse responses with a Cholesky
 decomposition of $V_2$ ordered as $y_n$, $c$, $\Delta k$.
 
 ```{code-cell} ipython3

From 0261afa57e4c398194f0c6b2d53bf9c4aeba9ef5 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 11 Feb 2026 00:36:23 +1100
Subject: [PATCH 18/37] updates

---
 lectures/_toc.yml                 |    1 +
 lectures/doubts_or_variability.md | 1242 +++++++++++++++++++++++++++++
 2 files changed, 1243 insertions(+)
 create mode 100644 lectures/doubts_or_variability.md

diff --git a/lectures/_toc.yml b/lectures/_toc.yml
index 098deaaa7..ad308c958 100644
--- a/lectures/_toc.yml
+++ b/lectures/_toc.yml
@@ -123,6 +123,7 @@ parts:
   numbered: true
   chapters:
   - file: markov_asset
+  - file: doubts_or_variability
   - file: ge_arrow
   - file: harrison_kreps
   - file: morris_learn
diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
new file mode 100644
index 000000000..b6d9a1f65
--- /dev/null
+++ b/lectures/doubts_or_variability.md
@@ -0,0 +1,1242 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.17.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+(doubts_or_variability)=
+```{raw} jupyter
+<div id="qe-notebook-header" align="right" style="text-align:right;">
+        <a href="https://quantecon.org/" title="quantecon.org">
+                <img style="width:250px;display:inline;" width="250px" src="https://assets.quantecon.org/img/qe-menubar-logo.svg" alt="QuantEcon">
+        </a>
+</div>
+```
+
+# Doubts or Variability?
+
+```{contents} Contents
+:depth: 2
+```
+
+## Overview
+
+Robert Lucas Jr. opened a 2003 essay with a challenge:
+
+> *No one has found risk aversion parameters of 50 or 100 in the diversification of
+> individual portfolios, in the level of insurance deductibles, in the wage premiums
+> associated with occupations with high earnings risk, or in the revenues raised by
+> state-operated lotteries.*
+
+Tallarini {cite}`Tallarini_2000` had shown that a recursive preference specification could match the equity premium and the risk-free rate puzzle simultaneously.
+But matching required setting the risk-aversion coefficient $\gamma$ to around 50 for a random-walk consumption model and around 75 for a trend-stationary model --- exactly the range that provoked Lucas's skepticism.
+
+{cite}`BHS_2009` ask whether those large $\gamma$ values really measure aversion to atemporal risk.
+Their answer is no.
+The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a
+second recursion in which the agent has unit risk aversion but fears that the probability model governing consumption growth may be wrong.
+Under this reading, the parameter that looked like extreme risk aversion instead measures
+the agent's concern about **model misspecification**.
+
+The question then becomes: how much misspecification is plausible?
+Rather than calibrating $\gamma$ through Pratt-style thought experiments about known gambles,
+{cite}`BHS_2009` calibrate through a **detection-error probability** --- the chance of confusing the agent's baseline (approximating) model with the pessimistic (worst-case) model after seeing a finite sample.
+When detection-error probabilities are moderate, the implied $\gamma$ values are large enough to reach the Hansen--Jagannathan volatility bound.
+
+This reinterpretation changes the welfare question that asset prices answer.
+Large measured risk premia no longer imply large gains from smoothing known aggregate risk.
+Instead, they imply large gains from resolving model uncertainty --- a very different policy object.
+
+```{code-cell} ipython3
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from scipy.stats import norm
+
+np.set_printoptions(precision=4, suppress=True)
+
+
+def set_seed(seed=1234):
+    np.random.seed(seed)
+
+
+set_seed()
+```
+
+## The economic idea
+
+A representative consumer has a baseline probabilistic description of consumption growth.
+Call it the **approximating model**.
+The consumer does not fully trust this model.
+To formalize that distrust, she surrounds the approximating model with a set of nearby alternatives that are difficult to distinguish statistically in finite samples.
+
+Among those alternatives, a minimizing player inside the consumer's head selects a **worst-case model**.
+The resulting max-min problem generates a likelihood-ratio distortion $\hat g_{t+1}$ that tilts one-step-ahead probabilities toward adverse states.
+That distortion enters the stochastic discount factor alongside the usual intertemporal marginal rate of substitution, and its standard deviation is the **market price of model uncertainty** (MPU).
+
+The discipline on how much distortion is allowed comes not from introspection about
+willingness to pay for small known gambles, but from a statistical detection problem:
+given $T$ observations, how likely is a Bayesian to confuse the approximating model with the worst-case model?
+The answer is a **detection-error probability** $p(\theta^{-1})$.
+High $p$ means the two models are nearly indistinguishable and the consumer's fear of misspecification is hard to dismiss.
+
+## Four agent types and one key equivalence
+
+The analysis compares four preference specifications that are useful for different purposes.
+
+* **Type I** (Kreps--Porteus--Epstein--Zin--Tallarini): risk-sensitive recursive utility with risk-aversion parameter $\gamma$ and IES fixed at 1.
+* **Type II** (multiplier preferences): unit risk aversion but a penalty parameter $\theta$ on the relative entropy of probability distortions.
+* **Type III** (constraint preferences): unit risk aversion with a hard bound $\eta$ on discounted relative entropy.
+* **Type IV** (pessimistic ex post Bayesian): log utility under a single pessimistic probability model $\hat\Pi_\infty$.
+
+The pivotal result is that **types I and II are observationally equivalent** over consumption plans in this environment.
+The mapping is $\theta = [(1-\beta)(\gamma - 1)]^{-1}$.
+So when Tallarini sets $\gamma = 50$ to reach the Hansen--Jagannathan bound, one can equally say
+the consumer has unit risk aversion and a model-uncertainty penalty $\theta$ that corresponds to a
+moderate detection-error probability.
+The quantitative fit is unchanged; only the economic interpretation shifts.
+
+## Setup
+
+The calibration uses quarterly U.S. data from 1948:2--2006:4 for consumption **growth rates** (a sample length of $T = 235$ quarters).
+When we plot **levels** of log consumption (as in Fig. 6), we align the time index to 1948:1--2006:4, which yields $T+1 = 236$ quarterly observations.
+Parameter estimates for two consumption-growth specifications (random walk and trend stationary)
+come from Table 2 of {cite}`BHS_2009`, and asset-return moments come from their Table 1.
+Following footnote 8 in {cite}`BHS_2009`, consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
+
+### Data
+
+Most numerical inputs in this lecture are taken directly from {cite}`BHS_2009`.
+Table 2 provides the maximum-likelihood estimates of $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the two consumption-growth specifications, and Table 1 provides the asset-return moments used in the Hansen--Jagannathan bound calculation.
+
+Following footnote 8 of {cite}`BHS_2009`, consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
+We construct this measure from three [FRED](https://fred.stlouisfed.org) series:
+
+| FRED series | Description |
+| --- | --- |
+| `PCNDGC96` | Real PCE: nondurable goods (billions of chained 2017 \$, SAAR) |
+| `PCESVC96` | Real PCE: services (billions of chained 2017 \$, SAAR) |
+| `CNP16OV` | Civilian noninstitutional population, 16+ (thousands, monthly) |
+
+The BEA deflates each PCE component by its own chain-type price index internally.
+Summing the chained-dollar components introduces a small Fisher-index non-additivity error, but this is negligible for our purposes and avoids the larger error of deflating the ND+SV nominal aggregate by the *overall* PCE deflator (which includes durables with secularly declining prices).
+
+The processing pipeline is:
+
+1. Add real nondurables and services: $C_t^{real} = C_t^{nd} + C_t^{sv}$.
+2. Convert to per-capita (millions of dollars per person): divide by the quarterly average of the monthly population series and by $10^6$.
+   This normalization matches the units in {cite}`BHS_2009`, where the trend-stationary intercept is $\zeta = -4.48$.
+3. Compute log consumption: $c_t = \log C_t^{real,pc}$.
+
+```{code-cell} ipython3
+start_date = "1947-01-01"
+end_date = "2007-01-01"
+
+
+def _fetch_fred_series(series_id, start_date, end_date):
+    # Keyless pull from FRED CSV endpoint.
+    url = f"https://fred.stlouisfed.org/graph/fredgraph.csv?id={series_id}"
+    df = pd.read_csv(url)
+    if df.empty:
+        raise ValueError(f"FRED returned an empty table for '{series_id}'")
+
+    # Be robust to header variations (e.g., DATE vs date, BOM, whitespace).
+    df.columns = [str(c).strip().lstrip("\ufeff") for c in df.columns]
+    date_col = df.columns[0]
+    value_col = series_id if series_id in df.columns else (df.columns[1] if len(df.columns) > 1 else None)
+    if value_col is None:
+        raise ValueError(
+            f"Unexpected FRED CSV format for '{series_id}'. Columns: {list(df.columns)}"
+        )
+
+    dates = pd.to_datetime(df[date_col], errors="coerce")
+    values = pd.to_numeric(df[value_col], errors="coerce")
+    out = pd.Series(values.to_numpy(), index=dates, name=series_id).dropna().sort_index()
+    out = out.loc[start_date:end_date].dropna()
+    if out.empty:
+        raise ValueError(f"FRED series '{series_id}' returned no data in sample window")
+    return out
+
+
+# Fetch real PCE components and population from FRED (no API key required)
+real_nd = _fetch_fred_series("PCNDGC96", start_date, end_date)
+real_sv = _fetch_fred_series("PCESVC96", start_date, end_date)
+pop_m = _fetch_fred_series("CNP16OV", start_date, end_date)
+
+# Step 1: aggregate real nondurables + services
+real_total = real_nd + real_sv
+
+# Step 2: align to quarterly frequency first, then convert to per-capita
+# real_total is in billions ($1e9), pop is in thousands ($1e3)
+# per-capita in millions: real_total * 1e9 / (pop * 1e3) / 1e6 = real_total / pop
+real_total_q = real_total.resample("QS").mean()
+pop_q = pop_m.resample("QS").mean()
+real_pc = (real_total_q / pop_q).dropna()
+
+# Restrict to sample period 1948Q1–2006Q4
+real_pc = real_pc.loc["1948-01-01":"2006-12-31"].dropna()
+
+# FRED-only fallback: use BEA per-capita quarterly components directly.
+# This avoids index-alignment failures in some pandas/FRED combinations.
+if real_pc.empty:
+    nd_pc = _fetch_fred_series("A796RX0Q048SBEA", start_date, end_date)
+    sv_pc = _fetch_fred_series("A797RX0Q048SBEA", start_date, end_date)
+    real_pc = ((nd_pc + sv_pc) / 1e6).loc["1948-01-01":"2006-12-31"].dropna()
+
+if real_pc.empty:
+    raise RuntimeError("FRED returned no usable observations after alignment/filtering")
+
+# Step 3: log consumption
+log_c_data = np.log(real_pc.to_numpy(dtype=float).reshape(-1))
+years_data = (real_pc.index.year + (real_pc.index.month - 1) / 12.0).to_numpy(dtype=float)
+
+print(f"Fetched {len(log_c_data)} quarterly observations from FRED")
+print(f"Sample: {years_data[0]:.1f} – {years_data[-1] + 0.25:.1f}")
+print(f"Observations: {len(log_c_data)}")
+print(f"c_0 = {log_c_data[0]:.3f} (paper Fig 6: ≈ −4.6)")
+```
+
+### Consumption plans and the state-space representation
+
+{cite}`BHS_2009` cast the analysis in terms of a general class of consumption plans.
+Let $x_t$ be an $n \times 1$ state vector and $\varepsilon_{t+1}$ an $m \times 1$ shock.
+A consumption plan belongs to the set $\mathcal{C}(A, B, H; x_0)$ if it admits the recursive representation
+
+```{math}
+:label: bhs_state_space
+x_{t+1} = A x_t + B \varepsilon_{t+1},
+\qquad
+c_t = H x_t,
+```
+
+where the eigenvalues of $A$ are bounded in modulus by $1/\sqrt{\beta}$.
+The time-$t$ element of a consumption plan can therefore be written as
+
+```{math}
+c_t = H\!\left(B\varepsilon_t + AB\varepsilon_{t-1} + \cdots + A^{t-1}B\varepsilon_1\right) + HA^t x_0.
+```
+
+The equivalence theorems and Bellman equations in the paper are stated for arbitrary plans in $\mathcal{C}(A,B,H;x_0)$.
+The random-walk and trend-stationary models below are two special cases.
+
+### Consumption dynamics
+
+Let $c_t = \log C_t$ be log consumption.
+
+The random-walk specification is
+
+```{math}
+c_{t+1} = c_t + \mu + \sigma_\varepsilon \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \sim \mathcal{N}(0, 1).
+```
+
+The trend-stationary specification can be written as a deterministic trend plus a stationary AR(1) component {cite}`BHS_2009`:
+
+```{math}
+c_t = \zeta + \mu t + z_t,
+\qquad
+z_{t+1} = \rho z_t + \sigma_\varepsilon \varepsilon_{t+1},
+\qquad
+\varepsilon_{t+1} \sim \mathcal{N}(0, 1).
+```
+
+Equivalently, defining the detrended series $\tilde c_t := c_t - \mu t$,
+
+```{math}
+\tilde c_{t+1} - \zeta = \rho(\tilde c_t - \zeta) + \sigma_\varepsilon \varepsilon_{t+1}.
+```
+
+Table 2 in {cite}`BHS_2009` reports $(\mu, \sigma_\varepsilon)$ for the random walk and $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the trend-stationary case.
+
+```{code-cell} ipython3
+# Preferences and sample length
+β = 0.995
+T = 235  # quarterly sample length used in the paper
+
+# Table 2 parameters
+rw = dict(μ=0.00495, σ_ε=0.0050)
+ts = dict(μ=0.00418, σ_ε=0.0050, ρ=0.980, ζ=-4.48)
+
+# Table 1 moments, converted from percent to decimals
+r_e_mean, r_e_std = 0.0227, 0.0768
+r_f_mean, r_f_std = 0.0032, 0.0061
+r_excess_std = 0.0767
+
+R_mean = np.array([1.0 + r_e_mean, 1.0 + r_f_mean])  # gross returns
+cov_erf = (r_e_std**2 + r_f_std**2 - r_excess_std**2) / 2.0
+Σ_R = np.array(
+    [
+        [r_e_std**2, cov_erf],
+        [cov_erf, r_f_std**2],
+    ]
+)
+Σ_R_inv = np.linalg.inv(Σ_R)
+
+print("Table 2 parameters")
+print(f"random walk: μ={rw['μ']:.5f}, σ_ε={rw['σ_ε']:.5f}")
+print(
+    f"trend stationary: μ={ts['μ']:.5f}, σ_ε={ts['σ_ε']:.5f}, "
+    f"ρ={ts['ρ']:.3f}, ζ={ts['ζ']:.2f}"
+)
+print()
+print("Table 1 moments")
+print(f"E[r_e]={r_e_mean:.4f}, std[r_e]={r_e_std:.4f}")
+print(f"E[r_f]={r_f_mean:.4f}, std[r_f]={r_f_std:.4f}")
+print(f"std[r_e-r_f]={r_excess_std:.4f}")
+```
+
+We can verify Table 2 by computing sample moments of log consumption growth from our FRED data:
+
+```{code-cell} ipython3
+# Growth rates: 1948Q2 to 2006Q4 (T = 235 quarters)
+Δc = np.diff(log_c_data)
+
+μ_hat = Δc.mean()
+σ_hat = Δc.std(ddof=1)
+
+print("Sample estimates from FRED data vs Table 2:")
+print(f"  μ̂   = {μ_hat:.5f}   (Table 2 RW: {rw['μ']:.5f})")
+print(f"  σ̂_ε = {σ_hat:.4f}    (Table 2: {rw['σ_ε']:.4f})")
+print(f"  T   = {len(Δc)} quarters")
+```
+
+## Preferences, distortions, and detection
+
+The type I recursion is
+
+```{math}
+:label: bhs_type1_recursion
+\log V_t
+=
+(1-\beta)c_t
++
+\frac{\beta}{1-\gamma}
+\log E_t\left[(V_{t+1})^{1-\gamma}\right].
+```
+
+### The transformed continuation value
+
+A key intermediate step in {cite}`BHS_2009` is to define the transformed continuation value
+
+```{math}
+:label: bhs_Ut_def
+U_t \equiv \frac{\log V_t}{1-\beta}
+```
+
+and the robustness parameter
+
+```{math}
+:label: bhs_theta_def
+\theta = \frac{-1}{(1-\beta)(1-\gamma)}.
+```
+
+Substituting into {eq}`bhs_type1_recursion` yields the **risk-sensitive recursion**
+
+```{math}
+:label: bhs_risk_sensitive
+U_t = c_t - \beta\theta \log E_t\!\left[\exp\!\left(\frac{-U_{t+1}}{\theta}\right)\right].
+```
+
+When $\gamma = 1$ (equivalently $\theta = +\infty$), the $\log E \exp$ term reduces to $E_t U_{t+1}$
+and the recursion becomes standard discounted expected log utility: $U_t = c_t + \beta E_t U_{t+1}$.
+
+For consumption plans in $\mathcal{C}(A, B, H; x_0)$, the recursion {eq}`bhs_risk_sensitive` implies the Bellman equation
+
+```{math}
+:label: bhs_bellman_type1
+U(x) = c - \beta\theta \log \int \exp\!\left[\frac{-U(Ax + B\varepsilon)}{\theta}\right] \pi(\varepsilon)\,d\varepsilon.
+```
+
+The stochastic discount factor can then be written as
+
+```{math}
+:label: bhs_sdf_Ut
+m_{t+1,t}
+=
+\beta \frac{C_t}{C_{t+1}}
+\cdot
+\frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}.
+```
+
+The second factor is the likelihood-ratio distortion $\hat g_{t+1}$: an exponential tilt of the continuation value that shifts probability toward states with low $U_{t+1}$.
+
+### Martingale likelihood ratios
+
+To formalize model distortions, {cite}`BHS_2009` use a nonnegative martingale $G_t$ with $E(G_t \mid x_0) = 1$ as a Radon--Nikodym derivative.
+Its one-step increments
+
+```{math}
+g_{t+1} = \frac{G_{t+1}}{G_t},
+\qquad
+E_t[g_{t+1}] = 1,
+\quad
+g_{t+1} \ge 0,
+\qquad
+G_0 = 1,
+```
+
+define distorted conditional expectations: $\tilde E_t[b_{t+1}] = E_t[g_{t+1}\,b_{t+1}]$.
+The conditional relative entropy of the distortion is $E_t[g_{t+1}\log g_{t+1}]$, and the discounted entropy over the entire path is $\beta E\bigl[\sum_{t=0}^{\infty} \beta^t G_t\,E_t(g_{t+1}\log g_{t+1})\,\big|\,x_0\bigr]$.
+
+### Type II: multiplier preferences
+
+A type II agent's **multiplier** preference ordering over consumption plans $C^\infty \in \mathcal{C}(A,B,H;x_0)$ is defined by
+
+```{math}
+:label: bhs_type2_objective
+\min_{\{g_{t+1}\}}
+\sum_{t=0}^{\infty} E\!\left\{\beta^t G_t
+\left[c_t + \beta\theta\,E_t\!\left(g_{t+1}\log g_{t+1}\right)\right]
+\,\Big|\, x_0\right\},
+```
+
+where $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \ge 0$, and $G_0 = 1$.
+The parameter $\theta > 0$ penalizes the relative entropy of probability distortions.
+
+The value function satisfies the Bellman equation
+
+```{math}
+:label: bhs_bellman_type2
+W(x)
+=
+c + \min_{g(\varepsilon) \ge 0}\;
+\beta \int \bigl[g(\varepsilon)\,W(Ax + B\varepsilon)
++ \theta\,g(\varepsilon)\log g(\varepsilon)\bigr]\,\pi(\varepsilon)\,d\varepsilon
+```
+
+subject to $\int g(\varepsilon)\,\pi(\varepsilon)\,d\varepsilon = 1$.
+Note that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty --- this is the key structural feature that makes $\hat g$ a likelihood ratio.
+
+The minimizer is
+
+```{math}
+:label: bhs_ghat
+\hat g_{t+1}
+=
+\frac{\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)}{E_t\!\left[\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)\right]}.
+```
+
+Substituting {eq}`bhs_ghat` back into {eq}`bhs_bellman_type2` gives
+
+$$W(x) = c - \beta\theta \log \int \exp\!\left[\frac{-W(Ax + B\varepsilon)}{\theta}\right]\pi(\varepsilon)\,d\varepsilon,$$
+
+which is identical to {eq}`bhs_bellman_type1`.
+Therefore $W(x) \equiv U(x)$, establishing that **types I and II are observationally equivalent** over elements of $\mathcal{C}(A,B,H;x_0)$.
+The mapping between parameters is
+
+```{math}
+\theta = \left[(1-\beta)(\gamma - 1)\right]^{-1}.
+```
+
+```{code-cell} ipython3
+def θ_from_γ(γ, β=β):
+    if γ <= 1:
+        return np.inf
+    return 1.0 / ((1.0 - β) * (γ - 1.0))
+
+
+def γ_from_θ(θ, β=β):
+    if np.isinf(θ):
+        return 1.0
+    return 1.0 + 1.0 / ((1.0 - β) * θ)
+```
+
+### Type III: constraint preferences
+
+Type III (constraint) preferences replace the entropy penalty with a hard bound.
+The agent minimizes expected discounted log consumption under the worst-case model,
+subject to a cap $\eta$ on discounted relative entropy:
+
+```{math}
+J(x_0)
+=
+\min_{\{g_{t+1}\}}
+\sum_{t=0}^{\infty} E\!\left[\beta^t G_t\,c_t \,\Big|\, x_0\right]
+```
+
+subject to $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \ge 0$, $G_0 = 1$, and
+
+```{math}
+\beta E\!\left[\sum_{t=0}^{\infty} \beta^t G_t\,E_t\!\left(g_{t+1}\log g_{t+1}\right)\,\Big|\,x_0\right] \le \eta.
+```
+
+The Lagrange multiplier on the entropy constraint is $\theta$, which connects type III to type II:
+for the particular $A, B, H$ and $\theta$ used to derive the worst-case joint distribution $\hat\Pi_\infty$,
+the shadow prices of uncertain claims for a type III agent match those of a type II agent.
+
+### Type IV: ex post Bayesian
+
+Type IV is an ordinary expected-utility agent with log preferences evaluated under a single pessimistic probability model $\hat\Pi_\infty$:
+
+```{math}
+\hat E_0 \sum_{t=0}^{\infty} \beta^t c_t.
+```
+
+The joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ is the one associated with the type II agent's worst-case distortion.
+For the particular $A, B, H$ and $\theta$ used to construct $\hat\Pi_\infty$, the type IV value function equals $J(x)$ from type III.
+
+### Stochastic discount factor
+
+Across all four types, the stochastic discount factor can be written compactly as
+
+```{math}
+:label: bhs_sdf
+m_{t+1,t}
+=
+\beta \frac{C_t}{C_{t+1}} \hat g_{t+1}.
+```
+
+The distortion $\hat g_{t+1}$ is a likelihood ratio between the approximating and worst-case one-step models.
+
+With log utility, $C_t/C_{t+1} = \exp(-(c_{t+1}-c_t))$ is the usual intertemporal marginal rate of substitution.
+Robustness multiplies that term by $\hat g_{t+1}$, so uncertainty aversion enters pricing only through the distortion.
+
+### Gaussian mean-shift distortions
+
+Under the random-walk model, the shock is $\varepsilon_{t+1} \sim \mathcal{N}(0, 1)$.
+The worst-case model shifts its mean to $-w$, which implies the likelihood ratio
+
+```{math}
+\hat g_{t+1}
+=
+\exp\left(-w \varepsilon_{t+1} - \frac{1}{2}w^2\right),
+\qquad
+E_t[\hat g_{t+1}] = 1.
+```
+
+Hence $\log \hat g_{t+1}$ is normal with mean $-w^2/2$ and variance $w^2$, and
+
+```{math}
+\operatorname{std}(\hat g_{t+1}) = \sqrt{e^{w^2}-1}.
+```
+
+For our Gaussian calibrations, the worst-case mean shift is summarized by
+
+```{math}
+:label: bhs_w_formulas
+w_{rw}(\theta) = -\frac{\sigma_\varepsilon}{(1-\beta)\theta},
+\qquad
+w_{ts}(\theta) = -\frac{\sigma_\varepsilon}{(1-\rho\beta)\theta}.
+```
+
+```{code-cell} ipython3
+def w_from_θ(θ, model):
+    if np.isinf(θ):
+        return 0.0
+    if model == "rw":
+        return -rw["σ_ε"] / ((1.0 - β) * θ)
+    if model == "ts":
+        return -ts["σ_ε"] / ((1.0 - β * ts["ρ"]) * θ)
+    raise ValueError("model must be 'rw' or 'ts'")
+```
+
+The **market price of model uncertainty** (MPU) is the conditional standard deviation of the distortion:
+
+```{math}
+:label: bhs_mpu_formula
+\text{MPU}
+=
+\operatorname{std}(\hat g_{t+1})
+=
+\sqrt{e^{w(\theta)^2}-1}
+\approx |w(\theta)|.
+```
+
+The detection error probability is
+
+```{math}
+:label: bhs_detection_formula
+p(\theta^{-1})
+=
+\frac{1}{2}\left(p_A + p_B\right),
+```
+
+and in our Gaussian mean-shift case reduces to
+
+```{math}
+:label: bhs_detection_closed
+p(\theta^{-1}) = \Phi\!\left(-\frac{|w(\theta)|\sqrt{T}}{2}\right).
+```
+
+```{code-cell} ipython3
+def detection_probability(θ, model):
+    w = abs(w_from_θ(θ, model))
+    return norm.cdf(-0.5 * w * np.sqrt(T))
+
+
+def θ_from_detection_probability(p, model):
+    if p >= 0.5:
+        return np.inf
+    w_abs = -2.0 * norm.ppf(p) / np.sqrt(T)
+    if model == "rw":
+        return rw["σ_ε"] / ((1.0 - β) * w_abs)
+    if model == "ts":
+        return ts["σ_ε"] / ((1.0 - β * ts["ρ"]) * w_abs)
+    raise ValueError("model must be 'rw' or 'ts'")
+```
+
+### Likelihood-ratio testing and detection errors
+
+Let $L_T$ be the log likelihood ratio between the worst-case and approximating models based on a sample of length $T$.
+Define
+
+```{math}
+p_A = \Pr_A(L_T < 0),
+\qquad
+p_B = \Pr_B(L_T > 0),
+```
+
+where $\Pr_A$ and $\Pr_B$ denote probabilities under the approximating and worst-case models.
+Then $p(\theta^{-1}) = \frac{1}{2}(p_A + p_B)$ is the average probability of choosing the wrong model.
+
+In the Gaussian mean-shift setting, $L_T$ is normal with mean $\pm \tfrac{1}{2}w^2T$ and variance $w^2T$, which yields the closed-form expression above.
+
+### Interpreting the calibration objects
+
+The parameter $\theta$ indexes how expensive it is for the minimizing player to distort the approximating model.
+
+A small $\theta$ means a cheap distortion and therefore stronger robustness concerns.
+
+The associated $\gamma = 1 + \left[(1-\beta)\theta\right]^{-1}$ can be large even when we do not want to interpret behavior as extreme atemporal risk aversion.
+
+The distortion magnitude $|w(\theta)|$ is a direct measure of how pessimistically the agent tilts one-step probabilities.
+
+Detection error probability $p(\theta^{-1})$ translates that tilt into a statistical statement about finite-sample distinguishability.
+
+High $p(\theta^{-1})$ means the two models are hard to distinguish.
+
+Low $p(\theta^{-1})$ means they are easier to distinguish.
+
+This translation is the bridge between econometric identification and preference calibration.
+
+Finally, the relative-entropy distance associated with the worst-case distortion is
+
+```{math}
+E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2,
+```
+
+so the discounted entropy used in type III preferences is
+
+```{math}
+\eta
+=
+\frac{\beta}{1-\beta}\cdot \frac{w(\theta)^2}{2},
+```
+
+```{code-cell} ipython3
+def η_from_θ(θ, model):
+    w = w_from_θ(θ, model)
+    return β * w**2 / (2.0 * (1.0 - β))
+```
+
+This is the mapping behind the right panel of the detection-probability figure below.
+
+## Tallarini's success and its cost
+
+Hansen and Jagannathan {cite}`Hansen_Jagannathan_1991` showed that any valid stochastic discount factor $m_{t+1,t}$ must satisfy a volatility bound: $\sigma(m)/E(m)$ must be at least as large as the maximum Sharpe ratio attainable in the market.
+Using postwar U.S. returns on the value-weighted NYSE and Treasury bills, this bound sets a
+high bar that time-separable CRRA preferences struggle to clear without also distorting the
+risk-free rate.
+
+In terms of the vector of gross returns $R_{t+1}$ with mean $E(R)$ and covariance matrix $\Sigma_R$,
+the bound can be written as
+
+```{math}
+\frac{\sigma(m)}{E(m)}
+\;\ge\;
+\sqrt{b^\top \Sigma_R^{-1} b},
+\qquad
+b = \mathbf{1} - E(m) E(R).
+```
+
+```{code-cell} ipython3
+def hj_std_bound(E_m):
+    b = np.ones(2) - E_m * R_mean
+    var_lb = b @ Σ_R_inv @ b
+    return np.sqrt(np.maximum(var_lb, 0.0))
+```
+
+Tallarini {cite}`Tallarini_2000` showed that recursive preferences with IES $= 1$ can clear this bar.
+By separating risk aversion $\gamma$ from the IES, the recursion pushes $\sigma(m)/E(m)$ upward
+while leaving $E(m)$ roughly consistent with the observed risk-free rate.
+
+For the two consumption specifications, {cite}`BHS_2009` derive closed-form expressions for the unconditional SDF moments.
+
+**Random walk** (eqs 15--16 of the paper):
+
+```{math}
+:label: bhs_Em_rw
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
+```
+
+```{math}
+:label: bhs_sigma_rw
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
+```
+
+**Trend stationary** (eqs 17--18):
+
+```{math}
+:label: bhs_Em_ts
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
+```
+
+```{math}
+:label: bhs_sigma_ts
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
+```
+
+These are what the code below implements.
+
+The figure below makes this visible.
+For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m)/E(m))$ pair
+for three specifications: time-separable CRRA (crosses), type I recursive preferences with random-walk consumption (circles), and type I recursive preferences with trend-stationary consumption (pluses).
+
+```{code-cell} ipython3
+def moments_type1_rw(γ):
+    θ = θ_from_γ(γ)
+    w = w_from_θ(θ, "rw")
+    var_log_m = (w - rw["σ_ε"]) ** 2
+    mean_log_m = np.log(β) - rw["μ"] - 0.5 * w**2
+    E_m = np.exp(mean_log_m + 0.5 * var_log_m)
+    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
+    return E_m, mpr
+
+
+def moments_type1_ts(γ):
+    θ = θ_from_γ(γ)
+    w = w_from_θ(θ, "ts")
+    var_z = ts["σ_ε"] ** 2 / (1.0 - ts["ρ"] ** 2)
+    var_log_m = (1.0 - ts["ρ"]) ** 2 * var_z + (w - ts["σ_ε"]) ** 2
+    mean_log_m = np.log(β) - ts["μ"] - 0.5 * w**2
+    E_m = np.exp(mean_log_m + 0.5 * var_log_m)
+    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
+    return E_m, mpr
+
+
+def moments_crra_rw(γ):
+    var_log_m = (γ * rw["σ_ε"]) ** 2
+    mean_log_m = np.log(β) - γ * rw["μ"]
+    E_m = np.exp(mean_log_m + 0.5 * var_log_m)
+    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
+    return E_m, mpr
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: stochastic discount factor moments and the Hansen-Jagannathan volatility
+      bound
+    name: fig-bhs-1
+---
+γ_grid = np.array([1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50], dtype=float)
+
+Em_rw = np.array([moments_type1_rw(γ)[0] for γ in γ_grid])
+MPR_rw = np.array([moments_type1_rw(γ)[1] for γ in γ_grid])
+
+Em_ts = np.array([moments_type1_ts(γ)[0] for γ in γ_grid])
+MPR_ts = np.array([moments_type1_ts(γ)[1] for γ in γ_grid])
+
+Em_crra = np.array([moments_crra_rw(γ)[0] for γ in γ_grid])
+MPR_crra = np.array([moments_crra_rw(γ)[1] for γ in γ_grid])
+
+Em_grid = np.linspace(0.8, 1.01, 1000)
+HJ_std = np.array([hj_std_bound(x) for x in Em_grid])
+
+fig, ax = plt.subplots(figsize=(7, 5))
+ax.plot(Em_grid, HJ_std, lw=2, color="black", label="Hansen-Jagannathan bound")
+ax.plot(Em_rw, MPR_rw, "o", lw=2, label="type I, random walk")
+ax.plot(Em_ts, MPR_ts, "+", lw=2, label="type I, trend stationary")
+ax.plot(Em_crra, MPR_crra, "x", lw=2, label="time-separable CRRA")
+
+ax.set_xlabel(r"$E(m)$")
+ax.set_ylabel(r"$\sigma(m)/E(m)$")
+ax.legend(frameon=False)
+ax.set_xlim(0.8, 1.01)
+ax.set_ylim(0.0, 0.42)
+
+plt.tight_layout()
+plt.show()
+```
+
+The crosses trace the familiar CRRA failure: as $\gamma$ rises, $\sigma(m)/E(m)$ grows but $E(m)$ falls well below the range consistent with the observed risk-free rate.
+This is the risk-free-rate puzzle of Weil {cite}`Weil_1989`.
+
+The circles and pluses show Tallarini's solution.
+Recursive utility with IES $= 1$ pushes volatility upward while keeping $E(m)$ roughly constant near $1/(1+r^f)$.
+For the random-walk model, the bound is reached around $\gamma = 50$; for the trend-stationary model, around $\gamma = 75$.
+
+The quantitative achievement is real.
+But Lucas's challenge still stands: what microeconomic evidence supports $\gamma = 50$?
+That tension is the starting point for the reinterpretation that follows.
+
+## A new calibration language: detection-error probabilities
+
+If $\gamma$ should not be calibrated by introspection about atemporal gambles, what replaces it?
+
+The answer is a statistical test.
+Fix a sample size $T$ (here 235 quarters, matching the postwar U.S. data).
+For a given $\theta$, compute the worst-case model and ask:
+if a Bayesian ran a likelihood-ratio test to distinguish the approximating model from the worst-case model, what fraction of the time would she make an error?
+That fraction is the detection-error probability $p(\theta^{-1})$.
+
+A high $p$ (near 0.5) means the two models are nearly indistinguishable --- the consumer's fear is hard to rule out.
+A low $p$ means the worst case is easy to reject and the robustness concern is less compelling.
+
+The left panel below plots $p(\theta^{-1})$ against $\theta^{-1}$ for the two consumption specifications.
+Notice that the same numerical $\theta$ corresponds to very different detection probabilities across models, because baseline dynamics differ.
+The right panel resolves this by plotting detection probabilities against discounted relative entropy $\eta$, which normalizes the statistical distance.
+Indexed by $\eta$, the two curves coincide.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: detection probabilities under random-walk and trend-stationary approximating
+      models
+    name: fig-bhs-2
+---
+θ_inv_grid = np.linspace(0.0, 1.8, 400)
+θ_grid = np.full_like(θ_inv_grid, np.inf)
+mask_θ = θ_inv_grid > 0.0
+θ_grid[mask_θ] = 1.0 / θ_inv_grid[mask_θ]
+
+p_rw = np.array([detection_probability(θ, "rw") for θ in θ_grid])
+p_ts = np.array([detection_probability(θ, "ts") for θ in θ_grid])
+
+η_rw = np.array([η_from_θ(θ, "rw") for θ in θ_grid])
+η_ts = np.array([η_from_θ(θ, "ts") for θ in θ_grid])
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+axes[0].plot(θ_inv_grid, 100.0 * p_rw, lw=2, label="random walk")
+axes[0].plot(θ_inv_grid, 100.0 * p_ts, lw=2, label="trend stationary")
+axes[0].set_xlabel(r"$\theta^{-1}$")
+axes[0].set_ylabel("detection error probability (percent)")
+axes[0].legend(frameon=False)
+
+axes[1].plot(η_rw, 100.0 * p_rw, lw=2, label="random walk")
+axes[1].plot(η_ts, 100.0 * p_ts, lw=2, ls="--", label="trend stationary")
+axes[1].set_xlabel(r"discounted entropy $\eta$")
+axes[1].set_ylabel("detection error probability (percent)")
+axes[1].set_xlim(0.0, 10)
+axes[1].legend(frameon=False)
+
+plt.tight_layout()
+plt.show()
+```
+
+This is why detection-error probabilities (or equivalently, discounted entropy) are the right cross-model yardstick.
+Holding $\theta$ fixed when switching from a random walk to a trend-stationary specification
+implicitly changes how much misspecification the consumer fears.
+Holding $\eta$ or $p$ fixed keeps the statistical difficulty of detecting misspecification constant.
+
+The explicit mapping that equates discounted entropy across models is (eq 41 of the paper):
+
+```{math}
+:label: bhs_theta_cross_model
+\theta_{\text{TS}}
+=
+\left(\frac{\sigma_\varepsilon^{\text{TS}}}{\sigma_\varepsilon^{\text{RW}}}\right)
+\frac{1-\beta}{1-\rho\beta}\;\theta_{\text{RW}}.
+```
+
+At our calibration $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$, this simplifies to
+$\theta_{\text{TS}} = \frac{1-\beta}{1-\rho\beta}\,\theta_{\text{RW}}$.
+Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ is much less than one,
+so holding entropy fixed requires a substantially smaller $\theta$ (stronger robustness) for the trend-stationary model than for the random walk.
+
+## The punchline: detection probabilities unify the two models
+
+We can now redraw Tallarini's figure using the new language.
+For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$,
+invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m),\;\sigma(m)/E(m))$ pair.
+
+```{code-cell} ipython3
+p_points = np.array([0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.01])
+
+θ_rw_points = np.array([θ_from_detection_probability(p, "rw") for p in p_points])
+θ_ts_points = np.array([θ_from_detection_probability(p, "ts") for p in p_points])
+
+γ_rw_points = np.array([γ_from_θ(θ) for θ in θ_rw_points])
+γ_ts_points = np.array([γ_from_θ(θ) for θ in θ_ts_points])
+
+Em_rw_p = np.array([moments_type1_rw(γ)[0] for γ in γ_rw_points])
+MPR_rw_p = np.array([moments_type1_rw(γ)[1] for γ in γ_rw_points])
+Em_ts_p = np.array([moments_type1_ts(γ)[0] for γ in γ_ts_points])
+MPR_ts_p = np.array([moments_type1_ts(γ)[1] for γ in γ_ts_points])
+
+print("p      γ_rw      γ_ts")
+for p, g1, g2 in zip(p_points, γ_rw_points, γ_ts_points):
+    print(f"{p:>4.2f} {g1:>9.2f} {g2:>9.2f}")
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: pricing loci obtained from common detection probabilities
+    name: fig-bhs-3
+---
+fig, ax = plt.subplots(figsize=(7, 5))
+ax.plot(Em_rw_p, MPR_rw_p, "o-", lw=2, label="random walk")
+ax.plot(Em_ts_p, MPR_ts_p, "+-", lw=2, label="trend stationary")
+ax.plot(Em_grid, HJ_std, lw=2, color="black", label="Hansen-Jagannathan bound")
+
+ax.set_xlabel(r"$E(m)$")
+ax.set_ylabel(r"$\sigma(m)/E(m)$")
+ax.legend(frameon=False, loc="upper right")
+ax.set_xlim(0.96, 1.05)
+ax.set_ylim(0.0, 0.34)
+
+plt.tight_layout()
+plt.show()
+```
+
+The striking result: the random-walk and trend-stationary loci nearly coincide.
+
+Recall that under Tallarini's $\gamma$-calibration, reaching the Hansen--Jagannathan bound required $\gamma \approx 50$ for the random walk but $\gamma \approx 75$ for the trend-stationary model --- very different numbers for the "same" preference parameter.
+Under detection-error calibration, both models reach the bound at the same detectability level (around $p = 0.05$).
+
+The model dependence was an artifact of using $\gamma$ as a cross-model yardstick.
+Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell the same story:
+a representative consumer with moderate, difficult-to-dismiss fears about model misspecification
+behaves as if she had very high risk aversion.
+
+## What do risk premia measure? Two mental experiments
+
+Lucas {cite}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate
+aggregate fluctuations.
+His answer --- very little --- rested on the assumption that the consumer knows the data-generating process.
+
+The robust reinterpretation introduces a second, distinct mental experiment.
+Instead of eliminating all randomness, suppose we keep randomness but remove the consumer's
+fear of model misspecification (set $\theta = \infty$).
+How much would she pay for that relief alone?
+
+Formally, define $\Delta c_0$ as a permanent proportional reduction in initial consumption that leaves the agent indifferent between
+the original environment and a counterfactual in which either (i) risk alone is removed or (ii) model uncertainty is removed.
+Because utility is log and the consumption process is Gaussian, these compensations are available in closed form.
+
+For type II preferences in the random-walk model, the decomposition is
+
+```{math}
+:label: bhs_type2_rw_decomp
+\Delta c_0^{risk}
+=
+\frac{\beta \sigma_\varepsilon^2}{2(1-\beta)},
+\qquad
+\Delta c_0^{uncertainty}
+=
+\frac{\beta \sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
+```
+
+For type III preferences in the random-walk model, the uncertainty term is twice as large:
+
+```{math}
+:label: bhs_type3_rw_decomp
+\Delta c_0^{uncertainty, III}
+=
+\frac{\beta \sigma_\varepsilon^2}{(1-\beta)^2\theta}.
+```
+
+For the trend-stationary model, denominators replace $(1-\beta)$ with $(1-\beta \rho)$ or $(1-\beta \rho^2)$ as detailed in Table 3 of {cite}`BHS_2009`, but the qualitative message is the same.
+
+The risk-only term $\Delta c_0^{risk}$ is tiny at postwar consumption volatility --- this is Lucas's well-known result.
+The model-uncertainty term $\Delta c_0^{uncertainty}$ can be first order whenever the detection-error probability is moderate, because $\theta$ appears in the denominator.
+
+## Visualizing the welfare decomposition
+
+We set $\beta = 0.995$ and calibrate $\theta$ so that $p(\theta^{-1}) = 0.10$, a conservative detection-error level.
+
+```{code-cell} ipython3
+p_star = 0.10
+θ_star = θ_from_detection_probability(p_star, "rw")
+γ_star = γ_from_θ(θ_star)
+w_star = w_from_θ(θ_star, "rw")
+
+# Type II compensations, random walk model
+comp_risk_only = β * rw["σ_ε"] ** 2 / (2.0 * (1.0 - β))
+comp_risk_unc = comp_risk_only + β * rw["σ_ε"] ** 2 / (2.0 * (1.0 - β) ** 2 * θ_star)
+
+# Two useful decompositions in levels
+risk_only_pct = 100.0 * (np.exp(comp_risk_only) - 1.0)
+risk_unc_pct = 100.0 * (np.exp(comp_risk_unc) - 1.0)
+uncertainty_only_pct = 100.0 * (np.exp(comp_risk_unc - comp_risk_only) - 1.0)
+
+print(f"p*={p_star:.2f}, θ*={θ_star:.4f}, γ*={γ_star:.2f}, w*={w_star:.4f}")
+print(f"risk only compensation (log units): {comp_risk_only:.6f}")
+print(f"risk + uncertainty compensation (log units): {comp_risk_unc:.6f}")
+print(f"risk only compensation (percent): {risk_only_pct:.3f}%")
+print(f"risk + uncertainty compensation (percent): {risk_unc_pct:.3f}%")
+print(f"uncertainty component alone (percent): {uncertainty_only_pct:.3f}%")
+
+h = 250
+t = np.arange(h + 1)
+
+# Baseline approximating model fan
+mean_base = rw["μ"] * t
+std_base = rw["σ_ε"] * np.sqrt(t)
+
+# Certainty equivalent line from Eq. (47), shifted by compensating variations
+certainty_slope = rw["μ"] + 0.5 * rw["σ_ε"] ** 2
+ce_risk = -comp_risk_only + certainty_slope * t
+ce_risk_unc = -comp_risk_unc + certainty_slope * t
+
+# Alternative models from the ambiguity set in panel B
+mean_low = (rw["μ"] + rw["σ_ε"] * w_star) * t
+mean_high = (rw["μ"] - rw["σ_ε"] * w_star) * t
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: certainty-equivalent paths and the set of nearby models under robustness
+    name: fig-bhs-4
+---
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+# Panel A
+ax = axes[0]
+ax.fill_between(t, mean_base - std_base, mean_base + std_base, alpha=0.25, color="tab:blue")
+ax.plot(t, ce_risk_unc, lw=2, ls="--", color="black", label="certainty equivalent: risk + uncertainty")
+ax.plot(t, ce_risk, lw=2, color="tab:orange", label="certainty equivalent: risk only")
+ax.plot(t, mean_base, lw=2, color="tab:blue", label="approximating-model mean")
+ax.set_xlabel("quarters")
+ax.set_ylabel("log consumption")
+ax.legend(frameon=False, fontsize=8, loc="upper left")
+
+# Panel B
+ax = axes[1]
+ax.fill_between(t, mean_base - std_base, mean_base + std_base, alpha=0.20, color="tab:blue")
+ax.fill_between(t, mean_low - std_base, mean_low + std_base, alpha=0.20, color="tab:red")
+ax.fill_between(t, mean_high - std_base, mean_high + std_base, alpha=0.20, color="tab:green")
+ax.plot(t, ce_risk_unc, lw=2, ls="--", color="black", label="certainty equivalent: risk + uncertainty")
+ax.plot(t, mean_base, lw=2, color="tab:blue", label="approximating-model mean")
+ax.plot(t, mean_low, lw=2, color="tab:red", label="worst-case-leaning mean")
+ax.plot(t, mean_high, lw=2, color="tab:green", label="best-case-leaning mean")
+ax.set_xlabel("quarters")
+ax.set_ylabel("log consumption")
+ax.legend(frameon=False, fontsize=8, loc="upper left")
+
+plt.tight_layout()
+plt.show()
+```
+
+**Left panel.**
+The small gap between the baseline mean path and the "risk only" certainty equivalent is Lucas's result:
+at postwar consumption volatility, the welfare gain from eliminating well-understood aggregate risk is tiny.
+
+The much larger gap between the baseline and the "risk + uncertainty" certainty equivalent
+is the new object.
+Most of that gap is compensation for model uncertainty, not risk.
+
+**Right panel.**
+The cloud of nearby models shows what the robust consumer guards against.
+The red-shaded and green-shaded fans correspond to pessimistic and optimistic mean-shift distortions
+whose detection-error probability is $p = 0.10$.
+These models are statistically close to the baseline (blue) but imply very different long-run consumption levels.
+The consumer's caution against such alternatives is what drives the large certainty-equivalent gap in the left panel.
+
+## How large are the welfare gains from resolving model uncertainty?
+
+A type III (constraint-preference) agent evaluates the worst model inside an entropy ball of radius $\eta$.
+As $\eta$ grows, the set of plausible misspecifications expands and the welfare cost of confronting model uncertainty rises.
+Because $\eta$ is abstract, {cite}`BHS_2009` instead index these costs by the associated detection error probability $p(\eta)$.
+The figure below reproduces their display: compensation for removing model uncertainty, measured as a proportion of consumption, plotted against $p(\eta)$.
+
+```{code-cell} ipython3
+η_grid = np.linspace(0.0, 5.0, 300)
+
+# Use w and η relation, then convert to θ model by model
+w_abs_grid = np.sqrt(2.0 * (1.0 - β) * η_grid / β)
+
+θ_rw_from_η = np.full_like(w_abs_grid, np.inf)
+θ_ts_from_η = np.full_like(w_abs_grid, np.inf)
+mask_w = w_abs_grid > 0.0
+θ_rw_from_η[mask_w] = rw["σ_ε"] / ((1.0 - β) * w_abs_grid[mask_w])
+θ_ts_from_η[mask_w] = ts["σ_ε"] / ((1.0 - β * ts["ρ"]) * w_abs_grid[mask_w])
+
+# Type III uncertainty terms from Table 3
+gain_rw = np.where(
+    np.isinf(θ_rw_from_η),
+    0.0,
+    β * rw["σ_ε"] ** 2 / ((1.0 - β) ** 2 * θ_rw_from_η),
+)
+gain_ts = np.where(
+    np.isinf(θ_ts_from_η),
+    0.0,
+    β * ts["σ_ε"] ** 2 / ((1.0 - β * ts["ρ"]) ** 2 * θ_ts_from_η),
+)
+
+# Convert log compensation to percent of initial consumption in levels
+gain_rw_pct = 100.0 * (np.exp(gain_rw) - 1.0)
+gain_ts_pct = 100.0 * (np.exp(gain_ts) - 1.0)
+
+# Detection error probabilities implied by η (common across RW/TS for the Gaussian mean-shift case)
+p_eta_pct = 100.0 * norm.cdf(-0.5 * w_abs_grid * np.sqrt(T))
+order = np.argsort(p_eta_pct)
+p_plot = p_eta_pct[order]
+gain_rw_plot = gain_rw_pct[order]
+gain_ts_plot = gain_ts_pct[order]
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: type III compensation for model uncertainty across detection-error probabilities
+    name: fig-bhs-5
+---
+fig, ax = plt.subplots(figsize=(7, 4))
+ax.plot(p_plot, gain_rw_plot, lw=2, color="black", label="RW type III")
+ax.plot(p_plot, gain_ts_plot, lw=2, ls="--", color="gray", label="TS type III")
+ax.set_xlabel(r"detection error probability $p(\eta)$ (percent)")
+ax.set_ylabel("proportion of consumption (percent)")
+ax.set_xlim(0.0, 50.0)
+ax.set_ylim(0.0, 30.0)
+ax.legend(frameon=False, loc="upper right")
+
+plt.tight_layout()
+plt.show()
+```
+
+The random-walk model delivers somewhat larger costs than the trend-stationary model at the same detection-error probability, but both curves dwarf the classic Lucas cost of business cycles.
+
+To put the magnitudes in perspective: Lucas estimated that eliminating all aggregate consumption risk
+is worth roughly 0.05% of consumption.
+At detection-error probabilities of 10--20%, the model-uncertainty
+compensation alone runs to several percent of consumption.
+
+This is the welfare counterpart to the pricing result.
+The large risk premia that Tallarini matched with high $\gamma$ are, under the robust reading,
+compensations for bearing model uncertainty --- and the implied welfare gains from resolving that uncertainty are correspondingly large.
+
+## Why doesn't learning eliminate these fears?
+
+A natural objection: if the consumer has 235 quarters of data, why can't she learn the true drift
+well enough to dismiss the worst-case model?
+
+The answer is that drift is a low-frequency feature.
+Estimating the mean of a random walk to the precision needed to reject small but economically meaningful
+shifts requires far more data than estimating volatility.
+The figure below makes this concrete.
+
+```{code-cell} ipython3
+p_fig6 = 0.20
+
+# Figure 6 overlays deterministic lines on the loaded consumption data.
+# Use sample-estimated RW moments to avoid data-vintage drift mismatches.
+rw_fig6 = dict(μ=μ_hat, σ_ε=σ_hat)
+w_fig6 = 2.0 * norm.ppf(p_fig6) / np.sqrt(T)
+
+# Use FRED data loaded earlier in the lecture
+c = log_c_data
+years = years_data
+
+t6 = np.arange(T + 1)
+c0 = c[0]
+line_approx = c0 + rw_fig6["μ"] * t6
+line_worst = c0 + (rw_fig6["μ"] + rw_fig6["σ_ε"] * w_fig6) * t6
+
+p_right = np.linspace(0.01, 0.50, 500)
+w_right = 2.0 * norm.ppf(p_right) / np.sqrt(T)
+μ_worst_right = rw_fig6["μ"] + rw_fig6["σ_ε"] * w_right
+
+μ_se = rw_fig6["σ_ε"] / np.sqrt(T)
+upper_band = rw_fig6["μ"] + 2.0 * μ_se
+lower_band = rw_fig6["μ"] - 2.0 * μ_se
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: robustly distorted growth rates and finite-sample uncertainty about drift
+    name: fig-bhs-6
+---
+fig, axes = plt.subplots(1, 2, figsize=(12, 4))
+
+ax = axes[0]
+ax.plot(years, c, lw=2, color="tab:blue", label="log consumption")
+ax.plot(years, line_approx, lw=2, ls="--", color="black", label="approximating model")
+ax.plot(
+    years,
+    line_worst,
+    lw=2,
+    ls=":",
+    color="black",
+    label=rf"wc model $p(\theta^{{-1}})={p_fig6:.1f}$",
+)
+ax.set_xlabel("year")
+ax.set_ylabel("log consumption")
+ax.legend(frameon=False, fontsize=8, loc="upper left")
+
+ax = axes[1]
+ax.plot(
+    100.0 * p_right,
+    1_000.0 * μ_worst_right,
+    lw=2,
+    color="tab:red",
+    label=r"$\mu + \sigma_\varepsilon w(\theta)$",
+)
+ax.axhline(1_000.0 * rw_fig6["μ"], lw=2, color="black", label=r"$\hat\mu$")
+ax.axhline(1_000.0 * upper_band, lw=2, ls="--", color="gray", label=r"$\hat\mu \pm 2\hat s.e.$")
+ax.axhline(1_000.0 * lower_band, lw=2, ls="--", color="gray")
+ax.set_xlabel("detection error probability (percent)")
+ax.set_ylabel(r"mean consumption growth ($\times 10^{-3}$)")
+ax.legend(frameon=False, fontsize=8, loc="upper right")
+ax.set_title("2 standard deviation band", fontsize=10)
+ax.set_xlim(0.0, 50.0)
+ax.set_ylim(3.0, 6.0)
+
+plt.tight_layout()
+plt.show()
+```
+
+**Left panel.**
+Postwar U.S. log consumption is shown alongside two deterministic trend lines:
+the approximating-model drift $\mu$ and the worst-case drift $\mu + \sigma_\varepsilon w(\theta)$ for $p(\theta^{-1}) = 0.20$.
+The plotted consumption series is constructed from FRED data following the processing pipeline described in the Data section above.
+The two trends are close enough that, even with decades of data, it is hard to distinguish them by eye.
+
+**Right panel.**
+As the detection-error probability rises (models become harder to tell apart), the worst-case mean growth rate moves back toward $\hat\mu$.
+The dashed gray lines mark a two-standard-error band around the maximum-likelihood estimate of $\mu$.
+Even at detection probabilities in the 5--20% range, the worst-case drift remains inside (or very near) this confidence band.
+
+The upshot: drift distortions that are economically large --- large enough to generate substantial model-uncertainty premia --- are statistically small relative to sampling uncertainty in $\hat\mu$.
+A dogmatic Bayesian who conditions on a single approximating model and updates using Bayes' law
+will not learn her way out of this problem in samples of the length available.
+Robustness concerns survive long histories precisely because the low-frequency features that matter most for pricing are the hardest to pin down.
+
+## Concluding remarks
+
+The title asks a question: are large risk premia prices of **variability** (atemporal risk aversion)
+or prices of **doubts** (model uncertainty)?
+
+The analysis above shows that the answer cannot be settled by asset-pricing data alone,
+because the two interpretations are observationally equivalent.
+But the choice matters enormously for what we conclude.
+
+Under the risk-aversion reading, high Sharpe ratios imply that consumers would pay a great deal to smooth
+known aggregate consumption fluctuations.
+Under the robustness reading, those same Sharpe ratios tell us consumers would pay a great deal
+to resolve uncertainty about which probability model governs consumption growth --- a fundamentally different policy object.
+
+Three features of the analysis support the robustness reading:
+
+1. Detection-error probabilities provide a more stable calibration language than $\gamma$: the two consumption models that required very different $\gamma$ values to match the data yield nearly identical pricing implications when indexed by detectability.
+2. The welfare gains implied by asset prices decompose overwhelmingly into a model-uncertainty component, with the pure risk component remaining small --- consistent with Lucas's original finding.
+3. The drift distortions that drive pricing are small enough to hide inside standard-error bands, so finite-sample learning cannot eliminate the consumer's fears.
+
+Whether one ultimately prefers the risk or the uncertainty interpretation, the framework
+clarifies that the question is not about the size of risk premia but about the economic object those premia identify.

From ed6519ced4999b406e6016a9b4131bee2db3af80 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 11 Feb 2026 13:05:26 +1100
Subject: [PATCH 19/37] updates

---
 lectures/_static/quant-econ.bib   |   22 +
 lectures/doubts_or_variability.md | 1725 ++++++++++++++++++++++-------
 2 files changed, 1345 insertions(+), 402 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index bd35b4809..413758676 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -1308,6 +1308,28 @@ @article{Tall2000
   month   = {June}
 }
 
+@article{Hansen_Jagannathan_1991,
+  author  = {Hansen, Lars Peter and Jagannathan, Ravi},
+  title   = {Implications of Security Market Data for Models of Dynamic Economies},
+  journal = {Journal of Political Economy},
+  year    = {1991},
+  volume  = {99},
+  number  = {2},
+  pages   = {225--262},
+  doi     = {10.1086/261750}
+}
+
+@article{Weil_1989,
+  author  = {Weil, Philippe},
+  title   = {The Equity Premium Puzzle and the Risk-Free Rate Puzzle},
+  journal = {Journal of Monetary Economics},
+  year    = {1989},
+  volume  = {24},
+  number  = {3},
+  pages   = {401--421},
+  doi     = {10.1016/0304-3932(89)90028-7}
+}
+
 @book{Lucas1987,
   title     = {Models of business cycles},
   author    = {Lucas, Robert E},
diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index b6d9a1f65..b2580f4b4 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -26,187 +26,363 @@ kernelspec:
 :depth: 2
 ```
 
-## Overview
-
-Robert Lucas Jr. opened a 2003 essay with a challenge:
-
 > *No one has found risk aversion parameters of 50 or 100 in the diversification of
 > individual portfolios, in the level of insurance deductibles, in the wage premiums
 > associated with occupations with high earnings risk, or in the revenues raised by
-> state-operated lotteries.*
+> state-operated lotteries.* -- Robert Lucas Jr., January 10, 2003
+
+## Overview
+
+{cite:t}`Tall2000` showed that a recursive preference specification could match the equity premium and the risk-free rate puzzle simultaneously.
 
-Tallarini {cite}`Tallarini_2000` had shown that a recursive preference specification could match the equity premium and the risk-free rate puzzle simultaneously.
 But matching required setting the risk-aversion coefficient $\gamma$ to around 50 for a random-walk consumption model and around 75 for a trend-stationary model --- exactly the range that provoked Lucas's skepticism.
 
-{cite}`BHS_2009` ask whether those large $\gamma$ values really measure aversion to atemporal risk.
-Their answer is no.
-The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a
-second recursion in which the agent has unit risk aversion but fears that the probability model governing consumption growth may be wrong.
-Under this reading, the parameter that looked like extreme risk aversion instead measures
-the agent's concern about **model misspecification**.
+{cite:t}`BHS_2009` ask whether those large $\gamma$ values really measure aversion to atemporal risk, or whether they instead measure the agent's doubts about the underlying probability model.
+
+Their answer --- and the theme of this lecture --- is that much of what looks like "risk aversion" can be reinterpreted as **model uncertainty**.
+
+The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max–min recursion in which the agent has unit risk aversion but fears that the probability model governing consumption growth may be wrong.
+
+Under this reading, the parameter that looked like extreme risk aversion instead measures concern about **misspecification**.
+
+Rather than calibrating $\gamma$ through Pratt-style thought experiments about known gambles, we calibrate through a **detection-error probability**: the probability of confusing the agent's baseline (approximating) model with the pessimistic (worst-case) model after seeing a finite sample.
 
-The question then becomes: how much misspecification is plausible?
-Rather than calibrating $\gamma$ through Pratt-style thought experiments about known gambles,
-{cite}`BHS_2009` calibrate through a **detection-error probability** --- the chance of confusing the agent's baseline (approximating) model with the pessimistic (worst-case) model after seeing a finite sample.
 When detection-error probabilities are moderate, the implied $\gamma$ values are large enough to reach the Hansen--Jagannathan volatility bound.
 
-This reinterpretation changes the welfare question that asset prices answer.
-Large measured risk premia no longer imply large gains from smoothing known aggregate risk.
-Instead, they imply large gains from resolving model uncertainty --- a very different policy object.
+This reinterpretation changes the welfare question that asset prices answer: do large risk premia measure the benefits from reducing well-understood aggregate fluctuations, or the benefits from reducing doubts about the consumption-growth model?
+
+We start with the Hansen--Jagannathan bound, then specify the statistical environment, lay out four related preference specifications and their equivalences, and finally revisit Tallarini's calibration using detection-error probabilities.
+
+This lecture draws on the ideas and techniques appeared in
+
+- {ref}`Asset Pricing: Finite State Models <mass>` where we introduce stochastic discount factors.
+- {ref}`Likelihood Ratio Processes <likelihood_ratio_process>` where we develop the likelihood-ratio machinery that reappears here as the worst-case distortion $\hat g$.
+
+
+Before we start, we install a package that is not included in Anaconda by default
+
+```{code-cell} ipython3
+:tags: [hide-output]
+!pip install pandas-datareader
+```
+
+We use the following imports for the rest of this lecture
 
 ```{code-cell} ipython3
+import datetime as dt
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
+from pandas_datareader import data as web
 from scipy.stats import norm
+```
+
+We also set up calibration inputs and compute the covariance matrix of equity and risk-free returns from reported moments
+
+```{code-cell} ipython3
+β = 0.995
+T = 235  
+
+# Table 2 parameters
+rw = dict(μ=0.00495, σ_ε=0.0050)
+ts = dict(μ=0.00418, σ_ε=0.0050, ρ=0.980, ζ=-4.48)
+
+# Table 1 moments, converted from percent to decimals
+r_e_mean, r_e_std = 0.0227, 0.0768
+r_f_mean, r_f_std = 0.0032, 0.0061
+r_excess_std = 0.0767
+
+R_mean = np.array([1.0 + r_e_mean, 1.0 + r_f_mean])  
+cov_erf = (r_e_std**2 + r_f_std**2 - r_excess_std**2) / 2.0
+Σ_R = np.array(
+    [
+        [r_e_std**2, cov_erf],
+        [cov_erf, r_f_std**2],
+    ]
+)
+Σ_R_inv = np.linalg.inv(Σ_R)
+```
+
+## The equity premium and risk-free rate puzzles
 
-np.set_printoptions(precision=4, suppress=True)
+### Pricing kernel and the risk-free rate
 
+In this section, we review a few key concepts appeared in {ref}`Asset Pricing: Finite State Models <mass>`.
+
+A random variable $m_{t+1}$ is said to be a **stochastic discount factor** if it confirms the following equation for the time-$t$ price $p_t$ of a one-period payoff $y_{t+1}$:
+
+```{math}
+:label: bhs_pricing_eq
+p_t = E_t(m_{t+1}\, y_{t+1}),
+```
 
-def set_seed(seed=1234):
-    np.random.seed(seed)
+where $E_t$ denotes the mathematical expectation conditioned on date-$t$ information.
 
+For time-separable CRRA preferences with discount factor $\beta$ and coefficient of relative risk aversion $\gamma$, the marginal rate of substitution gives
 
-set_seed()
+```{math}
+:label: bhs_crra_sdf
+m_{t+1} = \beta \left(\frac{C_{t+1}}{C_t}\right)^{-\gamma},
 ```
 
-## The economic idea
+where $C_t$ is consumption.
 
-A representative consumer has a baseline probabilistic description of consumption growth.
-Call it the **approximating model**.
-The consumer does not fully trust this model.
-To formalize that distrust, she surrounds the approximating model with a set of nearby alternatives that are difficult to distinguish statistically in finite samples.
+Setting $y_{t+1} = 1$ (a risk-free bond) in {eq}`bhs_pricing_eq` yields the reciprocal of the gross one-period risk-free rate:
 
-Among those alternatives, a minimizing player inside the consumer's head selects a **worst-case model**.
-The resulting max-min problem generates a likelihood-ratio distortion $\hat g_{t+1}$ that tilts one-step-ahead probabilities toward adverse states.
-That distortion enters the stochastic discount factor alongside the usual intertemporal marginal rate of substitution, and its standard deviation is the **market price of model uncertainty** (MPU).
+```{math}
+:label: bhs_riskfree
+\frac{1}{R_t^f} = E_t[m_{t+1}] = E_t\!\left[\beta\left(\frac{C_{t+1}}{C_t}\right)^{-\gamma}\right].
+```
 
-The discipline on how much distortion is allowed comes not from introspection about
-willingness to pay for small known gambles, but from a statistical detection problem:
-given $T$ observations, how likely is a Bayesian to confuse the approximating model with the worst-case model?
-The answer is a **detection-error probability** $p(\theta^{-1})$.
-High $p$ means the two models are nearly indistinguishable and the consumer's fear of misspecification is hard to dismiss.
+### The Hansen--Jagannathan bound
 
-## Four agent types and one key equivalence
+Let $R_{t+1}^e$ denote the gross return on a risky asset (e.g., the market portfolio) and $R_{t+1}^f$ the gross return on a one-period risk-free bond.
 
-The analysis compares four preference specifications that are useful for different purposes.
+The **excess return** is
 
-* **Type I** (Kreps--Porteus--Epstein--Zin--Tallarini): risk-sensitive recursive utility with risk-aversion parameter $\gamma$ and IES fixed at 1.
-* **Type II** (multiplier preferences): unit risk aversion but a penalty parameter $\theta$ on the relative entropy of probability distortions.
-* **Type III** (constraint preferences): unit risk aversion with a hard bound $\eta$ on discounted relative entropy.
-* **Type IV** (pessimistic ex post Bayesian): log utility under a single pessimistic probability model $\hat\Pi_\infty$.
+$$
+\xi_{t+1} = R_{t+1}^e - R_{t+1}^f.
+$$
 
-The pivotal result is that **types I and II are observationally equivalent** over consumption plans in this environment.
-The mapping is $\theta = [(1-\beta)(\gamma - 1)]^{-1}$.
-So when Tallarini sets $\gamma = 50$ to reach the Hansen--Jagannathan bound, one can equally say
-the consumer has unit risk aversion and a model-uncertainty penalty $\theta$ that corresponds to a
-moderate detection-error probability.
-The quantitative fit is unchanged; only the economic interpretation shifts.
+An excess return is the payoff on a zero-cost portfolio that is long one dollar of the risky asset and short one dollar of the risk-free bond.
 
-## Setup
+Because the portfolio costs nothing to enter, its price is $p_t = 0$, so {eq}`bhs_pricing_eq` implies
 
-The calibration uses quarterly U.S. data from 1948:2--2006:4 for consumption **growth rates** (a sample length of $T = 235$ quarters).
-When we plot **levels** of log consumption (as in Fig. 6), we align the time index to 1948:1--2006:4, which yields $T+1 = 236$ quarterly observations.
-Parameter estimates for two consumption-growth specifications (random walk and trend stationary)
-come from Table 2 of {cite}`BHS_2009`, and asset-return moments come from their Table 1.
-Following footnote 8 in {cite}`BHS_2009`, consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
+$$
+0 = E_t[m_{t+1}\,\xi_{t+1}].
+$$
 
-### Data
+We can decompose the expectation of a product into a covariance plus a product of expectations:
 
-Most numerical inputs in this lecture are taken directly from {cite}`BHS_2009`.
-Table 2 provides the maximum-likelihood estimates of $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the two consumption-growth specifications, and Table 1 provides the asset-return moments used in the Hansen--Jagannathan bound calculation.
+$$
+E_t[m_{t+1}\,\xi_{t+1}]
+=
+\operatorname{cov}_t(m_{t+1},\,\xi_{t+1}) + E_t[m_{t+1}]\,E_t[\xi_{t+1}],
+$$
 
-Following footnote 8 of {cite}`BHS_2009`, consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
-We construct this measure from three [FRED](https://fred.stlouisfed.org) series:
+where $\operatorname{cov}_t$ denotes the conditional covariance and $\sigma_t$ will denote the conditional standard deviation.
+Setting the left-hand side to zero and solving for the expected excess return gives
 
-| FRED series | Description |
-| --- | --- |
-| `PCNDGC96` | Real PCE: nondurable goods (billions of chained 2017 \$, SAAR) |
-| `PCESVC96` | Real PCE: services (billions of chained 2017 \$, SAAR) |
-| `CNP16OV` | Civilian noninstitutional population, 16+ (thousands, monthly) |
+$$
+E_t[\xi_{t+1}] = -\frac{\operatorname{cov}_t(m_{t+1},\,\xi_{t+1})}{E_t[m_{t+1}]}.
+$$
 
-The BEA deflates each PCE component by its own chain-type price index internally.
-Summing the chained-dollar components introduces a small Fisher-index non-additivity error, but this is negligible for our purposes and avoids the larger error of deflating the ND+SV nominal aggregate by the *overall* PCE deflator (which includes durables with secularly declining prices).
+Taking absolute values and applying the **Cauchy--Schwarz inequality** $|\operatorname{cov}(X,Y)| \leq \sigma(X)\,\sigma(Y)$ yields
 
-The processing pipeline is:
+```{math}
+:label: bhs_hj_bound
+\frac{|E_t[\xi_{t+1}]|}{\sigma_t(\xi_{t+1})}
+\;\leq\;
+\frac{\sigma_t(m_{t+1})}{E_t[m_{t+1}]}.
+```
 
-1. Add real nondurables and services: $C_t^{real} = C_t^{nd} + C_t^{sv}$.
-2. Convert to per-capita (millions of dollars per person): divide by the quarterly average of the monthly population series and by $10^6$.
-   This normalization matches the units in {cite}`BHS_2009`, where the trend-stationary intercept is $\zeta = -4.48$.
-3. Compute log consumption: $c_t = \log C_t^{real,pc}$.
+The left-hand side of {eq}`bhs_hj_bound` is the **Sharpe ratio**: the expected excess return per unit of return volatility.
+
+The right-hand side, $\sigma_t(m)/E_t(m)$, is the **market price of risk**: the maximum Sharpe ratio attainable in the market. 
+
+The bound says that the Sharpe ratio of any asset cannot exceed the market price of risk.
+
+#### Unconditional version
+
+The bound {eq}`bhs_hj_bound` is stated in conditional terms.
+An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$.
+{ref}`Exercise 1 <dov_ex1>` asks you to derive
+
+```{math}
+:label: bhs_hj_unconditional
+\frac{\sigma(m)}{E(m)}
+\;\ge\;
+\sqrt{b^\top \Sigma_R^{-1} b},
+\qquad
+b = \mathbb{1} - E(m)\, E(R).
+```
 
 ```{code-cell} ipython3
-start_date = "1947-01-01"
-end_date = "2007-01-01"
-
-
-def _fetch_fred_series(series_id, start_date, end_date):
-    # Keyless pull from FRED CSV endpoint.
-    url = f"https://fred.stlouisfed.org/graph/fredgraph.csv?id={series_id}"
-    df = pd.read_csv(url)
-    if df.empty:
-        raise ValueError(f"FRED returned an empty table for '{series_id}'")
-
-    # Be robust to header variations (e.g., DATE vs date, BOM, whitespace).
-    df.columns = [str(c).strip().lstrip("\ufeff") for c in df.columns]
-    date_col = df.columns[0]
-    value_col = series_id if series_id in df.columns else (df.columns[1] if len(df.columns) > 1 else None)
-    if value_col is None:
-        raise ValueError(
-            f"Unexpected FRED CSV format for '{series_id}'. Columns: {list(df.columns)}"
-        )
-
-    dates = pd.to_datetime(df[date_col], errors="coerce")
-    values = pd.to_numeric(df[value_col], errors="coerce")
-    out = pd.Series(values.to_numpy(), index=dates, name=series_id).dropna().sort_index()
-    out = out.loc[start_date:end_date].dropna()
-    if out.empty:
-        raise ValueError(f"FRED series '{series_id}' returned no data in sample window")
-    return out
+def hj_std_bound(E_m):
+    b = np.ones(2) - E_m * R_mean
+    var_lb = b @ Σ_R_inv @ b
+    return np.sqrt(np.maximum(var_lb, 0.0))
+```
 
+### The puzzle
 
-# Fetch real PCE components and population from FRED (no API key required)
-real_nd = _fetch_fred_series("PCNDGC96", start_date, end_date)
-real_sv = _fetch_fred_series("PCESVC96", start_date, end_date)
-pop_m = _fetch_fred_series("CNP16OV", start_date, end_date)
+To reconcile formula {eq}`bhs_crra_sdf` with measures of the market price of risk extracted from data on asset returns and prices (like those in Table 1 below) requires a value of $\gamma$ so high that it provokes skepticism --- this is the **equity premium puzzle**.
 
-# Step 1: aggregate real nondurables + services
-real_total = real_nd + real_sv
+But the puzzle has a second dimension.
 
-# Step 2: align to quarterly frequency first, then convert to per-capita
-# real_total is in billions ($1e9), pop is in thousands ($1e3)
-# per-capita in millions: real_total * 1e9 / (pop * 1e3) / 1e6 = real_total / pop
-real_total_q = real_total.resample("QS").mean()
-pop_q = pop_m.resample("QS").mean()
-real_pc = (real_total_q / pop_q).dropna()
+High values of $\gamma$ that deliver enough volatility $\sigma(m)$ also push the reciprocal of the risk-free rate $E(m)$ down, and therefore away from the Hansen--Jagannathan bounds.
 
-# Restrict to sample period 1948Q1–2006Q4
-real_pc = real_pc.loc["1948-01-01":"2006-12-31"].dropna()
+This is the **risk-free rate puzzle** of {cite:t}`Weil_1989`.
 
-# FRED-only fallback: use BEA per-capita quarterly components directly.
-# This avoids index-alignment failures in some pandas/FRED combinations.
-if real_pc.empty:
-    nd_pc = _fetch_fred_series("A796RX0Q048SBEA", start_date, end_date)
-    sv_pc = _fetch_fred_series("A797RX0Q048SBEA", start_date, end_date)
-    real_pc = ((nd_pc + sv_pc) / 1e6).loc["1948-01-01":"2006-12-31"].dropna()
+{cite:t}`Tall2000` showed that recursive preferences with IES $= 1$ can clear the HJ bar while avoiding the risk-free rate puzzle.
 
-if real_pc.empty:
-    raise RuntimeError("FRED returned no usable observations after alignment/filtering")
+### Deriving SDF moments under recursive preferences
 
-# Step 3: log consumption
-log_c_data = np.log(real_pc.to_numpy(dtype=float).reshape(-1))
-years_data = (real_pc.index.year + (real_pc.index.month - 1) / 12.0).to_numpy(dtype=float)
+The figure below reproduces Tallarini's key diagnostic.
 
-print(f"Fetched {len(log_c_data)} quarterly observations from FRED")
-print(f"Sample: {years_data[0]:.1f} – {years_data[-1] + 0.25:.1f}")
-print(f"Observations: {len(log_c_data)}")
-print(f"c_0 = {log_c_data[0]:.3f} (paper Fig 6: ≈ −4.6)")
+For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m)/E(m))$ pair for three specifications: time-separable CRRA (crosses), recursive preferences with random-walk consumption (circles), and recursive preferences with trend-stationary consumption (pluses).
+
+For the two consumption specifications, we can derive closed-form expressions for the unconditional SDF moments under recursive preferences.
+
+Under recursive preferences with IES $= 1$, the SDF has the form (derived later in {eq}`bhs_sdf`)
+
+$$
+m_{t+1} = \beta \frac{C_t}{C_{t+1}} \cdot \hat{g}_{t+1},
+$$
+
+where $\hat{g}_{t+1}$ is a likelihood-ratio distortion from the continuation value.
+
+For the random-walk model with $c_{t+1} - c_t = \mu + \sigma_\varepsilon \varepsilon_{t+1}$ and $\varepsilon_{t+1} \sim \mathcal{N}(0,1)$, the distortion is a Gaussian mean shift $w = -\sigma_\varepsilon(\gamma - 1)$, and $\log m_{t+1}$ turns out to be normally distributed:
+
+$$
+\log m_{t+1} = \log\beta - \mu - \tfrac{1}{2}w^2 + (w - \sigma_\varepsilon)\varepsilon_{t+1}.
+$$
+
+Its mean and variance are
+
+$$
+E[\log m] = \log\beta - \mu - \tfrac{1}{2}w^2,
+\qquad
+\operatorname{Var}(\log m) = (w - \sigma_\varepsilon)^2 = \sigma_\varepsilon^2 \gamma^2.
+$$
+
+For a lognormal random variable, $E[m] = \exp(E[\log m] + \tfrac{1}{2}\operatorname{Var}(\log m))$ and $\sigma(m)/E[m] = \sqrt{e^{\operatorname{Var}(\log m)} - 1}$.
+
+Substituting gives the following closed-form expressions ({ref}`Exercise 2 <dov_ex2>` asks you to work through the full derivation):
+
+- *Random walk*:
+
+```{math}
+:label: bhs_Em_rw
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
+```
+
+```{math}
+:label: bhs_sigma_rw
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
+```
+
+Notice that in {eq}`bhs_Em_rw`, because $\sigma_\varepsilon$ is small ($\approx 0.005$), the term $\frac{\sigma_\varepsilon^2}{2}(2\gamma-1)$ grows slowly with $\gamma$, keeping $E[m]$ roughly constant near $1/(1+r^f)$.
+
+Meanwhile {eq}`bhs_sigma_rw` shows that $\sigma(m)/E[m] \approx \sigma_\varepsilon \gamma$ grows linearly with $\gamma$.
+
+This is how recursive preferences push volatility toward the HJ bound without distorting the risk-free rate.
+
+An analogous calculation for the trend-stationary model yields:
+
+- *Trend stationary*:
+
+```{math}
+:label: bhs_Em_ts
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
+```
+
+```{math}
+:label: bhs_sigma_ts
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
+```
+
+The code below implements these expressions (and the corresponding CRRA moments) to draw Tallarini's figure.
+
+```{code-cell} ipython3
+def moments_type1_rw(γ):
+    μ, σ = rw["μ"], rw["σ_ε"]
+    E_m = β * np.exp(-μ + 0.5 * σ**2 * (2.0 * γ - 1.0))
+    var_log_m = (σ * γ) ** 2
+    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
+    return E_m, mpr
+
+
+def moments_type1_ts(γ):
+    μ, σ, ρ = ts["μ"], ts["σ_ε"], ts["ρ"]
+    mean_term = 1.0 - (2.0 * (1.0 - β) * (1.0 - γ)) / (1.0 - β * ρ) \
+                + (1.0 - ρ) / (1.0 + ρ)
+    E_m = β * np.exp(-μ + 0.5 * σ**2 * mean_term)
+    var_term = (((1.0 - β) * (1.0 - γ)) / (1.0 - β * ρ) - 1.0) ** 2 \
+                + (1.0 - ρ) / (1.0 + ρ)
+    var_log_m = σ**2 * var_term
+    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
+    return E_m, mpr
+
+
+def moments_crra_rw(γ):
+    μ, σ = rw["μ"], rw["σ_ε"]
+    var_log_m = (γ * σ) ** 2
+    mean_log_m = np.log(β) - γ * μ
+    E_m = np.exp(mean_log_m + 0.5 * var_log_m)
+    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
+    return E_m, mpr
+```
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: SDF moments and Hansen-Jagannathan bound
+    name: fig-bhs-1
+---
+γ_grid = np.array([1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50], dtype=float)
+
+Em_rw = np.array([moments_type1_rw(γ)[0] for γ in γ_grid])
+MPR_rw = np.array([moments_type1_rw(γ)[1] for γ in γ_grid])
+
+Em_ts = np.array([moments_type1_ts(γ)[0] for γ in γ_grid])
+MPR_ts = np.array([moments_type1_ts(γ)[1] for γ in γ_grid])
+
+Em_crra = np.array([moments_crra_rw(γ)[0] for γ in γ_grid])
+MPR_crra = np.array([moments_crra_rw(γ)[1] for γ in γ_grid])
+
+Em_grid = np.linspace(0.8, 1.01, 1000)
+HJ_std = np.array([hj_std_bound(x) for x in Em_grid])
+
+fig, ax = plt.subplots(figsize=(7, 5))
+ax.plot(Em_grid, HJ_std, lw=2, color="black", 
+                            label="Hansen-Jagannathan bound")
+ax.plot(Em_rw, MPR_rw, "o", lw=2, 
+                            label="recursive, random walk")
+ax.plot(Em_ts, MPR_ts, "+", lw=2, 
+                            label="recursive, trend stationary")
+ax.plot(Em_crra, MPR_crra, "x", lw=2, 
+                            label="time-separable CRRA")
+
+ax.set_xlabel(r"$E(m)$")
+ax.set_ylabel(r"$\sigma(m)/E(m)$")
+ax.legend(frameon=False)
+ax.set_xlim(0.8, 1.01)
+ax.set_ylim(0.0, 0.42)
+
+plt.tight_layout()
+plt.show()
 ```
 
-### Consumption plans and the state-space representation
+The crosses trace the familiar CRRA failure: as $\gamma$ rises, $\sigma(m)/E(m)$ grows but $E(m)$ falls well below the range consistent with the observed risk-free rate.
+
+This is the risk-free-rate puzzle of {cite:t}`Weil_1989`.
+
+The circles and pluses show Tallarini's solution.
+
+Recursive utility with IES $= 1$ pushes volatility upward while keeping $E(m)$ roughly constant near $1/(1+r^f)$.
+
+For the random-walk model, the bound is reached around $\gamma = 50$; for the trend-stationary model, around $\gamma = 75$.
+
+The quantitative achievement is real.
+
+But Lucas's challenge still stands: what microeconomic evidence supports $\gamma = 50$?
+
+To make the reinterpretation precise, we now lay out the statistical environment and the preference specifications.
+
+## The choice setting
+
+The calibration uses quarterly U.S. data from 1948Q2--2006Q4 for consumption growth rates (a sample length of $T = 235$ quarters).
+
+We consider two consumption-growth specifications (random walk and trend stationary) with parameter estimates and asset-return moments from {cite}`BHS_2009`.
+
+### Shocks and consumption plans
+
+We cast the analysis in terms of a general class of consumption plans.
 
-{cite}`BHS_2009` cast the analysis in terms of a general class of consumption plans.
 Let $x_t$ be an $n \times 1$ state vector and $\varepsilon_{t+1}$ an $m \times 1$ shock.
+
 A consumption plan belongs to the set $\mathcal{C}(A, B, H; x_0)$ if it admits the recursive representation
 
 ```{math}
@@ -217,13 +393,15 @@ c_t = H x_t,
 ```
 
 where the eigenvalues of $A$ are bounded in modulus by $1/\sqrt{\beta}$.
+
 The time-$t$ element of a consumption plan can therefore be written as
 
 ```{math}
 c_t = H\!\left(B\varepsilon_t + AB\varepsilon_{t-1} + \cdots + A^{t-1}B\varepsilon_1\right) + HA^t x_0.
 ```
 
-The equivalence theorems and Bellman equations in the paper are stated for arbitrary plans in $\mathcal{C}(A,B,H;x_0)$.
+The equivalence theorems and Bellman equations below hold for arbitrary plans in $\mathcal{C}(A,B,H;x_0)$.
+
 The random-walk and trend-stationary models below are two special cases.
 
 ### Consumption dynamics
@@ -236,7 +414,15 @@ The random-walk specification is
 c_{t+1} = c_t + \mu + \sigma_\varepsilon \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \sim \mathcal{N}(0, 1).
 ```
 
-The trend-stationary specification can be written as a deterministic trend plus a stationary AR(1) component {cite}`BHS_2009`:
+Iterating forward yields
+
+```{math}
+c_t = c_0 + t\mu + \sigma_\varepsilon(\varepsilon_t + \varepsilon_{t-1} + \cdots + \varepsilon_1),
+\qquad
+t \ge 1.
+```
+
+The trend-stationary specification can be written as a deterministic trend plus a stationary AR(1) component:
 
 ```{math}
 c_t = \zeta + \mu t + z_t,
@@ -246,38 +432,27 @@ z_{t+1} = \rho z_t + \sigma_\varepsilon \varepsilon_{t+1},
 \varepsilon_{t+1} \sim \mathcal{N}(0, 1).
 ```
 
+With $z_0 = c_0 - \zeta$, this implies the explicit representation
+
+```{math}
+c_t
+=
+\rho^t c_0 + \mu t + (1-\rho^t)\zeta
++
+\sigma_\varepsilon(\varepsilon_t + \rho \varepsilon_{t-1} + \cdots + \rho^{t-1}\varepsilon_1),
+\qquad
+t \ge 1.
+```
+
 Equivalently, defining the detrended series $\tilde c_t := c_t - \mu t$,
 
 ```{math}
 \tilde c_{t+1} - \zeta = \rho(\tilde c_t - \zeta) + \sigma_\varepsilon \varepsilon_{t+1}.
 ```
 
-Table 2 in {cite}`BHS_2009` reports $(\mu, \sigma_\varepsilon)$ for the random walk and $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the trend-stationary case.
+The estimated parameters are $(\mu, \sigma_\varepsilon)$ for the random walk and $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the trend-stationary case.
 
 ```{code-cell} ipython3
-# Preferences and sample length
-β = 0.995
-T = 235  # quarterly sample length used in the paper
-
-# Table 2 parameters
-rw = dict(μ=0.00495, σ_ε=0.0050)
-ts = dict(μ=0.00418, σ_ε=0.0050, ρ=0.980, ζ=-4.48)
-
-# Table 1 moments, converted from percent to decimals
-r_e_mean, r_e_std = 0.0227, 0.0768
-r_f_mean, r_f_std = 0.0032, 0.0061
-r_excess_std = 0.0767
-
-R_mean = np.array([1.0 + r_e_mean, 1.0 + r_f_mean])  # gross returns
-cov_erf = (r_e_std**2 + r_f_std**2 - r_excess_std**2) / 2.0
-Σ_R = np.array(
-    [
-        [r_e_std**2, cov_erf],
-        [cov_erf, r_f_std**2],
-    ]
-)
-Σ_R_inv = np.linalg.inv(Σ_R)
-
 print("Table 2 parameters")
 print(f"random walk: μ={rw['μ']:.5f}, σ_ε={rw['σ_ε']:.5f}")
 print(
@@ -291,24 +466,53 @@ print(f"E[r_f]={r_f_mean:.4f}, std[r_f]={r_f_std:.4f}")
 print(f"std[r_e-r_f]={r_excess_std:.4f}")
 ```
 
-We can verify Table 2 by computing sample moments of log consumption growth from our FRED data:
+### Overview of agents I, II, III, and IV
 
-```{code-cell} ipython3
-# Growth rates: 1948Q2 to 2006Q4 (T = 235 quarters)
-Δc = np.diff(log_c_data)
+We compare four preference specifications over consumption plans $C^\infty \in \mathcal{C}$.
 
-μ_hat = Δc.mean()
-σ_hat = Δc.std(ddof=1)
+**Type I agent (Kreps--Porteus--Epstein--Zin--Tallarini)** with
+- a discount factor $\beta \in (0,1)$;
+- an intertemporal elasticity of substitution fixed at $1$;
+- a risk-aversion parameter $\gamma \ge 1$; and
+- an approximating conditional density $\pi(\cdot)$ for shocks and its implied joint distribution $\Pi_\infty(\cdot \mid x_0)$.
 
-print("Sample estimates from FRED data vs Table 2:")
-print(f"  μ̂   = {μ_hat:.5f}   (Table 2 RW: {rw['μ']:.5f})")
-print(f"  σ̂_ε = {σ_hat:.4f}    (Table 2: {rw['σ_ε']:.4f})")
-print(f"  T   = {len(Δc)} quarters")
-```
+**Type II agent (multiplier preferences)** with
+- $\beta \in (0,1)$;
+- IES $=1$;
+- unit risk aversion;
+- an approximating model $\Pi_\infty(\cdot \mid x_0)$; and
+- a penalty parameter $\theta > 0$ that discourages probability distortions using relative entropy.
+
+**Type III agent (constraint preferences)** with
+- $\beta \in (0,1)$;
+- IES $=1$;
+- unit risk aversion;
+- an approximating model $\Pi_\infty(\cdot \mid x_0)$; and
+- a bound $\eta$ on discounted relative entropy.
+
+**Type IV agent (pessimistic ex post Bayesian)** with
+- $\beta \in (0,1)$;
+- IES $=1$;
+- unit risk aversion; and
+- a single pessimistic joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ induced by the type II worst-case distortion.
+
+Two equivalence results organize the analysis.
+
+Types I and II are observationally equivalent in the strong sense that they have identical preferences over $\mathcal{C}$ (once parameters are mapped appropriately).
+
+Types III and IV are observationally equivalent in a weaker but still useful sense: for the particular endowment process taken as given, they deliver the same worst-case pricing implications as a type II agent (for the $\theta$ that implements the entropy constraint).
 
 ## Preferences, distortions, and detection
 
-The type I recursion is
+We now formalize each of the four agent types and develop the equivalence results that connect them.
+
+We begin with the type I (Kreps--Porteus--Epstein--Zin--Tallarini) agent, whose preferences are defined by a recursion over certainty equivalents, then show how a change of variables converts it into a risk-sensitive recursion that is observationally equivalent to the type II agent's max--min problem.
+
+Along the way we introduce the likelihood-ratio distortion that appears in the stochastic discount factor and develop the detection-error probability that will serve as our new calibration language.
+
+### The transformed continuation value
+
+The type I (Kreps--Porteus--Epstein--Zin--Tallarini) recursion with IES $= 1$ and risk-aversion parameter $\gamma$ is
 
 ```{math}
 :label: bhs_type1_recursion
@@ -320,9 +524,7 @@ The type I recursion is
 \log E_t\left[(V_{t+1})^{1-\gamma}\right].
 ```
 
-### The transformed continuation value
-
-A key intermediate step in {cite}`BHS_2009` is to define the transformed continuation value
+A key intermediate step is to define the transformed continuation value
 
 ```{math}
 :label: bhs_Ut_def
@@ -336,15 +538,14 @@ and the robustness parameter
 \theta = \frac{-1}{(1-\beta)(1-\gamma)}.
 ```
 
-Substituting into {eq}`bhs_type1_recursion` yields the **risk-sensitive recursion**
+Substituting into {eq}`bhs_type1_recursion` yields the **risk-sensitive recursion** ({ref}`Exercise 3 <dov_ex3>` asks you to verify this step)
 
 ```{math}
 :label: bhs_risk_sensitive
 U_t = c_t - \beta\theta \log E_t\!\left[\exp\!\left(\frac{-U_{t+1}}{\theta}\right)\right].
 ```
 
-When $\gamma = 1$ (equivalently $\theta = +\infty$), the $\log E \exp$ term reduces to $E_t U_{t+1}$
-and the recursion becomes standard discounted expected log utility: $U_t = c_t + \beta E_t U_{t+1}$.
+When $\gamma = 1$ (equivalently $\theta = +\infty$), the $\log E \exp$ term reduces to $E_t U_{t+1}$ and the recursion becomes standard discounted expected log utility: $U_t = c_t + \beta E_t U_{t+1}$.
 
 For consumption plans in $\mathcal{C}(A, B, H; x_0)$, the recursion {eq}`bhs_risk_sensitive` implies the Bellman equation
 
@@ -357,7 +558,7 @@ The stochastic discount factor can then be written as
 
 ```{math}
 :label: bhs_sdf_Ut
-m_{t+1,t}
+m_{t+1}
 =
 \beta \frac{C_t}{C_{t+1}}
 \cdot
@@ -368,7 +569,8 @@ The second factor is the likelihood-ratio distortion $\hat g_{t+1}$: an exponent
 
 ### Martingale likelihood ratios
 
-To formalize model distortions, {cite}`BHS_2009` use a nonnegative martingale $G_t$ with $E(G_t \mid x_0) = 1$ as a Radon--Nikodym derivative.
+To formalize model distortions, we use a nonnegative martingale $G_t$ with $E(G_t \mid x_0) = 1$ as a Radon--Nikodym derivative.
+
 Its one-step increments
 
 ```{math}
@@ -382,6 +584,7 @@ G_0 = 1,
 ```
 
 define distorted conditional expectations: $\tilde E_t[b_{t+1}] = E_t[g_{t+1}\,b_{t+1}]$.
+
 The conditional relative entropy of the distortion is $E_t[g_{t+1}\log g_{t+1}]$, and the discounted entropy over the entire path is $\beta E\bigl[\sum_{t=0}^{\infty} \beta^t G_t\,E_t(g_{t+1}\log g_{t+1})\,\big|\,x_0\bigr]$.
 
 ### Type II: multiplier preferences
@@ -397,6 +600,7 @@ A type II agent's **multiplier** preference ordering over consumption plans $C^\
 ```
 
 where $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \ge 0$, and $G_0 = 1$.
+
 The parameter $\theta > 0$ penalizes the relative entropy of probability distortions.
 
 The value function satisfies the Bellman equation
@@ -411,9 +615,10 @@ c + \min_{g(\varepsilon) \ge 0}\;
 ```
 
 subject to $\int g(\varepsilon)\,\pi(\varepsilon)\,d\varepsilon = 1$.
+
 Note that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty --- this is the key structural feature that makes $\hat g$ a likelihood ratio.
 
-The minimizer is
+The minimizer is ({ref}`Exercise 4 <dov_ex4>` derives this and verifies the equivalence $W \equiv U$)
 
 ```{math}
 :label: bhs_ghat
@@ -427,7 +632,9 @@ Substituting {eq}`bhs_ghat` back into {eq}`bhs_bellman_type2` gives
 $$W(x) = c - \beta\theta \log \int \exp\!\left[\frac{-W(Ax + B\varepsilon)}{\theta}\right]\pi(\varepsilon)\,d\varepsilon,$$
 
 which is identical to {eq}`bhs_bellman_type1`.
+
 Therefore $W(x) \equiv U(x)$, establishing that **types I and II are observationally equivalent** over elements of $\mathcal{C}(A,B,H;x_0)$.
+
 The mapping between parameters is
 
 ```{math}
@@ -450,8 +657,8 @@ def γ_from_θ(θ, β=β):
 ### Type III: constraint preferences
 
 Type III (constraint) preferences replace the entropy penalty with a hard bound.
-The agent minimizes expected discounted log consumption under the worst-case model,
-subject to a cap $\eta$ on discounted relative entropy:
+
+The agent minimizes expected discounted log consumption under the worst-case model, subject to a cap $\eta$ on discounted relative entropy:
 
 ```{math}
 J(x_0)
@@ -466,9 +673,7 @@ subject to $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \ge 0$, $G_0 = 1
 \beta E\!\left[\sum_{t=0}^{\infty} \beta^t G_t\,E_t\!\left(g_{t+1}\log g_{t+1}\right)\,\Big|\,x_0\right] \le \eta.
 ```
 
-The Lagrange multiplier on the entropy constraint is $\theta$, which connects type III to type II:
-for the particular $A, B, H$ and $\theta$ used to derive the worst-case joint distribution $\hat\Pi_\infty$,
-the shadow prices of uncertain claims for a type III agent match those of a type II agent.
+The Lagrange multiplier on the entropy constraint is $\theta$, which connects type III to type II: for the particular $A, B, H$ and $\theta$ used to derive the worst-case joint distribution $\hat\Pi_\infty$, the shadow prices of uncertain claims for a type III agent match those of a type II agent.
 
 ### Type IV: ex post Bayesian
 
@@ -479,6 +684,7 @@ Type IV is an ordinary expected-utility agent with log preferences evaluated und
 ```
 
 The joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ is the one associated with the type II agent's worst-case distortion.
+
 For the particular $A, B, H$ and $\theta$ used to construct $\hat\Pi_\infty$, the type IV value function equals $J(x)$ from type III.
 
 ### Stochastic discount factor
@@ -487,7 +693,7 @@ Across all four types, the stochastic discount factor can be written compactly a
 
 ```{math}
 :label: bhs_sdf
-m_{t+1,t}
+m_{t+1}
 =
 \beta \frac{C_t}{C_{t+1}} \hat g_{t+1}.
 ```
@@ -495,17 +701,19 @@ m_{t+1,t}
 The distortion $\hat g_{t+1}$ is a likelihood ratio between the approximating and worst-case one-step models.
 
 With log utility, $C_t/C_{t+1} = \exp(-(c_{t+1}-c_t))$ is the usual intertemporal marginal rate of substitution.
+
 Robustness multiplies that term by $\hat g_{t+1}$, so uncertainty aversion enters pricing only through the distortion.
 
 ### Gaussian mean-shift distortions
 
 Under the random-walk model, the shock is $\varepsilon_{t+1} \sim \mathcal{N}(0, 1)$.
-The worst-case model shifts its mean to $-w$, which implies the likelihood ratio
+
+The worst-case model shifts its mean to $w$ (which will be negative under our calibrations), which implies the likelihood ratio ({ref}`Exercise 5 <dov_ex5>` verifies the properties of this distortion)
 
 ```{math}
 \hat g_{t+1}
 =
-\exp\left(-w \varepsilon_{t+1} - \frac{1}{2}w^2\right),
+\exp\left(w \varepsilon_{t+1} - \frac{1}{2}w^2\right),
 \qquad
 E_t[\hat g_{t+1}] = 1.
 ```
@@ -557,7 +765,7 @@ p(\theta^{-1})
 \frac{1}{2}\left(p_A + p_B\right),
 ```
 
-and in our Gaussian mean-shift case reduces to
+and in our Gaussian mean-shift case reduces to ({ref}`Exercise 6 <dov_ex6>` derives this closed form)
 
 ```{math}
 :label: bhs_detection_closed
@@ -583,7 +791,10 @@ def θ_from_detection_probability(p, model):
 
 ### Likelihood-ratio testing and detection errors
 
+The likelihood-ratio machinery used here connects to several other lectures: {ref}`Likelihood Ratio Processes <likelihood_ratio_process>` develops the properties of likelihood ratios in detail, {ref}`Heterogeneous Beliefs and Financial Markets <likelihood_ratio_process_2>` applies them to asset pricing with disagreement, and {ref}`A Problem that Stumped Milton Friedman <wald_friedman>` uses sequential likelihood-ratio tests in a closely related decision problem.
+
 Let $L_T$ be the log likelihood ratio between the worst-case and approximating models based on a sample of length $T$.
+
 Define
 
 ```{math}
@@ -593,6 +804,7 @@ p_B = \Pr_B(L_T > 0),
 ```
 
 where $\Pr_A$ and $\Pr_B$ denote probabilities under the approximating and worst-case models.
+
 Then $p(\theta^{-1}) = \frac{1}{2}(p_A + p_B)$ is the average probability of choosing the wrong model.
 
 In the Gaussian mean-shift setting, $L_T$ is normal with mean $\pm \tfrac{1}{2}w^2T$ and variance $w^2T$, which yields the closed-form expression above.
@@ -637,170 +849,35 @@ def η_from_θ(θ, model):
 
 This is the mapping behind the right panel of the detection-probability figure below.
 
-## Tallarini's success and its cost
-
-Hansen and Jagannathan {cite}`Hansen_Jagannathan_1991` showed that any valid stochastic discount factor $m_{t+1,t}$ must satisfy a volatility bound: $\sigma(m)/E(m)$ must be at least as large as the maximum Sharpe ratio attainable in the market.
-Using postwar U.S. returns on the value-weighted NYSE and Treasury bills, this bound sets a
-high bar that time-separable CRRA preferences struggle to clear without also distorting the
-risk-free rate.
-
-In terms of the vector of gross returns $R_{t+1}$ with mean $E(R)$ and covariance matrix $\Sigma_R$,
-the bound can be written as
-
-```{math}
-\frac{\sigma(m)}{E(m)}
-\;\ge\;
-\sqrt{b^\top \Sigma_R^{-1} b},
-\qquad
-b = \mathbf{1} - E(m) E(R).
-```
-
-```{code-cell} ipython3
-def hj_std_bound(E_m):
-    b = np.ones(2) - E_m * R_mean
-    var_lb = b @ Σ_R_inv @ b
-    return np.sqrt(np.maximum(var_lb, 0.0))
-```
-
-Tallarini {cite}`Tallarini_2000` showed that recursive preferences with IES $= 1$ can clear this bar.
-By separating risk aversion $\gamma$ from the IES, the recursion pushes $\sigma(m)/E(m)$ upward
-while leaving $E(m)$ roughly consistent with the observed risk-free rate.
-
-For the two consumption specifications, {cite}`BHS_2009` derive closed-form expressions for the unconditional SDF moments.
-
-**Random walk** (eqs 15--16 of the paper):
-
-```{math}
-:label: bhs_Em_rw
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
-```
-
-```{math}
-:label: bhs_sigma_rw
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
-```
-
-**Trend stationary** (eqs 17--18):
-
-```{math}
-:label: bhs_Em_ts
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
-```
-
-```{math}
-:label: bhs_sigma_ts
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
-```
-
-These are what the code below implements.
-
-The figure below makes this visible.
-For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m)/E(m))$ pair
-for three specifications: time-separable CRRA (crosses), type I recursive preferences with random-walk consumption (circles), and type I recursive preferences with trend-stationary consumption (pluses).
-
-```{code-cell} ipython3
-def moments_type1_rw(γ):
-    θ = θ_from_γ(γ)
-    w = w_from_θ(θ, "rw")
-    var_log_m = (w - rw["σ_ε"]) ** 2
-    mean_log_m = np.log(β) - rw["μ"] - 0.5 * w**2
-    E_m = np.exp(mean_log_m + 0.5 * var_log_m)
-    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
-    return E_m, mpr
-
-
-def moments_type1_ts(γ):
-    θ = θ_from_γ(γ)
-    w = w_from_θ(θ, "ts")
-    var_z = ts["σ_ε"] ** 2 / (1.0 - ts["ρ"] ** 2)
-    var_log_m = (1.0 - ts["ρ"]) ** 2 * var_z + (w - ts["σ_ε"]) ** 2
-    mean_log_m = np.log(β) - ts["μ"] - 0.5 * w**2
-    E_m = np.exp(mean_log_m + 0.5 * var_log_m)
-    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
-    return E_m, mpr
-
-
-def moments_crra_rw(γ):
-    var_log_m = (γ * rw["σ_ε"]) ** 2
-    mean_log_m = np.log(β) - γ * rw["μ"]
-    E_m = np.exp(mean_log_m + 0.5 * var_log_m)
-    mpr = np.sqrt(np.exp(var_log_m) - 1.0)
-    return E_m, mpr
-```
-
-```{code-cell} ipython3
----
-mystnb:
-  figure:
-    caption: stochastic discount factor moments and the Hansen-Jagannathan volatility
-      bound
-    name: fig-bhs-1
----
-γ_grid = np.array([1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50], dtype=float)
-
-Em_rw = np.array([moments_type1_rw(γ)[0] for γ in γ_grid])
-MPR_rw = np.array([moments_type1_rw(γ)[1] for γ in γ_grid])
-
-Em_ts = np.array([moments_type1_ts(γ)[0] for γ in γ_grid])
-MPR_ts = np.array([moments_type1_ts(γ)[1] for γ in γ_grid])
-
-Em_crra = np.array([moments_crra_rw(γ)[0] for γ in γ_grid])
-MPR_crra = np.array([moments_crra_rw(γ)[1] for γ in γ_grid])
-
-Em_grid = np.linspace(0.8, 1.01, 1000)
-HJ_std = np.array([hj_std_bound(x) for x in Em_grid])
-
-fig, ax = plt.subplots(figsize=(7, 5))
-ax.plot(Em_grid, HJ_std, lw=2, color="black", label="Hansen-Jagannathan bound")
-ax.plot(Em_rw, MPR_rw, "o", lw=2, label="type I, random walk")
-ax.plot(Em_ts, MPR_ts, "+", lw=2, label="type I, trend stationary")
-ax.plot(Em_crra, MPR_crra, "x", lw=2, label="time-separable CRRA")
-
-ax.set_xlabel(r"$E(m)$")
-ax.set_ylabel(r"$\sigma(m)/E(m)$")
-ax.legend(frameon=False)
-ax.set_xlim(0.8, 1.01)
-ax.set_ylim(0.0, 0.42)
-
-plt.tight_layout()
-plt.show()
-```
-
-The crosses trace the familiar CRRA failure: as $\gamma$ rises, $\sigma(m)/E(m)$ grows but $E(m)$ falls well below the range consistent with the observed risk-free rate.
-This is the risk-free-rate puzzle of Weil {cite}`Weil_1989`.
-
-The circles and pluses show Tallarini's solution.
-Recursive utility with IES $= 1$ pushes volatility upward while keeping $E(m)$ roughly constant near $1/(1+r^f)$.
-For the random-walk model, the bound is reached around $\gamma = 50$; for the trend-stationary model, around $\gamma = 75$.
-
-The quantitative achievement is real.
-But Lucas's challenge still stands: what microeconomic evidence supports $\gamma = 50$?
-That tension is the starting point for the reinterpretation that follows.
-
 ## A new calibration language: detection-error probabilities
 
 If $\gamma$ should not be calibrated by introspection about atemporal gambles, what replaces it?
 
 The answer is a statistical test.
+
 Fix a sample size $T$ (here 235 quarters, matching the postwar U.S. data).
-For a given $\theta$, compute the worst-case model and ask:
-if a Bayesian ran a likelihood-ratio test to distinguish the approximating model from the worst-case model, what fraction of the time would she make an error?
+
+For a given $\theta$, compute the worst-case model and ask: if a Bayesian ran a likelihood-ratio test to distinguish the approximating model from the worst-case model, what fraction of the time would she make an error?
+
 That fraction is the detection-error probability $p(\theta^{-1})$.
 
 A high $p$ (near 0.5) means the two models are nearly indistinguishable --- the consumer's fear is hard to rule out.
+
 A low $p$ means the worst case is easy to reject and the robustness concern is less compelling.
 
 The left panel below plots $p(\theta^{-1})$ against $\theta^{-1}$ for the two consumption specifications.
+
 Notice that the same numerical $\theta$ corresponds to very different detection probabilities across models, because baseline dynamics differ.
+
 The right panel resolves this by plotting detection probabilities against discounted relative entropy $\eta$, which normalizes the statistical distance.
+
 Indexed by $\eta$, the two curves coincide.
 
 ```{code-cell} ipython3
 ---
 mystnb:
   figure:
-    caption: detection probabilities under random-walk and trend-stationary approximating
-      models
+    caption: Detection probabilities across two models
     name: fig-bhs-2
 ---
 θ_inv_grid = np.linspace(0.0, 1.8, 400)
@@ -834,11 +911,12 @@ plt.show()
 ```
 
 This is why detection-error probabilities (or equivalently, discounted entropy) are the right cross-model yardstick.
-Holding $\theta$ fixed when switching from a random walk to a trend-stationary specification
-implicitly changes how much misspecification the consumer fears.
+
+Holding $\theta$ fixed when switching from a random walk to a trend-stationary specification implicitly changes how much misspecification the consumer fears.
+
 Holding $\eta$ or $p$ fixed keeps the statistical difficulty of detecting misspecification constant.
 
-The explicit mapping that equates discounted entropy across models is (eq 41 of the paper):
+The explicit mapping that equates discounted entropy across models is ({ref}`Exercise 7 <dov_ex7>` derives it):
 
 ```{math}
 :label: bhs_theta_cross_model
@@ -848,16 +926,15 @@ The explicit mapping that equates discounted entropy across models is (eq 41 of
 \frac{1-\beta}{1-\rho\beta}\;\theta_{\text{RW}}.
 ```
 
-At our calibration $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$, this simplifies to
-$\theta_{\text{TS}} = \frac{1-\beta}{1-\rho\beta}\,\theta_{\text{RW}}$.
-Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ is much less than one,
-so holding entropy fixed requires a substantially smaller $\theta$ (stronger robustness) for the trend-stationary model than for the random walk.
+At our calibration $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$, this simplifies to $\theta_{\text{TS}} = \frac{1-\beta}{1-\rho\beta}\,\theta_{\text{RW}}$.
+
+Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ is much less than one, so holding entropy fixed requires a substantially smaller $\theta$ (stronger robustness) for the trend-stationary model than for the random walk.
 
 ## The punchline: detection probabilities unify the two models
 
 We can now redraw Tallarini's figure using the new language.
-For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$,
-invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m),\;\sigma(m)/E(m))$ pair.
+
+For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$, invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m),\;\sigma(m)/E(m))$ pair.
 
 ```{code-cell} ipython3
 p_points = np.array([0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.01])
@@ -882,7 +959,7 @@ for p, g1, g2 in zip(p_points, γ_rw_points, γ_ts_points):
 ---
 mystnb:
   figure:
-    caption: pricing loci obtained from common detection probabilities
+    caption: Pricing loci from common detectability
     name: fig-bhs-3
 ---
 fig, ax = plt.subplots(figsize=(7, 5))
@@ -903,27 +980,28 @@ plt.show()
 The striking result: the random-walk and trend-stationary loci nearly coincide.
 
 Recall that under Tallarini's $\gamma$-calibration, reaching the Hansen--Jagannathan bound required $\gamma \approx 50$ for the random walk but $\gamma \approx 75$ for the trend-stationary model --- very different numbers for the "same" preference parameter.
+
 Under detection-error calibration, both models reach the bound at the same detectability level (around $p = 0.05$).
 
 The model dependence was an artifact of using $\gamma$ as a cross-model yardstick.
-Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell the same story:
-a representative consumer with moderate, difficult-to-dismiss fears about model misspecification
-behaves as if she had very high risk aversion.
+
+Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell the same story: a representative consumer with moderate, difficult-to-dismiss fears about model misspecification behaves as if she had very high risk aversion.
 
 ## What do risk premia measure? Two mental experiments
 
-Lucas {cite}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate
-aggregate fluctuations.
+Lucas {cite}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate aggregate fluctuations.
+
 His answer --- very little --- rested on the assumption that the consumer knows the data-generating process.
 
 The robust reinterpretation introduces a second, distinct mental experiment.
-Instead of eliminating all randomness, suppose we keep randomness but remove the consumer's
-fear of model misspecification (set $\theta = \infty$).
+
+Instead of eliminating all randomness, suppose we keep randomness but remove the consumer's fear of model misspecification (set $\theta = \infty$).
+
 How much would she pay for that relief alone?
 
-Formally, define $\Delta c_0$ as a permanent proportional reduction in initial consumption that leaves the agent indifferent between
-the original environment and a counterfactual in which either (i) risk alone is removed or (ii) model uncertainty is removed.
-Because utility is log and the consumption process is Gaussian, these compensations are available in closed form.
+Formally, define $\Delta c_0$ as a permanent proportional reduction in initial consumption that leaves the agent indifferent between the original environment and a counterfactual in which either (i) risk alone is removed or (ii) model uncertainty is removed.
+
+Because utility is log and the consumption process is Gaussian, these compensations are available in closed form ({ref}`Exercise 8 <dov_ex8>` derives them).
 
 For type II preferences in the random-walk model, the decomposition is
 
@@ -947,9 +1025,10 @@ For type III preferences in the random-walk model, the uncertainty term is twice
 \frac{\beta \sigma_\varepsilon^2}{(1-\beta)^2\theta}.
 ```
 
-For the trend-stationary model, denominators replace $(1-\beta)$ with $(1-\beta \rho)$ or $(1-\beta \rho^2)$ as detailed in Table 3 of {cite}`BHS_2009`, but the qualitative message is the same.
+For the trend-stationary model, denominators replace $(1-\beta)$ with $(1-\beta \rho)$ or $(1-\beta \rho^2)$, but the qualitative message is the same.
 
 The risk-only term $\Delta c_0^{risk}$ is tiny at postwar consumption volatility --- this is Lucas's well-known result.
+
 The model-uncertainty term $\Delta c_0^{uncertainty}$ can be first order whenever the detection-error probability is moderate, because $\theta$ appears in the denominator.
 
 ## Visualizing the welfare decomposition
@@ -963,8 +1042,8 @@ p_star = 0.10
 w_star = w_from_θ(θ_star, "rw")
 
 # Type II compensations, random walk model
-comp_risk_only = β * rw["σ_ε"] ** 2 / (2.0 * (1.0 - β))
-comp_risk_unc = comp_risk_only + β * rw["σ_ε"] ** 2 / (2.0 * (1.0 - β) ** 2 * θ_star)
+comp_risk_only = β * rw["σ_ε"]**2 / (2.0 * (1.0 - β))
+comp_risk_unc = comp_risk_only + β * rw["σ_ε"]**2 / (2.0 * (1.0 - β)**2 * θ_star)
 
 # Two useful decompositions in levels
 risk_only_pct = 100.0 * (np.exp(comp_risk_only) - 1.0)
@@ -986,7 +1065,7 @@ mean_base = rw["μ"] * t
 std_base = rw["σ_ε"] * np.sqrt(t)
 
 # Certainty equivalent line from Eq. (47), shifted by compensating variations
-certainty_slope = rw["μ"] + 0.5 * rw["σ_ε"] ** 2
+certainty_slope = rw["μ"] + 0.5 * rw["σ_ε"]**2
 ce_risk = -comp_risk_only + certainty_slope * t
 ce_risk_unc = -comp_risk_unc + certainty_slope * t
 
@@ -999,7 +1078,7 @@ mean_high = (rw["μ"] - rw["σ_ε"] * w_star) * t
 ---
 mystnb:
   figure:
-    caption: certainty-equivalent paths and the set of nearby models under robustness
+    caption: Certainty equivalents under robustness
     name: fig-bhs-4
 ---
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
@@ -1031,27 +1110,29 @@ plt.tight_layout()
 plt.show()
 ```
 
-**Left panel.**
-The small gap between the baseline mean path and the "risk only" certainty equivalent is Lucas's result:
-at postwar consumption volatility, the welfare gain from eliminating well-understood aggregate risk is tiny.
+**Left panel.** The small gap between the baseline mean path and the "risk only" certainty equivalent is Lucas's result: at postwar consumption volatility, the welfare gain from eliminating well-understood aggregate risk is tiny.
+
+The much larger gap between the baseline and the "risk + uncertainty" certainty equivalent is the new object.
 
-The much larger gap between the baseline and the "risk + uncertainty" certainty equivalent
-is the new object.
 Most of that gap is compensation for model uncertainty, not risk.
 
-**Right panel.**
-The cloud of nearby models shows what the robust consumer guards against.
-The red-shaded and green-shaded fans correspond to pessimistic and optimistic mean-shift distortions
-whose detection-error probability is $p = 0.10$.
+**Right panel.** The cloud of nearby models shows what the robust consumer guards against.
+
+The red-shaded and green-shaded fans correspond to pessimistic and optimistic mean-shift distortions whose detection-error probability is $p = 0.10$.
+
 These models are statistically close to the baseline (blue) but imply very different long-run consumption levels.
+
 The consumer's caution against such alternatives is what drives the large certainty-equivalent gap in the left panel.
 
 ## How large are the welfare gains from resolving model uncertainty?
 
 A type III (constraint-preference) agent evaluates the worst model inside an entropy ball of radius $\eta$.
+
 As $\eta$ grows, the set of plausible misspecifications expands and the welfare cost of confronting model uncertainty rises.
-Because $\eta$ is abstract, {cite}`BHS_2009` instead index these costs by the associated detection error probability $p(\eta)$.
-The figure below reproduces their display: compensation for removing model uncertainty, measured as a proportion of consumption, plotted against $p(\eta)$.
+
+Because $\eta$ is abstract, we instead index these costs by the associated detection error probability $p(\eta)$.
+
+The figure below plots compensation for removing model uncertainty, measured as a proportion of consumption, against $p(\eta)$.
 
 ```{code-cell} ipython3
 η_grid = np.linspace(0.0, 5.0, 300)
@@ -1069,12 +1150,12 @@ mask_w = w_abs_grid > 0.0
 gain_rw = np.where(
     np.isinf(θ_rw_from_η),
     0.0,
-    β * rw["σ_ε"] ** 2 / ((1.0 - β) ** 2 * θ_rw_from_η),
+    β * rw["σ_ε"]**2 / ((1.0 - β)**2 * θ_rw_from_η),
 )
 gain_ts = np.where(
     np.isinf(θ_ts_from_η),
     0.0,
-    β * ts["σ_ε"] ** 2 / ((1.0 - β * ts["ρ"]) ** 2 * θ_ts_from_η),
+    β * ts["σ_ε"]**2 / ((1.0 - β * ts["ρ"])**2 * θ_ts_from_η),
 )
 
 # Convert log compensation to percent of initial consumption in levels
@@ -1093,7 +1174,7 @@ gain_ts_plot = gain_ts_pct[order]
 ---
 mystnb:
   figure:
-    caption: type III compensation for model uncertainty across detection-error probabilities
+    caption: Type III uncertainty compensation curve
     name: fig-bhs-5
 ---
 fig, ax = plt.subplots(figsize=(7, 4))
@@ -1111,25 +1192,106 @@ plt.show()
 
 The random-walk model delivers somewhat larger costs than the trend-stationary model at the same detection-error probability, but both curves dwarf the classic Lucas cost of business cycles.
 
-To put the magnitudes in perspective: Lucas estimated that eliminating all aggregate consumption risk
-is worth roughly 0.05% of consumption.
-At detection-error probabilities of 10--20%, the model-uncertainty
-compensation alone runs to several percent of consumption.
+To put the magnitudes in perspective: Lucas estimated that eliminating all aggregate consumption risk is worth roughly 0.05% of consumption.
+
+At detection-error probabilities of 10--20%, the model-uncertainty compensation alone runs to several percent of consumption.
 
 This is the welfare counterpart to the pricing result.
-The large risk premia that Tallarini matched with high $\gamma$ are, under the robust reading,
-compensations for bearing model uncertainty --- and the implied welfare gains from resolving that uncertainty are correspondingly large.
+
+The large risk premia that Tallarini matched with high $\gamma$ are, under the robust reading, compensations for bearing model uncertainty --- and the implied welfare gains from resolving that uncertainty are correspondingly large.
 
 ## Why doesn't learning eliminate these fears?
 
-A natural objection: if the consumer has 235 quarters of data, why can't she learn the true drift
-well enough to dismiss the worst-case model?
+A natural objection: if the consumer has 235 quarters of data, why can't she learn the true drift well enough to dismiss the worst-case model?
 
 The answer is that drift is a low-frequency feature.
-Estimating the mean of a random walk to the precision needed to reject small but economically meaningful
-shifts requires far more data than estimating volatility.
+
+Estimating the mean of a random walk to the precision needed to reject small but economically meaningful shifts requires far more data than estimating volatility.
+
 The figure below makes this concrete.
 
+Consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
+
+We construct real per-capita nondurables-plus-services consumption from three FRED series:
+
+| FRED series | Description |
+| --- | --- |
+| `PCNDGC96` | Real PCE: nondurable goods (billions of chained 2017 \$, SAAR) |
+| `PCESVC96` | Real PCE: services (billions of chained 2017 \$, SAAR) |
+| `CNP16OV` | Civilian noninstitutional population, 16+ (thousands, monthly) |
+
+The processing pipeline is:
+
+1. Add real nondurables and services: $C_t^{real} = C_t^{nd} + C_t^{sv}$.
+2. Convert to per-capita: divide by the quarterly average of the monthly population series.
+3. Compute log consumption: $c_t = \log C_t^{real,pc}$.
+
+When we plot *levels* of log consumption, we align the time index to 1948Q1--2006Q4, which yields $T+1 = 236$ quarterly observations.
+
+```{code-cell} ipython3
+start_date = dt.datetime(1947, 1, 1)
+end_date = dt.datetime(2007, 1, 1)
+
+
+def _read_fred_series(series_id, start_date, end_date):
+    series = web.DataReader(series_id, "fred", start_date, end_date)[series_id]
+    series = pd.to_numeric(series, errors="coerce").dropna().sort_index()
+    if series.empty:
+        raise ValueError(f"FRED series '{series_id}' returned no data in sample window")
+    return series
+
+
+# Fetch real PCE components and population from FRED
+real_nd = _read_fred_series("PCNDGC96", start_date, end_date)
+real_sv = _read_fred_series("PCESVC96", start_date, end_date)
+pop_m = _read_fred_series("CNP16OV", start_date, end_date)
+
+# Step 1: aggregate real nondurables + services
+real_total = real_nd + real_sv
+
+# Step 2: align to quarterly frequency first, then convert to per-capita
+# real_total is in billions ($1e9), pop is in thousands ($1e3)
+# per-capita in millions: real_total * 1e9 / (pop * 1e3) / 1e6 = real_total / pop
+real_total_q = real_total.resample("QS").mean()
+pop_q = pop_m.resample("QS").mean()
+real_pc = (real_total_q / pop_q).dropna()
+
+# Restrict to sample period 1948Q1–2006Q4
+real_pc = real_pc.loc["1948-01-01":"2006-12-31"].dropna()
+
+# FRED fallback: use BEA per-capita quarterly components directly.
+if real_pc.empty:
+    nd_pc = _read_fred_series("A796RX0Q048SBEA", start_date, end_date)
+    sv_pc = _read_fred_series("A797RX0Q048SBEA", start_date, end_date)
+    real_pc = ((nd_pc + sv_pc) / 1e6).loc["1948-01-01":"2006-12-31"].dropna()
+
+if real_pc.empty:
+    raise RuntimeError("FRED returned no usable observations after alignment/filtering")
+
+# Step 3: log consumption
+log_c_data = np.log(real_pc.to_numpy(dtype=float).reshape(-1))
+years_data = (real_pc.index.year + (real_pc.index.month - 1) / 12.0).to_numpy(dtype=float)
+
+print(f"Fetched {len(log_c_data)} quarterly observations from FRED")
+print(f"Sample: {years_data[0]:.1f} – {years_data[-1] + 0.25:.1f}")
+print(f"Observations: {len(log_c_data)}")
+```
+
+We can verify Table 2 by computing sample moments of log consumption growth from our FRED data:
+
+```{code-cell} ipython3
+# Growth rates: 1948Q2 to 2006Q4 (T = 235 quarters)
+diff_c = np.diff(log_c_data)
+
+μ_hat = diff_c.mean()
+σ_hat = diff_c.std(ddof=1)
+
+print("Sample estimates from FRED data vs Table 2:")
+print(f"  μ   = {μ_hat:.5f}   (Table 2 RW: {rw['μ']:.5f})")
+print(f"  σ_ε = {σ_hat:.4f}    (Table 2: {rw['σ_ε']:.4f})")
+print(f"  T   = {len(diff_c)} quarters")
+```
+
 ```{code-cell} ipython3
 p_fig6 = 0.20
 
@@ -1138,14 +1300,19 @@ p_fig6 = 0.20
 rw_fig6 = dict(μ=μ_hat, σ_ε=σ_hat)
 w_fig6 = 2.0 * norm.ppf(p_fig6) / np.sqrt(T)
 
-# Use FRED data loaded earlier in the lecture
 c = log_c_data
 years = years_data
 
 t6 = np.arange(T + 1)
-c0 = c[0]
-line_approx = c0 + rw_fig6["μ"] * t6
-line_worst = c0 + (rw_fig6["μ"] + rw_fig6["σ_ε"] * w_fig6) * t6
+μ_approx = rw_fig6["μ"]
+μ_worst = rw_fig6["μ"] + rw_fig6["σ_ε"] * w_fig6
+
+# Match BHS Figure 6 visual construction by fitting intercepts separately
+# while holding the two drifts fixed.
+a_approx = (c - μ_approx * t6).mean()
+a_worst = (c - μ_worst * t6).mean()
+line_approx = a_approx + μ_approx * t6
+line_worst = a_worst + μ_worst * t6
 
 p_right = np.linspace(0.01, 0.50, 500)
 w_right = 2.0 * norm.ppf(p_right) / np.sqrt(T)
@@ -1160,7 +1327,7 @@ lower_band = rw_fig6["μ"] - 2.0 * μ_se
 ---
 mystnb:
   figure:
-    caption: robustly distorted growth rates and finite-sample uncertainty about drift
+    caption: Drift distortion and sampling uncertainty
     name: fig-bhs-6
 ---
 fig, axes = plt.subplots(1, 2, figsize=(12, 4))
@@ -1194,43 +1361,43 @@ ax.axhline(1_000.0 * lower_band, lw=2, ls="--", color="gray")
 ax.set_xlabel("detection error probability (percent)")
 ax.set_ylabel(r"mean consumption growth ($\times 10^{-3}$)")
 ax.legend(frameon=False, fontsize=8, loc="upper right")
-ax.set_title("2 standard deviation band", fontsize=10)
 ax.set_xlim(0.0, 50.0)
-ax.set_ylim(3.0, 6.0)
 
 plt.tight_layout()
 plt.show()
 ```
 
-**Left panel.**
-Postwar U.S. log consumption is shown alongside two deterministic trend lines:
-the approximating-model drift $\mu$ and the worst-case drift $\mu + \sigma_\varepsilon w(\theta)$ for $p(\theta^{-1}) = 0.20$.
-The plotted consumption series is constructed from FRED data following the processing pipeline described in the Data section above.
+**Left panel.** Postwar U.S. log consumption is shown alongside two deterministic trend lines: the approximating-model drift $\mu$ and the worst-case drift $\mu + \sigma_\varepsilon w(\theta)$ for $p(\theta^{-1}) = 0.20$.
+
+For comparability with BHS Fig. 6, we estimate intercepts separately for these two fixed slopes.
+
+The plotted consumption series is constructed from FRED data following the processing pipeline described above.
+
 The two trends are close enough that, even with decades of data, it is hard to distinguish them by eye.
 
-**Right panel.**
-As the detection-error probability rises (models become harder to tell apart), the worst-case mean growth rate moves back toward $\hat\mu$.
+**Right panel.** As the detection-error probability rises (models become harder to tell apart), the worst-case mean growth rate moves back toward $\hat\mu$.
+
 The dashed gray lines mark a two-standard-error band around the maximum-likelihood estimate of $\mu$.
+
 Even at detection probabilities in the 5--20% range, the worst-case drift remains inside (or very near) this confidence band.
 
 The upshot: drift distortions that are economically large --- large enough to generate substantial model-uncertainty premia --- are statistically small relative to sampling uncertainty in $\hat\mu$.
-A dogmatic Bayesian who conditions on a single approximating model and updates using Bayes' law
-will not learn her way out of this problem in samples of the length available.
+
+A dogmatic Bayesian who conditions on a single approximating model and updates using Bayes' law will not learn her way out of this problem in samples of the length available.
+
 Robustness concerns survive long histories precisely because the low-frequency features that matter most for pricing are the hardest to pin down.
 
 ## Concluding remarks
 
-The title asks a question: are large risk premia prices of **variability** (atemporal risk aversion)
-or prices of **doubts** (model uncertainty)?
+The title asks a question: are large risk premia prices of **variability** (atemporal risk aversion) or prices of **doubts** (model uncertainty)?
+
+The analysis above shows that the answer cannot be settled by asset-pricing data alone, because the two interpretations are observationally equivalent.
 
-The analysis above shows that the answer cannot be settled by asset-pricing data alone,
-because the two interpretations are observationally equivalent.
 But the choice matters enormously for what we conclude.
 
-Under the risk-aversion reading, high Sharpe ratios imply that consumers would pay a great deal to smooth
-known aggregate consumption fluctuations.
-Under the robustness reading, those same Sharpe ratios tell us consumers would pay a great deal
-to resolve uncertainty about which probability model governs consumption growth --- a fundamentally different policy object.
+Under the risk-aversion reading, high Sharpe ratios imply that consumers would pay a great deal to smooth known aggregate consumption fluctuations.
+
+Under the robustness reading, those same Sharpe ratios tell us consumers would pay a great deal to resolve uncertainty about which probability model governs consumption growth.
 
 Three features of the analysis support the robustness reading:
 
@@ -1238,5 +1405,759 @@ Three features of the analysis support the robustness reading:
 2. The welfare gains implied by asset prices decompose overwhelmingly into a model-uncertainty component, with the pure risk component remaining small --- consistent with Lucas's original finding.
 3. The drift distortions that drive pricing are small enough to hide inside standard-error bands, so finite-sample learning cannot eliminate the consumer's fears.
 
-Whether one ultimately prefers the risk or the uncertainty interpretation, the framework
-clarifies that the question is not about the size of risk premia but about the economic object those premia identify.
+Whether one ultimately prefers the risk or the uncertainty interpretation, the framework clarifies that the question is not about the size of risk premia but about the economic object those premia measure.
+
+## Exercises
+
+The exercises below ask you to fill in several derivation steps.
+
+```{exercise}
+:label: dov_ex1
+
+Let $R_{t+1}$ be an $n \times 1$ vector of gross returns with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$.
+
+Let $m_{t+1}$ be a valid stochastic discount factor satisfying $\mathbb{1} = E[m_{t+1}\,R_{t+1}]$.
+
+1. Use the covariance decomposition $E[mR] = E[m]\,E[R] + \operatorname{cov}(m,R)$ to show that $\operatorname{cov}(m,R) = \mathbb{1} - E[m]\,E[R] =: b$.
+2. For a portfolio with weight vector $\alpha$ and return $R^p = \alpha^\top R$, show that $\operatorname{cov}(m, R^p) = \alpha^\top b$.
+3. Apply the Cauchy--Schwarz inequality to the pair $(m, R^p)$ to obtain $|\alpha^\top b| \leq \sigma(m)\,\sqrt{\alpha^\top \Sigma_R\,\alpha}$.
+4. Maximize the ratio $|\alpha^\top b|/\sqrt{\alpha^\top \Sigma_R\,\alpha}$ over $\alpha$ and show that the maximum is $\sqrt{b^\top \Sigma_R^{-1} b}$, attained at $\alpha^\star = \Sigma_R^{-1}b$.
+5. Conclude that $\sigma(m)/E(m) \ge \sqrt{b^\top \Sigma_R^{-1} b}$, which is {eq}`bhs_hj_unconditional`.
+```
+
+```{solution-start} dov_ex1
+:class: dropdown
+```
+
+**Part 1.** From $\mathbb{1} = E[m\,R] = E[m]\,E[R] + \operatorname{cov}(m,R)$, rearranging gives $\operatorname{cov}(m,R) = \mathbb{1} - E[m]\,E[R] \equiv b$.
+
+**Part 2.** The portfolio return is $R^p = \alpha^\top R$, so
+
+$$
+\operatorname{cov}(m, R^p) = \operatorname{cov}(m, \alpha^\top R) = \alpha^\top \operatorname{cov}(m, R) = \alpha^\top b.
+$$
+
+**Part 3.** The Cauchy--Schwarz inequality for any two random variables $X, Y$ states $|\operatorname{cov}(X,Y)| \leq \sigma(X)\,\sigma(Y)$.
+Applying it to $(m, R^p)$:
+
+$$
+|\alpha^\top b| = |\operatorname{cov}(m, R^p)| \leq \sigma(m)\,\sigma(R^p) = \sigma(m)\,\sqrt{\alpha^\top \Sigma_R\,\alpha}.
+$$
+
+**Part 4.** Rearranging Part 3 gives
+
+$$
+\frac{|\alpha^\top b|}{\sqrt{\alpha^\top \Sigma_R\,\alpha}} \leq \sigma(m).
+$$
+
+To maximize the left-hand side over $\alpha$, define the $\Sigma_R$-inner product $\langle u, v \rangle_{\Sigma} = u^\top \Sigma_R\, v$.
+Insert $I = \Sigma_R \Sigma_R^{-1}$:
+
+$$
+\alpha^\top b
+= \alpha^\top (\Sigma_R \Sigma_R^{-1}) b
+= (\alpha^\top \Sigma_R)(\Sigma_R^{-1} b)
+= \langle \alpha,\, \Sigma_R^{-1}b \rangle_{\Sigma}.
+$$
+
+Cauchy--Schwarz in this inner product gives
+
+$$
+|\langle \alpha,\, \Sigma_R^{-1}b \rangle_{\Sigma}|
+\leq
+\sqrt{\langle \alpha, \alpha \rangle_{\Sigma}}\;\sqrt{\langle \Sigma_R^{-1}b,\, \Sigma_R^{-1}b \rangle_{\Sigma}}
+=
+\sqrt{\alpha^\top \Sigma_R\,\alpha}\;\sqrt{b^\top \Sigma_R^{-1} b},
+$$
+
+with equality when $\alpha \propto \Sigma_R^{-1} b$.
+Substituting $\alpha^\star = \Sigma_R^{-1} b$ confirms
+
+$$
+\max_\alpha \frac{|\alpha^\top b|}{\sqrt{\alpha^\top \Sigma_R\,\alpha}} = \sqrt{b^\top \Sigma_R^{-1} b}.
+$$
+
+**Part 5.** Combining Parts 3 and 4, $\sqrt{b^\top \Sigma_R^{-1} b} \leq \sigma(m)$.
+Dividing both sides by $E[m] > 0$ yields {eq}`bhs_hj_unconditional`.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex2
+
+Combine the SDF representation {eq}`bhs_sdf` with the random-walk consumption dynamics and the Gaussian mean-shift distortion to show that $\log m_{t+1}$ is normally distributed under the approximating model.
+
+1. Compute its mean and variance in terms of $(\beta,\mu,\sigma_\varepsilon,w)$.
+2. Use lognormal moments to derive expressions for $E[m]$ and $\sigma(m)/E[m]$.
+3. Use the parameter mapping $\theta = [(1-\beta)(\gamma-1)]^{-1}$ and the associated $w$ to obtain {eq}`bhs_Em_rw` and {eq}`bhs_sigma_rw`.
+```
+
+```{solution-start} dov_ex2
+:class: dropdown
+```
+
+Under the random walk,
+
+$$
+c_{t+1}-c_t=\mu+\sigma_\varepsilon \varepsilon_{t+1}
+
+$$
+with $\varepsilon_{t+1}\sim\mathcal{N}(0,1)$ under the approximating model.
+
+Using {eq}`bhs_sdf` and the Gaussian distortion
+
+$$
+\hat g_{t+1}=\exp\!\left(w\varepsilon_{t+1}-\tfrac{1}{2}w^2\right),
+
+$$
+we get
+
+$$
+m_{t+1}
+=
+\beta \exp\!\left(-(c_{t+1}-c_t)\right)\hat g_{t+1}
+=
+\beta \exp\!\left(-\mu-\sigma_\varepsilon\varepsilon_{t+1}\right)\exp\!\left(w\varepsilon_{t+1}-\frac{1}{2}w^2\right).
+$$
+
+Therefore
+
+$$
+\log m_{t+1}
+=
+\log\beta-\mu-\frac{1}{2}w^2 + (w-\sigma_\varepsilon)\varepsilon_{t+1},
+
+$$
+which is normal with mean
+
+$$
+E[\log m]=\log\beta-\mu-\tfrac{1}{2}w^2
+
+$$
+and variance
+
+$$
+\operatorname{Var}(\log m)=(w-\sigma_\varepsilon)^2.
+$$
+
+For a lognormal random variable,
+
+$$
+E[m]=\exp(E[\log m]+\tfrac{1}{2}\operatorname{Var}(\log m))
+
+$$
+and
+
+$$
+\sigma(m)/E[m]=\sqrt{e^{\operatorname{Var}(\log m)}-1}.
+
+$$
+Hence
+
+$$
+E[m]
+=
+\beta\exp\!\left(
+-\mu-\frac{1}{2}w^2+\frac{1}{2}(w-\sigma_\varepsilon)^2
+\right)
+=
+\beta\exp\!\left(-\mu+\frac{\sigma_\varepsilon^2}{2}-\sigma_\varepsilon w\right),
+
+$$
+and
+
+$$
+\frac{\sigma(m)}{E[m]}
+=
+\sqrt{\exp\!\left((w-\sigma_\varepsilon)^2\right)-1}.
+$$
+
+Now use $w_{\text{RW}}(\theta)=-\sigma_\varepsilon/[(1-\beta)\theta]$ from {eq}`bhs_w_formulas` and
+$\theta=[(1-\beta)(\gamma-1)]^{-1}$ to get $w=-\sigma_\varepsilon(\gamma-1)$.
+Then
+
+$$
+-\sigma_\varepsilon w=\sigma_\varepsilon^2(\gamma-1)
+
+$$
+and
+
+$$
+(w-\sigma_\varepsilon)^2 = (-\sigma_\varepsilon\gamma)^2=\sigma_\varepsilon^2\gamma^2.
+
+$$
+Substituting yields {eq}`bhs_Em_rw` and {eq}`bhs_sigma_rw`.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex3
+
+Starting from the type I recursion {eq}`bhs_type1_recursion` and the definitions of $U_t$ and $\theta$ in {eq}`bhs_Ut_def`--{eq}`bhs_theta_def`, derive the risk-sensitive recursion {eq}`bhs_risk_sensitive`.
+
+Verify that as $\gamma \to 1$ (equivalently $\theta \to \infty$), the recursion converges to standard discounted expected log utility $U_t = c_t + \beta E_t U_{t+1}$.
+```
+
+```{solution-start} dov_ex3
+:class: dropdown
+```
+
+Start from the type I recursion {eq}`bhs_type1_recursion` and write
+
+$$
+(V_{t+1})^{1-\gamma} = \exp\!\bigl((1-\gamma)\log V_{t+1}\bigr).
+$$
+
+Using $\log V_t = (1-\beta)U_t$ from {eq}`bhs_Ut_def`, we obtain
+
+$$
+(1-\beta)U_t
+=
+(1-\beta)c_t
+\;+\;
+\frac{\beta}{1-\gamma}\log E_t\!\left[\exp\!\bigl((1-\gamma)(1-\beta)U_{t+1}\bigr)\right].
+$$
+
+Divide by $(1-\beta)$ and use {eq}`bhs_theta_def`,
+
+$$
+\theta = -\bigl[(1-\beta)(1-\gamma)\bigr]^{-1}.
+
+$$
+Then $(1-\gamma)(1-\beta)=-1/\theta$ and $\beta/[(1-\beta)(1-\gamma)]=-\beta\theta$, so
+
+$$
+U_t
+=
+c_t - \beta\theta \log E_t\!\left[\exp\!\left(-\frac{U_{t+1}}{\theta}\right)\right],
+
+$$
+which is {eq}`bhs_risk_sensitive`.
+
+For $\theta\to\infty$ (equivalently $\gamma\to 1$), use the expansion
+
+$$
+\exp(-U_{t+1}/\theta)=1-U_{t+1}/\theta+o(1/\theta).
+
+$$
+Then
+
+$$
+\log E_t[\exp(-U_{t+1}/\theta)]
+=
+-E_t[U_{t+1}]/\theta+o(1/\theta),
+
+$$
+so $-\theta\log E_t[\exp(-U_{t+1}/\theta)]\to E_t[U_{t+1}]$ and the recursion converges to
+
+$$
+U_t = c_t + \beta E_t U_{t+1}.
+$$
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex4
+
+Consider the type II Bellman equation {eq}`bhs_bellman_type2`.
+
+1. Use a Lagrange multiplier to impose the normalization constraint $\int g(\varepsilon)\,\pi(\varepsilon)\,d\varepsilon = 1$.
+2. Derive the first-order condition for $g(\varepsilon)$ and show that the minimizer is the exponential tilt in {eq}`bhs_ghat`.
+3. Substitute your minimizing $g$ back into {eq}`bhs_bellman_type2` to recover the risk-sensitive Bellman equation {eq}`bhs_bellman_type1`.
+
+Conclude that $W(x) \equiv U(x)$ for consumption plans in $\mathcal{C}(A,B,H;x_0)$.
+```
+
+```{solution-start} dov_ex4
+:class: dropdown
+```
+
+Fix $x$ and write $W'(\varepsilon) := W(Ax + B\varepsilon)$ for short.
+
+Form the Lagrangian
+
+$$
+\mathcal{L}[g,\lambda]
+=
+\beta \int \Bigl[g(\varepsilon)W'(\varepsilon) + \theta g(\varepsilon)\log g(\varepsilon)\Bigr]\pi(\varepsilon)\,d\varepsilon
+\;+\;
+\lambda\left(\int g(\varepsilon)\pi(\varepsilon)\,d\varepsilon - 1\right).
+$$
+
+The pointwise first-order condition for $g(\varepsilon)$ is
+
+$$
+0
+=
+\frac{\partial \mathcal{L}}{\partial g(\varepsilon)}
+=
+\beta\Bigl[W'(\varepsilon) + \theta(1+\log g(\varepsilon))\Bigr]\pi(\varepsilon)
+\;+\;
+\lambda\,\pi(\varepsilon),
+
+$$
+so (dividing by $\beta\pi(\varepsilon)$)
+
+$$
+\log g(\varepsilon)
+=
+-\frac{W'(\varepsilon)}{\theta} - 1 - \frac{\lambda}{\beta\theta}.
+$$
+
+Exponentiating yields $g(\varepsilon)=K\exp(-W'(\varepsilon)/\theta)$ for a constant $K$.
+Imposing $\int g(\varepsilon)\pi(\varepsilon)d\varepsilon=1$ implies
+
+$$
+K^{-1}
+=
+\int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon,
+
+$$
+and therefore
+
+$$
+\hat g(\varepsilon)
+=
+\frac{\exp\!\left(-W(Ax+B\varepsilon)/\theta\right)}
+\int \exp\!\left(-W(Ax+B\tilde\varepsilon)/\theta\right)\pi(\tilde\varepsilon)\,d\tilde\varepsilon,
+
+$$
+which is {eq}`bhs_ghat`.
+
+To substitute back, define
+
+$$
+Z(x):=\int \exp(-W(Ax+B\varepsilon)/\theta)\pi(\varepsilon)\,d\varepsilon.
+
+$$
+Then $\hat g(\varepsilon)=\exp(-W(Ax+B\varepsilon)/\theta)/Z(x)$ and
+
+$$
+\log\hat g(\varepsilon)=-W(Ax+B\varepsilon)/\theta-\log Z(x).
+
+$$
+Hence
+
+$$
+\int \Bigl[\hat g(\varepsilon)W(Ax+B\varepsilon) + \theta \hat g(\varepsilon)\log \hat g(\varepsilon)\Bigr]\pi(\varepsilon)\,d\varepsilon
+=
+-\theta\log Z(x),
+
+$$
+because the $W$ terms cancel and $\int \hat g\,\pi = 1$.
+
+Plugging this into {eq}`bhs_bellman_type2` gives
+
+$$
+W(x)
+=
+c-\beta\theta\log Z(x)
+=
+c-\beta\theta \log \int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon,
+
+$$
+which is {eq}`bhs_bellman_type1`. Therefore $W(x)\equiv U(x)$.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex5
+
+Let $\varepsilon \sim \mathcal{N}(0,1)$ under the approximating model and define
+
+$$
+\hat g(\varepsilon) = \exp\!\left(w\varepsilon - \frac{1}{2}w^2\right)
+
+$$
+as in the Gaussian mean-shift section.
+
+1. Show that $E[\hat g(\varepsilon)] = 1$.
+2. Show that for any bounded measurable function $f$,
+
+$$
+E[\hat g(\varepsilon) f(\varepsilon)]
+
+$$
+equals the expectation of $f$ under $\mathcal{N}(w,1)$.
+3. Compute the mean and variance of $\log \hat g(\varepsilon)$ and use these to derive
+
+$$
+\operatorname{std}(\hat g) = \sqrt{e^{w^2}-1}.
+
+$$
+4. Compute the conditional relative entropy $E[\hat g\log \hat g]$ and verify that it equals $\tfrac{1}{2}w^2$.
+```
+
+```{solution-start} dov_ex5
+:class: dropdown
+```
+
+1. Using the moment generating function of a standard normal,
+
+$$
+E[\hat g(\varepsilon)]
+=
+e^{-w^2/2}\,E[e^{w\varepsilon}]
+=
+e^{-w^2/2}\,e^{w^2/2}
+=
+1.
+$$
+
+2. Let $\varphi(\varepsilon) = (2\pi)^{-1/2}e^{-\varepsilon^2/2}$ be the $\mathcal{N}(0,1)$ density.
+Then
+
+$$
+\hat g(\varepsilon)\varphi(\varepsilon)
+=
+\frac{1}{\sqrt{2\pi}}
+\exp\!\left(w\varepsilon-\frac{1}{2}w^2-\frac{1}{2}\varepsilon^2\right)
+=
+\frac{1}{\sqrt{2\pi}}
+\exp\!\left(-\frac{1}{2}(\varepsilon-w)^2\right),
+
+$$
+which is the $\mathcal{N}(w,1)$ density.
+Therefore, for bounded measurable $f$,
+
+$$
+E[\hat g(\varepsilon)f(\varepsilon)]
+=
+\int f(\varepsilon)\,\hat g(\varepsilon)\varphi(\varepsilon)\,d\varepsilon
+=
+E_{\mathcal{N}(w,1)}[f(\varepsilon)].
+$$
+
+3. Since $\log \hat g(\varepsilon) = w\varepsilon - \tfrac{1}{2}w^2$ and $\varepsilon\sim\mathcal{N}(0,1)$,
+
+$$
+E[\log \hat g] = -\frac{1}{2}w^2,
+\qquad
+\operatorname{Var}(\log \hat g)=w^2.
+
+$$
+Moreover, $\operatorname{Var}(\hat g)=E[\hat g^2]-1$ because $E[\hat g]=1$.
+Now
+
+$$
+E[\hat g^2]
+=
+E\!\left[\exp\!\left(2w\varepsilon - w^2\right)\right]
+=
+e^{-w^2}\,E[e^{2w\varepsilon}]
+=
+e^{-w^2}\,e^{(2w)^2/2}
+=
+e^{w^2},
+
+$$
+so $\operatorname{std}(\hat g)=\sqrt{e^{w^2}-1}$.
+
+4. Using part 2 with $f(\varepsilon)=\log \hat g(\varepsilon)=w\varepsilon-\tfrac{1}{2}w^2$,
+
+$$
+E[\hat g\log \hat g]
+=
+E_{\mathcal{N}(w,1)}\!\left[w\varepsilon-\frac{1}{2}w^2\right]
+=
+w\cdot E_{\mathcal{N}(w,1)}[\varepsilon]-\frac{1}{2}w^2
+=
+w^2-\frac{1}{2}w^2
+=
+\frac{1}{2}w^2.
+$$
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex6
+
+In the Gaussian mean-shift setting of {ref}`Exercise 5 <dov_ex5>`, let $L_T$ be the log likelihood ratio between the worst-case and approximating models based on $T$ observations.
+
+1. Show that $L_T$ is normal under each model.
+2. Compute its mean and variance under the approximating and worst-case models.
+3. Using the definition of detection-error probability in {eq}`bhs_detection_formula`, derive the closed-form expression {eq}`bhs_detection_closed`.
+```
+
+```{solution-start} dov_ex6
+:class: dropdown
+```
+
+Let the approximating model be $\varepsilon_i \sim \mathcal{N}(0,1)$ and the worst-case model be $\varepsilon_i \sim \mathcal{N}(w,1)$, i.i.d. for $i=1,\ldots,T$.
+
+Take the log likelihood ratio in the direction that matches the definitions in the text:
+
+$$
+L_T
+=
+\log \frac{\prod_{i=1}^T \varphi(\varepsilon_i)}{\prod_{i=1}^T \varphi(\varepsilon_i-w)}
+=
+\sum_{i=1}^T \ell(\varepsilon_i),
+
+$$
+where $\varphi$ is the $\mathcal{N}(0,1)$ density and
+
+$$
+\ell(\varepsilon)
+=
+\log \varphi(\varepsilon) - \log \varphi(\varepsilon-w)
+=
+-\frac{1}{2}\Bigl[\varepsilon^2-(\varepsilon-w)^2\Bigr]
+=
+-w\varepsilon + \frac{1}{2}w^2.
+$$
+
+Therefore
+
+$$
+L_T = -w\sum_{i=1}^T \varepsilon_i + \tfrac{1}{2}w^2T.
+$$
+
+Under the approximating model, $\sum_{i=1}^T \varepsilon_i \sim \mathcal{N}(0,T)$, so
+
+$$
+L_T \sim \mathcal{N}\!\left(\frac{1}{2}w^2T,\; w^2T\right).
+$$
+
+Under the worst-case model, $\sum_{i=1}^T \varepsilon_i \sim \mathcal{N}(wT,T)$, so
+
+$$
+L_T \sim \mathcal{N}\!\left(-\frac{1}{2}w^2T,\; w^2T\right).
+$$
+
+Now
+
+$$
+p_A = \Pr_A(L_T<0)
+=
+\Phi\!\left(\frac{0-\frac{1}{2}w^2T}{|w|\sqrt{T}}\right)
+=
+\Phi\!\left(-\frac{|w|\sqrt{T}}{2}\right),
+
+$$
+and
+
+$$
+p_B = \Pr_B(L_T>0)
+=
+1-\Phi\!\left(\frac{0-(-\frac{1}{2}w^2T)}{|w|\sqrt{T}}\right)
+=
+1-\Phi\!\left(\frac{|w|\sqrt{T}}{2}\right)
+=
+\Phi\!\left(-\frac{|w|\sqrt{T}}{2}\right).
+$$
+
+Therefore
+
+$$
+p(\theta^{-1})=\tfrac{1}{2}(p_A+p_B)=\Phi\!\left(-\tfrac{|w|\sqrt{T}}{2}\right),
+
+$$
+which is {eq}`bhs_detection_closed`.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex7
+
+Using the formulas for $w(\theta)$ in {eq}`bhs_w_formulas` and the definition of discounted entropy
+
+$$
+\eta = \frac{\beta}{1-\beta}\cdot \frac{w(\theta)^2}{2},
+
+$$
+show that holding $\eta$ fixed across the random-walk and trend-stationary consumption specifications implies the mapping {eq}`bhs_theta_cross_model`.
+
+Specialize your result to the case $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$ and interpret the role of $\rho$.
+```
+
+```{solution-start} dov_ex7
+:class: dropdown
+```
+
+Because $\eta$ depends on $\theta$ only through $w(\theta)^2$, holding $\eta$ fixed across models is equivalent to holding $|w(\theta)|$ fixed.
+
+Using {eq}`bhs_w_formulas`,
+
+$$
+|w_{\text{RW}}(\theta_{\text{RW}})|
+=
+\frac{\sigma_\varepsilon^{\text{RW}}}{(1-\beta)\theta_{\text{RW}}},
+\qquad
+|w_{\text{TS}}(\theta_{\text{TS}})|
+=
+\frac{\sigma_\varepsilon^{\text{TS}}}{(1-\beta\rho)\theta_{\text{TS}}}.
+$$
+
+Equating these magnitudes and solving for $\theta_{\text{TS}}$ gives
+
+$$
+\theta_{\text{TS}}
+=
+\left(\frac{\sigma_\varepsilon^{\text{TS}}}{\sigma_\varepsilon^{\text{RW}}}\right)
+\frac{1-\beta}{1-\beta\rho}\,\theta_{\text{RW}},
+
+$$
+which is {eq}`bhs_theta_cross_model`.
+
+If $\sigma_\varepsilon^{\text{TS}}=\sigma_\varepsilon^{\text{RW}}$, then
+
+$$
+\theta_{\text{TS}}=\frac{1-\beta}{1-\beta\rho}\theta_{\text{RW}}.
+
+$$
+Since $\rho\in(0,1)$ implies $1-\beta\rho < 1-\beta$, the ratio $(1-\beta)/(1-\beta\rho)$ is less than one.
+So to hold entropy fixed, the trend-stationary model requires a smaller $\theta$ (i.e., a cheaper distortion / stronger robustness) than the random-walk model.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex8
+
+For type II (multiplier) preferences under random-walk consumption growth, derive the compensating-variation formulas in {eq}`bhs_type2_rw_decomp`.
+
+In particular, derive
+
+1. the **risk** term by comparing the stochastic economy to a deterministic consumption path with the same mean level of consumption (Lucas's thought experiment), and
+2. the **uncertainty** term by comparing a type II agent with parameter $\theta$ to the expected-utility case $\theta=\infty$, holding the stochastic environment fixed.
+```
+
+```{solution-start} dov_ex8
+:class: dropdown
+```
+
+Write the random walk as
+
+$$
+c_t = c_0 + t\mu + \sigma_\varepsilon\sum_{j=1}^t \varepsilon_j
+
+$$
+with $\varepsilon_j\stackrel{iid}{\sim}\mathcal{N}(0,1)$.
+
+**Risk term.**
+The mean level of consumption is
+
+$$
+E[C_t]=E[e^{c_t}]=\exp(c_0+t\mu+\tfrac{1}{2}t\sigma_\varepsilon^2),
+
+$$
+so the deterministic path with the same mean levels is
+
+$$
+\bar c_t = c_0 + t(\mu+\tfrac{1}{2}\sigma_\varepsilon^2).
+$$
+
+Under expected log utility ($\theta=\infty$), discounted expected utility is
+
+$$
+\sum_{t\ge 0}\beta^t E[c_t]
+=
+\frac{c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2},
+
+$$
+while for the deterministic mean-level path it is
+
+$$
+\sum_{t\ge 0}\beta^t \bar c_t
+=
+\frac{c_0}{1-\beta} + \frac{\beta(\mu+\tfrac{1}{2}\sigma_\varepsilon^2)}{(1-\beta)^2}.
+$$
+
+If we reduce initial consumption by $\Delta c_0^{risk}$ (so $\bar c_t$ shifts down by $\Delta c_0^{risk}$ for all $t$), utility falls by $\Delta c_0^{risk}/(1-\beta)$.
+Equating the two utilities gives
+
+$$
+\frac{\Delta c_0^{risk}}{1-\beta}
+=
+\frac{\beta(\tfrac{1}{2}\sigma_\varepsilon^2)}{(1-\beta)^2}
+\quad\Rightarrow\quad
+\Delta c_0^{risk}=\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}.
+$$
+
+**Uncertainty term.**
+For type II multiplier preferences, the minimizing distortion is a Gaussian mean shift with parameter $w$ and per-period relative entropy $\tfrac{1}{2}w^2$.
+Under the distorted model, $E[\varepsilon]=w$, so
+
+$$
+E[c_t]=c_0+t(\mu+\sigma_\varepsilon w).
+$$
+
+Plugging this into the type II objective (and using $E_t[g\log g]=\tfrac{1}{2}w^2$) gives the discounted objective as a function of $w$:
+
+$$
+J(w)
+=
+\sum_{t\ge 0}\beta^t\Bigl(c_0+t(\mu+\sigma_\varepsilon w)\Bigr)
+\;+\;
+\sum_{t\ge 0}\beta^{t+1}\theta\cdot\frac{w^2}{2}.
+
+$$
+Using $\sum_{t\ge0}\beta^t=1/(1-\beta)$ and $\sum_{t\ge0}t\beta^t=\beta/(1-\beta)^2$,
+
+$$
+J(w)
+=
+\frac{c_0}{1-\beta}
+\;+\;
+\frac{\beta(\mu+\sigma_\varepsilon w)}{(1-\beta)^2}
+\;+\;
+\frac{\beta\theta}{1-\beta}\cdot\frac{w^2}{2}.
+$$
+
+Minimizing over $w$ yields
+
+$$
+0=\frac{\partial J}{\partial w}
+=
+\frac{\beta\sigma_\varepsilon}{(1-\beta)^2}
+\;+\;
+\frac{\beta\theta}{1-\beta}\,w
+\quad\Rightarrow\quad
+w^*=-\frac{\sigma_\varepsilon}{(1-\beta)\theta},
+
+$$
+which matches {eq}`bhs_w_formulas`.
+
+Substituting $w^*$ back in gives
+
+$$
+J(w^*)
+=
+\frac{c_0}{1-\beta}
+\;+\;
+\frac{\beta\mu}{(1-\beta)^2}
+-\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta}.
+$$
+
+When $\theta=\infty$ (no model uncertainty), the last term disappears.
+Thus the utility gain from removing model uncertainty at fixed $(\mu,\sigma_\varepsilon)$ is
+
+$$
+\beta\sigma_\varepsilon^2/[2(1-\beta)^3\theta].
+
+$$
+To offset this by a permanent upward shift in initial log consumption, we need
+
+$$
+\Delta c_0^{uncertainty}/(1-\beta)=\beta\sigma_\varepsilon^2/[2(1-\beta)^3\theta],
+
+$$
+so
+
+$$
+\Delta c_0^{uncertainty}
+=
+\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
+$$
+
+Together these reproduce {eq}`bhs_type2_rw_decomp`.
+
+```{solution-end}
+```

From 803e744ed526d76018f71ed03c735e65bf62c285 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 11 Feb 2026 18:08:43 +1100
Subject: [PATCH 20/37] updates

---
 lectures/doubts_or_variability.md | 657 +++++++++++++++++++++---------
 1 file changed, 466 insertions(+), 191 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index b2580f4b4..7c64a07d8 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -185,13 +185,12 @@ The bound says that the Sharpe ratio of any asset cannot exceed the market price
 #### Unconditional version
 
 The bound {eq}`bhs_hj_bound` is stated in conditional terms.
-An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$.
-{ref}`Exercise 1 <dov_ex1>` asks you to derive
+An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$ 
 
 ```{math}
 :label: bhs_hj_unconditional
 \frac{\sigma(m)}{E(m)}
-\;\ge\;
+\;\geq\;
 \sqrt{b^\top \Sigma_R^{-1} b},
 \qquad
 b = \mathbb{1} - E(m)\, E(R).
@@ -204,6 +203,8 @@ def hj_std_bound(E_m):
     return np.sqrt(np.maximum(var_lb, 0.0))
 ```
 
+In {ref}`Exercise 1 <dov_ex1>`, we will revisit and verify this unconditional version of the HJ bound.
+
 ### The puzzle
 
 To reconcile formula {eq}`bhs_crra_sdf` with measures of the market price of risk extracted from data on asset returns and prices (like those in Table 1 below) requires a value of $\gamma$ so high that it provokes skepticism --- this is the **equity premium puzzle**.
@@ -216,73 +217,13 @@ This is the **risk-free rate puzzle** of {cite:t}`Weil_1989`.
 
 {cite:t}`Tall2000` showed that recursive preferences with IES $= 1$ can clear the HJ bar while avoiding the risk-free rate puzzle.
 
-### Deriving SDF moments under recursive preferences
+### Epstein-Zin SDF moments
 
 The figure below reproduces Tallarini's key diagnostic.
 
-For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m)/E(m))$ pair for three specifications: time-separable CRRA (crosses), recursive preferences with random-walk consumption (circles), and recursive preferences with trend-stationary consumption (pluses).
-
-For the two consumption specifications, we can derive closed-form expressions for the unconditional SDF moments under recursive preferences.
-
-Under recursive preferences with IES $= 1$, the SDF has the form (derived later in {eq}`bhs_sdf`)
-
-$$
-m_{t+1} = \beta \frac{C_t}{C_{t+1}} \cdot \hat{g}_{t+1},
-$$
-
-where $\hat{g}_{t+1}$ is a likelihood-ratio distortion from the continuation value.
-
-For the random-walk model with $c_{t+1} - c_t = \mu + \sigma_\varepsilon \varepsilon_{t+1}$ and $\varepsilon_{t+1} \sim \mathcal{N}(0,1)$, the distortion is a Gaussian mean shift $w = -\sigma_\varepsilon(\gamma - 1)$, and $\log m_{t+1}$ turns out to be normally distributed:
-
-$$
-\log m_{t+1} = \log\beta - \mu - \tfrac{1}{2}w^2 + (w - \sigma_\varepsilon)\varepsilon_{t+1}.
-$$
-
-Its mean and variance are
-
-$$
-E[\log m] = \log\beta - \mu - \tfrac{1}{2}w^2,
-\qquad
-\operatorname{Var}(\log m) = (w - \sigma_\varepsilon)^2 = \sigma_\varepsilon^2 \gamma^2.
-$$
-
-For a lognormal random variable, $E[m] = \exp(E[\log m] + \tfrac{1}{2}\operatorname{Var}(\log m))$ and $\sigma(m)/E[m] = \sqrt{e^{\operatorname{Var}(\log m)} - 1}$.
-
-Substituting gives the following closed-form expressions ({ref}`Exercise 2 <dov_ex2>` asks you to work through the full derivation):
-
-- *Random walk*:
-
-```{math}
-:label: bhs_Em_rw
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
-```
-
-```{math}
-:label: bhs_sigma_rw
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
-```
-
-Notice that in {eq}`bhs_Em_rw`, because $\sigma_\varepsilon$ is small ($\approx 0.005$), the term $\frac{\sigma_\varepsilon^2}{2}(2\gamma-1)$ grows slowly with $\gamma$, keeping $E[m]$ roughly constant near $1/(1+r^f)$.
-
-Meanwhile {eq}`bhs_sigma_rw` shows that $\sigma(m)/E[m] \approx \sigma_\varepsilon \gamma$ grows linearly with $\gamma$.
-
-This is how recursive preferences push volatility toward the HJ bound without distorting the risk-free rate.
+We derive closed-form expressions for the Epstein-Zin SDF moments --- equations {eq}`bhs_Em_rw`--{eq}`bhs_sigma_ts` --- later in {ref}`ez_sdf_moments`, after developing the Epstein-Zin recursion and the Gaussian mean-shift distortion.
 
-An analogous calculation for the trend-stationary model yields:
-
-- *Trend stationary*:
-
-```{math}
-:label: bhs_Em_ts
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
-```
-
-```{math}
-:label: bhs_sigma_ts
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
-```
-
-The code below implements these expressions (and the corresponding CRRA moments) to draw Tallarini's figure.
+The code below implements those expressions (and the corresponding CRRA moments). 
 
 ```{code-cell} ipython3
 def moments_type1_rw(γ):
@@ -314,6 +255,9 @@ def moments_crra_rw(γ):
     return E_m, mpr
 ```
 
+For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m)/E(m))$ pair for three specifications: time-separable CRRA (crosses), Epstein-Zin preferences with random-walk consumption (circles), and Epstein-Zin preferences with trend-stationary consumption (pluses)
+
+
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -338,10 +282,10 @@ HJ_std = np.array([hj_std_bound(x) for x in Em_grid])
 fig, ax = plt.subplots(figsize=(7, 5))
 ax.plot(Em_grid, HJ_std, lw=2, color="black", 
                             label="Hansen-Jagannathan bound")
-ax.plot(Em_rw, MPR_rw, "o", lw=2, 
-                            label="recursive, random walk")
-ax.plot(Em_ts, MPR_ts, "+", lw=2, 
-                            label="recursive, trend stationary")
+ax.plot(Em_rw, MPR_rw, "o", lw=2,
+                            label="Epstein-Zin, random walk")
+ax.plot(Em_ts, MPR_ts, "+", lw=2,
+                            label="Epstein-Zin, trend stationary")
 ax.plot(Em_crra, MPR_crra, "x", lw=2, 
                             label="time-separable CRRA")
 
@@ -369,13 +313,10 @@ The quantitative achievement is real.
 
 But Lucas's challenge still stands: what microeconomic evidence supports $\gamma = 50$?
 
-To make the reinterpretation precise, we now lay out the statistical environment and the preference specifications.
-
 ## The choice setting
 
-The calibration uses quarterly U.S. data from 1948Q2--2006Q4 for consumption growth rates (a sample length of $T = 235$ quarters).
+To make the answer to this question precise, we now lay out the statistical environment and the preference specifications.
 
-We consider two consumption-growth specifications (random walk and trend stationary) with parameter estimates and asset-return moments from {cite}`BHS_2009`.
 
 ### Shocks and consumption plans
 
@@ -408,7 +349,7 @@ The random-walk and trend-stationary models below are two special cases.
 
 Let $c_t = \log C_t$ be log consumption.
 
-The random-walk specification is
+The *random-walk* specification is
 
 ```{math}
 c_{t+1} = c_t + \mu + \sigma_\varepsilon \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \sim \mathcal{N}(0, 1).
@@ -419,10 +360,10 @@ Iterating forward yields
 ```{math}
 c_t = c_0 + t\mu + \sigma_\varepsilon(\varepsilon_t + \varepsilon_{t-1} + \cdots + \varepsilon_1),
 \qquad
-t \ge 1.
+t \geq 1.
 ```
 
-The trend-stationary specification can be written as a deterministic trend plus a stationary AR(1) component:
+The *trend-stationary* specification can be written as a deterministic trend plus a stationary AR(1) component:
 
 ```{math}
 c_t = \zeta + \mu t + z_t,
@@ -441,7 +382,7 @@ c_t
 +
 \sigma_\varepsilon(\varepsilon_t + \rho \varepsilon_{t-1} + \cdots + \rho^{t-1}\varepsilon_1),
 \qquad
-t \ge 1.
+t \geq 1.
 ```
 
 Equivalently, defining the detrended series $\tilde c_t := c_t - \mu t$,
@@ -473,7 +414,7 @@ We compare four preference specifications over consumption plans $C^\infty \in \
 **Type I agent (Kreps--Porteus--Epstein--Zin--Tallarini)** with
 - a discount factor $\beta \in (0,1)$;
 - an intertemporal elasticity of substitution fixed at $1$;
-- a risk-aversion parameter $\gamma \ge 1$; and
+- a risk-aversion parameter $\gamma \geq 1$; and
 - an approximating conditional density $\pi(\cdot)$ for shocks and its implied joint distribution $\Pi_\infty(\cdot \mid x_0)$.
 
 **Type II agent (multiplier preferences)** with
@@ -496,23 +437,60 @@ We compare four preference specifications over consumption plans $C^\infty \in \
 - unit risk aversion; and
 - a single pessimistic joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ induced by the type II worst-case distortion.
 
-Two equivalence results organize the analysis.
+
+We will introduce two sets of equivalence results.
 
 Types I and II are observationally equivalent in the strong sense that they have identical preferences over $\mathcal{C}$ (once parameters are mapped appropriately).
 
 Types III and IV are observationally equivalent in a weaker but still useful sense: for the particular endowment process taken as given, they deliver the same worst-case pricing implications as a type II agent (for the $\theta$ that implements the entropy constraint).
 
+(pref_equiv)=
 ## Preferences, distortions, and detection
 
 We now formalize each of the four agent types and develop the equivalence results that connect them.
 
-We begin with the type I (Kreps--Porteus--Epstein--Zin--Tallarini) agent, whose preferences are defined by a recursion over certainty equivalents, then show how a change of variables converts it into a risk-sensitive recursion that is observationally equivalent to the type II agent's max--min problem.
+For each of the four types, we will derive a Bellman equation that characterizes the agent's value function and stochastic discount factor.
+
+The stochastic discount factor of all four types will be in the form of 
+
+$$
+m_{t+1} = \beta \frac{\partial U_{t+1}/\partial c_{t+1}}{\partial U_t/\partial c_t} \hat g_{t+1},
+$$
+
+where $\hat g_{t+1}$ is a likelihood-ratio distortion that we will define in each case.
+
 
 Along the way we introduce the likelihood-ratio distortion that appears in the stochastic discount factor and develop the detection-error probability that will serve as our new calibration language.
 
-### The transformed continuation value
+### Type I: Kreps--Porteus--Epstein--Zin--Tallarini preferences with IES $= 1$
 
-The type I (Kreps--Porteus--Epstein--Zin--Tallarini) recursion with IES $= 1$ and risk-aversion parameter $\gamma$ is
+The general Epstein-Zin-Weil specification aggregates current consumption and a certainty equivalent of future utility using a CES function:
+
+```{math}
+:label: bhs_ez_general
+V_t = \left[(1-\beta)\, C_t^{\,\rho} + \beta\, \mathcal{R}_t(V_{t+1})^{\,\rho}\right]^{1/\rho},
+\qquad
+\rho := 1 - \frac{1}{\psi},
+```
+
+where $\psi > 0$ is the intertemporal elasticity of substitution and the certainty equivalent uses the risk-aversion parameter $\gamma \geq 1$:
+
+```{math}
+:label: bhs_certainty_equiv
+\mathcal{R}_t(V_{t+1})
+=
+\left(E_t\!\left[V_{t+1}^{1-\gamma}\right]\right)^{\!\frac{1}{1-\gamma}}.
+```
+
+Let $\psi = 1$, so $\rho \to 0$.
+
+In this limit the CES aggregator degenerates into a Cobb-Douglas:
+
+$$
+V_t = C_t^{1-\beta} \cdot \mathcal{R}_t(V_{t+1})^{\,\beta}.
+$$
+
+Taking logs and expanding the certainty equivalent {eq}`bhs_certainty_equiv` gives the *type I recursion*:
 
 ```{math}
 :label: bhs_type1_recursion
@@ -554,22 +532,60 @@ For consumption plans in $\mathcal{C}(A, B, H; x_0)$, the recursion {eq}`bhs_ris
 U(x) = c - \beta\theta \log \int \exp\!\left[\frac{-U(Ax + B\varepsilon)}{\theta}\right] \pi(\varepsilon)\,d\varepsilon.
 ```
 
-The stochastic discount factor can then be written as
+#### Deriving the stochastic discount factor
+
+The stochastic discount factor is the intertemporal marginal rate of substitution --- the ratio of marginal utilities of the consumption good at $t+1$ versus $t$.
+
+Since $c_t$ enters {eq}`bhs_risk_sensitive` linearly, $\partial U_t / \partial c_t = 1$.
+
+Converting from log consumption to the consumption good gives $\partial U_t / \partial C_t = 1/C_t$.
+
+A perturbation to $c_{t+1}$ in a particular state affects $U_t$ through the $\log E_t \exp$ term.
+
+Differentiating {eq}`bhs_risk_sensitive`:
+
+$$
+\frac{\partial U_t}{\partial c_{t+1}}
+=
+-\beta\theta
+\frac{\exp(-U_{t+1}/\theta)  (-1/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}
+\underbrace{\frac{\partial U_{t+1}}{\partial c_{t+1}}}_{=\,1}
+=
+\beta \frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}.
+$$
+
+This when converted to the consumption level gives
+$\partial U_t / \partial C_{t+1} = \beta \frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]} \frac{1}{C_{t+1}}$.
+
+Taking the ratio gives the SDF:
 
 ```{math}
 :label: bhs_sdf_Ut
 m_{t+1}
 =
+\frac{\partial U_t / \partial C_{t+1}}{\partial U_t / \partial C_t}
+=
 \beta \frac{C_t}{C_{t+1}}
-\cdot
 \frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}.
 ```
 
-The second factor is the likelihood-ratio distortion $\hat g_{t+1}$: an exponential tilt of the continuation value that shifts probability toward states with low $U_{t+1}$.
+The first factor $\beta\,C_t/C_{t+1}$ is the standard log-utility IMRS.
+
+The second factor is the likelihood-ratio distortion $\hat g_{t+1}$: an exponential tilt that overweights states where the continuation value $U_{t+1}$ is low.
+
 
-### Martingale likelihood ratios
+### Type II: multiplier preferences
+
+Now we move to the type II (multiplier) agent.
+
+Before we write down the preferences, we introduce the machinery of martingale likelihood ratios that will be used to formalize model distortions.
+
+The tools in this section build on {ref}`Likelihood Ratio Processes <likelihood_ratio_process>`, which develops properties of likelihood ratios in detail, and {ref}`Divergence Measures <divergence_measures>`, which covers relative entropy.
+
+
+#### Martingale likelihood ratios
 
-To formalize model distortions, we use a nonnegative martingale $G_t$ with $E(G_t \mid x_0) = 1$ as a Radon--Nikodym derivative.
+Consider a nonnegative martingale $G_t$ with $E(G_t \mid x_0) = 1$ as a Radon--Nikodym derivative.
 
 Its one-step increments
 
@@ -578,7 +594,7 @@ g_{t+1} = \frac{G_{t+1}}{G_t},
 \qquad
 E_t[g_{t+1}] = 1,
 \quad
-g_{t+1} \ge 0,
+g_{t+1} \geq 0,
 \qquad
 G_0 = 1,
 ```
@@ -587,9 +603,8 @@ define distorted conditional expectations: $\tilde E_t[b_{t+1}] = E_t[g_{t+1}\,b
 
 The conditional relative entropy of the distortion is $E_t[g_{t+1}\log g_{t+1}]$, and the discounted entropy over the entire path is $\beta E\bigl[\sum_{t=0}^{\infty} \beta^t G_t\,E_t(g_{t+1}\log g_{t+1})\,\big|\,x_0\bigr]$.
 
-### Type II: multiplier preferences
 
-A type II agent's **multiplier** preference ordering over consumption plans $C^\infty \in \mathcal{C}(A,B,H;x_0)$ is defined by
+A type II agent's *multiplier* preference ordering over consumption plans $C^\infty \in \mathcal{C}(A,B,H;x_0)$ is defined by
 
 ```{math}
 :label: bhs_type2_objective
@@ -599,7 +614,7 @@ A type II agent's **multiplier** preference ordering over consumption plans $C^\
 \,\Big|\, x_0\right\},
 ```
 
-where $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \ge 0$, and $G_0 = 1$.
+where $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \geq 0$, and $G_0 = 1$.
 
 The parameter $\theta > 0$ penalizes the relative entropy of probability distortions.
 
@@ -609,14 +624,14 @@ The value function satisfies the Bellman equation
 :label: bhs_bellman_type2
 W(x)
 =
-c + \min_{g(\varepsilon) \ge 0}\;
+c + \min_{g(\varepsilon) \geq 0}\;
 \beta \int \bigl[g(\varepsilon)\,W(Ax + B\varepsilon)
 + \theta\,g(\varepsilon)\log g(\varepsilon)\bigr]\,\pi(\varepsilon)\,d\varepsilon
 ```
 
 subject to $\int g(\varepsilon)\,\pi(\varepsilon)\,d\varepsilon = 1$.
 
-Note that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty --- this is the key structural feature that makes $\hat g$ a likelihood ratio.
+Inside the integral, $g(\varepsilon)\,W(Ax + B\varepsilon)$ is the continuation value under the distorted model $g\pi$, while $\theta\,g(\varepsilon)\log g(\varepsilon)$ is the entropy penalty that makes large departures from the approximating model $\pi$ costly.
 
 The minimizer is ({ref}`Exercise 4 <dov_ex4>` derives this and verifies the equivalence $W \equiv U$)
 
@@ -627,13 +642,16 @@ The minimizer is ({ref}`Exercise 4 <dov_ex4>` derives this and verifies the equi
 \frac{\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)}{E_t\!\left[\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)\right]}.
 ```
 
+Note that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty --- this is the key structural feature that makes $\hat g$ a likelihood ratio.
+
+
 Substituting {eq}`bhs_ghat` back into {eq}`bhs_bellman_type2` gives
 
 $$W(x) = c - \beta\theta \log \int \exp\!\left[\frac{-W(Ax + B\varepsilon)}{\theta}\right]\pi(\varepsilon)\,d\varepsilon,$$
 
 which is identical to {eq}`bhs_bellman_type1`.
 
-Therefore $W(x) \equiv U(x)$, establishing that **types I and II are observationally equivalent** over elements of $\mathcal{C}(A,B,H;x_0)$.
+Therefore $W(x) \equiv U(x)$, establishing that *types I and II are observationally equivalent* over elements of $\mathcal{C}(A,B,H;x_0)$.
 
 The mapping between parameters is
 
@@ -667,13 +685,43 @@ J(x_0)
 \sum_{t=0}^{\infty} E\!\left[\beta^t G_t\,c_t \,\Big|\, x_0\right]
 ```
 
-subject to $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \ge 0$, $G_0 = 1$, and
+subject to $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \geq 0$, $G_0 = 1$, and
 
 ```{math}
-\beta E\!\left[\sum_{t=0}^{\infty} \beta^t G_t\,E_t\!\left(g_{t+1}\log g_{t+1}\right)\,\Big|\,x_0\right] \le \eta.
+\beta E\!\left[\sum_{t=0}^{\infty} \beta^t G_t\,E_t\!\left(g_{t+1}\log g_{t+1}\right)\,\Big|\,x_0\right] \leq \eta.
 ```
 
-The Lagrange multiplier on the entropy constraint is $\theta$, which connects type III to type II: for the particular $A, B, H$ and $\theta$ used to derive the worst-case joint distribution $\hat\Pi_\infty$, the shadow prices of uncertain claims for a type III agent match those of a type II agent.
+The Lagrangian for the type III problem is
+
+$$
+\mathcal{L}
+=
+\sum_{t=0}^{\infty} E\!\left[\beta^t G_t\,c_t \,\Big|\, x_0\right]
+\;+\;
+\theta\!\left[
+\beta E\!\left(\sum_{t=0}^{\infty} \beta^t G_t\,E_t(g_{t+1}\log g_{t+1})\,\Big|\,x_0\right) - \eta
+\right],
+$$
+
+where $\theta \ge 0$ is the multiplier on the entropy constraint.
+
+Collecting terms inside the expectation gives
+
+$$
+\mathcal{L}
+=
+\sum_{t=0}^{\infty} E\!\left\{\beta^t G_t
+\left[c_t + \beta\theta\,E_t(g_{t+1}\log g_{t+1})\right]
+\,\Big|\, x_0\right\} - \theta\eta,
+$$
+
+which, apart from the constant $-\theta\eta$, has the same structure as the type II objective {eq}`bhs_type2_objective`.
+
+The FOC for $g_{t+1}$ is therefore identical, and the optimal distortion is the same $\hat g_{t+1}$ as in {eq}`bhs_ghat` for the $\theta$ that makes the entropy constraint bind.
+
+The SDF is again $m_{t+1} = \beta(C_t/C_{t+1})\hat g_{t+1}$.
+
+For the particular $A, B, H$ and $\theta$ used to derive the worst-case joint distribution $\hat\Pi_\infty$, the shadow prices of uncertain claims for a type III agent match those of a type II agent.
 
 ### Type IV: ex post Bayesian
 
@@ -683,13 +731,30 @@ Type IV is an ordinary expected-utility agent with log preferences evaluated und
 \hat E_0 \sum_{t=0}^{\infty} \beta^t c_t.
 ```
 
+$\hat E_0$ denotes expectation under the pessimistic model $\hat\Pi_\infty$.
+
 The joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ is the one associated with the type II agent's worst-case distortion.
 
+Under $\hat\Pi_\infty$ the agent has log utility, so the Euler equation for any gross return $R_{t+1}$ is
+
+$$
+1 = \hat E_t\!\left[\beta \frac{C_t}{C_{t+1}} R_{t+1}\right].
+$$
+
+To express this in terms of the approximating model $\Pi_\infty$, apply a change of measure using the one-step likelihood ratio $\hat g_{t+1} = d\hat\Pi / d\Pi$:
+
+$$
+1 = E_t\!\left[\hat g_{t+1} \cdot \beta \frac{C_t}{C_{t+1}} R_{t+1}\right]
+= E_t\!\left[m_{t+1}\, R_{t+1}\right],
+$$
+
+so the effective SDF under the approximating model is $m_{t+1} = \beta(C_t/C_{t+1})\hat g_{t+1}$.
+
 For the particular $A, B, H$ and $\theta$ used to construct $\hat\Pi_\infty$, the type IV value function equals $J(x)$ from type III.
 
 ### Stochastic discount factor
 
-Across all four types, the stochastic discount factor can be written compactly as
+As we have shown in each case of the four types, the stochastic discount factor can be written compactly as
 
 ```{math}
 :label: bhs_sdf
@@ -704,6 +769,39 @@ With log utility, $C_t/C_{t+1} = \exp(-(c_{t+1}-c_t))$ is the usual intertempora
 
 Robustness multiplies that term by $\hat g_{t+1}$, so uncertainty aversion enters pricing only through the distortion.
 
+For constraint preferences, the worst-case distortion is the same as for multiplier preferences with the $\theta$ that makes the entropy constraint bind.
+
+While for the ex post Bayesian, the distortion is a change of measure from the approximating model to the pessimistic model.
+
+### Value function decomposition: $W = J + \theta N$
+
+We can express the type II value function in a revealing way by substituting the minimizing $\hat g$ back into the Bellman equation {eq}`bhs_bellman_type2`:
+
+```{math}
+:label: bhs_W_decomp_bellman
+W(x) = c + \beta \int \bigl[\hat g(\varepsilon)\,W(Ax + B\varepsilon) + \theta\,\hat g(\varepsilon)\log \hat g(\varepsilon)\bigr]\,\pi(\varepsilon)\,d\varepsilon.
+```
+
+Define two components:
+
+```{math}
+:label: bhs_J_recursion
+J(x) = c + \beta \int \hat g(\varepsilon)\,J(Ax + B\varepsilon)\,\pi(\varepsilon)\,d\varepsilon,
+```
+
+```{math}
+:label: bhs_N_recursion
+N(x) = \beta \int \hat g(\varepsilon)\bigl[\log \hat g(\varepsilon) + N(Ax + B\varepsilon)\bigr]\,\pi(\varepsilon)\,d\varepsilon.
+```
+
+Then $W(x) = J(x) + \theta N(x)$.
+
+Here $J(x_t) = \hat E_t \sum_{j=0}^{\infty} \beta^j c_{t+j}$ is expected discounted log consumption under the *worst-case* model --- this is the value function for both the type III and the type IV agent.
+
+And $N(x)$ is discounted continuation entropy: it measures the total information cost of the probability distortion from date $t$ onward.
+
+This decomposition will be important for the welfare calculations in {ref}`the welfare section <welfare_experiments>` below, where it explains why type III uncertainty compensation is twice that of type II.
+
 ### Gaussian mean-shift distortions
 
 Under the random-walk model, the shock is $\varepsilon_{t+1} \sim \mathcal{N}(0, 1)$.
@@ -724,7 +822,8 @@ Hence $\log \hat g_{t+1}$ is normal with mean $-w^2/2$ and variance $w^2$, and
 \operatorname{std}(\hat g_{t+1}) = \sqrt{e^{w^2}-1}.
 ```
 
-For our Gaussian calibrations, the worst-case mean shift is summarized by
+For our Gaussian calibrations, the worst-case mean shifts for 
+the random-walk model and the trend-stationary model are summarized by
 
 ```{math}
 :label: bhs_w_formulas
@@ -744,6 +843,71 @@ def w_from_θ(θ, model):
     raise ValueError("model must be 'rw' or 'ts'")
 ```
 
+(ez_sdf_moments)=
+### SDF moments under Epstein-Zin preferences
+
+We can now derive the closed-form SDF moments used to draw {numref}`fig-bhs-1`.
+
+Under Epstein-Zin preferences with IES $= 1$, the SDF has the form {eq}`bhs_sdf`
+
+$$
+m_{t+1} = \beta \frac{C_t}{C_{t+1}} \cdot \hat{g}_{t+1},
+$$
+
+where $\hat{g}_{t+1}$ is the likelihood-ratio distortion from the continuation value.
+
+For the random-walk model with $c_{t+1} - c_t = \mu + \sigma_\varepsilon \varepsilon_{t+1}$ and $\varepsilon_{t+1} \sim \mathcal{N}(0,1)$, the distortion is a Gaussian mean shift $w = -\sigma_\varepsilon(\gamma - 1)$, and $\log m_{t+1}$ turns out to be normally distributed:
+
+$$
+\log m_{t+1} = \log\beta - \mu - \tfrac{1}{2}w^2 + (w - \sigma_\varepsilon)\varepsilon_{t+1}.
+$$
+
+Its mean and variance are
+
+$$
+E[\log m] = \log\beta - \mu - \tfrac{1}{2}w^2,
+\qquad
+\operatorname{Var}(\log m) = (w - \sigma_\varepsilon)^2 = \sigma_\varepsilon^2 \gamma^2.
+$$
+
+For a lognormal random variable, $E[m] = \exp(E[\log m] + \tfrac{1}{2}\operatorname{Var}(\log m))$ and $\sigma(m)/E[m] = \sqrt{e^{\operatorname{Var}(\log m)} - 1}$.
+
+Substituting gives the following closed-form expressions ({ref}`Exercise 2 <dov_ex2>` asks you to work through the full derivation):
+
+- *Random walk*:
+
+```{math}
+:label: bhs_Em_rw
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
+```
+
+```{math}
+:label: bhs_sigma_rw
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
+```
+
+Notice that in {eq}`bhs_Em_rw`, because $\sigma_\varepsilon$ is small ($\approx 0.005$), the term $\frac{\sigma_\varepsilon^2}{2}(2\gamma-1)$ grows slowly with $\gamma$, keeping $E[m]$ roughly constant near $1/(1+r^f)$.
+
+Meanwhile {eq}`bhs_sigma_rw` shows that $\sigma(m)/E[m] \approx \sigma_\varepsilon \gamma$ grows linearly with $\gamma$.
+
+This is how Epstein-Zin preferences push volatility toward the HJ bound without distorting the risk-free rate.
+
+An analogous calculation for the trend-stationary model yields:
+
+- *Trend stationary*:
+
+```{math}
+:label: bhs_Em_ts
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
+```
+
+```{math}
+:label: bhs_sigma_ts
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
+```
+
+### Market price of model uncertainty
+
 The **market price of model uncertainty** (MPU) is the conditional standard deviation of the distortion:
 
 ```{math}
@@ -791,8 +955,6 @@ def θ_from_detection_probability(p, model):
 
 ### Likelihood-ratio testing and detection errors
 
-The likelihood-ratio machinery used here connects to several other lectures: {ref}`Likelihood Ratio Processes <likelihood_ratio_process>` develops the properties of likelihood ratios in detail, {ref}`Heterogeneous Beliefs and Financial Markets <likelihood_ratio_process_2>` applies them to asset pricing with disagreement, and {ref}`A Problem that Stumped Milton Friedman <wald_friedman>` uses sequential likelihood-ratio tests in a closely related decision problem.
-
 Let $L_T$ be the log likelihood ratio between the worst-case and approximating models based on a sample of length $T$.
 
 Define
@@ -849,6 +1011,36 @@ def η_from_θ(θ, model):
 
 This is the mapping behind the right panel of the detection-probability figure below.
 
+### Closed-form value functions for random-walk consumption
+
+We can now evaluate the value functions $W$, $J$, and $N$ in closed form for the random-walk model.
+
+Substituting $w_{rw}(\theta) = -\sigma_\varepsilon / [(1-\beta)\theta]$ from {eq}`bhs_w_formulas` into the discounted entropy formula gives
+
+```{math}
+:label: bhs_N_rw
+N(x) = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta^2}.
+```
+
+The type II value function {eq}`bhs_W_decomp_bellman` evaluates to
+
+```{math}
+:label: bhs_W_rw
+W(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
+```
+
+Using $W = J + \theta N$, the type III/IV value function is
+
+```{math}
+:label: bhs_J_rw
+J(x_t) = W(x_t) - \theta N(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
+```
+
+Note that $J$ has *twice* the uncertainty correction of $W$: the coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ in $J$ is $1$ versus $\tfrac{1}{2}$ in $W$.
+This is because $W$ includes the entropy "rebate" $\theta N$ that partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model.
+
+This difference propagates directly into the welfare calculations below.
+
 ## A new calibration language: detection-error probabilities
 
 If $\gamma$ should not be calibrated by introspection about atemporal gambles, what replaces it?
@@ -987,6 +1179,7 @@ The model dependence was an artifact of using $\gamma$ as a cross-model yardstic
 
 Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell the same story: a representative consumer with moderate, difficult-to-dismiss fears about model misspecification behaves as if she had very high risk aversion.
 
+(welfare_experiments)=
 ## What do risk premia measure? Two mental experiments
 
 Lucas {cite}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate aggregate fluctuations.
@@ -999,11 +1192,52 @@ Instead of eliminating all randomness, suppose we keep randomness but remove the
 
 How much would she pay for that relief alone?
 
-Formally, define $\Delta c_0$ as a permanent proportional reduction in initial consumption that leaves the agent indifferent between the original environment and a counterfactual in which either (i) risk alone is removed or (ii) model uncertainty is removed.
+Formally, we seek a permanent proportional reduction $c_0 - c_0^J$ in initial log consumption that leaves a type $J$ agent indifferent between the original risky plan and a deterministic certainty equivalent path.
 
-Because utility is log and the consumption process is Gaussian, these compensations are available in closed form ({ref}`Exercise 8 <dov_ex8>` derives them).
+Because utility is log and the consumption process is Gaussian, these compensations are available in closed form.
 
-For type II preferences in the random-walk model, the decomposition is
+### The certainty equivalent path
+
+Our point of comparison is the deterministic path with the same mean level of consumption as the stochastic plan:
+
+```{math}
+:label: bhs_ce_path
+c_{t+1}^{ce} - c_t^{ce} = \mu + \tfrac{1}{2}\sigma_\varepsilon^2.
+```
+
+The extra $\tfrac{1}{2}\sigma_\varepsilon^2$ is a Jensen's inequality correction: $E[C_t] = E[e^{c_t}] = \exp(c_0 + t\mu + \tfrac{1}{2}t\sigma_\varepsilon^2)$, so {eq}`bhs_ce_path` matches the mean *level* of consumption at every date.
+
+### Compensating variations from the value functions
+
+We use the closed-form value functions derived earlier: {eq}`bhs_W_rw` for the type I/II value function $W$ and {eq}`bhs_J_rw` for the type III/IV value function $J$.
+
+For the certainty equivalent path {eq}`bhs_ce_path`, there is no risk and no model uncertainty, so its value starting from $c_0^J$ is
+
+$$
+U^{ce}(c_0^J) = \frac{1}{1-\beta}\!\left[c_0^J + \frac{\beta}{1-\beta}\!\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right].
+$$
+
+### Type I (Epstein-Zin) compensation
+
+Setting $U^{ce}(c_0^I) = W(x_0)$ from {eq}`bhs_W_rw` and solving for $c_0 - c_0^I$:
+
+```{math}
+:label: bhs_comp_type1
+c_0 - c_0^I
+=
+\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}\!\left(1 + \frac{1}{(1-\beta)\theta}\right)
+=
+\frac{\beta\sigma_\varepsilon^2\gamma}{2(1-\beta)},
+```
+
+where the last step uses $\gamma = 1 + [(1-\beta)\theta]^{-1}$.
+
+### Type II (multiplier) decomposition
+
+Because $W \equiv U$, we have $c_0^{II} = c_0^I$ and the total compensation is the same.
+But the interpretation differs: we can now decompose it into **risk** and **model uncertainty** components.
+
+A type II agent with $\theta = \infty$ (no model uncertainty) has log preferences and requires
 
 ```{math}
 :label: bhs_type2_rw_decomp
@@ -1016,20 +1250,50 @@ For type II preferences in the random-walk model, the decomposition is
 \frac{\beta \sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
 ```
 
-For type III preferences in the random-walk model, the uncertainty term is twice as large:
+The risk term $\Delta c_0^{risk}$ is Lucas's cost of business cycles: at postwar consumption volatility ($\sigma_\varepsilon \approx 0.005$), it is tiny.
+
+The uncertainty term $\Delta c_0^{uncertainty}$ is the additional compensation a type II agent requires for facing model misspecification. It can be first order whenever the detection-error probability is moderate, because $\theta$ appears in the denominator.
+
+### Type III (constraint) compensation
+
+For a type III agent, the value function $J$ from {eq}`bhs_J_rw` implies
 
 ```{math}
 :label: bhs_type3_rw_decomp
-\Delta c_0^{uncertainty, III}
+c_0 - c_0^{III}
 =
-\frac{\beta \sigma_\varepsilon^2}{(1-\beta)^2\theta}.
+\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}(2\gamma - 1).
 ```
 
-For the trend-stationary model, denominators replace $(1-\beta)$ with $(1-\beta \rho)$ or $(1-\beta \rho^2)$, but the qualitative message is the same.
+The uncertainty component alone is
+
+$$
+c_0^{III}(r) - c_0^{III}
+=
+\frac{\beta\sigma_\varepsilon^2}{(1-\beta)^2\theta},
+$$
 
-The risk-only term $\Delta c_0^{risk}$ is tiny at postwar consumption volatility --- this is Lucas's well-known result.
+which is *twice* the type II uncertainty compensation {eq}`bhs_type2_rw_decomp`.
+The factor of two traces back to the difference between $W$ and $J$ noted after {eq}`bhs_J_rw`: the entropy rebate $\theta N$ in $W = J + \theta N$ partially offsets the pessimistic tilt for the type II agent, but not for the type III agent who evaluates consumption purely under the worst-case model.
 
-The model-uncertainty term $\Delta c_0^{uncertainty}$ can be first order whenever the detection-error probability is moderate, because $\theta$ appears in the denominator.
+### Type IV (ex post Bayesian) compensation
+
+A type IV agent believes the pessimistic model without doubt, so his perceived drift is $\tilde\mu = \mu - \sigma_\varepsilon^2/[(1-\beta)\theta]$.
+His compensation for moving to the certainty equivalent path is the same as {eq}`bhs_type3_rw_decomp`, because he ranks plans using the same value function $J$.
+
+### Trend-stationary formulas
+
+For the trend-stationary model, the denominators $(1-\beta)$ in the uncertainty terms are replaced by $(1-\beta\rho)$, and the risk terms involve $(1-\beta\rho^2)$:
+
+$$
+\Delta c_0^{risk,\,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)},
+\qquad
+\Delta c_0^{unc,\,ts,\,II} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho)^2\theta},
+\qquad
+\Delta c_0^{unc,\,ts,\,III} = \frac{\beta\sigma_\varepsilon^2}{(1-\beta\rho)^2\theta}.
+$$
+
+The qualitative message is the same: the risk component is negligible, and the model-uncertainty component dominates.
 
 ## Visualizing the welfare decomposition
 
@@ -1110,13 +1374,13 @@ plt.tight_layout()
 plt.show()
 ```
 
-**Left panel.** The small gap between the baseline mean path and the "risk only" certainty equivalent is Lucas's result: at postwar consumption volatility, the welfare gain from eliminating well-understood aggregate risk is tiny.
+The left panel shows the small gap between the baseline mean path and the "risk only" certainty equivalent is Lucas's result: at postwar consumption volatility, the welfare gain from eliminating well-understood aggregate risk is tiny.
 
 The much larger gap between the baseline and the "risk + uncertainty" certainty equivalent is the new object.
 
 Most of that gap is compensation for model uncertainty, not risk.
 
-**Right panel.** The cloud of nearby models shows what the robust consumer guards against.
+The right panel shows the cloud of nearby models shows what the robust consumer guards against.
 
 The red-shaded and green-shaded fans correspond to pessimistic and optimistic mean-shift distortions whose detection-error probability is $p = 0.10$.
 
@@ -1196,8 +1460,6 @@ To put the magnitudes in perspective: Lucas estimated that eliminating all aggre
 
 At detection-error probabilities of 10--20%, the model-uncertainty compensation alone runs to several percent of consumption.
 
-This is the welfare counterpart to the pricing result.
-
 The large risk premia that Tallarini matched with high $\gamma$ are, under the robust reading, compensations for bearing model uncertainty --- and the implied welfare gains from resolving that uncertainty are correspondingly large.
 
 ## Why doesn't learning eliminate these fears?
@@ -1212,19 +1474,23 @@ The figure below makes this concrete.
 
 Consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
 
-We construct real per-capita nondurables-plus-services consumption from three FRED series:
+We construct real per-capita nondurables-plus-services consumption from four FRED series:
 
 | FRED series | Description |
 | --- | --- |
-| `PCNDGC96` | Real PCE: nondurable goods (billions of chained 2017 \$, SAAR) |
-| `PCESVC96` | Real PCE: services (billions of chained 2017 \$, SAAR) |
+| `PCND` | Nominal PCE: nondurable goods (billions of \$, SAAR, quarterly) |
+| `PCESV` | Nominal PCE: services (billions of \$, SAAR, quarterly) |
+| `DPCERD3Q086SBEA` | PCE implicit price deflator (index 2017 $= 100$, quarterly) |
 | `CNP16OV` | Civilian noninstitutional population, 16+ (thousands, monthly) |
 
+We use nominal rather than chained-dollar components because chained-dollar series are not additive: chain-weighted indices update their base-period expenditure weights every period, so components deflated with different price changes do not sum to the separately chained aggregate.  Adding nominal series and deflating the sum with a single price index avoids this problem.
+
 The processing pipeline is:
 
-1. Add real nondurables and services: $C_t^{real} = C_t^{nd} + C_t^{sv}$.
-2. Convert to per-capita: divide by the quarterly average of the monthly population series.
-3. Compute log consumption: $c_t = \log C_t^{real,pc}$.
+1. Add nominal nondurables and services: $C_t^{nom} = C_t^{nd} + C_t^{sv}$.
+2. Deflate by the PCE price index: $C_t^{real} = C_t^{nom} / (P_t / 100)$.
+3. Convert to per-capita: divide by the quarterly average of the monthly population series.
+4. Compute log consumption: $c_t = \log C_t^{real,pc}$.
 
 When we plot *levels* of log consumption, we align the time index to 1948Q1--2006Q4, which yields $T+1 = 236$ quarterly observations.
 
@@ -1241,34 +1507,29 @@ def _read_fred_series(series_id, start_date, end_date):
     return series
 
 
-# Fetch real PCE components and population from FRED
-real_nd = _read_fred_series("PCNDGC96", start_date, end_date)
-real_sv = _read_fred_series("PCESVC96", start_date, end_date)
-pop_m = _read_fred_series("CNP16OV", start_date, end_date)
+# Fetch nominal PCE components, deflator, and population from FRED
+nom_nd = _read_fred_series("PCND", start_date, end_date)        # quarterly, 1947–
+nom_sv = _read_fred_series("PCESV", start_date, end_date)       # quarterly, 1947–
+defl = _read_fred_series("DPCERD3Q086SBEA", start_date, end_date)  # quarterly, 1947–
+pop_m = _read_fred_series("CNP16OV", start_date, end_date)      # monthly, 1948–
 
-# Step 1: aggregate real nondurables + services
-real_total = real_nd + real_sv
+# Step 1: add nominal nondurables + services (nominal $ are additive)
+nom_total = nom_nd + nom_sv
 
-# Step 2: align to quarterly frequency first, then convert to per-capita
-# real_total is in billions ($1e9), pop is in thousands ($1e3)
-# per-capita in millions: real_total * 1e9 / (pop * 1e3) / 1e6 = real_total / pop
-real_total_q = real_total.resample("QS").mean()
+# Step 2: deflate by PCE implicit price deflator (index 2017=100)
+real_total = nom_total / (defl / 100.0)
+
+# Step 3: convert to per-capita (population is monthly, so average to quarterly)
 pop_q = pop_m.resample("QS").mean()
-real_pc = (real_total_q / pop_q).dropna()
+real_pc = (real_total / pop_q).dropna()
 
 # Restrict to sample period 1948Q1–2006Q4
 real_pc = real_pc.loc["1948-01-01":"2006-12-31"].dropna()
 
-# FRED fallback: use BEA per-capita quarterly components directly.
-if real_pc.empty:
-    nd_pc = _read_fred_series("A796RX0Q048SBEA", start_date, end_date)
-    sv_pc = _read_fred_series("A797RX0Q048SBEA", start_date, end_date)
-    real_pc = ((nd_pc + sv_pc) / 1e6).loc["1948-01-01":"2006-12-31"].dropna()
-
 if real_pc.empty:
     raise RuntimeError("FRED returned no usable observations after alignment/filtering")
 
-# Step 3: log consumption
+# Step 4: log consumption
 log_c_data = np.log(real_pc.to_numpy(dtype=float).reshape(-1))
 years_data = (real_pc.index.year + (real_pc.index.month - 1) / 12.0).to_numpy(dtype=float)
 
@@ -1367,23 +1628,17 @@ plt.tight_layout()
 plt.show()
 ```
 
-**Left panel.** Postwar U.S. log consumption is shown alongside two deterministic trend lines: the approximating-model drift $\mu$ and the worst-case drift $\mu + \sigma_\varepsilon w(\theta)$ for $p(\theta^{-1}) = 0.20$.
-
-For comparability with BHS Fig. 6, we estimate intercepts separately for these two fixed slopes.
-
-The plotted consumption series is constructed from FRED data following the processing pipeline described above.
+In the left panel, postwar U.S. log consumption is shown alongside two deterministic trend lines: the approximating-model drift $\mu$ and the worst-case drift $\mu + \sigma_\varepsilon w(\theta)$ for $p(\theta^{-1}) = 0.20$.
 
 The two trends are close enough that, even with decades of data, it is hard to distinguish them by eye.
 
-**Right panel.** As the detection-error probability rises (models become harder to tell apart), the worst-case mean growth rate moves back toward $\hat\mu$.
+In the right panel, as the detection-error probability rises (models become harder to tell apart), the worst-case mean growth rate moves back toward $\hat\mu$.
 
 The dashed gray lines mark a two-standard-error band around the maximum-likelihood estimate of $\mu$.
 
 Even at detection probabilities in the 5--20% range, the worst-case drift remains inside (or very near) this confidence band.
 
-The upshot: drift distortions that are economically large --- large enough to generate substantial model-uncertainty premia --- are statistically small relative to sampling uncertainty in $\hat\mu$.
-
-A dogmatic Bayesian who conditions on a single approximating model and updates using Bayes' law will not learn her way out of this problem in samples of the length available.
+Drift distortions that are economically large --- large enough to generate substantial model-uncertainty premia --- are statistically small relative to sampling uncertainty in $\hat\mu$.
 
 Robustness concerns survive long histories precisely because the low-frequency features that matter most for pricing are the hardest to pin down.
 
@@ -1422,7 +1677,7 @@ Let $m_{t+1}$ be a valid stochastic discount factor satisfying $\mathbb{1} = E[m
 2. For a portfolio with weight vector $\alpha$ and return $R^p = \alpha^\top R$, show that $\operatorname{cov}(m, R^p) = \alpha^\top b$.
 3. Apply the Cauchy--Schwarz inequality to the pair $(m, R^p)$ to obtain $|\alpha^\top b| \leq \sigma(m)\,\sqrt{\alpha^\top \Sigma_R\,\alpha}$.
 4. Maximize the ratio $|\alpha^\top b|/\sqrt{\alpha^\top \Sigma_R\,\alpha}$ over $\alpha$ and show that the maximum is $\sqrt{b^\top \Sigma_R^{-1} b}$, attained at $\alpha^\star = \Sigma_R^{-1}b$.
-5. Conclude that $\sigma(m)/E(m) \ge \sqrt{b^\top \Sigma_R^{-1} b}$, which is {eq}`bhs_hj_unconditional`.
+5. Conclude that $\sigma(m)/E(m) \geq \sqrt{b^\top \Sigma_R^{-1} b}$, which is {eq}`bhs_hj_unconditional`.
 ```
 
 ```{solution-start} dov_ex1
@@ -1624,33 +1879,39 @@ Divide by $(1-\beta)$ and use {eq}`bhs_theta_def`,
 
 $$
 \theta = -\bigl[(1-\beta)(1-\gamma)\bigr]^{-1}.
-
 $$
+
 Then $(1-\gamma)(1-\beta)=-1/\theta$ and $\beta/[(1-\beta)(1-\gamma)]=-\beta\theta$, so
 
 $$
 U_t
 =
 c_t - \beta\theta \log E_t\!\left[\exp\!\left(-\frac{U_{t+1}}{\theta}\right)\right],
-
 $$
+
 which is {eq}`bhs_risk_sensitive`.
 
 For $\theta\to\infty$ (equivalently $\gamma\to 1$), use the expansion
 
 $$
 \exp(-U_{t+1}/\theta)=1-U_{t+1}/\theta+o(1/\theta).
+$$
+
+Taking expectations,
 
 $$
-Then
+E_t[\exp(-U_{t+1}/\theta)] = 1 - E_t[U_{t+1}]/\theta + o(1/\theta).
+$$
+
+Applying $\log(1+x) = x + o(x)$ with $x = -E_t[U_{t+1}]/\theta + o(1/\theta)$,
 
 $$
 \log E_t[\exp(-U_{t+1}/\theta)]
 =
--E_t[U_{t+1}]/\theta+o(1/\theta),
-
+-E_t[U_{t+1}]/\theta + o(1/\theta),
 $$
-so $-\theta\log E_t[\exp(-U_{t+1}/\theta)]\to E_t[U_{t+1}]$ and the recursion converges to
+
+so $-\theta\log E_t[\exp(-U_{t+1}/\theta)] \to E_t[U_{t+1}]$ and the recursion converges to
 
 $$
 U_t = c_t + \beta E_t U_{t+1}.
@@ -1697,8 +1958,8 @@ $$
 \beta\Bigl[W'(\varepsilon) + \theta(1+\log g(\varepsilon))\Bigr]\pi(\varepsilon)
 \;+\;
 \lambda\,\pi(\varepsilon),
-
 $$
+
 so (dividing by $\beta\pi(\varepsilon)$)
 
 $$
@@ -1707,46 +1968,55 @@ $$
 -\frac{W'(\varepsilon)}{\theta} - 1 - \frac{\lambda}{\beta\theta}.
 $$
 
-Exponentiating yields $g(\varepsilon)=K\exp(-W'(\varepsilon)/\theta)$ for a constant $K$.
-Imposing $\int g(\varepsilon)\pi(\varepsilon)d\varepsilon=1$ implies
+Exponentiating yields $g(\varepsilon)=K\exp(-W'(\varepsilon)/\theta)$ where $K = \exp(-1 - \lambda/(\beta\theta))$ is a constant that does not depend on $\varepsilon$.
+
+To pin down $K$, impose the normalization $\int g(\varepsilon)\pi(\varepsilon)\,d\varepsilon=1$:
+
+$$
+1 = K \int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon,
+$$
+
+so
 
 $$
 K^{-1}
 =
-\int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon,
-
+\int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon.
 $$
-and therefore
+
+Substituting $K^{-1}$ into the denominator of $g = K\exp(-W'/\theta)$ gives the minimizer:
 
 $$
-\hat g(\varepsilon)
+g^*(\varepsilon)
 =
-\frac{\exp\!\left(-W(Ax+B\varepsilon)/\theta\right)}
-\int \exp\!\left(-W(Ax+B\tilde\varepsilon)/\theta\right)\pi(\tilde\varepsilon)\,d\tilde\varepsilon,
-
+\frac{\exp\!\left(-W(Ax+B\varepsilon)/\theta\right)}{
+    \int \exp\!\left(-W(Ax+B\tilde\varepsilon)/\theta\right)\pi(\tilde\varepsilon)\,d\tilde\varepsilon}.
 $$
-which is {eq}`bhs_ghat`.
+
+This has exactly the same form as the distortion $\hat g_{t+1} = \exp(-U_{t+1}/\theta)/E_t[\exp(-U_{t+1}/\theta)]$ that appears in the type I SDF {eq}`bhs_sdf_Ut`, with $W$ in place of $U$.
+
+Once we verify below that $W \equiv U$, the minimizer $g^*$ and the SDF distortion $\hat g$ coincide, which is {eq}`bhs_ghat`.
 
 To substitute back, define
 
 $$
 Z(x):=\int \exp(-W(Ax+B\varepsilon)/\theta)\pi(\varepsilon)\,d\varepsilon.
-
 $$
+
 Then $\hat g(\varepsilon)=\exp(-W(Ax+B\varepsilon)/\theta)/Z(x)$ and
 
 $$
 \log\hat g(\varepsilon)=-W(Ax+B\varepsilon)/\theta-\log Z(x).
-
 $$
+
 Hence
 
 $$
 \int \Bigl[\hat g(\varepsilon)W(Ax+B\varepsilon) + \theta \hat g(\varepsilon)\log \hat g(\varepsilon)\Bigr]\pi(\varepsilon)\,d\varepsilon
 =
 -\theta\log Z(x),
-
 $$
+
 because the $W$ terms cancel and $\int \hat g\,\pi = 1$.
 
 Plugging this into {eq}`bhs_bellman_type2` gives
@@ -1757,8 +2027,8 @@ W(x)
 c-\beta\theta\log Z(x)
 =
 c-\beta\theta \log \int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon,
-
 $$
+
 which is {eq}`bhs_bellman_type1`. Therefore $W(x)\equiv U(x)$.
 
 ```{solution-end}
@@ -1771,24 +2041,26 @@ Let $\varepsilon \sim \mathcal{N}(0,1)$ under the approximating model and define
 
 $$
 \hat g(\varepsilon) = \exp\!\left(w\varepsilon - \frac{1}{2}w^2\right)
-
 $$
+
 as in the Gaussian mean-shift section.
 
 1. Show that $E[\hat g(\varepsilon)] = 1$.
+
 2. Show that for any bounded measurable function $f$,
 
 $$
 E[\hat g(\varepsilon) f(\varepsilon)]
-
 $$
+
 equals the expectation of $f$ under $\mathcal{N}(w,1)$.
+
 3. Compute the mean and variance of $\log \hat g(\varepsilon)$ and use these to derive
 
 $$
 \operatorname{std}(\hat g) = \sqrt{e^{w^2}-1}.
-
 $$
+
 4. Compute the conditional relative entropy $E[\hat g\log \hat g]$ and verify that it equals $\tfrac{1}{2}w^2$.
 ```
 
@@ -1809,6 +2081,7 @@ e^{-w^2/2}\,e^{w^2/2}
 $$
 
 2. Let $\varphi(\varepsilon) = (2\pi)^{-1/2}e^{-\varepsilon^2/2}$ be the $\mathcal{N}(0,1)$ density.
+
 Then
 
 $$
@@ -1819,19 +2092,20 @@ $$
 =
 \frac{1}{\sqrt{2\pi}}
 \exp\!\left(-\frac{1}{2}(\varepsilon-w)^2\right),
-
 $$
-which is the $\mathcal{N}(w,1)$ density.
+
+which is the $\mathcal{N}(w,1)$ density
+
 Therefore, for bounded measurable $f$,
 
 $$
 E[\hat g(\varepsilon)f(\varepsilon)]
 =
 \int f(\varepsilon)\,\hat g(\varepsilon)\varphi(\varepsilon)\,d\varepsilon
-=
-E_{\mathcal{N}(w,1)}[f(\varepsilon)].
 $$
 
+equals the expectation of $f$ under $\mathcal{N}(w,1)$.
+
 3. Since $\log \hat g(\varepsilon) = w\varepsilon - \tfrac{1}{2}w^2$ and $\varepsilon\sim\mathcal{N}(0,1)$,
 
 $$
@@ -1841,6 +2115,7 @@ E[\log \hat g] = -\frac{1}{2}w^2,
 
 $$
 Moreover, $\operatorname{Var}(\hat g)=E[\hat g^2]-1$ because $E[\hat g]=1$.
+
 Now
 
 $$
@@ -1938,8 +2213,8 @@ p_A = \Pr_A(L_T<0)
 \Phi\!\left(\frac{0-\frac{1}{2}w^2T}{|w|\sqrt{T}}\right)
 =
 \Phi\!\left(-\frac{|w|\sqrt{T}}{2}\right),
-
 $$
+
 and
 
 $$
@@ -1970,8 +2245,8 @@ Using the formulas for $w(\theta)$ in {eq}`bhs_w_formulas` and the definition of
 
 $$
 \eta = \frac{\beta}{1-\beta}\cdot \frac{w(\theta)^2}{2},
-
 $$
+
 show that holding $\eta$ fixed across the random-walk and trend-stationary consumption specifications implies the mapping {eq}`bhs_theta_cross_model`.
 
 Specialize your result to the case $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$ and interpret the role of $\rho$.
@@ -2002,16 +2277,16 @@ $$
 =
 \left(\frac{\sigma_\varepsilon^{\text{TS}}}{\sigma_\varepsilon^{\text{RW}}}\right)
 \frac{1-\beta}{1-\beta\rho}\,\theta_{\text{RW}},
-
 $$
+
 which is {eq}`bhs_theta_cross_model`.
 
 If $\sigma_\varepsilon^{\text{TS}}=\sigma_\varepsilon^{\text{RW}}$, then
 
 $$
 \theta_{\text{TS}}=\frac{1-\beta}{1-\beta\rho}\theta_{\text{RW}}.
-
 $$
+
 Since $\rho\in(0,1)$ implies $1-\beta\rho < 1-\beta$, the ratio $(1-\beta)/(1-\beta\rho)$ is less than one.
 So to hold entropy fixed, the trend-stationary model requires a smaller $\theta$ (i.e., a cheaper distortion / stronger robustness) than the random-walk model.
 
@@ -2037,8 +2312,8 @@ Write the random walk as
 
 $$
 c_t = c_0 + t\mu + \sigma_\varepsilon\sum_{j=1}^t \varepsilon_j
-
 $$
+
 with $\varepsilon_j\stackrel{iid}{\sim}\mathcal{N}(0,1)$.
 
 **Risk term.**
@@ -2046,8 +2321,8 @@ The mean level of consumption is
 
 $$
 E[C_t]=E[e^{c_t}]=\exp(c_0+t\mu+\tfrac{1}{2}t\sigma_\varepsilon^2),
-
 $$
+
 so the deterministic path with the same mean levels is
 
 $$
@@ -2057,7 +2332,7 @@ $$
 Under expected log utility ($\theta=\infty$), discounted expected utility is
 
 $$
-\sum_{t\ge 0}\beta^t E[c_t]
+\sum_{t\geq 0}\beta^t E[c_t]
 =
 \frac{c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2},
 
@@ -2065,7 +2340,7 @@ $$
 while for the deterministic mean-level path it is
 
 $$
-\sum_{t\ge 0}\beta^t \bar c_t
+\sum_{t\geq 0}\beta^t \bar c_t
 =
 \frac{c_0}{1-\beta} + \frac{\beta(\mu+\tfrac{1}{2}\sigma_\varepsilon^2)}{(1-\beta)^2}.
 $$
@@ -2094,11 +2369,11 @@ Plugging this into the type II objective (and using $E_t[g\log g]=\tfrac{1}{2}w^
 $$
 J(w)
 =
-\sum_{t\ge 0}\beta^t\Bigl(c_0+t(\mu+\sigma_\varepsilon w)\Bigr)
+\sum_{t\geq 0}\beta^t\Bigl(c_0+t(\mu+\sigma_\varepsilon w)\Bigr)
 \;+\;
-\sum_{t\ge 0}\beta^{t+1}\theta\cdot\frac{w^2}{2}.
-
+\sum_{t\geq 0}\beta^{t+1}\theta\cdot\frac{w^2}{2}.
 $$
+
 Using $\sum_{t\ge0}\beta^t=1/(1-\beta)$ and $\sum_{t\ge0}t\beta^t=\beta/(1-\beta)^2$,
 
 $$
@@ -2121,8 +2396,8 @@ $$
 \frac{\beta\theta}{1-\beta}\,w
 \quad\Rightarrow\quad
 w^*=-\frac{\sigma_\varepsilon}{(1-\beta)\theta},
-
 $$
+
 which matches {eq}`bhs_w_formulas`.
 
 Substituting $w^*$ back in gives
@@ -2141,14 +2416,14 @@ Thus the utility gain from removing model uncertainty at fixed $(\mu,\sigma_\var
 
 $$
 \beta\sigma_\varepsilon^2/[2(1-\beta)^3\theta].
-
 $$
+
 To offset this by a permanent upward shift in initial log consumption, we need
 
 $$
 \Delta c_0^{uncertainty}/(1-\beta)=\beta\sigma_\varepsilon^2/[2(1-\beta)^3\theta],
-
 $$
+
 so
 
 $$

From 0348f7adb79fafa84377372d86246b9928d33ca4 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 00:18:12 +1100
Subject: [PATCH 21/37] updates

---
 lectures/doubts_or_variability.md | 107 ++++++++++++++++++++++++++----
 1 file changed, 95 insertions(+), 12 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 7c64a07d8..17129d9b0 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -1037,6 +1037,7 @@ J(x_t) = W(x_t) - \theta N(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-
 ```
 
 Note that $J$ has *twice* the uncertainty correction of $W$: the coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ in $J$ is $1$ versus $\tfrac{1}{2}$ in $W$.
+
 This is because $W$ includes the entropy "rebate" $\theta N$ that partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model.
 
 This difference propagates directly into the welfare calculations below.
@@ -1047,7 +1048,7 @@ If $\gamma$ should not be calibrated by introspection about atemporal gambles, w
 
 The answer is a statistical test.
 
-Fix a sample size $T$ (here 235 quarters, matching the postwar U.S. data).
+Fix a sample size $T$ (here 235 quarters, matching the postwar US data used in the paper).
 
 For a given $\theta$, compute the worst-case model and ask: if a Bayesian ran a likelihood-ratio test to distinguish the approximating model from the worst-case model, what fraction of the time would she make an error?
 
@@ -1122,7 +1123,7 @@ At our calibration $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{R
 
 Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ is much less than one, so holding entropy fixed requires a substantially smaller $\theta$ (stronger robustness) for the trend-stationary model than for the random walk.
 
-## The punchline: detection probabilities unify the two models
+## Detection probabilities unify the two models
 
 We can now redraw Tallarini's figure using the new language.
 
@@ -1182,9 +1183,9 @@ Once we measure robustness concerns in units of statistical detectability, the t
 (welfare_experiments)=
 ## What do risk premia measure? Two mental experiments
 
-Lucas {cite}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate aggregate fluctuations.
+{cite:t}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate aggregate fluctuations.
 
-His answer --- very little --- rested on the assumption that the consumer knows the data-generating process.
+His answer rested on the assumption that the consumer knows the data-generating process.
 
 The robust reinterpretation introduces a second, distinct mental experiment.
 
@@ -1281,6 +1282,85 @@ The factor of two traces back to the difference between $W$ and $J$ noted after
 A type IV agent believes the pessimistic model without doubt, so his perceived drift is $\tilde\mu = \mu - \sigma_\varepsilon^2/[(1-\beta)\theta]$.
 His compensation for moving to the certainty equivalent path is the same as {eq}`bhs_type3_rw_decomp`, because he ranks plans using the same value function $J$.
 
+### Comparison with a risky but free-of-model-uncertainty path
+
+The certainty equivalents above compared a risky plan to a deterministic path, eliminating both risk and uncertainty simultaneously.
+
+We now describe an alternative measure that isolates compensation for model uncertainty by keeping risk intact.
+
+We compare two situations whose risky consumptions for all dates $t \geq 1$ are identical.
+
+All compensation for model uncertainty is concentrated in an adjustment to date-zero consumption alone.
+
+Specifically, we seek $c_0^{II}(u)$ that makes a type II agent indifferent between:
+
+1. Facing the stochastic plan under $\theta < \infty$ (fear of model misspecification), consuming $c_0$ at date zero.
+2. Facing the **same** stochastic plan under $\theta = \infty$ (no fear of misspecification), but consuming only $c_0^{II}(u) < c_0$ at date zero.
+
+In both cases, continuation consumptions $c_t$ for $t \geq 1$ are generated by the random walk starting from the **same** $c_0$.
+
+For the type II agent under $\theta < \infty$, the total value is $W(c_0)$ from {eq}`bhs_W_rw`.
+
+For the agent liberated from model uncertainty ($\theta = \infty$), the value is
+
+$$
+c_0^{II}(u) + \beta\,E\!\left[V^{\log}(c_1)\right],
+$$
+
+where $V^{\log}(c_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta\mu}{1-\beta}\right]$ is the log-utility value function and $c_1 = c_0 + \mu + \sigma_\varepsilon \varepsilon_1$.
+
+Since $c_1$ is built from $c_0$ (not $c_0^{II}(u)$), the continuation is
+
+$$
+\beta\,E\!\left[V^{\log}(c_1)\right]
+= \frac{\beta}{1-\beta}\!\left[c_0 + \frac{\mu}{1-\beta}\right]
+= \frac{\beta c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2}.
+$$
+
+Setting $W(c_0)$ equal to the liberation value and simplifying:
+
+$$
+\frac{c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta}
+=
+c_0^{II}(u) + \frac{\beta c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2}.
+$$
+
+Because $\frac{c_0}{1-\beta} - \frac{\beta c_0}{1-\beta} = c_0$, solving for the compensation gives
+
+```{math}
+:label: bhs_comp_type2u
+c_0 - c_0^{II}(u) = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta} = \frac{\beta\sigma_\varepsilon^2(\gamma - 1)}{2(1-\beta)^2}.
+```
+
+This is $\frac{1}{1-\beta}$ times the uncertainty compensation $\Delta c_0^{\text{uncertainty}}$ from {eq}`bhs_type2_rw_decomp`.
+
+The multiplicative factor $\frac{1}{1-\beta}$ arises because all compensation is concentrated in a single period: adjusting $c_0$ alone must offset the cumulative loss in continuation value that the uncertainty penalty imposes in every future period.
+
+An analogous calculation for a **type III** agent, using $J(c_0)$ from {eq}`bhs_J_rw`, gives
+
+```{math}
+:label: bhs_comp_type3u
+c_0 - c_0^{III}(u) = \frac{\beta\sigma_\varepsilon^2}{(1-\beta)^3\theta} = \frac{\beta\sigma_\varepsilon^2(\gamma - 1)}{(1-\beta)^2},
+```
+
+which is $\frac{1}{1-\beta}$ times the type III uncertainty compensation and **twice** the type II compensation {eq}`bhs_comp_type2u`, again reflecting the absence of the entropy rebate in $J$.
+
+### Summary of welfare compensations (random walk)
+
+The following table collects all compensating variations for the random walk model.
+
+| Agent | Compensation | Formula | Measures |
+|:------|:-------------|:--------|:---------|
+| I, II | $c_0 - c_0^{II}$ | $\frac{\beta\sigma_\varepsilon^2\gamma}{2(1-\beta)}$ | risk + uncertainty (vs. deterministic) |
+| II | $c_0 - c_0^{II}(r)$ | $\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}$ | risk only (vs. deterministic) |
+| II | $c_0^{II}(r) - c_0^{II}$ | $\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}$ | uncertainty only (vs. deterministic) |
+| II | $c_0 - c_0^{II}(u)$ | $\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta}$ | uncertainty only (vs. risky path) |
+| III | $c_0 - c_0^{III}$ | $\frac{\beta\sigma_\varepsilon^2(2\gamma-1)}{2(1-\beta)}$ | risk + uncertainty (vs. deterministic) |
+| III | $c_0^{III}(r) - c_0^{III}$ | $\frac{\beta\sigma_\varepsilon^2}{(1-\beta)^2\theta}$ | uncertainty only (vs. deterministic) |
+| III | $c_0 - c_0^{III}(u)$ | $\frac{\beta\sigma_\varepsilon^2}{(1-\beta)^3\theta}$ | uncertainty only (vs. risky path) |
+
+The "vs. deterministic" rows use the certainty-equivalent path {eq}`bhs_ce_path` as a benchmark; the "vs. risky path" rows use the risky-but-uncertainty-free comparison of {eq}`bhs_comp_type2u`--{eq}`bhs_comp_type3u`.
+
 ### Trend-stationary formulas
 
 For the trend-stationary model, the denominators $(1-\beta)$ in the uncertainty terms are replaced by $(1-\beta\rho)$, and the risk terms involve $(1-\beta\rho^2)$:
@@ -1374,19 +1454,22 @@ plt.tight_layout()
 plt.show()
 ```
 
-The left panel shows the small gap between the baseline mean path and the "risk only" certainty equivalent is Lucas's result: at postwar consumption volatility, the welfare gain from eliminating well-understood aggregate risk is tiny.
-
-The much larger gap between the baseline and the "risk + uncertainty" certainty equivalent is the new object.
+The left panel illustrates our elimination of model uncertainty and risk experiment for a type II agent.
 
-Most of that gap is compensation for model uncertainty, not risk.
+The grey fan shows a one-standard-deviation band for the $j$-step-ahead conditional distribution of $c_t$ under the calibrated random walk model.
 
-The right panel shows the cloud of nearby models shows what the robust consumer guards against.
+The dash-dot line $c^{II}$ shows the certainty equivalent path whose date-zero consumption is reduced by $c_0 - c_0^{II}$, making the type II agent indifferent between this deterministic trajectory and the stochastic plan --- it compensates for bearing both risk and model ambiguity.
 
-The red-shaded and green-shaded fans correspond to pessimistic and optimistic mean-shift distortions whose detection-error probability is $p = 0.10$.
+The solid line $c^r$ shows the certainty equivalent for a type II agent without model uncertainty ($\theta = \infty$), initialized at $c_0 - c_0^{II}(r)$.
+At postwar calibrated values this gap is tiny, so $c^r$ sits just below the centre of the fan.
 
-These models are statistically close to the baseline (blue) but imply very different long-run consumption levels.
+Consistent with {cite:t}`Lucas_2003`, the welfare gains from eliminating well-understood risk are very small.
+We reinterpret the large welfare gains found by {cite:t}`Tall2000` as coming not from reducing risk, but from reducing model uncertainty.
 
-The consumer's caution against such alternatives is what drives the large certainty-equivalent gap in the left panel.
+The right panel shows the cloud of nearby models that the robust consumer guards against.
+Each grey fan depicts a one-standard-deviation band for a different model in the ambiguity set.
+The models are statistically close to the baseline --- their detection-error probability is $p = 0.10$ --- but imply very different long-run consumption levels.
+The consumer's caution against such alternatives drives the large certainty-equivalent gap in the left panel.
 
 ## How large are the welfare gains from resolving model uncertainty?
 

From e3c3600b9a2204ce281b3072be3fb05d3807cdcc Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 14:50:24 +1100
Subject: [PATCH 22/37] updates

---
 lectures/_static/quant-econ.bib   |   7 +
 lectures/doubts_or_variability.md | 920 +++++++++++++++++++++---------
 2 files changed, 658 insertions(+), 269 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 413758676..5a2e26ca0 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -3,6 +3,13 @@
 Note: Extended Information (like abstracts, doi, url's etc.) can be found in quant-econ-extendedinfo.bib file in _static/
 ###
 
+@book{Sargent_Stachurski_2025, 
+  place={Cambridge}, 
+  title={Dynamic Programming: Finite States}, 
+  publisher={Cambridge University Press}, 
+  author={Sargent, Thomas J and Stachurski, John}, 
+  year={2025}
+}
 
 @incollection{slutsky:1927,
  address = {Moscow},
diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 17129d9b0..602571dad 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -29,7 +29,9 @@ kernelspec:
 > *No one has found risk aversion parameters of 50 or 100 in the diversification of
 > individual portfolios, in the level of insurance deductibles, in the wage premiums
 > associated with occupations with high earnings risk, or in the revenues raised by
-> state-operated lotteries.* -- Robert Lucas Jr., January 10, 2003
+> state-operated lotteries. It
+> would be good to have the equity premium resolved, but I think we need to look beyond high
+> estimates of risk aversion to do it.* -- Robert Lucas Jr., January 10, 2003
 
 ## Overview
 
@@ -41,32 +43,31 @@ But matching required setting the risk-aversion coefficient $\gamma$ to around 5
 
 Their answer --- and the theme of this lecture --- is that much of what looks like "risk aversion" can be reinterpreted as **model uncertainty**.
 
-The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max–min recursion in which the agent has unit risk aversion but fears that the probability model governing consumption growth may be wrong.
+The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max–min recursion in which the agent fears that the probability model governing consumption growth may be wrong.
 
 Under this reading, the parameter that looked like extreme risk aversion instead measures concern about **misspecification**.
 
-Rather than calibrating $\gamma$ through Pratt-style thought experiments about known gambles, we calibrate through a **detection-error probability**: the probability of confusing the agent's baseline (approximating) model with the pessimistic (worst-case) model after seeing a finite sample.
+They show that modest amounts of model uncertainty can substitute for large amounts of risk aversion
+in terms of choices and effects on asset prices.
 
-When detection-error probabilities are moderate, the implied $\gamma$ values are large enough to reach the Hansen--Jagannathan volatility bound.
+This reinterpretation changes the welfare question that asset prices answer: do large risk premia measure the benefits from reducing well-understood aggregate fluctuations, or the benefits from reducing doubts about the underlying model?
 
-This reinterpretation changes the welfare question that asset prices answer: do large risk premia measure the benefits from reducing well-understood aggregate fluctuations, or the benefits from reducing doubts about the consumption-growth model?
+We start with the Hansen--Jagannathan bound, then specify the statistical environment, lay out four related preference specifications and their relationships, and finally revisit Tallarini's calibration using detection-error probabilities.
 
-We start with the Hansen--Jagannathan bound, then specify the statistical environment, lay out four related preference specifications and their equivalences, and finally revisit Tallarini's calibration using detection-error probabilities.
-
-This lecture draws on the ideas and techniques appeared in
+This lecture draws on ideas and techniques that appear in
 
 - {ref}`Asset Pricing: Finite State Models <mass>` where we introduce stochastic discount factors.
 - {ref}`Likelihood Ratio Processes <likelihood_ratio_process>` where we develop the likelihood-ratio machinery that reappears here as the worst-case distortion $\hat g$.
 
 
-Before we start, we install a package that is not included in Anaconda by default
+In addition to what's in Anaconda, this lecture will need the following libraries:
 
 ```{code-cell} ipython3
 :tags: [hide-output]
 !pip install pandas-datareader
 ```
 
-We use the following imports for the rest of this lecture
+We use the following imports:
 
 ```{code-cell} ipython3
 import datetime as dt
@@ -75,9 +76,10 @@ import pandas as pd
 import matplotlib.pyplot as plt
 from pandas_datareader import data as web
 from scipy.stats import norm
+from scipy.optimize import brentq
 ```
 
-We also set up calibration inputs and compute the covariance matrix of equity and risk-free returns from reported moments
+We also set up calibration inputs and compute the covariance matrix of equity and risk-free returns from reported moments.
 
 ```{code-cell} ipython3
 β = 0.995
@@ -107,9 +109,9 @@ cov_erf = (r_e_std**2 + r_f_std**2 - r_excess_std**2) / 2.0
 
 ### Pricing kernel and the risk-free rate
 
-In this section, we review a few key concepts appeared in {ref}`Asset Pricing: Finite State Models <mass>`.
+In this section, we review a few key concepts from {ref}`Asset Pricing: Finite State Models <mass>`.
 
-A random variable $m_{t+1}$ is said to be a **stochastic discount factor** if it confirms the following equation for the time-$t$ price $p_t$ of a one-period payoff $y_{t+1}$:
+A random variable $m_{t+1}$ is said to be a **stochastic discount factor** if it satisfies the following equation for the time-$t$ price $p_t$ of a one-period payoff $y_{t+1}$:
 
 ```{math}
 :label: bhs_pricing_eq
@@ -125,7 +127,7 @@ For time-separable CRRA preferences with discount factor $\beta$ and coefficient
 m_{t+1} = \beta \left(\frac{C_{t+1}}{C_t}\right)^{-\gamma},
 ```
 
-where $C_t$ is consumption.
+where $C_t$ is consumption at time $t$.
 
 Setting $y_{t+1} = 1$ (a risk-free bond) in {eq}`bhs_pricing_eq` yields the reciprocal of the gross one-period risk-free rate:
 
@@ -161,6 +163,7 @@ E_t[m_{t+1}\,\xi_{t+1}]
 $$
 
 where $\operatorname{cov}_t$ denotes the conditional covariance and $\sigma_t$ will denote the conditional standard deviation.
+
 Setting the left-hand side to zero and solving for the expected excess return gives
 
 $$
@@ -185,6 +188,7 @@ The bound says that the Sharpe ratio of any asset cannot exceed the market price
 #### Unconditional version
 
 The bound {eq}`bhs_hj_bound` is stated in conditional terms.
+
 An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$ 
 
 ```{math}
@@ -193,9 +197,13 @@ An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (
 \;\geq\;
 \sqrt{b^\top \Sigma_R^{-1} b},
 \qquad
-b = \mathbb{1} - E(m)\, E(R).
+b = \mathbf{1} - E(m)\, E(R).
 ```
 
+In {ref}`Exercise 1 <dov_ex1>`, we will revisit and verify this unconditional version of the HJ bound.
+
+Below we implement a function that computes the right-hand side of {eq}`bhs_hj_unconditional` for any given value of $E(m)$
+
 ```{code-cell} ipython3
 def hj_std_bound(E_m):
     b = np.ones(2) - E_m * R_mean
@@ -203,7 +211,6 @@ def hj_std_bound(E_m):
     return np.sqrt(np.maximum(var_lb, 0.0))
 ```
 
-In {ref}`Exercise 1 <dov_ex1>`, we will revisit and verify this unconditional version of the HJ bound.
 
 ### The puzzle
 
@@ -217,13 +224,13 @@ This is the **risk-free rate puzzle** of {cite:t}`Weil_1989`.
 
 {cite:t}`Tall2000` showed that recursive preferences with IES $= 1$ can clear the HJ bar while avoiding the risk-free rate puzzle.
 
-### Epstein-Zin SDF moments
-
 The figure below reproduces Tallarini's key diagnostic.
 
-We derive closed-form expressions for the Epstein-Zin SDF moments --- equations {eq}`bhs_Em_rw`--{eq}`bhs_sigma_ts` --- later in {ref}`ez_sdf_moments`, after developing the Epstein-Zin recursion and the Gaussian mean-shift distortion.
+We present this figure before developing the underlying theory because it motivates much of the subsequent analysis.
+
+The closed-form expressions for the Epstein--Zin SDF moments used in the plot are derived in {ref}`Exercise 2 <dov_ex2>`.
 
-The code below implements those expressions (and the corresponding CRRA moments). 
+The code below implements those expressions and the corresponding CRRA moments
 
 ```{code-cell} ipython3
 def moments_type1_rw(γ):
@@ -255,7 +262,7 @@ def moments_crra_rw(γ):
     return E_m, mpr
 ```
 
-For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m)/E(m))$ pair for three specifications: time-separable CRRA (crosses), Epstein-Zin preferences with random-walk consumption (circles), and Epstein-Zin preferences with trend-stationary consumption (pluses)
+For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m))$ pair for three specifications: time-separable CRRA (crosses), Epstein--Zin preferences with random-walk consumption (circles), and Epstein--Zin preferences with trend-stationary consumption (pluses).
 
 
 ```{code-cell} ipython3
@@ -265,32 +272,32 @@ mystnb:
     caption: SDF moments and Hansen-Jagannathan bound
     name: fig-bhs-1
 ---
-γ_grid = np.array([1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50], dtype=float)
+γ_grid = np.arange(1, 51, 5)
 
 Em_rw = np.array([moments_type1_rw(γ)[0] for γ in γ_grid])
-MPR_rw = np.array([moments_type1_rw(γ)[1] for γ in γ_grid])
+σ_m_rw = np.array([moments_type1_rw(γ)[0] * moments_type1_rw(γ)[1] for γ in γ_grid])
 
 Em_ts = np.array([moments_type1_ts(γ)[0] for γ in γ_grid])
-MPR_ts = np.array([moments_type1_ts(γ)[1] for γ in γ_grid])
+σ_m_ts = np.array([moments_type1_ts(γ)[0] * moments_type1_ts(γ)[1] for γ in γ_grid])
 
 Em_crra = np.array([moments_crra_rw(γ)[0] for γ in γ_grid])
-MPR_crra = np.array([moments_crra_rw(γ)[1] for γ in γ_grid])
+σ_m_crra = np.array([moments_crra_rw(γ)[0] * moments_crra_rw(γ)[1] for γ in γ_grid])
 
 Em_grid = np.linspace(0.8, 1.01, 1000)
 HJ_std = np.array([hj_std_bound(x) for x in Em_grid])
 
 fig, ax = plt.subplots(figsize=(7, 5))
-ax.plot(Em_grid, HJ_std, lw=2, color="black", 
+ax.plot(Em_grid, HJ_std, lw=2, color="black",
                             label="Hansen-Jagannathan bound")
-ax.plot(Em_rw, MPR_rw, "o", lw=2,
+ax.plot(Em_rw, σ_m_rw, "o", lw=2,
                             label="Epstein-Zin, random walk")
-ax.plot(Em_ts, MPR_ts, "+", lw=2,
+ax.plot(Em_ts, σ_m_ts, "+", lw=2,
                             label="Epstein-Zin, trend stationary")
-ax.plot(Em_crra, MPR_crra, "x", lw=2, 
+ax.plot(Em_crra, σ_m_crra, "x", lw=2,
                             label="time-separable CRRA")
 
 ax.set_xlabel(r"$E(m)$")
-ax.set_ylabel(r"$\sigma(m)/E(m)$")
+ax.set_ylabel(r"$\sigma(m)$")
 ax.legend(frameon=False)
 ax.set_xlim(0.8, 1.01)
 ax.set_ylim(0.0, 0.42)
@@ -299,7 +306,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-The crosses trace the familiar CRRA failure: as $\gamma$ rises, $\sigma(m)/E(m)$ grows but $E(m)$ falls well below the range consistent with the observed risk-free rate.
+The crosses show that as $\gamma$ rises, $\sigma(m)/E(m)$ grows but $E(m)$ falls well below the range consistent with the observed risk-free rate.
 
 This is the risk-free-rate puzzle of {cite:t}`Weil_1989`.
 
@@ -309,18 +316,17 @@ Recursive utility with IES $= 1$ pushes volatility upward while keeping $E(m)$ r
 
 For the random-walk model, the bound is reached around $\gamma = 50$; for the trend-stationary model, around $\gamma = 75$.
 
-The quantitative achievement is real.
+The quantitative achievement is significant, but Lucas's challenge remains: what microeconomic evidence supports $\gamma = 50$?
 
-But Lucas's challenge still stands: what microeconomic evidence supports $\gamma = 50$?
+{cite:t}`BHS_2009` argue that the large $\gamma$ values are not really about risk aversion, but instead reflect the agent's doubts about the underlying probability model.
 
 ## The choice setting
 
-To make the answer to this question precise, we now lay out the statistical environment and the preference specifications.
-
+To develop this reinterpretation, we first need to formalize the setting we are working in.
 
 ### Shocks and consumption plans
 
-We cast the analysis in terms of a general class of consumption plans.
+We formulate the analysis in terms of a general class of consumption plans.
 
 Let $x_t$ be an $n \times 1$ state vector and $\varepsilon_{t+1}$ an $m \times 1$ shock.
 
@@ -335,7 +341,7 @@ c_t = H x_t,
 
 where the eigenvalues of $A$ are bounded in modulus by $1/\sqrt{\beta}$.
 
-The time-$t$ element of a consumption plan can therefore be written as
+The time-$t$ consumption can therefore be written as
 
 ```{math}
 c_t = H\!\left(B\varepsilon_t + AB\varepsilon_{t-1} + \cdots + A^{t-1}B\varepsilon_1\right) + HA^t x_0.
@@ -343,13 +349,13 @@ c_t = H\!\left(B\varepsilon_t + AB\varepsilon_{t-1} + \cdots + A^{t-1}B\varepsil
 
 The equivalence theorems and Bellman equations below hold for arbitrary plans in $\mathcal{C}(A,B,H;x_0)$.
 
-The random-walk and trend-stationary models below are two special cases.
+The random-walk and trend-stationary models below are two special cases we focus on.
 
 ### Consumption dynamics
 
 Let $c_t = \log C_t$ be log consumption.
 
-The *random-walk* specification is
+The *geometric-random-walk* specification is
 
 ```{math}
 c_{t+1} = c_t + \mu + \sigma_\varepsilon \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \sim \mathcal{N}(0, 1).
@@ -363,7 +369,7 @@ c_t = c_0 + t\mu + \sigma_\varepsilon(\varepsilon_t + \varepsilon_{t-1} + \cdots
 t \geq 1.
 ```
 
-The *trend-stationary* specification can be written as a deterministic trend plus a stationary AR(1) component:
+The *geometric-trend-stationary* specification can be written as a deterministic trend plus a stationary AR(1) component:
 
 ```{math}
 c_t = \zeta + \mu t + z_t,
@@ -373,7 +379,7 @@ z_{t+1} = \rho z_t + \sigma_\varepsilon \varepsilon_{t+1},
 \varepsilon_{t+1} \sim \mathcal{N}(0, 1).
 ```
 
-With $z_0 = c_0 - \zeta$, this implies the explicit representation
+With $z_0 = c_0 - \zeta$, this implies the representation
 
 ```{math}
 c_t
@@ -393,6 +399,8 @@ Equivalently, defining the detrended series $\tilde c_t := c_t - \mu t$,
 
 The estimated parameters are $(\mu, \sigma_\varepsilon)$ for the random walk and $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the trend-stationary case.
 
+Below we record these parameters and moments in the paper's tables for later reference
+
 ```{code-cell} ipython3
 print("Table 2 parameters")
 print(f"random walk: μ={rw['μ']:.5f}, σ_ε={rw['σ_ε']:.5f}")
@@ -407,31 +415,35 @@ print(f"E[r_f]={r_f_mean:.4f}, std[r_f]={r_f_std:.4f}")
 print(f"std[r_e-r_f]={r_excess_std:.4f}")
 ```
 
+(pref_equiv)=
+## Preferences, distortions, and detection
+
+
 ### Overview of agents I, II, III, and IV
 
 We compare four preference specifications over consumption plans $C^\infty \in \mathcal{C}$.
 
-**Type I agent (Kreps--Porteus--Epstein--Zin--Tallarini)** with
+*Type I agent (Kreps--Porteus--Epstein--Zin--Tallarini)* with
 - a discount factor $\beta \in (0,1)$;
 - an intertemporal elasticity of substitution fixed at $1$;
 - a risk-aversion parameter $\gamma \geq 1$; and
 - an approximating conditional density $\pi(\cdot)$ for shocks and its implied joint distribution $\Pi_\infty(\cdot \mid x_0)$.
 
-**Type II agent (multiplier preferences)** with
+*Type II agent (multiplier preferences)* with
 - $\beta \in (0,1)$;
 - IES $=1$;
 - unit risk aversion;
 - an approximating model $\Pi_\infty(\cdot \mid x_0)$; and
 - a penalty parameter $\theta > 0$ that discourages probability distortions using relative entropy.
 
-**Type III agent (constraint preferences)** with
+*Type III agent (constraint preferences)* with
 - $\beta \in (0,1)$;
 - IES $=1$;
 - unit risk aversion;
 - an approximating model $\Pi_\infty(\cdot \mid x_0)$; and
 - a bound $\eta$ on discounted relative entropy.
 
-**Type IV agent (pessimistic ex post Bayesian)** with
+*Type IV agent (pessimistic ex post Bayesian)* with
 - $\beta \in (0,1)$;
 - IES $=1$;
 - unit risk aversion; and
@@ -440,18 +452,15 @@ We compare four preference specifications over consumption plans $C^\infty \in \
 
 We will introduce two sets of equivalence results.
 
-Types I and II are observationally equivalent in the strong sense that they have identical preferences over $\mathcal{C}$ (once parameters are mapped appropriately).
+Types I and II are observationally equivalent in the strong sense that they have identical preferences over $\mathcal{C}$.
 
-Types III and IV are observationally equivalent in a weaker but still useful sense: for the particular endowment process taken as given, they deliver the same worst-case pricing implications as a type II agent (for the $\theta$ that implements the entropy constraint).
-
-(pref_equiv)=
-## Preferences, distortions, and detection
+Types III and IV are observationally equivalent in a weaker but still useful sense: for the particular endowment process taken as given, they deliver the same worst-case pricing implications as a type II agent.
 
 We now formalize each of the four agent types and develop the equivalence results that connect them.
 
 For each of the four types, we will derive a Bellman equation that characterizes the agent's value function and stochastic discount factor.
 
-The stochastic discount factor of all four types will be in the form of 
+The stochastic discount factor for all four types takes the form
 
 $$
 m_{t+1} = \beta \frac{\partial U_{t+1}/\partial c_{t+1}}{\partial U_t/\partial c_t} \hat g_{t+1},
@@ -460,11 +469,11 @@ $$
 where $\hat g_{t+1}$ is a likelihood-ratio distortion that we will define in each case.
 
 
-Along the way we introduce the likelihood-ratio distortion that appears in the stochastic discount factor and develop the detection-error probability that will serve as our new calibration language.
+Along the way, we introduce the likelihood-ratio distortion that appears in the stochastic discount factor and develop the detection-error probability that serves as our new calibration device.
 
-### Type I: Kreps--Porteus--Epstein--Zin--Tallarini preferences with IES $= 1$
+### Type I: Kreps--Porteus--Epstein--Zin--Tallarini preferences
 
-The general Epstein-Zin-Weil specification aggregates current consumption and a certainty equivalent of future utility using a CES function:
+The general Epstein--Zin--Weil specification aggregates current consumption and a certainty equivalent of future utility using a CES function:
 
 ```{math}
 :label: bhs_ez_general
@@ -482,9 +491,14 @@ where $\psi > 0$ is the intertemporal elasticity of substitution and the certain
 \left(E_t\!\left[V_{t+1}^{1-\gamma}\right]\right)^{\!\frac{1}{1-\gamma}}.
 ```
 
+```{note}
+For readers interested in a general class of aggregatiors and certainty equivalents, see Section 
+7.3 of {cite:t}`Sargent_Stachurski_2025`.
+```
+
 Let $\psi = 1$, so $\rho \to 0$.
 
-In this limit the CES aggregator degenerates into a Cobb-Douglas:
+In this limit the CES aggregator reduces to
 
 $$
 V_t = C_t^{1-\beta} \cdot \mathcal{R}_t(V_{t+1})^{\,\beta}.
@@ -534,7 +548,7 @@ U(x) = c - \beta\theta \log \int \exp\!\left[\frac{-U(Ax + B\varepsilon)}{\theta
 
 #### Deriving the stochastic discount factor
 
-The stochastic discount factor is the intertemporal marginal rate of substitution --- the ratio of marginal utilities of the consumption good at $t+1$ versus $t$.
+The stochastic discount factor is the intertemporal marginal rate of substitution: the ratio of marginal utilities of the consumption good at dates $t+1$ and $t$.
 
 Since $c_t$ enters {eq}`bhs_risk_sensitive` linearly, $\partial U_t / \partial c_t = 1$.
 
@@ -554,10 +568,10 @@ $$
 \beta \frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}.
 $$
 
-This when converted to the consumption level gives
+Converting to consumption levels gives
 $\partial U_t / \partial C_{t+1} = \beta \frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]} \frac{1}{C_{t+1}}$.
 
-Taking the ratio gives the SDF:
+The ratio of these marginal utilities gives the SDF:
 
 ```{math}
 :label: bhs_sdf_Ut
@@ -569,23 +583,22 @@ m_{t+1}
 \frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}.
 ```
 
-The first factor $\beta\,C_t/C_{t+1}$ is the standard log-utility IMRS.
 
 The second factor is the likelihood-ratio distortion $\hat g_{t+1}$: an exponential tilt that overweights states where the continuation value $U_{t+1}$ is low.
 
 
 ### Type II: multiplier preferences
 
-Now we move to the type II (multiplier) agent.
+We now turn to the type II (multiplier) agent.
 
-Before we write down the preferences, we introduce the machinery of martingale likelihood ratios that will be used to formalize model distortions.
+Before writing down the preferences, we introduce the machinery of martingale likelihood ratios used to formalize model distortions.
 
 The tools in this section build on {ref}`Likelihood Ratio Processes <likelihood_ratio_process>`, which develops properties of likelihood ratios in detail, and {ref}`Divergence Measures <divergence_measures>`, which covers relative entropy.
 
 
 #### Martingale likelihood ratios
 
-Consider a nonnegative martingale $G_t$ with $E(G_t \mid x_0) = 1$ as a Radon--Nikodym derivative.
+Consider a nonnegative martingale $G_t$ with $E(G_t \mid x_0) = 1$.
 
 Its one-step increments
 
@@ -642,7 +655,7 @@ The minimizer is ({ref}`Exercise 4 <dov_ex4>` derives this and verifies the equi
 \frac{\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)}{E_t\!\left[\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)\right]}.
 ```
 
-Note that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty --- this is the key structural feature that makes $\hat g$ a likelihood ratio.
+The fact that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty is the key structural feature that makes $\hat g$ a likelihood ratio.
 
 
 Substituting {eq}`bhs_ghat` back into {eq}`bhs_bellman_type2` gives
@@ -754,7 +767,7 @@ For the particular $A, B, H$ and $\theta$ used to construct $\hat\Pi_\infty$, th
 
 ### Stochastic discount factor
 
-As we have shown in each case of the four types, the stochastic discount factor can be written compactly as
+As we have shown for each of the four agent types, the stochastic discount factor can be written compactly as
 
 ```{math}
 :label: bhs_sdf
@@ -771,11 +784,11 @@ Robustness multiplies that term by $\hat g_{t+1}$, so uncertainty aversion enter
 
 For constraint preferences, the worst-case distortion is the same as for multiplier preferences with the $\theta$ that makes the entropy constraint bind.
 
-While for the ex post Bayesian, the distortion is a change of measure from the approximating model to the pessimistic model.
+For the ex post Bayesian, the distortion is a change of measure from the approximating model to the pessimistic model.
 
-### Value function decomposition: $W = J + \theta N$
+### Value function decomposition
 
-We can express the type II value function in a revealing way by substituting the minimizing $\hat g$ back into the Bellman equation {eq}`bhs_bellman_type2`:
+Substituting the minimizing $\hat g$ back into the Bellman equation {eq}`bhs_bellman_type2` yields a revealing decomposition of the type II value function:
 
 ```{math}
 :label: bhs_W_decomp_bellman
@@ -796,7 +809,9 @@ N(x) = \beta \int \hat g(\varepsilon)\bigl[\log \hat g(\varepsilon) + N(Ax + B\v
 
 Then $W(x) = J(x) + \theta N(x)$.
 
-Here $J(x_t) = \hat E_t \sum_{j=0}^{\infty} \beta^j c_{t+j}$ is expected discounted log consumption under the *worst-case* model --- this is the value function for both the type III and the type IV agent.
+Here $J(x_t) = \hat E_t \sum_{j=0}^{\infty} \beta^j c_{t+j}$ is expected discounted log consumption under the *worst-case* model.
+
+$J$ is the value function for both the type III and the type IV agent: the type III agent maximizes expected utility subject to an entropy constraint, and once the worst-case model is determined, the resulting value is expected discounted consumption under that model; the type IV agent uses the same worst-case model as a fixed belief, so evaluates the same expectation.
 
 And $N(x)$ is discounted continuation entropy: it measures the total information cost of the probability distortion from date $t$ onward.
 
@@ -804,9 +819,18 @@ This decomposition will be important for the welfare calculations in {ref}`the w
 
 ### Gaussian mean-shift distortions
 
-Under the random-walk model, the shock is $\varepsilon_{t+1} \sim \mathcal{N}(0, 1)$.
+The preceding results hold for general distortions $\hat g$.
+We now specialize to the Gaussian case that underlies our two consumption models.
+
+Under both models, the shock is $\varepsilon_{t+1} \sim \mathcal{N}(0,1)$.
+
+As we verify in the next subsection, the value function $W$ is linear in the state, so the exponent in the worst-case distortion {eq}`bhs_ghat` is linear in $\varepsilon_{t+1}$.
+
+Exponentially tilting a Gaussian by a linear function produces another Gaussian with the same variance but a shifted mean.
 
-The worst-case model shifts its mean to $w$ (which will be negative under our calibrations), which implies the likelihood ratio ({ref}`Exercise 5 <dov_ex5>` verifies the properties of this distortion)
+The worst-case model therefore keeps the variance at one but shifts the mean of $\varepsilon_{t+1}$ to some $w < 0$.
+
+The resulting likelihood ratio is ({ref}`Exercise 5 <dov_ex5>` verifies its properties)
 
 ```{math}
 \hat g_{t+1}
@@ -822,8 +846,11 @@ Hence $\log \hat g_{t+1}$ is normal with mean $-w^2/2$ and variance $w^2$, and
 \operatorname{std}(\hat g_{t+1}) = \sqrt{e^{w^2}-1}.
 ```
 
-For our Gaussian calibrations, the worst-case mean shifts for 
-the random-walk model and the trend-stationary model are summarized by
+The mean shift $w$ is determined by how strongly each shock $\varepsilon_{t+1}$ affects continuation value.
+From {eq}`bhs_ghat`, the worst-case distortion puts $\hat g \propto \exp(-W(x_{t+1})/\theta)$.
+If $W(x_{t+1})$ loads on $\varepsilon_{t+1}$ with coefficient $\lambda$, then the Gaussian mean shift is $w = -\lambda/\theta$.
+
+By guessing linear value functions and matching coefficients in the Bellman equation ({ref}`Exercise 11 <dov_ex11>` works out both cases), we obtain the worst-case mean shifts
 
 ```{math}
 :label: bhs_w_formulas
@@ -832,6 +859,8 @@ w_{rw}(\theta) = -\frac{\sigma_\varepsilon}{(1-\beta)\theta},
 w_{ts}(\theta) = -\frac{\sigma_\varepsilon}{(1-\rho\beta)\theta}.
 ```
 
+The denominator $(1-\beta)$ in the random-walk case is replaced by $(1-\beta\rho)$ in the trend-stationary case: because the AR(1) component is persistent, each shock has a larger effect on continuation utility.
+
 ```{code-cell} ipython3
 def w_from_θ(θ, model):
     if np.isinf(θ):
@@ -843,69 +872,100 @@ def w_from_θ(θ, model):
     raise ValueError("model must be 'rw' or 'ts'")
 ```
 
-(ez_sdf_moments)=
-### SDF moments under Epstein-Zin preferences
+### Discounted entropy
 
-We can now derive the closed-form SDF moments used to draw {numref}`fig-bhs-1`.
+When the approximating and worst-case conditional densities are $\mathcal{N}(0,1)$ and $\mathcal{N}(w,1)$, conditional relative entropy is
 
-Under Epstein-Zin preferences with IES $= 1$, the SDF has the form {eq}`bhs_sdf`
+```{math}
+:label: bhs_conditional_entropy
+E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2.
+```
 
-$$
-m_{t+1} = \beta \frac{C_t}{C_{t+1}} \cdot \hat{g}_{t+1},
-$$
+Because the distortion is i.i.d., the discounted entropy recursion {eq}`bhs_N_recursion` reduces to $N = \beta(\frac{1}{2}w^2 + N)$, giving discounted entropy
 
-where $\hat{g}_{t+1}$ is the likelihood-ratio distortion from the continuation value.
+```{math}
+:label: bhs_eta_formula
+\eta = \frac{\beta}{2(1-\beta)}\,w(\theta)^2.
+```
 
-For the random-walk model with $c_{t+1} - c_t = \mu + \sigma_\varepsilon \varepsilon_{t+1}$ and $\varepsilon_{t+1} \sim \mathcal{N}(0,1)$, the distortion is a Gaussian mean shift $w = -\sigma_\varepsilon(\gamma - 1)$, and $\log m_{t+1}$ turns out to be normally distributed:
+```{code-cell} ipython3
+def η_from_θ(θ, model):
+    w = w_from_θ(θ, model)
+    return β * w**2 / (2.0 * (1.0 - β))
+```
 
-$$
-\log m_{t+1} = \log\beta - \mu - \tfrac{1}{2}w^2 + (w - \sigma_\varepsilon)\varepsilon_{t+1}.
-$$
+This formula provides a mapping between $\theta$ and $\eta$ that aligns multiplier and constraint preferences along an exogenous endowment process.
 
-Its mean and variance are
+In the {ref}`detection-error section <detection_error_section>` below, we show that it is more natural to hold $\eta$ (or equivalently the detection-error probability $p$) fixed rather than $\theta$ when comparing across consumption models.
 
-$$
-E[\log m] = \log\beta - \mu - \tfrac{1}{2}w^2,
-\qquad
-\operatorname{Var}(\log m) = (w - \sigma_\varepsilon)^2 = \sigma_\varepsilon^2 \gamma^2.
-$$
+### Value functions for random-walk consumption
 
-For a lognormal random variable, $E[m] = \exp(E[\log m] + \tfrac{1}{2}\operatorname{Var}(\log m))$ and $\sigma(m)/E[m] = \sqrt{e^{\operatorname{Var}(\log m)} - 1}$.
+We now solve the recursions {eq}`bhs_W_decomp_bellman`, {eq}`bhs_J_recursion`, and {eq}`bhs_N_recursion` in closed form for the random-walk model, where $W$ is the type II (multiplier) value function, $J$ is the type III/IV value function, and $N$ is discounted continuation entropy.
 
-Substituting gives the following closed-form expressions ({ref}`Exercise 2 <dov_ex2>` asks you to work through the full derivation):
+Substituting $w_{rw}(\theta) = -\sigma_\varepsilon / [(1-\beta)\theta]$ from {eq}`bhs_w_formulas` into {eq}`bhs_eta_formula` gives
 
-- *Random walk*:
+```{math}
+:label: bhs_N_rw
+N(x) = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta^2}.
+```
+
+For $W$, we guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ for some constant $d$ and verify it in the risk-sensitive Bellman equation {eq}`bhs_bellman_type1`.
+
+Under the random walk, $W(x_{t+1}) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d]$, so $-W(x_{t+1})/\theta$ is affine in the standard normal $\varepsilon_{t+1}$.
+
+Using the fact that $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$ for a normal random variable $Z$, the Bellman equation {eq}`bhs_bellman_type1` reduces to a constant-matching condition that pins down $d$ ({ref}`Exercise 9 <dov_ex9>` works through the algebra):
 
 ```{math}
-:label: bhs_Em_rw
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
+:label: bhs_W_rw
+W(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
 ```
 
+Using $W = J + \theta N$, the type III/IV value function is
+
 ```{math}
-:label: bhs_sigma_rw
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
+:label: bhs_J_rw
+J(x_t) = W(x_t) - \theta N(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
 ```
 
-Notice that in {eq}`bhs_Em_rw`, because $\sigma_\varepsilon$ is small ($\approx 0.005$), the term $\frac{\sigma_\varepsilon^2}{2}(2\gamma-1)$ grows slowly with $\gamma$, keeping $E[m]$ roughly constant near $1/(1+r^f)$.
+The coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ doubles from $\tfrac{1}{2}$ in $W$ to $1$ in $J$ because $W$ includes the entropy "rebate" $\theta N$ that partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model.
 
-Meanwhile {eq}`bhs_sigma_rw` shows that $\sigma(m)/E[m] \approx \sigma_\varepsilon \gamma$ grows linearly with $\gamma$.
+This difference propagates directly into the welfare calculations below.
 
-This is how Epstein-Zin preferences push volatility toward the HJ bound without distorting the risk-free rate.
+(detection_error_section)=
+## A new calibration language: detection-error probabilities
 
-An analogous calculation for the trend-stationary model yields:
+The preceding section derived SDF moments, value functions, and worst-case distortions as functions of $\gamma$ (or equivalently $\theta$).
 
-- *Trend stationary*:
+But if $\gamma$ should not be calibrated by introspection about atemporal gambles, what replaces it?
 
-```{math}
-:label: bhs_Em_ts
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
-```
+The answer proposed by {cite:t}`BHS_2009` is a statistical test that asks how easily one could distinguish the approximating model from its worst-case alternative.
+
+### Likelihood-ratio testing and detection errors
+
+Let $L_T$ be the log likelihood ratio between the worst-case and approximating models based on a sample of length $T$.
+
+Define
 
 ```{math}
-:label: bhs_sigma_ts
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
+p_A = \Pr_A(L_T < 0),
+\qquad
+p_B = \Pr_B(L_T > 0),
 ```
 
+where $\Pr_A$ and $\Pr_B$ denote probabilities under the approximating and worst-case models.
+
+Then $p(\theta^{-1}) = \frac{1}{2}(p_A + p_B)$ is the average probability of choosing the wrong model.
+
+Fix a sample size $T$ (here 235 quarters, matching the postwar US data used in the paper).
+
+For a given $\theta$, compute the worst-case model and ask: if a Bayesian ran a likelihood-ratio test to distinguish the approximating model from the worst-case model, what fraction of the time would she make an error?
+
+That fraction is the **detection-error probability** $p(\theta^{-1})$.
+
+A high $p$ (near 0.5) means the two models are nearly indistinguishable, so the consumer's fear is hard to rule out.
+
+A low $p$ means the worst-case model is easy to reject and the robustness concern is less compelling.
+
 ### Market price of model uncertainty
 
 The **market price of model uncertainty** (MPU) is the conditional standard deviation of the distortion:
@@ -920,7 +980,7 @@ The **market price of model uncertainty** (MPU) is the conditional standard devi
 \approx |w(\theta)|.
 ```
 
-The detection error probability is
+In the Gaussian mean-shift setting, $L_T$ is normal with mean $\pm \tfrac{1}{2}w^2T$ and variance $w^2T$, so the detection-error probability has the closed form ({ref}`Exercise 6 <dov_ex6>` derives this)
 
 ```{math}
 :label: bhs_detection_formula
@@ -929,8 +989,6 @@ p(\theta^{-1})
 \frac{1}{2}\left(p_A + p_B\right),
 ```
 
-and in our Gaussian mean-shift case reduces to ({ref}`Exercise 6 <dov_ex6>` derives this closed form)
-
 ```{math}
 :label: bhs_detection_closed
 p(\theta^{-1}) = \Phi\!\left(-\frac{|w(\theta)|\sqrt{T}}{2}\right).
@@ -953,26 +1011,10 @@ def θ_from_detection_probability(p, model):
     raise ValueError("model must be 'rw' or 'ts'")
 ```
 
-### Likelihood-ratio testing and detection errors
-
-Let $L_T$ be the log likelihood ratio between the worst-case and approximating models based on a sample of length $T$.
-
-Define
-
-```{math}
-p_A = \Pr_A(L_T < 0),
-\qquad
-p_B = \Pr_B(L_T > 0),
-```
-
-where $\Pr_A$ and $\Pr_B$ denote probabilities under the approximating and worst-case models.
-
-Then $p(\theta^{-1}) = \frac{1}{2}(p_A + p_B)$ is the average probability of choosing the wrong model.
-
-In the Gaussian mean-shift setting, $L_T$ is normal with mean $\pm \tfrac{1}{2}w^2T$ and variance $w^2T$, which yields the closed-form expression above.
-
 ### Interpreting the calibration objects
 
+We now summarize the chain of mappings that connects preference parameters to statistical distinguishability.
+
 The parameter $\theta$ indexes how expensive it is for the minimizing player to distort the approximating model.
 
 A small $\theta$ means a cheap distortion and therefore stronger robustness concerns.
@@ -987,76 +1029,13 @@ High $p(\theta^{-1})$ means the two models are hard to distinguish.
 
 Low $p(\theta^{-1})$ means they are easier to distinguish.
 
-This translation is the bridge between econometric identification and preference calibration.
-
-Finally, the relative-entropy distance associated with the worst-case distortion is
+This mapping bridges econometric identification and preference calibration.
 
-```{math}
-E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2,
-```
+Finally, recall from {eq}`bhs_eta_formula` that discounted entropy is $\eta = \frac{\beta}{2(1-\beta)}w(\theta)^2$.
 
-so the discounted entropy used in type III preferences is
+This tells us that when the distortion is a Gaussian mean shift, discounted entropy is proportional to the squared market price of model uncertainty.
 
-```{math}
-\eta
-=
-\frac{\beta}{1-\beta}\cdot \frac{w(\theta)^2}{2},
-```
-
-```{code-cell} ipython3
-def η_from_θ(θ, model):
-    w = w_from_θ(θ, model)
-    return β * w**2 / (2.0 * (1.0 - β))
-```
-
-This is the mapping behind the right panel of the detection-probability figure below.
-
-### Closed-form value functions for random-walk consumption
-
-We can now evaluate the value functions $W$, $J$, and $N$ in closed form for the random-walk model.
-
-Substituting $w_{rw}(\theta) = -\sigma_\varepsilon / [(1-\beta)\theta]$ from {eq}`bhs_w_formulas` into the discounted entropy formula gives
-
-```{math}
-:label: bhs_N_rw
-N(x) = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta^2}.
-```
-
-The type II value function {eq}`bhs_W_decomp_bellman` evaluates to
-
-```{math}
-:label: bhs_W_rw
-W(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
-```
-
-Using $W = J + \theta N$, the type III/IV value function is
-
-```{math}
-:label: bhs_J_rw
-J(x_t) = W(x_t) - \theta N(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
-```
-
-Note that $J$ has *twice* the uncertainty correction of $W$: the coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ in $J$ is $1$ versus $\tfrac{1}{2}$ in $W$.
-
-This is because $W$ includes the entropy "rebate" $\theta N$ that partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model.
-
-This difference propagates directly into the welfare calculations below.
-
-## A new calibration language: detection-error probabilities
-
-If $\gamma$ should not be calibrated by introspection about atemporal gambles, what replaces it?
-
-The answer is a statistical test.
-
-Fix a sample size $T$ (here 235 quarters, matching the postwar US data used in the paper).
-
-For a given $\theta$, compute the worst-case model and ask: if a Bayesian ran a likelihood-ratio test to distinguish the approximating model from the worst-case model, what fraction of the time would she make an error?
-
-That fraction is the detection-error probability $p(\theta^{-1})$.
-
-A high $p$ (near 0.5) means the two models are nearly indistinguishable --- the consumer's fear is hard to rule out.
-
-A low $p$ means the worst case is easy to reject and the robustness concern is less compelling.
+### Detection probabilities across the two models
 
 The left panel below plots $p(\theta^{-1})$ against $\theta^{-1}$ for the two consumption specifications.
 
@@ -1103,7 +1082,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-This is why detection-error probabilities (or equivalently, discounted entropy) are the right cross-model yardstick.
+Detection-error probabilities (or equivalently, discounted entropy) therefore provide the right cross-model yardstick.
 
 Holding $\theta$ fixed when switching from a random walk to a trend-stationary specification implicitly changes how much misspecification the consumer fears.
 
@@ -1125,23 +1104,30 @@ Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ i
 
 ## Detection probabilities unify the two models
 
-We can now redraw Tallarini's figure using the new language.
+We now redraw Tallarini's figure using detection-error probabilities.
 
-For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$, invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m),\;\sigma(m)/E(m))$ pair.
+For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$, invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m),\;\sigma(m))$ pair.
 
 ```{code-cell} ipython3
-p_points = np.array([0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.01])
+p_points = np.array(
+    [0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.01])
 
-θ_rw_points = np.array([θ_from_detection_probability(p, "rw") for p in p_points])
-θ_ts_points = np.array([θ_from_detection_probability(p, "ts") for p in p_points])
+θ_rw_points = np.array(
+    [θ_from_detection_probability(p, "rw") for p in p_points])
+θ_ts_points = np.array(
+    [θ_from_detection_probability(p, "ts") for p in p_points])
 
 γ_rw_points = np.array([γ_from_θ(θ) for θ in θ_rw_points])
 γ_ts_points = np.array([γ_from_θ(θ) for θ in θ_ts_points])
 
-Em_rw_p = np.array([moments_type1_rw(γ)[0] for γ in γ_rw_points])
-MPR_rw_p = np.array([moments_type1_rw(γ)[1] for γ in γ_rw_points])
-Em_ts_p = np.array([moments_type1_ts(γ)[0] for γ in γ_ts_points])
-MPR_ts_p = np.array([moments_type1_ts(γ)[1] for γ in γ_ts_points])
+Em_rw_p = np.array(
+    [moments_type1_rw(γ)[0] for γ in γ_rw_points])
+σ_m_rw_p = np.array(
+    [moments_type1_rw(γ)[0] * moments_type1_rw(γ)[1] for γ in γ_rw_points])
+Em_ts_p = np.array(
+    [moments_type1_ts(γ)[0] for γ in γ_ts_points])
+σ_m_ts_p = np.array(
+    [moments_type1_ts(γ)[0] * moments_type1_ts(γ)[1] for γ in γ_ts_points])
 
 print("p      γ_rw      γ_ts")
 for p, g1, g2 in zip(p_points, γ_rw_points, γ_ts_points):
@@ -1155,14 +1141,51 @@ mystnb:
     caption: Pricing loci from common detectability
     name: fig-bhs-3
 ---
+from scipy.optimize import brentq
+
+# Empirical Sharpe ratio — the minimum of the HJ bound curve
+sharpe = (r_e_mean - r_f_mean) / r_excess_std
+
+def sharpe_gap(p, model):
+    """Market price of risk minus Sharpe ratio, as a function of p."""
+    if p >= 0.5:
+        return -sharpe
+    θ = θ_from_detection_probability(p, model)
+    γ = γ_from_θ(θ)
+    _, mpr = moments_type1_rw(γ) if model == "rw" else moments_type1_ts(γ)
+    return mpr - sharpe
+
+p_hj_rw = brentq(sharpe_gap, 1e-4, 0.49, args=("rw",))
+p_hj_ts = brentq(sharpe_gap, 1e-4, 0.49, args=("ts",))
+
 fig, ax = plt.subplots(figsize=(7, 5))
-ax.plot(Em_rw_p, MPR_rw_p, "o-", lw=2, label="random walk")
-ax.plot(Em_ts_p, MPR_ts_p, "+-", lw=2, label="trend stationary")
-ax.plot(Em_grid, HJ_std, lw=2, color="black", label="Hansen-Jagannathan bound")
+ax.plot(Em_rw_p, σ_m_rw_p, "o",
+            label="random walk")
+ax.plot(Em_ts_p, σ_m_ts_p, "+", markersize=12,
+            label="trend stationary")
+ax.plot(Em_grid, HJ_std, lw=2,
+            color="black", label="Hansen-Jagannathan bound")
+
+# Mark p where each model's market price of risk reaches the Sharpe ratio
+for p_hj, model, color, name, marker in [
+    (p_hj_rw, "rw", "C0", "RW", "o"),
+    (p_hj_ts, "ts", "C1", "TS", "+"),
+]:
+    θ_hj = θ_from_detection_probability(p_hj, model)
+    γ_hj = γ_from_θ(θ_hj)
+    Em_hj, mpr_hj = (moments_type1_rw(γ_hj) if model == "rw"
+                      else moments_type1_ts(γ_hj))
+    σ_m_hj = Em_hj * mpr_hj
+    ax.axhline(σ_m_hj, ls="--", lw=1, color=color,
+               label=f"{name} reaches bound at $p = {p_hj:.3f}$")
+    if model == "ts":
+        ax.plot(Em_hj, σ_m_hj, marker, markersize=12, color=color)
+    else:
+        ax.plot(Em_hj, σ_m_hj, marker, color=color)
 
 ax.set_xlabel(r"$E(m)$")
-ax.set_ylabel(r"$\sigma(m)/E(m)$")
-ax.legend(frameon=False, loc="upper right")
+ax.set_ylabel(r"$\sigma(m)$")
+ax.legend(frameon=False)
 ax.set_xlim(0.96, 1.05)
 ax.set_ylim(0.0, 0.34)
 
@@ -1170,15 +1193,87 @@ plt.tight_layout()
 plt.show()
 ```
 
-The striking result: the random-walk and trend-stationary loci nearly coincide.
+The result is striking: the random-walk and trend-stationary loci nearly coincide.
 
 Recall that under Tallarini's $\gamma$-calibration, reaching the Hansen--Jagannathan bound required $\gamma \approx 50$ for the random walk but $\gamma \approx 75$ for the trend-stationary model --- very different numbers for the "same" preference parameter.
 
-Under detection-error calibration, both models reach the bound at the same detectability level (around $p = 0.05$).
+Under detection-error calibration, both models reach the bound at the same detectability level.
+
+The apparent model dependence was an artifact of using $\gamma$ as a cross-model yardstick.
+
+Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell the same story: a representative consumer with moderate, difficult-to-dismiss fears about model misspecification behaves as though she has very high risk aversion.
+
+The following figure brings together the two key ideas of this section: a small one-step density shift that is hard to detect (left panel) compounds into a large gap in expected log consumption (right panel).
 
-The model dependence was an artifact of using $\gamma$ as a cross-model yardstick.
+At $p = 0.03$ both models share the same innovation mean shift $w$, and the left panel shows that the approximating and worst-case one-step densities nearly coincide.
 
-Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell the same story: a representative consumer with moderate, difficult-to-dismiss fears about model misspecification behaves as if she had very high risk aversion.
+The right panel reveals the cumulative consequence: a per-period shift that is virtually undetectable compounds into a large gap in expected log consumption, especially under random-walk dynamics where each shock has a permanent effect.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Small one-step density shift (left) produces large cumulative
+      consumption gap (right) at detection-error probability $p = 0.03$ with $T = 240$ quarters
+    name: fig-bhs-fear
+---
+p_star = 0.03
+θ_star = θ_from_detection_probability(p_star, "rw")
+w_star = w_from_θ(θ_star, "rw") 
+σ_ε = rw["σ_ε"]
+ρ = ts["ρ"]
+
+fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))
+
+ε = np.linspace(-4.5, 4.5, 500)
+f0 = norm.pdf(ε, 0, 1)
+fw = norm.pdf(ε, w_star, 1)
+
+ax1.fill_between(ε, f0, alpha=0.15, color='k')
+ax1.plot(ε, f0, 'k', lw=2.5,
+         label=r'Approximating $\mathcal{N}(0,\,1)$')
+ax1.fill_between(ε, fw, alpha=0.15, color='C3')
+ax1.plot(ε, fw, 'C3', lw=2, ls='--',
+         label=f'Worst case $\\mathcal{{N}}({w_star:.2f},\\,1)$')
+
+peak = norm.pdf(0, 0, 1)
+ax1.annotate('', xy=(w_star, 0.55 * peak), xytext=(0, 0.55 * peak),
+             arrowprops=dict(arrowstyle='->', color='C3', lw=1.8))
+ax1.text(w_star / 2, 0.59 * peak, f'$w = {w_star:.2f}$',
+         ha='center', fontsize=11, color='C3')
+
+ax1.set_xlabel(r'$\varepsilon_{t+1}$')
+ax1.set_ylabel('Density')
+ax1.legend(frameon=False)
+ax1.spines[['top', 'right']].set_visible(False)
+
+quarters = np.arange(0, 241)
+years = quarters / 4
+
+gap_rw = 100 * σ_ε * w_star * quarters
+gap_ts = 100 * σ_ε * w_star * (1 - ρ**quarters) / (1 - ρ)
+
+ax2.plot(years, gap_rw, 'C0', lw=2.5, label='Random walk')
+ax2.plot(years, gap_ts, 'C1', lw=2.5, label='Trend stationary')
+ax2.fill_between(years, gap_rw, alpha=0.1, color='C0')
+ax2.fill_between(years, gap_ts, alpha=0.1, color='C1')
+ax2.axhline(0, color='k', lw=0.5, alpha=0.3)
+
+# Endpoint labels
+ax2.text(61, gap_rw[-1], f'{gap_rw[-1]:.1f}%',
+         fontsize=10, color='C0', va='center')
+ax2.text(61, gap_ts[-1], f'{gap_ts[-1]:.1f}%',
+         fontsize=10, color='C1', va='center')
+
+ax2.set_xlabel('Years')
+ax2.set_ylabel('Gap in expected log consumption (%)')
+ax2.legend(frameon=False, loc='lower left')
+ax2.spines[['top', 'right']].set_visible(False)
+ax2.set_xlim(0, 68)
+
+plt.tight_layout()
+plt.show()
+```
 
 (welfare_experiments)=
 ## What do risk premia measure? Two mental experiments
@@ -1187,40 +1282,67 @@ Once we measure robustness concerns in units of statistical detectability, the t
 
 His answer rested on the assumption that the consumer knows the data-generating process.
 
-The robust reinterpretation introduces a second, distinct mental experiment.
+The robust reinterpretation introduces a second, distinct thought experiment.
 
 Instead of eliminating all randomness, suppose we keep randomness but remove the consumer's fear of model misspecification (set $\theta = \infty$).
 
-How much would she pay for that relief alone?
+How much would she pay for that relief?
 
-Formally, we seek a permanent proportional reduction $c_0 - c_0^J$ in initial log consumption that leaves a type $J$ agent indifferent between the original risky plan and a deterministic certainty equivalent path.
+Formally, we seek a permanent proportional reduction $c_0 - c_0^k$ in initial log consumption that leaves an agent of type $k$ indifferent between the original risky plan and a deterministic certainty-equivalent path.
 
 Because utility is log and the consumption process is Gaussian, these compensations are available in closed form.
 
 ### The certainty equivalent path
 
-Our point of comparison is the deterministic path with the same mean level of consumption as the stochastic plan:
+The point of comparison is the deterministic path with the same mean level of consumption as the stochastic plan:
 
 ```{math}
 :label: bhs_ce_path
 c_{t+1}^{ce} - c_t^{ce} = \mu + \tfrac{1}{2}\sigma_\varepsilon^2.
 ```
 
-The extra $\tfrac{1}{2}\sigma_\varepsilon^2$ is a Jensen's inequality correction: $E[C_t] = E[e^{c_t}] = \exp(c_0 + t\mu + \tfrac{1}{2}t\sigma_\varepsilon^2)$, so {eq}`bhs_ce_path` matches the mean *level* of consumption at every date.
+The additional $\tfrac{1}{2}\sigma_\varepsilon^2$ term is a Jensen's inequality correction: $E[C_t] = E[e^{c_t}] = \exp(c_0 + t\mu + \tfrac{1}{2}t\sigma_\varepsilon^2)$, so {eq}`bhs_ce_path` matches the mean *level* of consumption at every date.
 
 ### Compensating variations from the value functions
 
 We use the closed-form value functions derived earlier: {eq}`bhs_W_rw` for the type I/II value function $W$ and {eq}`bhs_J_rw` for the type III/IV value function $J$.
 
-For the certainty equivalent path {eq}`bhs_ce_path`, there is no risk and no model uncertainty, so its value starting from $c_0^J$ is
+For the certainty-equivalent path {eq}`bhs_ce_path`, there is no risk and no model uncertainty ($\theta = \infty$, so $\hat g = 1$), so the value function reduces to discounted expected log utility.  
+
+With $c_t^{ce} = c_0^J + t(\mu + \tfrac{1}{2}\sigma_\varepsilon^2)$, we have
+
+$$
+U^{ce}(c_0^J)
+= \sum_{t=0}^{\infty}\beta^t c_t^{ce}
+= \sum_{t=0}^{\infty}\beta^t \bigl[c_0^J + t(\mu + \tfrac{1}{2}\sigma_\varepsilon^2)\bigr]
+= \frac{c_0^J}{1-\beta} + \frac{\beta(\mu + \tfrac{1}{2}\sigma_\varepsilon^2)}{(1-\beta)^2},
+$$
+
+where we used $\sum_{t \geq 0}\beta^t = \frac{1}{1-\beta}$ and $\sum_{t \geq 0}t\beta^t = \frac{\beta}{(1-\beta)^2}$.  Factoring gives
 
 $$
 U^{ce}(c_0^J) = \frac{1}{1-\beta}\!\left[c_0^J + \frac{\beta}{1-\beta}\!\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right].
 $$
 
-### Type I (Epstein-Zin) compensation
+### Type I (Epstein--Zin) compensation
 
-Setting $U^{ce}(c_0^I) = W(x_0)$ from {eq}`bhs_W_rw` and solving for $c_0 - c_0^I$:
+Setting $U^{ce}(c_0^I) = W(x_0)$ from {eq}`bhs_W_rw`:
+
+$$
+\frac{1}{1-\beta}\!\left[c_0^I + \frac{\beta}{1-\beta}\!\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right]
+=
+\frac{1}{1-\beta}\!\left[c_0 + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
+$$
+
+Multiplying both sides by $(1-\beta)$ and cancelling the common $\frac{\beta\mu}{1-\beta}$ terms gives
+
+$$
+c_0^I + \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}
+=
+c_0 - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
+$$
+
+Solving for $c_0 - c_0^I$:
 
 ```{math}
 :label: bhs_comp_type1
@@ -1236,7 +1358,8 @@ where the last step uses $\gamma = 1 + [(1-\beta)\theta]^{-1}$.
 ### Type II (multiplier) decomposition
 
 Because $W \equiv U$, we have $c_0^{II} = c_0^I$ and the total compensation is the same.
-But the interpretation differs: we can now decompose it into **risk** and **model uncertainty** components.
+
+However, the interpretation differs: we can now decompose it into **risk** and **model uncertainty** components.
 
 A type II agent with $\theta = \infty$ (no model uncertainty) has log preferences and requires
 
@@ -1251,13 +1374,29 @@ A type II agent with $\theta = \infty$ (no model uncertainty) has log preference
 \frac{\beta \sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
 ```
 
-The risk term $\Delta c_0^{risk}$ is Lucas's cost of business cycles: at postwar consumption volatility ($\sigma_\varepsilon \approx 0.005$), it is tiny.
+The risk term $\Delta c_0^{risk}$ is Lucas's cost of business cycles: at postwar consumption volatility ($\sigma_\varepsilon \approx 0.005$), it is small.
 
-The uncertainty term $\Delta c_0^{uncertainty}$ is the additional compensation a type II agent requires for facing model misspecification. It can be first order whenever the detection-error probability is moderate, because $\theta$ appears in the denominator.
+The uncertainty term $\Delta c_0^{uncertainty}$ is the additional compensation a type II agent requires for facing model misspecification. It can be first-order whenever the detection-error probability is moderate, because $\theta$ appears in the denominator.
 
 ### Type III (constraint) compensation
 
-For a type III agent, the value function $J$ from {eq}`bhs_J_rw` implies
+For a type III agent, we set $U^{ce}(c_0^{III}) = J(x_0)$ using the value function $J$ from {eq}`bhs_J_rw`:
+
+$$
+\frac{1}{1-\beta}\!\left[c_0^{III} + \frac{\beta}{1-\beta}\!\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right]
+=
+\frac{1}{1-\beta}\!\left[c_0 + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
+$$
+
+Following the same algebra as for type I but with the doubled uncertainty correction in $J$:
+
+$$
+c_0 - c_0^{III}
+=
+\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)} + \frac{\beta\sigma_\varepsilon^2}{(1-\beta)^2\theta}.
+$$
+
+Using $\frac{1}{(1-\beta)\theta} = \gamma - 1$, this simplifies to
 
 ```{math}
 :label: bhs_type3_rw_decomp
@@ -1266,7 +1405,7 @@ c_0 - c_0^{III}
 \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}(2\gamma - 1).
 ```
 
-The uncertainty component alone is
+The risk component is the same $\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}$ as before. The uncertainty component alone is
 
 $$
 c_0^{III}(r) - c_0^{III}
@@ -1279,18 +1418,18 @@ The factor of two traces back to the difference between $W$ and $J$ noted after
 
 ### Type IV (ex post Bayesian) compensation
 
-A type IV agent believes the pessimistic model without doubt, so his perceived drift is $\tilde\mu = \mu - \sigma_\varepsilon^2/[(1-\beta)\theta]$.
-His compensation for moving to the certainty equivalent path is the same as {eq}`bhs_type3_rw_decomp`, because he ranks plans using the same value function $J$.
+A type IV agent believes the pessimistic model, so the perceived drift is $\tilde\mu = \mu - \sigma_\varepsilon^2/[(1-\beta)\theta]$.
+The compensation for moving to the certainty-equivalent path is the same as {eq}`bhs_type3_rw_decomp`, because this agent ranks plans using the same value function $J$.
 
 ### Comparison with a risky but free-of-model-uncertainty path
 
-The certainty equivalents above compared a risky plan to a deterministic path, eliminating both risk and uncertainty simultaneously.
+The certainty equivalents above compare a risky plan to a deterministic path, eliminating both risk and uncertainty simultaneously.
 
 We now describe an alternative measure that isolates compensation for model uncertainty by keeping risk intact.
 
-We compare two situations whose risky consumptions for all dates $t \geq 1$ are identical.
+We compare two situations with identical risky consumption for all dates $t \geq 1$.
 
-All compensation for model uncertainty is concentrated in an adjustment to date-zero consumption alone.
+All compensation for model uncertainty is concentrated in an adjustment to date-zero consumption.
 
 Specifically, we seek $c_0^{II}(u)$ that makes a type II agent indifferent between:
 
@@ -1313,7 +1452,15 @@ Since $c_1$ is built from $c_0$ (not $c_0^{II}(u)$), the continuation is
 
 $$
 \beta\,E\!\left[V^{\log}(c_1)\right]
-= \frac{\beta}{1-\beta}\!\left[c_0 + \frac{\mu}{1-\beta}\right]
+= \frac{\beta}{1-\beta}\,E\!\left[c_1 + \frac{\beta\mu}{1-\beta}\right]
+= \frac{\beta}{1-\beta}\!\left[c_0 + \mu + \frac{\beta\mu}{1-\beta}\right]
+= \frac{\beta}{1-\beta}\!\left[c_0 + \frac{\mu}{1-\beta}\right],
+$$
+
+where we used $E[c_1] = c_0 + \mu$ (the noise term has zero mean). Expanding gives
+
+$$
+\beta\,E\!\left[V^{\log}(c_1)\right]
 = \frac{\beta c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2}.
 $$
 
@@ -1454,22 +1601,26 @@ plt.tight_layout()
 plt.show()
 ```
 
-The left panel illustrates our elimination of model uncertainty and risk experiment for a type II agent.
+The left panel illustrates the elimination of model uncertainty and risk for a type II agent.
 
-The grey fan shows a one-standard-deviation band for the $j$-step-ahead conditional distribution of $c_t$ under the calibrated random walk model.
+The shaded fan shows a one-standard-deviation band for the $j$-step-ahead conditional distribution of $c_t$ under the calibrated random-walk model.
 
-The dash-dot line $c^{II}$ shows the certainty equivalent path whose date-zero consumption is reduced by $c_0 - c_0^{II}$, making the type II agent indifferent between this deterministic trajectory and the stochastic plan --- it compensates for bearing both risk and model ambiguity.
+The dashed line $c^{II}$ shows the certainty-equivalent path whose date-zero consumption is reduced by $c_0 - c_0^{II}$, making the type II agent indifferent between this deterministic trajectory and the stochastic plan; it compensates for bearing both risk and model ambiguity.
 
 The solid line $c^r$ shows the certainty equivalent for a type II agent without model uncertainty ($\theta = \infty$), initialized at $c_0 - c_0^{II}(r)$.
-At postwar calibrated values this gap is tiny, so $c^r$ sits just below the centre of the fan.
+At postwar calibrated values this gap is small, so $c^r$ sits just below the center of the fan.
 
 Consistent with {cite:t}`Lucas_2003`, the welfare gains from eliminating well-understood risk are very small.
-We reinterpret the large welfare gains found by {cite:t}`Tall2000` as coming not from reducing risk, but from reducing model uncertainty.
 
-The right panel shows the cloud of nearby models that the robust consumer guards against.
-Each grey fan depicts a one-standard-deviation band for a different model in the ambiguity set.
+The large welfare gains found by {cite:t}`Tall2000` can be reinterpreted as arising not from reducing risk, but from reducing model uncertainty.
+
+The right panel shows the set of nearby models that the robust consumer guards against.
+
+Each shaded fan depicts a one-standard-deviation band for a different model in the ambiguity set.
+
 The models are statistically close to the baseline --- their detection-error probability is $p = 0.10$ --- but imply very different long-run consumption levels.
-The consumer's caution against such alternatives drives the large certainty-equivalent gap in the left panel.
+
+The consumer's caution against such alternatives accounts for the large certainty-equivalent gap in the left panel.
 
 ## How large are the welfare gains from resolving model uncertainty?
 
@@ -1477,7 +1628,7 @@ A type III (constraint-preference) agent evaluates the worst model inside an ent
 
 As $\eta$ grows, the set of plausible misspecifications expands and the welfare cost of confronting model uncertainty rises.
 
-Because $\eta$ is abstract, we instead index these costs by the associated detection error probability $p(\eta)$.
+Because $\eta$ is not directly interpretable, we instead index these costs by the associated detection-error probability $p(\eta)$.
 
 The figure below plots compensation for removing model uncertainty, measured as a proportion of consumption, against $p(\eta)$.
 
@@ -1531,29 +1682,29 @@ ax.set_xlabel(r"detection error probability $p(\eta)$ (percent)")
 ax.set_ylabel("proportion of consumption (percent)")
 ax.set_xlim(0.0, 50.0)
 ax.set_ylim(0.0, 30.0)
-ax.legend(frameon=False, loc="upper right")
+ax.legend(frameon=False)
 
 plt.tight_layout()
 plt.show()
 ```
 
-The random-walk model delivers somewhat larger costs than the trend-stationary model at the same detection-error probability, but both curves dwarf the classic Lucas cost of business cycles.
+The random-walk model implies somewhat larger costs than the trend-stationary model at the same detection-error probability, but both curves greatly exceed the classic Lucas cost of business cycles.
 
-To put the magnitudes in perspective: Lucas estimated that eliminating all aggregate consumption risk is worth roughly 0.05% of consumption.
+To put these magnitudes in perspective, Lucas estimated that eliminating all aggregate consumption risk is worth roughly 0.05% of consumption.
 
 At detection-error probabilities of 10--20%, the model-uncertainty compensation alone runs to several percent of consumption.
 
-The large risk premia that Tallarini matched with high $\gamma$ are, under the robust reading, compensations for bearing model uncertainty --- and the implied welfare gains from resolving that uncertainty are correspondingly large.
+Under the robust reading, the large risk premia that Tallarini matched with high $\gamma$ are compensations for bearing model uncertainty, and the implied welfare gains from resolving that uncertainty are correspondingly large.
 
 ## Why doesn't learning eliminate these fears?
 
-A natural objection: if the consumer has 235 quarters of data, why can't she learn the true drift well enough to dismiss the worst-case model?
+A natural objection is: if the consumer has 235 quarters of data, why can't she learn the true drift well enough to dismiss the worst-case model?
 
-The answer is that drift is a low-frequency feature.
+The answer is that the drift is a low-frequency feature of the data.
 
 Estimating the mean of a random walk to the precision needed to reject small but economically meaningful shifts requires far more data than estimating volatility.
 
-The figure below makes this concrete.
+The following figure makes this point concrete.
 
 Consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
 
@@ -1704,7 +1855,7 @@ ax.axhline(1_000.0 * upper_band, lw=2, ls="--", color="gray", label=r"$\hat\mu \
 ax.axhline(1_000.0 * lower_band, lw=2, ls="--", color="gray")
 ax.set_xlabel("detection error probability (percent)")
 ax.set_ylabel(r"mean consumption growth ($\times 10^{-3}$)")
-ax.legend(frameon=False, fontsize=8, loc="upper right")
+ax.legend(frameon=False, fontsize=8)
 ax.set_xlim(0.0, 50.0)
 
 plt.tight_layout()
@@ -1721,17 +1872,17 @@ The dashed gray lines mark a two-standard-error band around the maximum-likeliho
 
 Even at detection probabilities in the 5--20% range, the worst-case drift remains inside (or very near) this confidence band.
 
-Drift distortions that are economically large --- large enough to generate substantial model-uncertainty premia --- are statistically small relative to sampling uncertainty in $\hat\mu$.
+Drift distortions that are economically large---large enough to generate substantial model-uncertainty premia---are statistically small relative to sampling uncertainty in $\hat\mu$.
 
-Robustness concerns survive long histories precisely because the low-frequency features that matter most for pricing are the hardest to pin down.
+Robustness concerns persist despite long histories precisely because the low-frequency features that matter most for pricing are the hardest to estimate precisely.
 
 ## Concluding remarks
 
-The title asks a question: are large risk premia prices of **variability** (atemporal risk aversion) or prices of **doubts** (model uncertainty)?
+The title of this lecture poses a question: are large risk premia prices of **variability** (atemporal risk aversion) or prices of **doubts** (model uncertainty)?
 
 The analysis above shows that the answer cannot be settled by asset-pricing data alone, because the two interpretations are observationally equivalent.
 
-But the choice matters enormously for what we conclude.
+But the choice of interpretation matters for the conclusions we draw.
 
 Under the risk-aversion reading, high Sharpe ratios imply that consumers would pay a great deal to smooth known aggregate consumption fluctuations.
 
@@ -1747,16 +1898,16 @@ Whether one ultimately prefers the risk or the uncertainty interpretation, the f
 
 ## Exercises
 
-The exercises below ask you to fill in several derivation steps.
+The following exercises ask you to fill in several derivation steps.
 
 ```{exercise}
 :label: dov_ex1
 
 Let $R_{t+1}$ be an $n \times 1$ vector of gross returns with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$.
 
-Let $m_{t+1}$ be a valid stochastic discount factor satisfying $\mathbb{1} = E[m_{t+1}\,R_{t+1}]$.
+Let $m_{t+1}$ be a stochastic discount factor satisfying $\mathbf{1} = E[m_{t+1}\,R_{t+1}]$.
 
-1. Use the covariance decomposition $E[mR] = E[m]\,E[R] + \operatorname{cov}(m,R)$ to show that $\operatorname{cov}(m,R) = \mathbb{1} - E[m]\,E[R] =: b$.
+1. Use the covariance decomposition $E[mR] = E[m]\,E[R] + \operatorname{cov}(m,R)$ to show that $\operatorname{cov}(m,R) = \mathbf{1} - E[m]\,E[R] =: b$.
 2. For a portfolio with weight vector $\alpha$ and return $R^p = \alpha^\top R$, show that $\operatorname{cov}(m, R^p) = \alpha^\top b$.
 3. Apply the Cauchy--Schwarz inequality to the pair $(m, R^p)$ to obtain $|\alpha^\top b| \leq \sigma(m)\,\sqrt{\alpha^\top \Sigma_R\,\alpha}$.
 4. Maximize the ratio $|\alpha^\top b|/\sqrt{\alpha^\top \Sigma_R\,\alpha}$ over $\alpha$ and show that the maximum is $\sqrt{b^\top \Sigma_R^{-1} b}$, attained at $\alpha^\star = \Sigma_R^{-1}b$.
@@ -1767,7 +1918,7 @@ Let $m_{t+1}$ be a valid stochastic discount factor satisfying $\mathbb{1} = E[m
 :class: dropdown
 ```
 
-**Part 1.** From $\mathbb{1} = E[m\,R] = E[m]\,E[R] + \operatorname{cov}(m,R)$, rearranging gives $\operatorname{cov}(m,R) = \mathbb{1} - E[m]\,E[R] \equiv b$.
+**Part 1.** From $\mathbf{1} = E[m\,R] = E[m]\,E[R] + \operatorname{cov}(m,R)$, rearranging gives $\operatorname{cov}(m,R) = \mathbf{1} - E[m]\,E[R]= b$.
 
 **Part 2.** The portfolio return is $R^p = \alpha^\top R$, so
 
@@ -1775,8 +1926,8 @@ $$
 \operatorname{cov}(m, R^p) = \operatorname{cov}(m, \alpha^\top R) = \alpha^\top \operatorname{cov}(m, R) = \alpha^\top b.
 $$
 
-**Part 3.** The Cauchy--Schwarz inequality for any two random variables $X, Y$ states $|\operatorname{cov}(X,Y)| \leq \sigma(X)\,\sigma(Y)$.
-Applying it to $(m, R^p)$:
+**Part 3.** 
+Applying Cauchy--Schwarz inequality to $(m, R^p)$:
 
 $$
 |\alpha^\top b| = |\operatorname{cov}(m, R^p)| \leq \sigma(m)\,\sigma(R^p) = \sigma(m)\,\sqrt{\alpha^\top \Sigma_R\,\alpha}.
@@ -1789,7 +1940,8 @@ $$
 $$
 
 To maximize the left-hand side over $\alpha$, define the $\Sigma_R$-inner product $\langle u, v \rangle_{\Sigma} = u^\top \Sigma_R\, v$.
-Insert $I = \Sigma_R \Sigma_R^{-1}$:
+
+Insert $I = \Sigma_R \Sigma_R^{-1}$ gives
 
 $$
 \alpha^\top b
@@ -1809,14 +1961,15 @@ $$
 $$
 
 with equality when $\alpha \propto \Sigma_R^{-1} b$.
+
 Substituting $\alpha^\star = \Sigma_R^{-1} b$ confirms
 
 $$
 \max_\alpha \frac{|\alpha^\top b|}{\sqrt{\alpha^\top \Sigma_R\,\alpha}} = \sqrt{b^\top \Sigma_R^{-1} b}.
 $$
 
-**Part 5.** Combining Parts 3 and 4, $\sqrt{b^\top \Sigma_R^{-1} b} \leq \sigma(m)$.
-Dividing both sides by $E[m] > 0$ yields {eq}`bhs_hj_unconditional`.
+**Part 5.** Combining Parts 3 and 4 gives $\sqrt{b^\top \Sigma_R^{-1} b} \leq \sigma(m)$ and 
+dividing by $E[m] > 0$ yields {eq}`bhs_hj_unconditional`. 
 
 ```{solution-end}
 ```
@@ -1824,11 +1977,12 @@ Dividing both sides by $E[m] > 0$ yields {eq}`bhs_hj_unconditional`.
 ```{exercise}
 :label: dov_ex2
 
-Combine the SDF representation {eq}`bhs_sdf` with the random-walk consumption dynamics and the Gaussian mean-shift distortion to show that $\log m_{t+1}$ is normally distributed under the approximating model.
+Combine the SDF representation {eq}`bhs_sdf` with the random-walk consumption dynamics and the Gaussian mean-shift distortion to derive closed-form SDF moments.
 
-1. Compute its mean and variance in terms of $(\beta,\mu,\sigma_\varepsilon,w)$.
+1. Show that $\log m_{t+1}$ is normally distributed under the approximating model and compute its mean and variance in terms of $(\beta,\mu,\sigma_\varepsilon,w)$.
 2. Use lognormal moments to derive expressions for $E[m]$ and $\sigma(m)/E[m]$.
-3. Use the parameter mapping $\theta = [(1-\beta)(\gamma-1)]^{-1}$ and the associated $w$ to obtain {eq}`bhs_Em_rw` and {eq}`bhs_sigma_rw`.
+3. Use the parameter mapping $\theta = [(1-\beta)(\gamma-1)]^{-1}$ and the associated $w$ to obtain closed-form expressions for the random-walk model.
+4. Explain why $E[m]$ stays roughly constant while $\sigma(m)/E[m]$ grows linearly with $\gamma$.
 ```
 
 ```{solution-start} dov_ex2
@@ -1925,7 +2079,35 @@ $$
 (w-\sigma_\varepsilon)^2 = (-\sigma_\varepsilon\gamma)^2=\sigma_\varepsilon^2\gamma^2.
 
 $$
-Substituting yields {eq}`bhs_Em_rw` and {eq}`bhs_sigma_rw`.
+Substituting gives the closed-form expressions for the random-walk model:
+
+```{math}
+:label: bhs_Em_rw
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
+```
+
+```{math}
+:label: bhs_sigma_rw
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
+```
+
+Notice that in {eq}`bhs_Em_rw`, because $\sigma_\varepsilon$ is small ($\approx 0.005$), the term $\frac{\sigma_\varepsilon^2}{2}(2\gamma-1)$ grows slowly with $\gamma$, keeping $E[m]$ roughly constant near $1/(1+r^f)$.
+
+Meanwhile {eq}`bhs_sigma_rw` shows that $\sigma(m)/E[m] \approx \sigma_\varepsilon \gamma$ grows linearly with $\gamma$.
+
+This is how Epstein--Zin preferences push volatility toward the HJ bound without distorting the risk-free rate.
+
+An analogous calculation for the trend-stationary model yields:
+
+```{math}
+:label: bhs_Em_ts
+E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
+```
+
+```{math}
+:label: bhs_sigma_ts
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
+```
 
 ```{solution-end}
 ```
@@ -1994,7 +2176,8 @@ $$
 -E_t[U_{t+1}]/\theta + o(1/\theta),
 $$
 
-so $-\theta\log E_t[\exp(-U_{t+1}/\theta)] \to E_t[U_{t+1}]$ and the recursion converges to
+so $-\theta\log E_t[\exp(-U_{t+1}/\theta)] \to E_t[U_{t+1}]$ as 
+$\theta\to\infty$ and the recursion converges to
 
 $$
 U_t = c_t + \beta E_t U_{t+1}.
@@ -2519,3 +2702,202 @@ Together these reproduce {eq}`bhs_type2_rw_decomp`.
 
 ```{solution-end}
 ```
+
+```{exercise}
+:label: dov_ex9
+
+Verify the closed-form value function {eq}`bhs_W_rw` for the random-walk model by substituting a guess of the form $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ into the risk-sensitive Bellman equation {eq}`bhs_bellman_type1`.
+
+1. Under the random walk $c_{t+1} = c_t + \mu + \sigma_\varepsilon \varepsilon_{t+1}$, show that $W(Ax_t + B\varepsilon) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d]$.
+2. Substitute into the $\log E\exp$ term, using the fact that for $Z \sim \mathcal{N}(\mu_Z, \sigma_Z^2)$ we have $\log E[\exp(Z)] = \mu_Z + \frac{1}{2}\sigma_Z^2$.
+3. Solve for $d$ and confirm that it matches {eq}`bhs_W_rw`.
+```
+
+```{solution-start} dov_ex9
+:class: dropdown
+```
+
+**Part 1.** Under the random walk, $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$. Substituting the guess $W(x) = \frac{1}{1-\beta}[Hx + d]$ with $Hx_t = c_t$:
+
+$$
+W(Ax_t + B\varepsilon_{t+1}) = \frac{1}{1-\beta}\bigl[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d\bigr].
+$$
+
+**Part 2.** The Bellman equation {eq}`bhs_bellman_type1` requires computing
+
+$$
+-\beta\theta\log E_t\!\left[\exp\!\left(\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}\right)\right].
+$$
+
+Substituting the guess:
+
+$$
+\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}
+=
+\frac{-1}{(1-\beta)\theta}\bigl[c_t + \mu + d + \sigma_\varepsilon\varepsilon_{t+1}\bigr].
+$$
+
+This is an affine function of the standard normal $\varepsilon_{t+1}$, so the argument of the $\log E\exp$ is normal with
+
+$$
+\mu_Z = \frac{-(c_t + \mu + d)}{(1-\beta)\theta},
+\qquad
+\sigma_Z^2 = \frac{\sigma_\varepsilon^2}{(1-\beta)^2\theta^2}.
+$$
+
+Using $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$:
+
+$$
+-\beta\theta\!\left[\frac{-(c_t + \mu + d)}{(1-\beta)\theta} + \frac{\sigma_\varepsilon^2}{2(1-\beta)^2\theta^2}\right]
+=
+\frac{\beta}{1-\beta}\!\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
+$$
+
+**Part 3.** The Bellman equation becomes
+
+$$
+\frac{1}{1-\beta}[c_t + d]
+=
+c_t + \frac{\beta}{1-\beta}\!\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
+$$
+
+Expanding the right-hand side:
+
+$$
+c_t + \frac{\beta c_t}{1-\beta} + \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}
+=
+\frac{c_t}{1-\beta} + \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
+$$
+
+Equating both sides and cancelling $\frac{c_t}{1-\beta}$:
+
+$$
+\frac{d}{1-\beta} = \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
+$$
+
+Solving: $d - \beta d = \beta\mu - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)\theta}$, so
+
+$$
+d = \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right),
+$$
+
+confirming {eq}`bhs_W_rw`.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex10
+
+Derive the trend-stationary risk compensation stated in the lecture.
+
+For the trend-stationary model with $\tilde c_{t+1} - \zeta = \rho(\tilde c_t - \zeta) + \sigma_\varepsilon\varepsilon_{t+1}$, where $\tilde c_t = c_t - \mu t$, compute the risk compensation $\Delta c_0^{risk,\,ts}$ by comparing expected log utility under the stochastic plan to the deterministic certainty-equivalent path, and show that
+
+$$
+\Delta c_0^{risk,\,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)}.
+$$
+
+*Hint:* You will need $\operatorname{Var}(z_t) = \sigma_\varepsilon^2(1 + \rho^2 + \cdots + \rho^{2(t-1)})$ and the formula $\sum_{t \geq 1}\beta^t \sum_{j=0}^{t-1}\rho^{2j} = \frac{\beta}{(1-\beta)(1-\beta\rho^2)}$.
+```
+
+```{solution-start} dov_ex10
+:class: dropdown
+```
+
+Under the trend-stationary model with $z_0 = 0$, $c_t = c_0 + \mu t + z_t$ and $E[c_t] = c_0 + \mu t$ (since $E[z_t] = 0$).
+
+The deterministic certainty-equivalent path matches $E[C_t] = \exp(c_0 + \mu t + \frac{1}{2}\operatorname{Var}(z_t))$, so its log is $c_0^{ce} + \mu t + \frac{1}{2}\operatorname{Var}(z_t)$.
+
+Under expected log utility ($\theta = \infty$), the value of the stochastic plan is
+
+$$
+\sum_{t \geq 0}\beta^t E[c_t] = \frac{c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2}.
+$$
+
+The value of the certainty-equivalent path (matching mean levels) starting from $c_0 - \Delta c_0^{risk}$ is
+
+$$
+\sum_{t \geq 0}\beta^t \bigl[c_0 - \Delta c_0^{risk} + \mu t + \tfrac{1}{2}\operatorname{Var}(z_t)\bigr].
+$$
+
+Since $\operatorname{Var}(z_t) = \sigma_\varepsilon^2 \sum_{j=0}^{t-1}\rho^{2j}$, the extra term sums to
+
+$$
+\sum_{t \geq 1}\beta^t \cdot \frac{\sigma_\varepsilon^2}{2}\sum_{j=0}^{t-1}\rho^{2j}
+= \frac{\sigma_\varepsilon^2}{2}\cdot\frac{\beta}{(1-\beta)(1-\beta\rho^2)}.
+$$
+
+Equating values and solving:
+
+$$
+\frac{\Delta c_0^{risk}}{1-\beta} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)(1-\beta\rho^2)}
+\quad\Rightarrow\quad
+\Delta c_0^{risk,\,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)}.
+$$
+
+The uncertainty compensation follows from the value function: $\Delta c_0^{unc,\,ts,\,II} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho)^2\theta}$, with the $(1-\beta)$ factors replaced by $(1-\beta\rho)$ because the worst-case mean shift scales with $1/(1-\beta\rho)$ rather than $1/(1-\beta)$.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex11
+
+Derive the worst-case mean shifts {eq}`bhs_w_formulas` for both consumption models.
+
+Recall that the worst-case distortion {eq}`bhs_ghat` has $\hat g \propto \exp(-W(x_{t+1})/\theta)$.
+
+When $W$ is linear in the state, the exponent is linear in $\varepsilon_{t+1}$, and the Gaussian mean shift is $w = -\lambda/\theta$ where $\lambda$ is the coefficient on $\varepsilon_{t+1}$ in $W(x_{t+1})$.
+
+1. Random-walk model: Guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$. Using $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$, find $\lambda$ and show that $w = -\sigma_\varepsilon/[(1-\beta)\theta]$.
+
+2. Trend-stationary model: Write $z_t = \tilde c_t - \zeta$ and guess $W(x_t) = \frac{1}{1-\beta}[c_t + \alpha_1 z_t + \alpha_0]$. Show that:
+   - The coefficient on $\varepsilon_{t+1}$ in $W(x_{t+1})$ is $(1+\alpha_1)\sigma_\varepsilon/(1-\beta)$.
+   - Matching coefficients on $z_t$ in the Bellman equation gives $\alpha_1 = \beta(\rho-1)/(1-\beta\rho)$.
+   - Therefore $1+\alpha_1 = (1-\beta)/(1-\beta\rho)$ and $w = -\sigma_\varepsilon/[(1-\beta\rho)\theta]$.
+```
+
+```{solution-start} dov_ex11
+:class: dropdown
+```
+
+**Part 1.**
+Under the guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ and $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$,
+
+$$
+W(x_{t+1}) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d].
+$$
+
+The coefficient on $\varepsilon_{t+1}$ is $\lambda = \sigma_\varepsilon/(1-\beta)$, so $w = -\lambda/\theta = -\sigma_\varepsilon/[(1-\beta)\theta]$.
+
+**Part 2.**
+Under the guess $W(x_t) = \frac{1}{1-\beta}[c_t + \alpha_1 z_t + \alpha_0]$ with $c_{t+1} = c_t + \mu + (\rho-1)z_t + \sigma_\varepsilon\varepsilon_{t+1}$ and $z_{t+1} = \rho z_t + \sigma_\varepsilon\varepsilon_{t+1}$,
+
+$$
+W(x_{t+1}) = \tfrac{1}{1-\beta}\bigl[c_t + \mu + (\rho-1)z_t + \sigma_\varepsilon\varepsilon_{t+1} + \alpha_1(\rho z_t + \sigma_\varepsilon\varepsilon_{t+1}) + \alpha_0\bigr].
+$$
+
+The coefficient on $\varepsilon_{t+1}$ is $(1+\alpha_1)\sigma_\varepsilon/(1-\beta)$.
+
+To find $\alpha_1$, substitute the guess into the Bellman equation.
+
+The factors of $\frac{1}{1-\beta}$ cancel on both sides, and matching coefficients on $z_t$ gives
+
+$$
+\alpha_1 = \beta\bigl[(\rho-1) + \alpha_1\rho\bigr]
+\quad\Rightarrow\quad
+\alpha_1(1-\beta\rho) = \beta(\rho-1)
+\quad\Rightarrow\quad
+\alpha_1 = \frac{\beta(\rho-1)}{1-\beta\rho}.
+$$
+
+Therefore
+
+$$
+1+\alpha_1 = \frac{1-\beta\rho + \beta(\rho-1)}{1-\beta\rho} = \frac{1-\beta}{1-\beta\rho},
+$$
+
+and the coefficient on $\varepsilon_{t+1}$ becomes $(1+\alpha_1)\sigma_\varepsilon/(1-\beta) = \sigma_\varepsilon/(1-\beta\rho)$, giving $w = -\sigma_\varepsilon/[(1-\beta\rho)\theta]$.
+
+```{solution-end}
+```

From a2732d5dbce31210a1dd165bb558b02f7e22e669 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 16:34:39 +1100
Subject: [PATCH 23/37] updates

---
 lectures/doubts_or_variability.md | 479 +++++++++++++++++++-----------
 1 file changed, 301 insertions(+), 178 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 602571dad..7076d1c09 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -43,7 +43,7 @@ But matching required setting the risk-aversion coefficient $\gamma$ to around 5
 
 Their answer --- and the theme of this lecture --- is that much of what looks like "risk aversion" can be reinterpreted as **model uncertainty**.
 
-The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max–min recursion in which the agent fears that the probability model governing consumption growth may be wrong.
+The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max--min recursion in which the agent fears that the probability model governing consumption growth may be wrong.
 
 Under this reading, the parameter that looked like extreme risk aversion instead measures concern about **misspecification**.
 
@@ -115,7 +115,7 @@ A random variable $m_{t+1}$ is said to be a **stochastic discount factor** if it
 
 ```{math}
 :label: bhs_pricing_eq
-p_t = E_t(m_{t+1}\, y_{t+1}),
+p_t = E_t(m_{t+1}  y_{t+1}),
 ```
 
 where $E_t$ denotes the mathematical expectation conditioned on date-$t$ information.
@@ -133,7 +133,7 @@ Setting $y_{t+1} = 1$ (a risk-free bond) in {eq}`bhs_pricing_eq` yields the reci
 
 ```{math}
 :label: bhs_riskfree
-\frac{1}{R_t^f} = E_t[m_{t+1}] = E_t\!\left[\beta\left(\frac{C_{t+1}}{C_t}\right)^{-\gamma}\right].
+\frac{1}{R_t^f} = E_t[m_{t+1}] = E_t \left[\beta\left(\frac{C_{t+1}}{C_t}\right)^{-\gamma}\right].
 ```
 
 ### The Hansen--Jagannathan bound
@@ -151,15 +151,15 @@ An excess return is the payoff on a zero-cost portfolio that is long one dollar
 Because the portfolio costs nothing to enter, its price is $p_t = 0$, so {eq}`bhs_pricing_eq` implies
 
 $$
-0 = E_t[m_{t+1}\,\xi_{t+1}].
+0 = E_t[m_{t+1} \xi_{t+1}].
 $$
 
 We can decompose the expectation of a product into a covariance plus a product of expectations:
 
 $$
-E_t[m_{t+1}\,\xi_{t+1}]
+E_t[m_{t+1} \xi_{t+1}]
 =
-\operatorname{cov}_t(m_{t+1},\,\xi_{t+1}) + E_t[m_{t+1}]\,E_t[\xi_{t+1}],
+\operatorname{cov}_t(m_{t+1},\xi_{t+1}) + E_t[m_{t+1}] E_t[\xi_{t+1}],
 $$
 
 where $\operatorname{cov}_t$ denotes the conditional covariance and $\sigma_t$ will denote the conditional standard deviation.
@@ -167,21 +167,21 @@ where $\operatorname{cov}_t$ denotes the conditional covariance and $\sigma_t$ w
 Setting the left-hand side to zero and solving for the expected excess return gives
 
 $$
-E_t[\xi_{t+1}] = -\frac{\operatorname{cov}_t(m_{t+1},\,\xi_{t+1})}{E_t[m_{t+1}]}.
+E_t[\xi_{t+1}] = -\frac{\operatorname{cov}_t(m_{t+1}, \xi_{t+1})}{E_t[m_{t+1}]}.
 $$
 
-Taking absolute values and applying the **Cauchy--Schwarz inequality** $|\operatorname{cov}(X,Y)| \leq \sigma(X)\,\sigma(Y)$ yields
+Taking absolute values and applying the **Cauchy--Schwarz inequality** $|\operatorname{cov}(X,Y)| \leq \sigma(X) \sigma(Y)$ yields
 
 ```{math}
 :label: bhs_hj_bound
 \frac{|E_t[\xi_{t+1}]|}{\sigma_t(\xi_{t+1})}
-\;\leq\;
+\leq
 \frac{\sigma_t(m_{t+1})}{E_t[m_{t+1}]}.
 ```
 
 The left-hand side of {eq}`bhs_hj_bound` is the **Sharpe ratio**: the expected excess return per unit of return volatility.
 
-The right-hand side, $\sigma_t(m)/E_t(m)$, is the **market price of risk**: the maximum Sharpe ratio attainable in the market. 
+The right-hand side, $\sigma_t(m)/E_t(m)$, is the **market price of risk**: the maximum Sharpe ratio attainable in the market.
 
 The bound says that the Sharpe ratio of any asset cannot exceed the market price of risk.
 
@@ -189,20 +189,20 @@ The bound says that the Sharpe ratio of any asset cannot exceed the market price
 
 The bound {eq}`bhs_hj_bound` is stated in conditional terms.
 
-An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$ 
+An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$:
 
 ```{math}
 :label: bhs_hj_unconditional
-\frac{\sigma(m)}{E(m)}
-\;\geq\;
+\sigma(m)
+\geq
 \sqrt{b^\top \Sigma_R^{-1} b},
 \qquad
-b = \mathbf{1} - E(m)\, E(R).
+b = \mathbf{1} - E(m) E(R).
 ```
 
 In {ref}`Exercise 1 <dov_ex1>`, we will revisit and verify this unconditional version of the HJ bound.
 
-Below we implement a function that computes the right-hand side of {eq}`bhs_hj_unconditional` for any given value of $E(m)$
+Below we implement a function that computes the right-hand side of {eq}`bhs_hj_unconditional` for any given value of $E(m)$.
 
 ```{code-cell} ipython3
 def hj_std_bound(E_m):
@@ -230,7 +230,7 @@ We present this figure before developing the underlying theory because it motiva
 
 The closed-form expressions for the Epstein--Zin SDF moments used in the plot are derived in {ref}`Exercise 2 <dov_ex2>`.
 
-The code below implements those expressions and the corresponding CRRA moments
+The code below implements those expressions and the corresponding CRRA moments.
 
 ```{code-cell} ipython3
 def moments_type1_rw(γ):
@@ -262,7 +262,7 @@ def moments_crra_rw(γ):
     return E_m, mpr
 ```
 
-For each value of $\gamma \in \{1, 5, 10, \ldots, 50\}$, we plot the implied $(E(m),\;\sigma(m))$ pair for three specifications: time-separable CRRA (crosses), Epstein--Zin preferences with random-walk consumption (circles), and Epstein--Zin preferences with trend-stationary consumption (pluses).
+For each value of $\gamma \in \{1, 5, 10, \ldots, 51\}$, we plot the implied $(E(m),\sigma(m))$ pair for three specifications: time-separable CRRA (crosses), Epstein--Zin preferences with random-walk consumption (circles), and Epstein--Zin preferences with trend-stationary consumption (pluses).
 
 
 ```{code-cell} ipython3
@@ -272,7 +272,7 @@ mystnb:
     caption: SDF moments and Hansen-Jagannathan bound
     name: fig-bhs-1
 ---
-γ_grid = np.arange(1, 51, 5)
+γ_grid = np.arange(1, 55, 5)
 
 Em_rw = np.array([moments_type1_rw(γ)[0] for γ in γ_grid])
 σ_m_rw = np.array([moments_type1_rw(γ)[0] * moments_type1_rw(γ)[1] for γ in γ_grid])
@@ -344,7 +344,7 @@ where the eigenvalues of $A$ are bounded in modulus by $1/\sqrt{\beta}$.
 The time-$t$ consumption can therefore be written as
 
 ```{math}
-c_t = H\!\left(B\varepsilon_t + AB\varepsilon_{t-1} + \cdots + A^{t-1}B\varepsilon_1\right) + HA^t x_0.
+c_t = H \left(B\varepsilon_t + AB\varepsilon_{t-1} + \cdots + A^{t-1}B\varepsilon_1\right) + HA^t x_0.
 ```
 
 The equivalence theorems and Bellman equations below hold for arbitrary plans in $\mathcal{C}(A,B,H;x_0)$.
@@ -399,7 +399,7 @@ Equivalently, defining the detrended series $\tilde c_t := c_t - \mu t$,
 
 The estimated parameters are $(\mu, \sigma_\varepsilon)$ for the random walk and $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the trend-stationary case.
 
-Below we record these parameters and moments in the paper's tables for later reference
+Below we record these parameters and moments in the paper's tables for later reference.
 
 ```{code-cell} ipython3
 print("Table 2 parameters")
@@ -477,7 +477,7 @@ The general Epstein--Zin--Weil specification aggregates current consumption and
 
 ```{math}
 :label: bhs_ez_general
-V_t = \left[(1-\beta)\, C_t^{\,\rho} + \beta\, \mathcal{R}_t(V_{t+1})^{\,\rho}\right]^{1/\rho},
+V_t = \left[(1-\beta) C_t^{\rho} + \beta \mathcal{R}_t(V_{t+1})^{\rho}\right]^{1/\rho},
 \qquad
 \rho := 1 - \frac{1}{\psi},
 ```
@@ -488,11 +488,11 @@ where $\psi > 0$ is the intertemporal elasticity of substitution and the certain
 :label: bhs_certainty_equiv
 \mathcal{R}_t(V_{t+1})
 =
-\left(E_t\!\left[V_{t+1}^{1-\gamma}\right]\right)^{\!\frac{1}{1-\gamma}}.
+\left(E_t\left[V_{t+1}^{1-\gamma}\right]\right)^{\frac{1}{1-\gamma}}.
 ```
 
 ```{note}
-For readers interested in a general class of aggregatiors and certainty equivalents, see Section 
+For readers interested in a general class of aggregators and certainty equivalents, see Section
 7.3 of {cite:t}`Sargent_Stachurski_2025`.
 ```
 
@@ -501,7 +501,7 @@ Let $\psi = 1$, so $\rho \to 0$.
 In this limit the CES aggregator reduces to
 
 $$
-V_t = C_t^{1-\beta} \cdot \mathcal{R}_t(V_{t+1})^{\,\beta}.
+V_t = C_t^{1-\beta} \cdot \mathcal{R}_t(V_{t+1})^{\beta}.
 $$
 
 Taking logs and expanding the certainty equivalent {eq}`bhs_certainty_equiv` gives the *type I recursion*:
@@ -534,7 +534,7 @@ Substituting into {eq}`bhs_type1_recursion` yields the **risk-sensitive recursio
 
 ```{math}
 :label: bhs_risk_sensitive
-U_t = c_t - \beta\theta \log E_t\!\left[\exp\!\left(\frac{-U_{t+1}}{\theta}\right)\right].
+U_t = c_t - \beta\theta \log E_t\left[\exp\left(\frac{-U_{t+1}}{\theta}\right)\right].
 ```
 
 When $\gamma = 1$ (equivalently $\theta = +\infty$), the $\log E \exp$ term reduces to $E_t U_{t+1}$ and the recursion becomes standard discounted expected log utility: $U_t = c_t + \beta E_t U_{t+1}$.
@@ -543,7 +543,7 @@ For consumption plans in $\mathcal{C}(A, B, H; x_0)$, the recursion {eq}`bhs_ris
 
 ```{math}
 :label: bhs_bellman_type1
-U(x) = c - \beta\theta \log \int \exp\!\left[\frac{-U(Ax + B\varepsilon)}{\theta}\right] \pi(\varepsilon)\,d\varepsilon.
+U(x) = c - \beta\theta \log \int \exp\left[\frac{-U(Ax + B\varepsilon)}{\theta}\right] \pi(\varepsilon)d\varepsilon.
 ```
 
 #### Deriving the stochastic discount factor
@@ -563,7 +563,7 @@ $$
 =
 -\beta\theta
 \frac{\exp(-U_{t+1}/\theta)  (-1/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}
-\underbrace{\frac{\partial U_{t+1}}{\partial c_{t+1}}}_{=\,1}
+\underbrace{\frac{\partial U_{t+1}}{\partial c_{t+1}}}_{=1}
 =
 \beta \frac{\exp(-U_{t+1}/\theta)}{E_t[\exp(-U_{t+1}/\theta)]}.
 $$
@@ -612,9 +612,9 @@ g_{t+1} \geq 0,
 G_0 = 1,
 ```
 
-define distorted conditional expectations: $\tilde E_t[b_{t+1}] = E_t[g_{t+1}\,b_{t+1}]$.
+define distorted conditional expectations: $\tilde E_t[b_{t+1}] = E_t[g_{t+1}b_{t+1}]$.
 
-The conditional relative entropy of the distortion is $E_t[g_{t+1}\log g_{t+1}]$, and the discounted entropy over the entire path is $\beta E\bigl[\sum_{t=0}^{\infty} \beta^t G_t\,E_t(g_{t+1}\log g_{t+1})\,\big|\,x_0\bigr]$.
+The conditional relative entropy of the distortion is $E_t[g_{t+1}\log g_{t+1}]$, and the discounted entropy over the entire path is $\beta E\bigl[\sum_{t=0}^{\infty} \beta^t G_tE_t(g_{t+1}\log g_{t+1})\big|x_0\bigr]$.
 
 
 A type II agent's *multiplier* preference ordering over consumption plans $C^\infty \in \mathcal{C}(A,B,H;x_0)$ is defined by
@@ -622,9 +622,9 @@ A type II agent's *multiplier* preference ordering over consumption plans $C^\in
 ```{math}
 :label: bhs_type2_objective
 \min_{\{g_{t+1}\}}
-\sum_{t=0}^{\infty} E\!\left\{\beta^t G_t
-\left[c_t + \beta\theta\,E_t\!\left(g_{t+1}\log g_{t+1}\right)\right]
-\,\Big|\, x_0\right\},
+\sum_{t=0}^{\infty} E\left\{\beta^t G_t
+\left[c_t + \beta\theta E_t\left(g_{t+1}\log g_{t+1}\right)\right]
+\Big| x_0\right\},
 ```
 
 where $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \geq 0$, and $G_0 = 1$.
@@ -637,14 +637,14 @@ The value function satisfies the Bellman equation
 :label: bhs_bellman_type2
 W(x)
 =
-c + \min_{g(\varepsilon) \geq 0}\;
-\beta \int \bigl[g(\varepsilon)\,W(Ax + B\varepsilon)
-+ \theta\,g(\varepsilon)\log g(\varepsilon)\bigr]\,\pi(\varepsilon)\,d\varepsilon
+c + \min_{g(\varepsilon) \geq 0}
+\beta \int \bigl[g(\varepsilon) W(Ax + B\varepsilon)
++ \theta g(\varepsilon)\log g(\varepsilon)\bigr] \pi(\varepsilon) d\varepsilon
 ```
 
-subject to $\int g(\varepsilon)\,\pi(\varepsilon)\,d\varepsilon = 1$.
+subject to $\int g(\varepsilon) \pi(\varepsilon) d\varepsilon = 1$.
 
-Inside the integral, $g(\varepsilon)\,W(Ax + B\varepsilon)$ is the continuation value under the distorted model $g\pi$, while $\theta\,g(\varepsilon)\log g(\varepsilon)$ is the entropy penalty that makes large departures from the approximating model $\pi$ costly.
+Inside the integral, $g(\varepsilon) W(Ax + B\varepsilon)$ is the continuation value under the distorted model $g\pi$, while $\theta g(\varepsilon)\log g(\varepsilon)$ is the entropy penalty that makes large departures from the approximating model $\pi$ costly.
 
 The minimizer is ({ref}`Exercise 4 <dov_ex4>` derives this and verifies the equivalence $W \equiv U$)
 
@@ -652,7 +652,7 @@ The minimizer is ({ref}`Exercise 4 <dov_ex4>` derives this and verifies the equi
 :label: bhs_ghat
 \hat g_{t+1}
 =
-\frac{\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)}{E_t\!\left[\exp\!\bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)\right]}.
+\frac{\exp \bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)}{E_t \left[\exp \bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)\right]}.
 ```
 
 The fact that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty is the key structural feature that makes $\hat g$ a likelihood ratio.
@@ -660,7 +660,7 @@ The fact that $g(\varepsilon)$ multiplies both the continuation value $W$ and th
 
 Substituting {eq}`bhs_ghat` back into {eq}`bhs_bellman_type2` gives
 
-$$W(x) = c - \beta\theta \log \int \exp\!\left[\frac{-W(Ax + B\varepsilon)}{\theta}\right]\pi(\varepsilon)\,d\varepsilon,$$
+$$W(x) = c - \beta\theta \log \int \exp \left[\frac{-W(Ax + B\varepsilon)}{\theta}\right]\pi(\varepsilon) d\varepsilon,$$
 
 which is identical to {eq}`bhs_bellman_type1`.
 
@@ -695,13 +695,13 @@ The agent minimizes expected discounted log consumption under the worst-case mod
 J(x_0)
 =
 \min_{\{g_{t+1}\}}
-\sum_{t=0}^{\infty} E\!\left[\beta^t G_t\,c_t \,\Big|\, x_0\right]
+\sum_{t=0}^{\infty} E \left[\beta^t G_t c_t \Big|  x_0\right]
 ```
 
 subject to $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \geq 0$, $G_0 = 1$, and
 
 ```{math}
-\beta E\!\left[\sum_{t=0}^{\infty} \beta^t G_t\,E_t\!\left(g_{t+1}\log g_{t+1}\right)\,\Big|\,x_0\right] \leq \eta.
+\beta E \left[\sum_{t=0}^{\infty} \beta^t G_t E_t\left(g_{t+1}\log g_{t+1}\right)\Big|x_0\right] \leq \eta.
 ```
 
 The Lagrangian for the type III problem is
@@ -709,10 +709,10 @@ The Lagrangian for the type III problem is
 $$
 \mathcal{L}
 =
-\sum_{t=0}^{\infty} E\!\left[\beta^t G_t\,c_t \,\Big|\, x_0\right]
-\;+\;
-\theta\!\left[
-\beta E\!\left(\sum_{t=0}^{\infty} \beta^t G_t\,E_t(g_{t+1}\log g_{t+1})\,\Big|\,x_0\right) - \eta
+\sum_{t=0}^{\infty} E\left[\beta^t G_t c_t \Big| x_0\right]
++
+\theta \left[
+\beta E \left(\sum_{t=0}^{\infty} \beta^t G_t E_t(g_{t+1}\log g_{t+1})\Big| x_0 \right) - \eta
 \right],
 $$
 
@@ -723,14 +723,14 @@ Collecting terms inside the expectation gives
 $$
 \mathcal{L}
 =
-\sum_{t=0}^{\infty} E\!\left\{\beta^t G_t
-\left[c_t + \beta\theta\,E_t(g_{t+1}\log g_{t+1})\right]
-\,\Big|\, x_0\right\} - \theta\eta,
+\sum_{t=0}^{\infty} E \left \{\beta^t G_t
+\left[c_t + \beta \theta E_t(g_{t+1}\log g_{t+1})\right]
+\Big| x_0\right\} - \theta\eta,
 $$
 
 which, apart from the constant $-\theta\eta$, has the same structure as the type II objective {eq}`bhs_type2_objective`.
 
-The FOC for $g_{t+1}$ is therefore identical, and the optimal distortion is the same $\hat g_{t+1}$ as in {eq}`bhs_ghat` for the $\theta$ that makes the entropy constraint bind.
+The first-order condition for $g_{t+1}$ is therefore identical, and the optimal distortion is the same $\hat g_{t+1}$ as in {eq}`bhs_ghat` for the $\theta$ that makes the entropy constraint bind.
 
 The SDF is again $m_{t+1} = \beta(C_t/C_{t+1})\hat g_{t+1}$.
 
@@ -751,14 +751,14 @@ The joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ is the one assoc
 Under $\hat\Pi_\infty$ the agent has log utility, so the Euler equation for any gross return $R_{t+1}$ is
 
 $$
-1 = \hat E_t\!\left[\beta \frac{C_t}{C_{t+1}} R_{t+1}\right].
+1 = \hat E_t \left[\beta \frac{C_t}{C_{t+1}} R_{t+1}\right].
 $$
 
 To express this in terms of the approximating model $\Pi_\infty$, apply a change of measure using the one-step likelihood ratio $\hat g_{t+1} = d\hat\Pi / d\Pi$:
 
 $$
-1 = E_t\!\left[\hat g_{t+1} \cdot \beta \frac{C_t}{C_{t+1}} R_{t+1}\right]
-= E_t\!\left[m_{t+1}\, R_{t+1}\right],
+1 = E_t\left[\hat g_{t+1} \cdot \beta \frac{C_t}{C_{t+1}} R_{t+1}\right]
+= E_t\left[m_{t+1} R_{t+1}\right],
 $$
 
 so the effective SDF under the approximating model is $m_{t+1} = \beta(C_t/C_{t+1})\hat g_{t+1}$.
@@ -792,19 +792,19 @@ Substituting the minimizing $\hat g$ back into the Bellman equation {eq}`bhs_bel
 
 ```{math}
 :label: bhs_W_decomp_bellman
-W(x) = c + \beta \int \bigl[\hat g(\varepsilon)\,W(Ax + B\varepsilon) + \theta\,\hat g(\varepsilon)\log \hat g(\varepsilon)\bigr]\,\pi(\varepsilon)\,d\varepsilon.
+W(x) = c + \beta \int \bigl[\hat g(\varepsilon) W(Ax + B\varepsilon) + \theta \hat g(\varepsilon)\log \hat g(\varepsilon)\bigr] \pi(\varepsilon)d\varepsilon.
 ```
 
 Define two components:
 
 ```{math}
 :label: bhs_J_recursion
-J(x) = c + \beta \int \hat g(\varepsilon)\,J(Ax + B\varepsilon)\,\pi(\varepsilon)\,d\varepsilon,
+J(x) = c + \beta \int \hat g(\varepsilon) J(Ax + B\varepsilon) \pi(\varepsilon)d\varepsilon,
 ```
 
 ```{math}
 :label: bhs_N_recursion
-N(x) = \beta \int \hat g(\varepsilon)\bigl[\log \hat g(\varepsilon) + N(Ax + B\varepsilon)\bigr]\,\pi(\varepsilon)\,d\varepsilon.
+N(x) = \beta \int \hat g(\varepsilon)\bigl[\log \hat g(\varepsilon) + N(Ax + B\varepsilon)\bigr] \pi(\varepsilon)d\varepsilon.
 ```
 
 Then $W(x) = J(x) + \theta N(x)$.
@@ -813,7 +813,7 @@ Here $J(x_t) = \hat E_t \sum_{j=0}^{\infty} \beta^j c_{t+j}$ is expected discoun
 
 $J$ is the value function for both the type III and the type IV agent: the type III agent maximizes expected utility subject to an entropy constraint, and once the worst-case model is determined, the resulting value is expected discounted consumption under that model; the type IV agent uses the same worst-case model as a fixed belief, so evaluates the same expectation.
 
-And $N(x)$ is discounted continuation entropy: it measures the total information cost of the probability distortion from date $t$ onward.
+The term $N(x)$ is discounted continuation entropy: it measures the total information cost of the probability distortion from date $t$ onward.
 
 This decomposition will be important for the welfare calculations in {ref}`the welfare section <welfare_experiments>` below, where it explains why type III uncertainty compensation is twice that of type II.
 
@@ -833,6 +833,7 @@ The worst-case model therefore keeps the variance at one but shifts the mean of
 The resulting likelihood ratio is ({ref}`Exercise 5 <dov_ex5>` verifies its properties)
 
 ```{math}
+:label: bhs_ghat_gaussian
 \hat g_{t+1}
 =
 \exp\left(w \varepsilon_{t+1} - \frac{1}{2}w^2\right),
@@ -885,7 +886,7 @@ Because the distortion is i.i.d., the discounted entropy recursion {eq}`bhs_N_re
 
 ```{math}
 :label: bhs_eta_formula
-\eta = \frac{\beta}{2(1-\beta)}\,w(\theta)^2.
+\eta = \frac{\beta}{2(1-\beta)} w(\theta)^2.
 ```
 
 ```{code-cell} ipython3
@@ -917,14 +918,14 @@ Using the fact that $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$ for a normal r
 
 ```{math}
 :label: bhs_W_rw
-W(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
+W(x_t) = \frac{1}{1-\beta}\left[c_t + \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
 ```
 
 Using $W = J + \theta N$, the type III/IV value function is
 
 ```{math}
 :label: bhs_J_rw
-J(x_t) = W(x_t) - \theta N(x_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
+J(x_t) = W(x_t) - \theta N(x_t) = \frac{1}{1-\beta}\left[c_t + \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
 ```
 
 The coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ doubles from $\tfrac{1}{2}$ in $W$ to $1$ in $J$ because $W$ includes the entropy "rebate" $\theta N$ that partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model.
@@ -991,7 +992,7 @@ p(\theta^{-1})
 
 ```{math}
 :label: bhs_detection_closed
-p(\theta^{-1}) = \Phi\!\left(-\frac{|w(\theta)|\sqrt{T}}{2}\right).
+p(\theta^{-1}) = \Phi \left(-\frac{|w(\theta)|\sqrt{T}}{2}\right).
 ```
 
 ```{code-cell} ipython3
@@ -1095,10 +1096,10 @@ The explicit mapping that equates discounted entropy across models is ({ref}`Exe
 \theta_{\text{TS}}
 =
 \left(\frac{\sigma_\varepsilon^{\text{TS}}}{\sigma_\varepsilon^{\text{RW}}}\right)
-\frac{1-\beta}{1-\rho\beta}\;\theta_{\text{RW}}.
+\frac{1-\beta}{1-\rho\beta} \theta_{\text{RW}}.
 ```
 
-At our calibration $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$, this simplifies to $\theta_{\text{TS}} = \frac{1-\beta}{1-\rho\beta}\,\theta_{\text{RW}}$.
+At our calibration $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$, this simplifies to $\theta_{\text{TS}} = \frac{1-\beta}{1-\rho\beta}\theta_{\text{RW}}$.
 
 Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ is much less than one, so holding entropy fixed requires a substantially smaller $\theta$ (stronger robustness) for the trend-stationary model than for the random walk.
 
@@ -1106,7 +1107,7 @@ Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ i
 
 We now redraw Tallarini's figure using detection-error probabilities.
 
-For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$, invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m),\;\sigma(m))$ pair.
+For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$, invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m), \sigma(m))$ pair.
 
 ```{code-cell} ipython3
 p_points = np.array(
@@ -1231,10 +1232,10 @@ fw = norm.pdf(ε, w_star, 1)
 
 ax1.fill_between(ε, f0, alpha=0.15, color='k')
 ax1.plot(ε, f0, 'k', lw=2.5,
-         label=r'Approximating $\mathcal{N}(0,\,1)$')
+         label=r'Approximating $\mathcal{N}(0, 1)$')
 ax1.fill_between(ε, fw, alpha=0.15, color='C3')
 ax1.plot(ε, fw, 'C3', lw=2, ls='--',
-         label=f'Worst case $\\mathcal{{N}}({w_star:.2f},\\,1)$')
+         label=f'Worst case $\mathcal{{N}}({w_star:.2f},1)$')
 
 peak = norm.pdf(0, 0, 1)
 ax1.annotate('', xy=(w_star, 0.55 * peak), xytext=(0, 0.55 * peak),
@@ -1245,7 +1246,6 @@ ax1.text(w_star / 2, 0.59 * peak, f'$w = {w_star:.2f}$',
 ax1.set_xlabel(r'$\varepsilon_{t+1}$')
 ax1.set_ylabel('Density')
 ax1.legend(frameon=False)
-ax1.spines[['top', 'right']].set_visible(False)
 
 quarters = np.arange(0, 241)
 years = quarters / 4
@@ -1268,15 +1268,82 @@ ax2.text(61, gap_ts[-1], f'{gap_ts[-1]:.1f}%',
 ax2.set_xlabel('Years')
 ax2.set_ylabel('Gap in expected log consumption (%)')
 ax2.legend(frameon=False, loc='lower left')
-ax2.spines[['top', 'right']].set_visible(False)
 ax2.set_xlim(0, 68)
 
 plt.tight_layout()
 plt.show()
 ```
 
+The next figure decomposes the log SDF into two additive components.
+
+Taking logs of the SDF {eq}`bhs_sdf` gives
+
+$$
+\log m_{t+1}
+=
+\underbrace{\log \beta - \Delta c_{t+1}}_{\text{log-utility intertemporal MRS}}
++
+\underbrace{\log \hat g_{t+1}}_{\text{worst-case distortion}}.
+$$
+
+Under the random-walk model, $\Delta c_{t+1} = \mu + \sigma_\varepsilon \varepsilon_{t+1}$, and the Gaussian distortion {eq}`bhs_ghat_gaussian` gives $\log \hat g_{t+1} = w \varepsilon_{t+1} - \tfrac{1}{2}w^2$.
+
+Substituting, we can write
+
+$$
+\log m_{t+1}
+=
+\bigl(\log\beta - \mu - \tfrac{1}{2}w^2\bigr)
+-
+(\sigma_\varepsilon - w)\varepsilon_{t+1},
+$$
+
+so the slope of $\log m_{t+1}$ in $\varepsilon_{t+1}$ is $\sigma_\varepsilon - w$.
+
+Since $w < 0$, the distortion steepens the SDF relative to what log utility alone would deliver.
+
+In the figure below, the intertemporal marginal rate of substitution (IMRS) is nearly flat: at postwar calibrated volatility ($\sigma_\varepsilon = 0.005$), it contributes almost nothing to the pricing kernel's slope.
+
+The distortion accounts for virtually all of the SDF volatility --- what looks like extreme risk aversion ($\gamma \approx 34$) is really log utility plus moderate fears of model misspecification.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: "Doubts or variability? Decomposition of the robust SDF into
+      log-utility IMRS and worst-case distortion at $p = 0.10$"
+    name: fig-bhs-sdf-decomp
+---
+θ_cal = θ_from_detection_probability(0.10, "rw")
+γ_cal = γ_from_θ(θ_cal)
+w_cal = w_from_θ(θ_cal, "rw")
+
+μ_c, σ_c = rw["μ"], rw["σ_ε"]
+Δc = np.linspace(μ_c - 3.5 * σ_c, μ_c + 3.5 * σ_c, 300)
+ε = (Δc - μ_c) / σ_c
+
+log_imrs = np.log(β) - Δc
+log_ghat = w_cal * ε - 0.5 * w_cal**2
+log_sdf = log_imrs + log_ghat
+
+fig, ax = plt.subplots(figsize=(8, 5))
+
+ax.plot(100 * Δc, log_imrs, 'C1', lw=2,
+        label=r'IMRS: $\log\beta - \Delta c$')
+ax.plot(100 * Δc, log_ghat, 'C3', lw=2, ls='--',
+        label=r'Distortion: $\log\hat{g}$')
+ax.plot(100 * Δc, log_sdf, 'k', lw=2,
+        label=r'SDF: $\log m = \log\mathrm{IMRS} + \log\hat{g}$')
+ax.axhline(0, color='k', lw=0.5, alpha=0.3)
+ax.set_xlabel(r'Consumption growth $\Delta c_{t+1}$ (%)')
+ax.set_ylabel('Log SDF component')
+ax.legend(frameon=False, fontsize=10, loc='upper right')
+
+plt.show()
+```
+
 (welfare_experiments)=
-## What do risk premia measure? Two mental experiments
+## What do risk premia measure?
 
 {cite:t}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate aggregate fluctuations.
 
@@ -1307,7 +1374,7 @@ The additional $\tfrac{1}{2}\sigma_\varepsilon^2$ term is a Jensen's inequality
 
 We use the closed-form value functions derived earlier: {eq}`bhs_W_rw` for the type I/II value function $W$ and {eq}`bhs_J_rw` for the type III/IV value function $J$.
 
-For the certainty-equivalent path {eq}`bhs_ce_path`, there is no risk and no model uncertainty ($\theta = \infty$, so $\hat g = 1$), so the value function reduces to discounted expected log utility.  
+For the certainty-equivalent path {eq}`bhs_ce_path`, there is no risk and no model uncertainty ($\theta = \infty$, so $\hat g = 1$), so the value function reduces to discounted expected log utility.
 
 With $c_t^{ce} = c_0^J + t(\mu + \tfrac{1}{2}\sigma_\varepsilon^2)$, we have
 
@@ -1318,10 +1385,12 @@ U^{ce}(c_0^J)
 = \frac{c_0^J}{1-\beta} + \frac{\beta(\mu + \tfrac{1}{2}\sigma_\varepsilon^2)}{(1-\beta)^2},
 $$
 
-where we used $\sum_{t \geq 0}\beta^t = \frac{1}{1-\beta}$ and $\sum_{t \geq 0}t\beta^t = \frac{\beta}{(1-\beta)^2}$.  Factoring gives
+where we used $\sum_{t \geq 0}\beta^t = \frac{1}{1-\beta}$ and $\sum_{t \geq 0}t\beta^t = \frac{\beta}{(1-\beta)^2}$. 
+
+Factoring gives
 
 $$
-U^{ce}(c_0^J) = \frac{1}{1-\beta}\!\left[c_0^J + \frac{\beta}{1-\beta}\!\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right].
+U^{ce}(c_0^J) = \frac{1}{1-\beta}\left[c_0^J + \frac{\beta}{1-\beta}\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right].
 $$
 
 ### Type I (Epstein--Zin) compensation
@@ -1329,9 +1398,9 @@ $$
 Setting $U^{ce}(c_0^I) = W(x_0)$ from {eq}`bhs_W_rw`:
 
 $$
-\frac{1}{1-\beta}\!\left[c_0^I + \frac{\beta}{1-\beta}\!\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right]
+\frac{1}{1-\beta}\left[c_0^I + \frac{\beta}{1-\beta}\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right]
 =
-\frac{1}{1-\beta}\!\left[c_0 + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
+\frac{1}{1-\beta}\left[c_0 + \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right)\right].
 $$
 
 Multiplying both sides by $(1-\beta)$ and cancelling the common $\frac{\beta\mu}{1-\beta}$ terms gives
@@ -1348,7 +1417,7 @@ Solving for $c_0 - c_0^I$:
 :label: bhs_comp_type1
 c_0 - c_0^I
 =
-\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}\!\left(1 + \frac{1}{(1-\beta)\theta}\right)
+\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}\left(1 + \frac{1}{(1-\beta)\theta}\right)
 =
 \frac{\beta\sigma_\varepsilon^2\gamma}{2(1-\beta)},
 ```
@@ -1383,9 +1452,9 @@ The uncertainty term $\Delta c_0^{uncertainty}$ is the additional compensation a
 For a type III agent, we set $U^{ce}(c_0^{III}) = J(x_0)$ using the value function $J$ from {eq}`bhs_J_rw`:
 
 $$
-\frac{1}{1-\beta}\!\left[c_0^{III} + \frac{\beta}{1-\beta}\!\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right]
+\frac{1}{1-\beta}\left[c_0^{III} + \frac{\beta}{1-\beta}\left(\mu + \tfrac{1}{2}\sigma_\varepsilon^2\right)\right]
 =
-\frac{1}{1-\beta}\!\left[c_0 + \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
+\frac{1}{1-\beta}\left[c_0 + \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
 $$
 
 Following the same algebra as for type I but with the doubled uncertainty correction in $J$:
@@ -1443,24 +1512,24 @@ For the type II agent under $\theta < \infty$, the total value is $W(c_0)$ from
 For the agent liberated from model uncertainty ($\theta = \infty$), the value is
 
 $$
-c_0^{II}(u) + \beta\,E\!\left[V^{\log}(c_1)\right],
+c_0^{II}(u) + \beta E\left[V^{\log}(c_1)\right],
 $$
 
-where $V^{\log}(c_t) = \frac{1}{1-\beta}\!\left[c_t + \frac{\beta\mu}{1-\beta}\right]$ is the log-utility value function and $c_1 = c_0 + \mu + \sigma_\varepsilon \varepsilon_1$.
+where $V^{\log}(c_t) = \frac{1}{1-\beta} \left[c_t + \frac{\beta\mu}{1-\beta}\right]$ is the log-utility value function and $c_1 = c_0 + \mu + \sigma_\varepsilon \varepsilon_1$.
 
 Since $c_1$ is built from $c_0$ (not $c_0^{II}(u)$), the continuation is
 
 $$
-\beta\,E\!\left[V^{\log}(c_1)\right]
-= \frac{\beta}{1-\beta}\,E\!\left[c_1 + \frac{\beta\mu}{1-\beta}\right]
-= \frac{\beta}{1-\beta}\!\left[c_0 + \mu + \frac{\beta\mu}{1-\beta}\right]
-= \frac{\beta}{1-\beta}\!\left[c_0 + \frac{\mu}{1-\beta}\right],
+\beta E\left[V^{\log}(c_1)\right]
+= \frac{\beta}{1-\beta} E\left[c_1 + \frac{\beta\mu}{1-\beta}\right]
+= \frac{\beta}{1-\beta}\left[c_0 + \mu + \frac{\beta\mu}{1-\beta}\right]
+= \frac{\beta}{1-\beta}\left[c_0 + \frac{\mu}{1-\beta}\right],
 $$
 
 where we used $E[c_1] = c_0 + \mu$ (the noise term has zero mean). Expanding gives
 
 $$
-\beta\,E\!\left[V^{\log}(c_1)\right]
+\beta E\left[V^{\log}(c_1)\right]
 = \frac{\beta c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2}.
 $$
 
@@ -1513,11 +1582,11 @@ The "vs. deterministic" rows use the certainty-equivalent path {eq}`bhs_ce_path`
 For the trend-stationary model, the denominators $(1-\beta)$ in the uncertainty terms are replaced by $(1-\beta\rho)$, and the risk terms involve $(1-\beta\rho^2)$:
 
 $$
-\Delta c_0^{risk,\,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)},
+\Delta c_0^{risk,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)},
 \qquad
-\Delta c_0^{unc,\,ts,\,II} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho)^2\theta},
+\Delta c_0^{unc,ts,II} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho)^2\theta},
 \qquad
-\Delta c_0^{unc,\,ts,\,III} = \frac{\beta\sigma_\varepsilon^2}{(1-\beta\rho)^2\theta}.
+\Delta c_0^{unc,ts,III} = \frac{\beta\sigma_\varepsilon^2}{(1-\beta\rho)^2\theta}.
 $$
 
 The qualitative message is the same: the risk component is negligible, and the model-uncertainty component dominates.
@@ -1676,12 +1745,10 @@ mystnb:
     name: fig-bhs-5
 ---
 fig, ax = plt.subplots(figsize=(7, 4))
-ax.plot(p_plot, gain_rw_plot, lw=2, color="black", label="RW type III")
-ax.plot(p_plot, gain_ts_plot, lw=2, ls="--", color="gray", label="TS type III")
+ax.plot(p_plot, gain_rw_plot, lw=2, label="RW type III")
+ax.plot(p_plot, gain_ts_plot, lw=2, label="TS type III")
 ax.set_xlabel(r"detection error probability $p(\eta)$ (percent)")
 ax.set_ylabel("proportion of consumption (percent)")
-ax.set_xlim(0.0, 50.0)
-ax.set_ylim(0.0, 30.0)
 ax.legend(frameon=False)
 
 plt.tight_layout()
@@ -1696,6 +1763,60 @@ At detection-error probabilities of 10--20%, the model-uncertainty compensation
 
 Under the robust reading, the large risk premia that Tallarini matched with high $\gamma$ are compensations for bearing model uncertainty, and the implied welfare gains from resolving that uncertainty are correspondingly large.
 
+The following contour plot shows how type II (multiplier) compensation varies over a two-dimensional parameter space: the detection-error probability $p$ and the consumption volatility $\sigma_\varepsilon$.
+
+The star marks the calibrated point ($p = 0.10$, $\sigma_\varepsilon = 0.5\%$).
+
+At the calibrated volatility, moving left (lower $p$, stronger robustness concerns) increases compensation dramatically, while the classic risk-only cost (the $p = 50\%$ edge) remains negligible.
+
+Comparing the two panels shows that the random-walk model generates much larger welfare costs than the trend-stationary model at the same ($p$, $\sigma_\varepsilon$), because permanent shocks compound the worst-case drift indefinitely.
+
+```{code-cell} ipython3
+---
+mystnb:
+  figure:
+    caption: Type II compensation across detection-error probability and
+      consumption volatility
+    name: fig-bhs-contour
+---
+p_grid = np.linspace(0.02, 0.49, 300)
+σ_grid = np.linspace(0.001, 0.015, 300)
+P, Σ = np.meshgrid(p_grid, σ_grid)
+
+W_abs = -2 * norm.ppf(P) / np.sqrt(T)
+
+# RW: total type II = βσ²γ / [2(1-β)] 
+Γ_rw = 1 + W_abs / Σ
+comp_rw = 100 * (np.exp(β * Σ**2 * Γ_rw / (2 * (1 - β))) - 1)
+
+# TS: risk + uncertainty 
+ρ_val = ts["ρ"]
+risk_ts = β * Σ**2 / (2 * (1 - β * ρ_val**2))
+unc_ts = β * Σ * W_abs / (2 * (1 - β * ρ_val))
+comp_ts = 100 * (np.exp(risk_ts + unc_ts) - 1)
+
+levels = [0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 50]
+
+fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5.5), sharey=True)
+
+for ax, comp, title in [(ax1, comp_rw, 'Random walk'),
+                         (ax2, comp_ts, 'Trend stationary')]:
+    cf = ax.contourf(100 * P, 100 * Σ, comp, levels=levels,
+                     cmap='Blues', extend='both')
+    cs = ax.contour(100 * P, 100 * Σ, comp, levels=levels,
+                    colors='k', linewidths=0.5)
+    ax.clabel(cs, fmt='%g%%', fontsize=8)
+    ax.plot(10, 0.5, 'x', markersize=14, color='w',
+            mec='k', mew=1, zorder=5)
+    ax.set_xlabel(r'Detection-error probability $p$ (%)')
+    ax.set_title(title)
+
+ax1.set_ylabel(r'Consumption volatility $\sigma_\varepsilon$ (%)')
+
+plt.tight_layout()
+plt.show()
+```
+
 ## Why doesn't learning eliminate these fears?
 
 A natural objection is: if the consumer has 235 quarters of data, why can't she learn the true drift well enough to dismiss the worst-case model?
@@ -1717,7 +1838,9 @@ We construct real per-capita nondurables-plus-services consumption from four FRE
 | `DPCERD3Q086SBEA` | PCE implicit price deflator (index 2017 $= 100$, quarterly) |
 | `CNP16OV` | Civilian noninstitutional population, 16+ (thousands, monthly) |
 
-We use nominal rather than chained-dollar components because chained-dollar series are not additive: chain-weighted indices update their base-period expenditure weights every period, so components deflated with different price changes do not sum to the separately chained aggregate.  Adding nominal series and deflating the sum with a single price index avoids this problem.
+We use nominal rather than chained-dollar components because chained-dollar series are not additive: chain-weighted indices update their base-period expenditure weights every period, so components deflated with different price changes do not sum to the separately chained aggregate. 
+
+Adding nominal series and deflating the sum with a single price index avoids this problem.
 
 The processing pipeline is:
 
@@ -1872,7 +1995,7 @@ The dashed gray lines mark a two-standard-error band around the maximum-likeliho
 
 Even at detection probabilities in the 5--20% range, the worst-case drift remains inside (or very near) this confidence band.
 
-Drift distortions that are economically large---large enough to generate substantial model-uncertainty premia---are statistically small relative to sampling uncertainty in $\hat\mu$.
+Drift distortions that are economically large --- large enough to generate substantial model-uncertainty premia --- are statistically small relative to sampling uncertainty in $\hat\mu$.
 
 Robustness concerns persist despite long histories precisely because the low-frequency features that matter most for pricing are the hardest to estimate precisely.
 
@@ -1905,20 +2028,20 @@ The following exercises ask you to fill in several derivation steps.
 
 Let $R_{t+1}$ be an $n \times 1$ vector of gross returns with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$.
 
-Let $m_{t+1}$ be a stochastic discount factor satisfying $\mathbf{1} = E[m_{t+1}\,R_{t+1}]$.
+Let $m_{t+1}$ be a stochastic discount factor satisfying $\mathbf{1} = E[m_{t+1} R_{t+1}]$.
 
-1. Use the covariance decomposition $E[mR] = E[m]\,E[R] + \operatorname{cov}(m,R)$ to show that $\operatorname{cov}(m,R) = \mathbf{1} - E[m]\,E[R] =: b$.
+1. Use the covariance decomposition $E[mR] = E[m] E[R] + \operatorname{cov}(m,R)$ to show that $\operatorname{cov}(m,R) = \mathbf{1} - E[m] E[R] =: b$.
 2. For a portfolio with weight vector $\alpha$ and return $R^p = \alpha^\top R$, show that $\operatorname{cov}(m, R^p) = \alpha^\top b$.
-3. Apply the Cauchy--Schwarz inequality to the pair $(m, R^p)$ to obtain $|\alpha^\top b| \leq \sigma(m)\,\sqrt{\alpha^\top \Sigma_R\,\alpha}$.
-4. Maximize the ratio $|\alpha^\top b|/\sqrt{\alpha^\top \Sigma_R\,\alpha}$ over $\alpha$ and show that the maximum is $\sqrt{b^\top \Sigma_R^{-1} b}$, attained at $\alpha^\star = \Sigma_R^{-1}b$.
-5. Conclude that $\sigma(m)/E(m) \geq \sqrt{b^\top \Sigma_R^{-1} b}$, which is {eq}`bhs_hj_unconditional`.
+3. Apply the Cauchy--Schwarz inequality to the pair $(m, R^p)$ to obtain $|\alpha^\top b| \leq \sigma(m)\sqrt{\alpha^\top \Sigma_R\alpha}$.
+4. Maximize the ratio $|\alpha^\top b|/\sqrt{\alpha^\top \Sigma_R \alpha}$ over $\alpha$ and show that the maximum is $\sqrt{b^\top \Sigma_R^{-1} b}$, attained at $\alpha^\star = \Sigma_R^{-1}b$.
+5. Conclude that $\sigma(m) \geq \sqrt{b^\top \Sigma_R^{-1} b}$, which is {eq}`bhs_hj_unconditional`.
 ```
 
 ```{solution-start} dov_ex1
 :class: dropdown
 ```
 
-**Part 1.** From $\mathbf{1} = E[m\,R] = E[m]\,E[R] + \operatorname{cov}(m,R)$, rearranging gives $\operatorname{cov}(m,R) = \mathbf{1} - E[m]\,E[R]= b$.
+**Part 1.** From $\mathbf{1} = E[mR] = E[m] E[R] + \operatorname{cov}(m,R)$, rearranging gives $\operatorname{cov}(m,R) = \mathbf{1} - E[m] E[R]= b$.
 
 **Part 2.** The portfolio return is $R^p = \alpha^\top R$, so
 
@@ -1927,49 +2050,48 @@ $$
 $$
 
 **Part 3.** 
-Applying Cauchy--Schwarz inequality to $(m, R^p)$:
+Applying the Cauchy--Schwarz inequality to $(m, R^p)$:
 
 $$
-|\alpha^\top b| = |\operatorname{cov}(m, R^p)| \leq \sigma(m)\,\sigma(R^p) = \sigma(m)\,\sqrt{\alpha^\top \Sigma_R\,\alpha}.
+|\alpha^\top b| = |\operatorname{cov}(m, R^p)| \leq \sigma(m) \sigma(R^p) = \sigma(m) \sqrt{\alpha^\top \Sigma_R \alpha}.
 $$
 
 **Part 4.** Rearranging Part 3 gives
 
 $$
-\frac{|\alpha^\top b|}{\sqrt{\alpha^\top \Sigma_R\,\alpha}} \leq \sigma(m).
+\frac{|\alpha^\top b|}{\sqrt{\alpha^\top \Sigma_R \alpha}} \leq \sigma(m).
 $$
 
-To maximize the left-hand side over $\alpha$, define the $\Sigma_R$-inner product $\langle u, v \rangle_{\Sigma} = u^\top \Sigma_R\, v$.
+To maximize the left-hand side over $\alpha$, define the $\Sigma_R$-inner product $\langle u, v \rangle_{\Sigma} = u^\top \Sigma_R v$.
 
-Insert $I = \Sigma_R \Sigma_R^{-1}$ gives
+Inserting $I = \Sigma_R \Sigma_R^{-1}$ gives
 
 $$
 \alpha^\top b
 = \alpha^\top (\Sigma_R \Sigma_R^{-1}) b
 = (\alpha^\top \Sigma_R)(\Sigma_R^{-1} b)
-= \langle \alpha,\, \Sigma_R^{-1}b \rangle_{\Sigma}.
+= \langle \alpha, \Sigma_R^{-1}b \rangle_{\Sigma}.
 $$
 
 Cauchy--Schwarz in this inner product gives
 
 $$
-|\langle \alpha,\, \Sigma_R^{-1}b \rangle_{\Sigma}|
+|\langle \alpha, \Sigma_R^{-1}b \rangle_{\Sigma}|
 \leq
-\sqrt{\langle \alpha, \alpha \rangle_{\Sigma}}\;\sqrt{\langle \Sigma_R^{-1}b,\, \Sigma_R^{-1}b \rangle_{\Sigma}}
+\sqrt{\langle \alpha, \alpha \rangle_{\Sigma}}\sqrt{\langle \Sigma_R^{-1}b, \Sigma_R^{-1}b \rangle_{\Sigma}}
 =
-\sqrt{\alpha^\top \Sigma_R\,\alpha}\;\sqrt{b^\top \Sigma_R^{-1} b},
+\sqrt{\alpha^\top \Sigma_R \alpha} \sqrt{b^\top \Sigma_R^{-1} b},
 $$
 
 with equality when $\alpha \propto \Sigma_R^{-1} b$.
 
-Substituting $\alpha^\star = \Sigma_R^{-1} b$ confirms
+Substituting $\alpha^\star = \Sigma_R^{-1} b$ verifies
 
 $$
-\max_\alpha \frac{|\alpha^\top b|}{\sqrt{\alpha^\top \Sigma_R\,\alpha}} = \sqrt{b^\top \Sigma_R^{-1} b}.
+\max_\alpha \frac{|\alpha^\top b|}{\sqrt{\alpha^\top \Sigma_R \alpha}} = \sqrt{b^\top \Sigma_R^{-1} b}.
 $$
 
-**Part 5.** Combining Parts 3 and 4 gives $\sqrt{b^\top \Sigma_R^{-1} b} \leq \sigma(m)$ and 
-dividing by $E[m] > 0$ yields {eq}`bhs_hj_unconditional`. 
+**Part 5.** Combining Parts 3 and 4 gives $\sigma(m) \geq \sqrt{b^\top \Sigma_R^{-1} b}$, which is {eq}`bhs_hj_unconditional`.
 
 ```{solution-end}
 ```
@@ -2000,7 +2122,7 @@ with $\varepsilon_{t+1}\sim\mathcal{N}(0,1)$ under the approximating model.
 Using {eq}`bhs_sdf` and the Gaussian distortion
 
 $$
-\hat g_{t+1}=\exp\!\left(w\varepsilon_{t+1}-\tfrac{1}{2}w^2\right),
+\hat g_{t+1}=\exp \left(w\varepsilon_{t+1}-\tfrac{1}{2}w^2\right),
 
 $$
 we get
@@ -2008,9 +2130,9 @@ we get
 $$
 m_{t+1}
 =
-\beta \exp\!\left(-(c_{t+1}-c_t)\right)\hat g_{t+1}
+\beta \exp \left(-(c_{t+1}-c_t)\right)\hat g_{t+1}
 =
-\beta \exp\!\left(-\mu-\sigma_\varepsilon\varepsilon_{t+1}\right)\exp\!\left(w\varepsilon_{t+1}-\frac{1}{2}w^2\right).
+\beta \exp \left(-\mu-\sigma_\varepsilon\varepsilon_{t+1}\right)\exp \left(w\varepsilon_{t+1}-\frac{1}{2}w^2\right).
 $$
 
 Therefore
@@ -2050,11 +2172,11 @@ Hence
 $$
 E[m]
 =
-\beta\exp\!\left(
+\beta\exp\left(
 -\mu-\frac{1}{2}w^2+\frac{1}{2}(w-\sigma_\varepsilon)^2
 \right)
 =
-\beta\exp\!\left(-\mu+\frac{\sigma_\varepsilon^2}{2}-\sigma_\varepsilon w\right),
+\beta\exp\left(-\mu+\frac{\sigma_\varepsilon^2}{2}-\sigma_\varepsilon w\right),
 
 $$
 and
@@ -2062,7 +2184,7 @@ and
 $$
 \frac{\sigma(m)}{E[m]}
 =
-\sqrt{\exp\!\left((w-\sigma_\varepsilon)^2\right)-1}.
+\sqrt{\exp\left((w-\sigma_\varepsilon)^2\right)-1}.
 $$
 
 Now use $w_{\text{RW}}(\theta)=-\sigma_\varepsilon/[(1-\beta)\theta]$ from {eq}`bhs_w_formulas` and
@@ -2083,12 +2205,12 @@ Substituting gives the closed-form expressions for the random-walk model:
 
 ```{math}
 :label: bhs_Em_rw
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
+E[m] = \beta \exp\left[-\mu + \frac{\sigma_\varepsilon^2}{2}(2\gamma - 1)\right],
 ```
 
 ```{math}
 :label: bhs_sigma_rw
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\left(\sigma_\varepsilon^2 \gamma^2\right) - 1}.
 ```
 
 Notice that in {eq}`bhs_Em_rw`, because $\sigma_\varepsilon$ is small ($\approx 0.005$), the term $\frac{\sigma_\varepsilon^2}{2}(2\gamma-1)$ grows slowly with $\gamma$, keeping $E[m]$ roughly constant near $1/(1+r^f)$.
@@ -2101,12 +2223,12 @@ An analogous calculation for the trend-stationary model yields:
 
 ```{math}
 :label: bhs_Em_ts
-E[m] = \beta \exp\!\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\!\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
+E[m] = \beta \exp\left[-\mu + \frac{\sigma_\varepsilon^2}{2}\left(1 - \frac{2(1-\beta)(1-\gamma)}{1-\beta\rho} + \frac{1-\rho}{1+\rho}\right)\right],
 ```
 
 ```{math}
 :label: bhs_sigma_ts
-\frac{\sigma(m)}{E[m]} = \sqrt{\exp\!\left[\sigma_\varepsilon^2\!\left(\!\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{\!2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
+\frac{\sigma(m)}{E[m]} = \sqrt{\exp\left[\sigma_\varepsilon^2\left(\left(\frac{(1-\beta)(1-\gamma)}{1-\beta\rho} - 1\right)^{2} + \frac{1-\rho}{1+\rho}\right)\right] - 1}.
 ```
 
 ```{solution-end}
@@ -2127,7 +2249,7 @@ Verify that as $\gamma \to 1$ (equivalently $\theta \to \infty$), the recursion
 Start from the type I recursion {eq}`bhs_type1_recursion` and write
 
 $$
-(V_{t+1})^{1-\gamma} = \exp\!\bigl((1-\gamma)\log V_{t+1}\bigr).
+(V_{t+1})^{1-\gamma} = \exp\bigl((1-\gamma)\log V_{t+1}\bigr).
 $$
 
 Using $\log V_t = (1-\beta)U_t$ from {eq}`bhs_Ut_def`, we obtain
@@ -2136,8 +2258,8 @@ $$
 (1-\beta)U_t
 =
 (1-\beta)c_t
-\;+\;
-\frac{\beta}{1-\gamma}\log E_t\!\left[\exp\!\bigl((1-\gamma)(1-\beta)U_{t+1}\bigr)\right].
++
+\frac{\beta}{1-\gamma}\log E_t\left[\exp\bigl((1-\gamma)(1-\beta)U_{t+1}\bigr)\right].
 $$
 
 Divide by $(1-\beta)$ and use {eq}`bhs_theta_def`,
@@ -2151,7 +2273,7 @@ Then $(1-\gamma)(1-\beta)=-1/\theta$ and $\beta/[(1-\beta)(1-\gamma)]=-\beta\the
 $$
 U_t
 =
-c_t - \beta\theta \log E_t\!\left[\exp\!\left(-\frac{U_{t+1}}{\theta}\right)\right],
+c_t - \beta\theta \log E_t \left[\exp \left(-\frac{U_{t+1}}{\theta}\right)\right],
 $$
 
 which is {eq}`bhs_risk_sensitive`.
@@ -2191,7 +2313,7 @@ $$
 
 Consider the type II Bellman equation {eq}`bhs_bellman_type2`.
 
-1. Use a Lagrange multiplier to impose the normalization constraint $\int g(\varepsilon)\,\pi(\varepsilon)\,d\varepsilon = 1$.
+1. Use a Lagrange multiplier to impose the normalization constraint $\int g(\varepsilon) \pi(\varepsilon) d\varepsilon = 1$.
 2. Derive the first-order condition for $g(\varepsilon)$ and show that the minimizer is the exponential tilt in {eq}`bhs_ghat`.
 3. Substitute your minimizing $g$ back into {eq}`bhs_bellman_type2` to recover the risk-sensitive Bellman equation {eq}`bhs_bellman_type1`.
 
@@ -2209,9 +2331,9 @@ Form the Lagrangian
 $$
 \mathcal{L}[g,\lambda]
 =
-\beta \int \Bigl[g(\varepsilon)W'(\varepsilon) + \theta g(\varepsilon)\log g(\varepsilon)\Bigr]\pi(\varepsilon)\,d\varepsilon
-\;+\;
-\lambda\left(\int g(\varepsilon)\pi(\varepsilon)\,d\varepsilon - 1\right).
+\beta \int \Bigl[g(\varepsilon)W'(\varepsilon) + \theta g(\varepsilon)\log g(\varepsilon)\Bigr]\pi(\varepsilon)d\varepsilon
++
+\lambda\left(\int g(\varepsilon)\pi(\varepsilon) d\varepsilon - 1\right).
 $$
 
 The pointwise first-order condition for $g(\varepsilon)$ is
@@ -2222,8 +2344,8 @@ $$
 \frac{\partial \mathcal{L}}{\partial g(\varepsilon)}
 =
 \beta\Bigl[W'(\varepsilon) + \theta(1+\log g(\varepsilon))\Bigr]\pi(\varepsilon)
-\;+\;
-\lambda\,\pi(\varepsilon),
++
+\lambda\pi(\varepsilon),
 $$
 
 so (dividing by $\beta\pi(\varepsilon)$)
@@ -2236,10 +2358,10 @@ $$
 
 Exponentiating yields $g(\varepsilon)=K\exp(-W'(\varepsilon)/\theta)$ where $K = \exp(-1 - \lambda/(\beta\theta))$ is a constant that does not depend on $\varepsilon$.
 
-To pin down $K$, impose the normalization $\int g(\varepsilon)\pi(\varepsilon)\,d\varepsilon=1$:
+To pin down $K$, impose the normalization $\int g(\varepsilon)\pi(\varepsilon)d\varepsilon=1$:
 
 $$
-1 = K \int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon,
+1 = K \int \exp \left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon) d\varepsilon,
 $$
 
 so
@@ -2247,7 +2369,7 @@ so
 $$
 K^{-1}
 =
-\int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon.
+\int \exp\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon) d\varepsilon.
 $$
 
 Substituting $K^{-1}$ into the denominator of $g = K\exp(-W'/\theta)$ gives the minimizer:
@@ -2255,8 +2377,8 @@ Substituting $K^{-1}$ into the denominator of $g = K\exp(-W'/\theta)$ gives the
 $$
 g^*(\varepsilon)
 =
-\frac{\exp\!\left(-W(Ax+B\varepsilon)/\theta\right)}{
-    \int \exp\!\left(-W(Ax+B\tilde\varepsilon)/\theta\right)\pi(\tilde\varepsilon)\,d\tilde\varepsilon}.
+\frac{\exp\left(-W(Ax+B\varepsilon)/\theta\right)}{
+    \int \exp\left(-W(Ax+B\tilde\varepsilon)/\theta\right)\pi(\tilde\varepsilon) d\tilde\varepsilon}.
 $$
 
 This has exactly the same form as the distortion $\hat g_{t+1} = \exp(-U_{t+1}/\theta)/E_t[\exp(-U_{t+1}/\theta)]$ that appears in the type I SDF {eq}`bhs_sdf_Ut`, with $W$ in place of $U$.
@@ -2266,7 +2388,7 @@ Once we verify below that $W \equiv U$, the minimizer $g^*$ and the SDF distorti
 To substitute back, define
 
 $$
-Z(x):=\int \exp(-W(Ax+B\varepsilon)/\theta)\pi(\varepsilon)\,d\varepsilon.
+Z(x):=\int \exp(-W(Ax+B\varepsilon)/\theta)\pi(\varepsilon) d\varepsilon.
 $$
 
 Then $\hat g(\varepsilon)=\exp(-W(Ax+B\varepsilon)/\theta)/Z(x)$ and
@@ -2278,12 +2400,12 @@ $$
 Hence
 
 $$
-\int \Bigl[\hat g(\varepsilon)W(Ax+B\varepsilon) + \theta \hat g(\varepsilon)\log \hat g(\varepsilon)\Bigr]\pi(\varepsilon)\,d\varepsilon
+\int \Bigl[\hat g(\varepsilon)W(Ax+B\varepsilon) + \theta \hat g(\varepsilon)\log \hat g(\varepsilon)\Bigr]\pi(\varepsilon) d\varepsilon
 =
 -\theta\log Z(x),
 $$
 
-because the $W$ terms cancel and $\int \hat g\,\pi = 1$.
+because the $W$ terms cancel and $\int \hat g \pi = 1$.
 
 Plugging this into {eq}`bhs_bellman_type2` gives
 
@@ -2292,7 +2414,7 @@ W(x)
 =
 c-\beta\theta\log Z(x)
 =
-c-\beta\theta \log \int \exp\!\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon)\,d\varepsilon,
+c-\beta\theta \log \int \exp\left(-\frac{W(Ax+B\varepsilon)}{\theta}\right)\pi(\varepsilon) d\varepsilon,
 $$
 
 which is {eq}`bhs_bellman_type1`. Therefore $W(x)\equiv U(x)$.
@@ -2306,7 +2428,7 @@ which is {eq}`bhs_bellman_type1`. Therefore $W(x)\equiv U(x)$.
 Let $\varepsilon \sim \mathcal{N}(0,1)$ under the approximating model and define
 
 $$
-\hat g(\varepsilon) = \exp\!\left(w\varepsilon - \frac{1}{2}w^2\right)
+\hat g(\varepsilon) = \exp\left(w\varepsilon - \frac{1}{2}w^2\right)
 $$
 
 as in the Gaussian mean-shift section.
@@ -2339,9 +2461,9 @@ $$
 $$
 E[\hat g(\varepsilon)]
 =
-e^{-w^2/2}\,E[e^{w\varepsilon}]
+e^{-w^2/2}E[e^{w\varepsilon}]
 =
-e^{-w^2/2}\,e^{w^2/2}
+e^{-w^2/2}e^{w^2/2}
 =
 1.
 $$
@@ -2354,20 +2476,20 @@ $$
 \hat g(\varepsilon)\varphi(\varepsilon)
 =
 \frac{1}{\sqrt{2\pi}}
-\exp\!\left(w\varepsilon-\frac{1}{2}w^2-\frac{1}{2}\varepsilon^2\right)
+\exp\left(w\varepsilon-\frac{1}{2}w^2-\frac{1}{2}\varepsilon^2\right)
 =
 \frac{1}{\sqrt{2\pi}}
-\exp\!\left(-\frac{1}{2}(\varepsilon-w)^2\right),
+\exp\left(-\frac{1}{2}(\varepsilon-w)^2\right),
 $$
 
-which is the $\mathcal{N}(w,1)$ density
+which is the $\mathcal{N}(w,1)$ density.
 
 Therefore, for bounded measurable $f$,
 
 $$
 E[\hat g(\varepsilon)f(\varepsilon)]
 =
-\int f(\varepsilon)\,\hat g(\varepsilon)\varphi(\varepsilon)\,d\varepsilon
+\int f(\varepsilon)\hat g(\varepsilon)\varphi(\varepsilon)d\varepsilon
 $$
 
 equals the expectation of $f$ under $\mathcal{N}(w,1)$.
@@ -2387,11 +2509,11 @@ Now
 $$
 E[\hat g^2]
 =
-E\!\left[\exp\!\left(2w\varepsilon - w^2\right)\right]
+E\left[\exp\left(2w\varepsilon - w^2\right)\right]
 =
-e^{-w^2}\,E[e^{2w\varepsilon}]
+e^{-w^2}E[e^{2w\varepsilon}]
 =
-e^{-w^2}\,e^{(2w)^2/2}
+e^{-w^2}e^{(2w)^2/2}
 =
 e^{w^2},
 
@@ -2403,7 +2525,7 @@ so $\operatorname{std}(\hat g)=\sqrt{e^{w^2}-1}$.
 $$
 E[\hat g\log \hat g]
 =
-E_{\mathcal{N}(w,1)}\!\left[w\varepsilon-\frac{1}{2}w^2\right]
+E_{\mathcal{N}(w,1)}\left[w\varepsilon-\frac{1}{2}w^2\right]
 =
 w\cdot E_{\mathcal{N}(w,1)}[\varepsilon]-\frac{1}{2}w^2
 =
@@ -2462,13 +2584,13 @@ $$
 Under the approximating model, $\sum_{i=1}^T \varepsilon_i \sim \mathcal{N}(0,T)$, so
 
 $$
-L_T \sim \mathcal{N}\!\left(\frac{1}{2}w^2T,\; w^2T\right).
+L_T \sim \mathcal{N}\left(\frac{1}{2}w^2T, w^2T\right).
 $$
 
 Under the worst-case model, $\sum_{i=1}^T \varepsilon_i \sim \mathcal{N}(wT,T)$, so
 
 $$
-L_T \sim \mathcal{N}\!\left(-\frac{1}{2}w^2T,\; w^2T\right).
+L_T \sim \mathcal{N}\left(-\frac{1}{2}w^2T, w^2T\right).
 $$
 
 Now
@@ -2476,9 +2598,9 @@ Now
 $$
 p_A = \Pr_A(L_T<0)
 =
-\Phi\!\left(\frac{0-\frac{1}{2}w^2T}{|w|\sqrt{T}}\right)
+\Phi\left(\frac{0-\frac{1}{2}w^2T}{|w|\sqrt{T}}\right)
 =
-\Phi\!\left(-\frac{|w|\sqrt{T}}{2}\right),
+\Phi\left(-\frac{|w|\sqrt{T}}{2}\right),
 $$
 
 and
@@ -2486,17 +2608,17 @@ and
 $$
 p_B = \Pr_B(L_T>0)
 =
-1-\Phi\!\left(\frac{0-(-\frac{1}{2}w^2T)}{|w|\sqrt{T}}\right)
+1-\Phi\left(\frac{0-(-\frac{1}{2}w^2T)}{|w|\sqrt{T}}\right)
 =
-1-\Phi\!\left(\frac{|w|\sqrt{T}}{2}\right)
+1-\Phi\left(\frac{|w|\sqrt{T}}{2}\right)
 =
-\Phi\!\left(-\frac{|w|\sqrt{T}}{2}\right).
+\Phi\left(-\frac{|w|\sqrt{T}}{2}\right).
 $$
 
 Therefore
 
 $$
-p(\theta^{-1})=\tfrac{1}{2}(p_A+p_B)=\Phi\!\left(-\tfrac{|w|\sqrt{T}}{2}\right),
+p(\theta^{-1})=\tfrac{1}{2}(p_A+p_B)=\Phi\left(-\tfrac{|w|\sqrt{T}}{2}\right),
 
 $$
 which is {eq}`bhs_detection_closed`.
@@ -2542,7 +2664,7 @@ $$
 \theta_{\text{TS}}
 =
 \left(\frac{\sigma_\varepsilon^{\text{TS}}}{\sigma_\varepsilon^{\text{RW}}}\right)
-\frac{1-\beta}{1-\beta\rho}\,\theta_{\text{RW}},
+\frac{1-\beta}{1-\beta\rho}\theta_{\text{RW}},
 $$
 
 which is {eq}`bhs_theta_cross_model`.
@@ -2553,8 +2675,9 @@ $$
 \theta_{\text{TS}}=\frac{1-\beta}{1-\beta\rho}\theta_{\text{RW}}.
 $$
 
-Since $\rho\in(0,1)$ implies $1-\beta\rho < 1-\beta$, the ratio $(1-\beta)/(1-\beta\rho)$ is less than one.
-So to hold entropy fixed, the trend-stationary model requires a smaller $\theta$ (i.e., a cheaper distortion / stronger robustness) than the random-walk model.
+Since $\rho\in(0,1)$ implies $1-\beta\rho > 1-\beta$, the ratio $(1-\beta)/(1-\beta\rho)$ is less than one.
+
+To hold entropy fixed, the trend-stationary model therefore requires a smaller $\theta$ (i.e., a cheaper distortion and stronger robustness) than the random-walk model.
 
 ```{solution-end}
 ```
@@ -2636,7 +2759,7 @@ $$
 J(w)
 =
 \sum_{t\geq 0}\beta^t\Bigl(c_0+t(\mu+\sigma_\varepsilon w)\Bigr)
-\;+\;
++
 \sum_{t\geq 0}\beta^{t+1}\theta\cdot\frac{w^2}{2}.
 $$
 
@@ -2646,9 +2769,9 @@ $$
 J(w)
 =
 \frac{c_0}{1-\beta}
-\;+\;
++
 \frac{\beta(\mu+\sigma_\varepsilon w)}{(1-\beta)^2}
-\;+\;
++
 \frac{\beta\theta}{1-\beta}\cdot\frac{w^2}{2}.
 $$
 
@@ -2658,8 +2781,8 @@ $$
 0=\frac{\partial J}{\partial w}
 =
 \frac{\beta\sigma_\varepsilon}{(1-\beta)^2}
-\;+\;
-\frac{\beta\theta}{1-\beta}\,w
++
+\frac{\beta\theta}{1-\beta}w
 \quad\Rightarrow\quad
 w^*=-\frac{\sigma_\varepsilon}{(1-\beta)\theta},
 $$
@@ -2672,7 +2795,7 @@ $$
 J(w^*)
 =
 \frac{c_0}{1-\beta}
-\;+\;
++
 \frac{\beta\mu}{(1-\beta)^2}
 -\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta}.
 $$
@@ -2726,7 +2849,7 @@ $$
 **Part 2.** The Bellman equation {eq}`bhs_bellman_type1` requires computing
 
 $$
--\beta\theta\log E_t\!\left[\exp\!\left(\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}\right)\right].
+-\beta\theta\log E_t\left[\exp\left(\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}\right)\right].
 $$
 
 Substituting the guess:
@@ -2748,9 +2871,9 @@ $$
 Using $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$:
 
 $$
--\beta\theta\!\left[\frac{-(c_t + \mu + d)}{(1-\beta)\theta} + \frac{\sigma_\varepsilon^2}{2(1-\beta)^2\theta^2}\right]
+-\beta\theta\left[\frac{-(c_t + \mu + d)}{(1-\beta)\theta} + \frac{\sigma_\varepsilon^2}{2(1-\beta)^2\theta^2}\right]
 =
-\frac{\beta}{1-\beta}\!\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
+\frac{\beta}{1-\beta}\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
 $$
 
 **Part 3.** The Bellman equation becomes
@@ -2758,7 +2881,7 @@ $$
 $$
 \frac{1}{1-\beta}[c_t + d]
 =
-c_t + \frac{\beta}{1-\beta}\!\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
+c_t + \frac{\beta}{1-\beta}\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
 $$
 
 Expanding the right-hand side:
@@ -2778,10 +2901,10 @@ $$
 Solving: $d - \beta d = \beta\mu - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)\theta}$, so
 
 $$
-d = \frac{\beta}{1-\beta}\!\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right),
+d = \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right),
 $$
 
-confirming {eq}`bhs_W_rw`.
+which matches {eq}`bhs_W_rw`.
 
 ```{solution-end}
 ```
@@ -2791,10 +2914,10 @@ confirming {eq}`bhs_W_rw`.
 
 Derive the trend-stationary risk compensation stated in the lecture.
 
-For the trend-stationary model with $\tilde c_{t+1} - \zeta = \rho(\tilde c_t - \zeta) + \sigma_\varepsilon\varepsilon_{t+1}$, where $\tilde c_t = c_t - \mu t$, compute the risk compensation $\Delta c_0^{risk,\,ts}$ by comparing expected log utility under the stochastic plan to the deterministic certainty-equivalent path, and show that
+For the trend-stationary model with $\tilde c_{t+1} - \zeta = \rho(\tilde c_t - \zeta) + \sigma_\varepsilon\varepsilon_{t+1}$, where $\tilde c_t = c_t - \mu t$, compute the risk compensation $\Delta c_0^{risk,ts}$ by comparing expected log utility under the stochastic plan to the deterministic certainty-equivalent path, and show that
 
 $$
-\Delta c_0^{risk,\,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)}.
+\Delta c_0^{risk,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)}.
 $$
 
 *Hint:* You will need $\operatorname{Var}(z_t) = \sigma_\varepsilon^2(1 + \rho^2 + \cdots + \rho^{2(t-1)})$ and the formula $\sum_{t \geq 1}\beta^t \sum_{j=0}^{t-1}\rho^{2j} = \frac{\beta}{(1-\beta)(1-\beta\rho^2)}$.
@@ -2832,10 +2955,10 @@ Equating values and solving:
 $$
 \frac{\Delta c_0^{risk}}{1-\beta} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)(1-\beta\rho^2)}
 \quad\Rightarrow\quad
-\Delta c_0^{risk,\,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)}.
+\Delta c_0^{risk,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)}.
 $$
 
-The uncertainty compensation follows from the value function: $\Delta c_0^{unc,\,ts,\,II} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho)^2\theta}$, with the $(1-\beta)$ factors replaced by $(1-\beta\rho)$ because the worst-case mean shift scales with $1/(1-\beta\rho)$ rather than $1/(1-\beta)$.
+The uncertainty compensation follows from the value function: $\Delta c_0^{unc,ts,II} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho)^2\theta}$, with the $(1-\beta)$ factors replaced by $(1-\beta\rho)$ because the worst-case mean shift scales with $1/(1-\beta\rho)$ rather than $1/(1-\beta)$.
 
 ```{solution-end}
 ```

From 6285f9847acdb28b5dffd4df0d5d5d842435aacd Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 18:37:53 +1100
Subject: [PATCH 24/37] updates

---
 lectures/doubts_or_variability.md | 357 +++++++++++++++++-------------
 1 file changed, 207 insertions(+), 150 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 7076d1c09..cc1e166a1 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -37,27 +37,28 @@ kernelspec:
 
 {cite:t}`Tall2000` showed that a recursive preference specification could match the equity premium and the risk-free rate puzzle simultaneously.
 
-But matching required setting the risk-aversion coefficient $\gamma$ to around 50 for a random-walk consumption model and around 75 for a trend-stationary model --- exactly the range that provoked Lucas's skepticism.
+But matching required setting the risk-aversion coefficient $\gamma$ to around 50 for a random-walk consumption model and around 75 for a trend-stationary model, exactly the range that provoked Lucas's skepticism.
 
 {cite:t}`BHS_2009` ask whether those large $\gamma$ values really measure aversion to atemporal risk, or whether they instead measure the agent's doubts about the underlying probability model.
 
-Their answer --- and the theme of this lecture --- is that much of what looks like "risk aversion" can be reinterpreted as **model uncertainty**.
+Their answer, and the theme of this lecture, is that much of what looks like "risk aversion" can be reinterpreted as **model uncertainty**.
 
 The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max--min recursion in which the agent fears that the probability model governing consumption growth may be wrong.
 
 Under this reading, the parameter that looked like extreme risk aversion instead measures concern about **misspecification**.
 
-They show that modest amounts of model uncertainty can substitute for large amounts of risk aversion
-in terms of choices and effects on asset prices.
+They show that modest amounts of model uncertainty can substitute for large amounts of risk aversion in terms of choices and effects on asset prices.
 
-This reinterpretation changes the welfare question that asset prices answer: do large risk premia measure the benefits from reducing well-understood aggregate fluctuations, or the benefits from reducing doubts about the underlying model?
+This reinterpretation changes the welfare question that asset prices answer.
 
-We start with the Hansen--Jagannathan bound, then specify the statistical environment, lay out four related preference specifications and their relationships, and finally revisit Tallarini's calibration using detection-error probabilities.
+Do large risk premia measure the benefits from reducing well-understood aggregate fluctuations, or the benefits from reducing doubts about the underlying model?
 
-This lecture draws on ideas and techniques that appear in
+We begin with the Hansen--Jagannathan bound, then specify the statistical environment, lay out four related preference specifications and the connections among them, and finally revisit Tallarini's calibration through the lens of detection-error probabilities.
 
-- {ref}`Asset Pricing: Finite State Models <mass>` where we introduce stochastic discount factors.
-- {ref}`Likelihood Ratio Processes <likelihood_ratio_process>` where we develop the likelihood-ratio machinery that reappears here as the worst-case distortion $\hat g$.
+Along the way, we draw on ideas and techniques from
+
+- {ref}`Asset Pricing: Finite State Models <mass>`, where we introduce stochastic discount factors, and
+- {ref}`Likelihood Ratio Processes <likelihood_ratio_process>`, where we develop the likelihood-ratio machinery that reappears here as the worst-case distortion $\hat g$.
 
 
 In addition to what's in Anaconda, this lecture will need the following libraries:
@@ -109,9 +110,9 @@ cov_erf = (r_e_std**2 + r_f_std**2 - r_excess_std**2) / 2.0
 
 ### Pricing kernel and the risk-free rate
 
-In this section, we review a few key concepts from {ref}`Asset Pricing: Finite State Models <mass>`.
+Let's briefly review a few key concepts from {ref}`Asset Pricing: Finite State Models <mass>`.
 
-A random variable $m_{t+1}$ is said to be a **stochastic discount factor** if it satisfies the following equation for the time-$t$ price $p_t$ of a one-period payoff $y_{t+1}$:
+A random variable $m_{t+1}$ is called a **stochastic discount factor** if, for a one-period payoff $y_{t+1}$ with time-$t$ price $p_t$, it satisfies
 
 ```{math}
 :label: bhs_pricing_eq
@@ -183,13 +184,13 @@ The left-hand side of {eq}`bhs_hj_bound` is the **Sharpe ratio**: the expected e
 
 The right-hand side, $\sigma_t(m)/E_t(m)$, is the **market price of risk**: the maximum Sharpe ratio attainable in the market.
 
-The bound says that the Sharpe ratio of any asset cannot exceed the market price of risk.
+In words, no asset's Sharpe ratio can exceed the market price of risk.
 
 #### Unconditional version
 
 The bound {eq}`bhs_hj_bound` is stated in conditional terms.
 
-An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$:
+There is also an unconditional counterpart that works with a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$:
 
 ```{math}
 :label: bhs_hj_unconditional
@@ -200,9 +201,9 @@ An unconditional counterpart considers a vector of $n$ gross returns $R_{t+1}$ (
 b = \mathbf{1} - E(m) E(R).
 ```
 
-In {ref}`Exercise 1 <dov_ex1>`, we will revisit and verify this unconditional version of the HJ bound.
+{ref}`Exercise 1 <dov_ex1>` walks through a derivation of this unconditional bound.
 
-Below we implement a function that computes the right-hand side of {eq}`bhs_hj_unconditional` for any given value of $E(m)$.
+The function below computes the right-hand side of {eq}`bhs_hj_unconditional` for any given value of $E(m)$.
 
 ```{code-cell} ipython3
 def hj_std_bound(E_m):
@@ -214,11 +215,13 @@ def hj_std_bound(E_m):
 
 ### The puzzle
 
-To reconcile formula {eq}`bhs_crra_sdf` with measures of the market price of risk extracted from data on asset returns and prices (like those in Table 1 below) requires a value of $\gamma$ so high that it provokes skepticism --- this is the **equity premium puzzle**.
+Reconciling formula {eq}`bhs_crra_sdf` with the market price of risk extracted from data on asset returns (like those in Table 1 below) requires a value of $\gamma$ so high that it provokes skepticism.
+
+This is the **equity premium puzzle**.
 
 But the puzzle has a second dimension.
 
-High values of $\gamma$ that deliver enough volatility $\sigma(m)$ also push the reciprocal of the risk-free rate $E(m)$ down, and therefore away from the Hansen--Jagannathan bounds.
+High values of $\gamma$ that deliver enough volatility $\sigma(m)$ also push $E(m)$, the reciprocal of the gross risk-free rate, too far down, away from the Hansen--Jagannathan bound.
 
 This is the **risk-free rate puzzle** of {cite:t}`Weil_1989`.
 
@@ -226,11 +229,11 @@ This is the **risk-free rate puzzle** of {cite:t}`Weil_1989`.
 
 The figure below reproduces Tallarini's key diagnostic.
 
-We present this figure before developing the underlying theory because it motivates much of the subsequent analysis.
+We show it before developing the underlying theory because it motivates much of what follows.
 
 The closed-form expressions for the Epstein--Zin SDF moments used in the plot are derived in {ref}`Exercise 2 <dov_ex2>`.
 
-The code below implements those expressions and the corresponding CRRA moments.
+The code below implements them alongside the corresponding CRRA moments.
 
 ```{code-cell} ipython3
 def moments_type1_rw(γ):
@@ -262,7 +265,9 @@ def moments_crra_rw(γ):
     return E_m, mpr
 ```
 
-For each value of $\gamma \in \{1, 5, 10, \ldots, 51\}$, we plot the implied $(E(m),\sigma(m))$ pair for three specifications: time-separable CRRA (crosses), Epstein--Zin preferences with random-walk consumption (circles), and Epstein--Zin preferences with trend-stationary consumption (pluses).
+For each value of $\gamma \in \{1, 5, 10, \ldots, 51\}$, we plot the implied $(E(m),\sigma(m))$ pair for three specifications.
+
+These are time-separable CRRA (crosses), Epstein--Zin preferences with random-walk consumption (circles), and Epstein--Zin preferences with trend-stationary consumption (pluses).
 
 
 ```{code-cell} ipython3
@@ -306,27 +311,33 @@ plt.tight_layout()
 plt.show()
 ```
 
-The crosses show that as $\gamma$ rises, $\sigma(m)/E(m)$ grows but $E(m)$ falls well below the range consistent with the observed risk-free rate.
+The crosses tell the story of the risk-free-rate puzzle ({cite:t}`Weil_1989`).
+
+As $\gamma$ rises, $\sigma(m)/E(m)$ grows but $E(m)$ drifts well below the range consistent with the observed risk-free rate.
+
+The circles and pluses show Tallarini's way out.
 
-This is the risk-free-rate puzzle of {cite:t}`Weil_1989`.
+Recursive utility with IES $= 1$ pushes volatility upward while keeping $E(m)$ roughly pinned near $1/(1+r^f)$.
 
-The circles and pluses show Tallarini's solution.
+For the random-walk model, the bound is reached at around $\gamma = 50$.
 
-Recursive utility with IES $= 1$ pushes volatility upward while keeping $E(m)$ roughly constant near $1/(1+r^f)$.
+For the trend-stationary model, it is reached at around $\gamma = 75$.
 
-For the random-walk model, the bound is reached around $\gamma = 50$; for the trend-stationary model, around $\gamma = 75$.
+The quantitative achievement is impressive, but Lucas's challenge still stands.
 
-The quantitative achievement is significant, but Lucas's challenge remains: what microeconomic evidence supports $\gamma = 50$?
+Where is the microeconomic evidence for $\gamma = 50$?
 
-{cite:t}`BHS_2009` argue that the large $\gamma$ values are not really about risk aversion, but instead reflect the agent's doubts about the underlying probability model.
+{cite:t}`BHS_2009` argue that these large $\gamma$ values are not really about risk aversion. 
+
+Instead, they reflect the agent's doubts about the probability model itself.
 
 ## The choice setting
 
-To develop this reinterpretation, we first need to formalize the setting we are working in.
+To make this reinterpretation precise, we first need to formalize the environment.
 
 ### Shocks and consumption plans
 
-We formulate the analysis in terms of a general class of consumption plans.
+We work with a general class of consumption plans.
 
 Let $x_t$ be an $n \times 1$ state vector and $\varepsilon_{t+1}$ an $m \times 1$ shock.
 
@@ -347,9 +358,9 @@ The time-$t$ consumption can therefore be written as
 c_t = H \left(B\varepsilon_t + AB\varepsilon_{t-1} + \cdots + A^{t-1}B\varepsilon_1\right) + HA^t x_0.
 ```
 
-The equivalence theorems and Bellman equations below hold for arbitrary plans in $\mathcal{C}(A,B,H;x_0)$.
+The equivalence theorems and Bellman equations that follow hold for arbitrary plans in $\mathcal{C}(A,B,H;x_0)$.
 
-The random-walk and trend-stationary models below are two special cases we focus on.
+We focus on the random-walk and trend-stationary models as two special cases.
 
 ### Consumption dynamics
 
@@ -399,7 +410,7 @@ Equivalently, defining the detrended series $\tilde c_t := c_t - \mu t$,
 
 The estimated parameters are $(\mu, \sigma_\varepsilon)$ for the random walk and $(\mu, \sigma_\varepsilon, \rho, \zeta)$ for the trend-stationary case.
 
-Below we record these parameters and moments in the paper's tables for later reference.
+We record these parameters and moments from the paper's tables for later reference.
 
 ```{code-cell} ipython3
 print("Table 2 parameters")
@@ -450,15 +461,15 @@ We compare four preference specifications over consumption plans $C^\infty \in \
 - a single pessimistic joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ induced by the type II worst-case distortion.
 
 
-We will introduce two sets of equivalence results.
+Two sets of equivalence results tie these agents together.
 
-Types I and II are observationally equivalent in the strong sense that they have identical preferences over $\mathcal{C}$.
+Types I and II turn out to be observationally equivalent in a strong sense, having identical preferences over $\mathcal{C}$.
 
-Types III and IV are observationally equivalent in a weaker but still useful sense: for the particular endowment process taken as given, they deliver the same worst-case pricing implications as a type II agent.
+Types III and IV are equivalent in a weaker but still useful sense, delivering the same worst-case pricing implications as a type II agent for a given endowment process.
 
-We now formalize each of the four agent types and develop the equivalence results that connect them.
+We now formalize each agent type and develop the equivalences among them.
 
-For each of the four types, we will derive a Bellman equation that characterizes the agent's value function and stochastic discount factor.
+For each type, we derive a Bellman equation that pins down the agent's value function and stochastic discount factor.
 
 The stochastic discount factor for all four types takes the form
 
@@ -469,11 +480,11 @@ $$
 where $\hat g_{t+1}$ is a likelihood-ratio distortion that we will define in each case.
 
 
-Along the way, we introduce the likelihood-ratio distortion that appears in the stochastic discount factor and develop the detection-error probability that serves as our new calibration device.
+Along the way, we introduce the likelihood-ratio distortion that enters the stochastic discount factor and develop the detection-error probability that will serve as our new calibration device.
 
 ### Type I: Kreps--Porteus--Epstein--Zin--Tallarini preferences
 
-The general Epstein--Zin--Weil specification aggregates current consumption and a certainty equivalent of future utility using a CES function:
+The general Epstein--Zin--Weil specification combines current consumption with a certainty equivalent of future utility through a CES aggregator:
 
 ```{math}
 :label: bhs_ez_general
@@ -516,7 +527,7 @@ Taking logs and expanding the certainty equivalent {eq}`bhs_certainty_equiv` giv
 \log E_t\left[(V_{t+1})^{1-\gamma}\right].
 ```
 
-A key intermediate step is to define the transformed continuation value
+A useful change of variables is to define the transformed continuation value
 
 ```{math}
 :label: bhs_Ut_def
@@ -537,7 +548,7 @@ Substituting into {eq}`bhs_type1_recursion` yields the **risk-sensitive recursio
 U_t = c_t - \beta\theta \log E_t\left[\exp\left(\frac{-U_{t+1}}{\theta}\right)\right].
 ```
 
-When $\gamma = 1$ (equivalently $\theta = +\infty$), the $\log E \exp$ term reduces to $E_t U_{t+1}$ and the recursion becomes standard discounted expected log utility: $U_t = c_t + \beta E_t U_{t+1}$.
+When $\gamma = 1$ (equivalently $\theta = +\infty$), the $\log E \exp$ term reduces to $E_t U_{t+1}$ and the recursion becomes standard discounted expected log utility, $U_t = c_t + \beta E_t U_{t+1}$.
 
 For consumption plans in $\mathcal{C}(A, B, H; x_0)$, the recursion {eq}`bhs_risk_sensitive` implies the Bellman equation
 
@@ -548,13 +559,13 @@ U(x) = c - \beta\theta \log \int \exp\left[\frac{-U(Ax + B\varepsilon)}{\theta}\
 
 #### Deriving the stochastic discount factor
 
-The stochastic discount factor is the intertemporal marginal rate of substitution: the ratio of marginal utilities of the consumption good at dates $t+1$ and $t$.
+The stochastic discount factor is the intertemporal marginal rate of substitution, the ratio of marginal utilities at dates $t+1$ and $t$.
 
-Since $c_t$ enters {eq}`bhs_risk_sensitive` linearly, $\partial U_t / \partial c_t = 1$.
+Because $c_t$ enters {eq}`bhs_risk_sensitive` linearly, $\partial U_t / \partial c_t = 1$.
 
 Converting from log consumption to the consumption good gives $\partial U_t / \partial C_t = 1/C_t$.
 
-A perturbation to $c_{t+1}$ in a particular state affects $U_t$ through the $\log E_t \exp$ term.
+A perturbation to $c_{t+1}$ in a particular state feeds into $U_t$ through the $\log E_t \exp$ term.
 
 Differentiating {eq}`bhs_risk_sensitive`:
 
@@ -591,9 +602,9 @@ The second factor is the likelihood-ratio distortion $\hat g_{t+1}$: an exponent
 
 We now turn to the type II (multiplier) agent.
 
-Before writing down the preferences, we introduce the machinery of martingale likelihood ratios used to formalize model distortions.
+Before writing down the preferences, we need the machinery of martingale likelihood ratios that formalizes what it means to distort a probability model.
 
-The tools in this section build on {ref}`Likelihood Ratio Processes <likelihood_ratio_process>`, which develops properties of likelihood ratios in detail, and {ref}`Divergence Measures <divergence_measures>`, which covers relative entropy.
+These tools build on {ref}`Likelihood Ratio Processes <likelihood_ratio_process>`, which develops properties of likelihood ratios in detail, and {ref}`Divergence Measures <divergence_measures>`, which covers relative entropy.
 
 
 #### Martingale likelihood ratios
@@ -629,7 +640,7 @@ A type II agent's *multiplier* preference ordering over consumption plans $C^\in
 
 where $G_{t+1} = g_{t+1}G_t$, $E_t[g_{t+1}] = 1$, $g_{t+1} \geq 0$, and $G_0 = 1$.
 
-The parameter $\theta > 0$ penalizes the relative entropy of probability distortions.
+A larger $\theta$ makes probability distortions more expensive, discouraging departures from the approximating model.
 
 The value function satisfies the Bellman equation
 
@@ -655,7 +666,9 @@ The minimizer is ({ref}`Exercise 4 <dov_ex4>` derives this and verifies the equi
 \frac{\exp \bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)}{E_t \left[\exp \bigl(-W(Ax_t + B\varepsilon_{t+1})/\theta\bigr)\right]}.
 ```
 
-The fact that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty is the key structural feature that makes $\hat g$ a likelihood ratio.
+Notice that $g(\varepsilon)$ multiplies both the continuation value $W$ and the entropy penalty.
+
+This is the key structural feature that makes $\hat g$ a likelihood ratio.
 
 
 Substituting {eq}`bhs_ghat` back into {eq}`bhs_bellman_type2` gives
@@ -687,9 +700,9 @@ def γ_from_θ(θ, β=β):
 
 ### Type III: constraint preferences
 
-Type III (constraint) preferences replace the entropy penalty with a hard bound.
+Type III (constraint) preferences swap the entropy penalty for a hard bound.
 
-The agent minimizes expected discounted log consumption under the worst-case model, subject to a cap $\eta$ on discounted relative entropy:
+Rather than penalizing distortions through $\theta$, the agent minimizes expected discounted log consumption under the worst-case model subject to a cap $\eta$ on discounted relative entropy:
 
 ```{math}
 J(x_0)
@@ -730,15 +743,15 @@ $$
 
 which, apart from the constant $-\theta\eta$, has the same structure as the type II objective {eq}`bhs_type2_objective`.
 
-The first-order condition for $g_{t+1}$ is therefore identical, and the optimal distortion is the same $\hat g_{t+1}$ as in {eq}`bhs_ghat` for the $\theta$ that makes the entropy constraint bind.
+The first-order condition for $g_{t+1}$ is therefore identical, and the optimal distortion is the same $\hat g_{t+1}$ as in {eq}`bhs_ghat`, evaluated at the $\theta$ that makes the entropy constraint bind.
 
 The SDF is again $m_{t+1} = \beta(C_t/C_{t+1})\hat g_{t+1}$.
 
-For the particular $A, B, H$ and $\theta$ used to derive the worst-case joint distribution $\hat\Pi_\infty$, the shadow prices of uncertain claims for a type III agent match those of a type II agent.
+So for the particular endowment process and the $\theta$ that enforces the entropy bound, a type III agent and a type II agent assign the same shadow prices to uncertain claims.
 
 ### Type IV: ex post Bayesian
 
-Type IV is an ordinary expected-utility agent with log preferences evaluated under a single pessimistic probability model $\hat\Pi_\infty$:
+The type IV agent is the simplest of the four: an ordinary expected-utility agent with log preferences who happens to hold a pessimistic probability model $\hat\Pi_\infty$:
 
 ```{math}
 \hat E_0 \sum_{t=0}^{\infty} \beta^t c_t.
@@ -746,9 +759,9 @@ Type IV is an ordinary expected-utility agent with log preferences evaluated und
 
 $\hat E_0$ denotes expectation under the pessimistic model $\hat\Pi_\infty$.
 
-The joint distribution $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ is the one associated with the type II agent's worst-case distortion.
+Here $\hat\Pi_\infty(\cdot \mid x_0, \theta)$ is the joint distribution generated by the type II agent's worst-case distortion.
 
-Under $\hat\Pi_\infty$ the agent has log utility, so the Euler equation for any gross return $R_{t+1}$ is
+Since the agent has log utility under $\hat\Pi_\infty$, the Euler equation for any gross return $R_{t+1}$ is
 
 $$
 1 = \hat E_t \left[\beta \frac{C_t}{C_{t+1}} R_{t+1}\right].
@@ -767,7 +780,7 @@ For the particular $A, B, H$ and $\theta$ used to construct $\hat\Pi_\infty$, th
 
 ### Stochastic discount factor
 
-As we have shown for each of the four agent types, the stochastic discount factor can be written compactly as
+Pulling together the results for all four agent types, the stochastic discount factor can be written compactly as
 
 ```{math}
 :label: bhs_sdf
@@ -776,15 +789,15 @@ m_{t+1}
 \beta \frac{C_t}{C_{t+1}} \hat g_{t+1}.
 ```
 
-The distortion $\hat g_{t+1}$ is a likelihood ratio between the approximating and worst-case one-step models.
+The factor $\hat g_{t+1}$ is a likelihood ratio between the approximating and worst-case one-step models.
 
 With log utility, $C_t/C_{t+1} = \exp(-(c_{t+1}-c_t))$ is the usual intertemporal marginal rate of substitution.
 
-Robustness multiplies that term by $\hat g_{t+1}$, so uncertainty aversion enters pricing only through the distortion.
+Robustness multiplies it by $\hat g_{t+1}$, so uncertainty aversion enters pricing entirely through the distortion.
 
-For constraint preferences, the worst-case distortion is the same as for multiplier preferences with the $\theta$ that makes the entropy constraint bind.
+For the constraint-preference agent, the worst-case distortion coincides with the multiplier agent's at the $\theta$ that makes the entropy constraint bind.
 
-For the ex post Bayesian, the distortion is a change of measure from the approximating model to the pessimistic model.
+For the ex post Bayesian, it is simply a change of measure from the approximating model to the pessimistic one.
 
 ### Value function decomposition
 
@@ -811,15 +824,20 @@ Then $W(x) = J(x) + \theta N(x)$.
 
 Here $J(x_t) = \hat E_t \sum_{j=0}^{\infty} \beta^j c_{t+j}$ is expected discounted log consumption under the *worst-case* model.
 
-$J$ is the value function for both the type III and the type IV agent: the type III agent maximizes expected utility subject to an entropy constraint, and once the worst-case model is determined, the resulting value is expected discounted consumption under that model; the type IV agent uses the same worst-case model as a fixed belief, so evaluates the same expectation.
+$J$ is the value function shared by both the type III and type IV agents.  
+
+For the type III agent, once the worst-case model is pinned down by the entropy constraint, the resulting value is simply expected discounted consumption under that model.
 
-The term $N(x)$ is discounted continuation entropy: it measures the total information cost of the probability distortion from date $t$ onward.
+The type IV agent adopts the same model as a fixed belief, so she evaluates the same expectation.
 
-This decomposition will be important for the welfare calculations in {ref}`the welfare section <welfare_experiments>` below, where it explains why type III uncertainty compensation is twice that of type II.
+The term $N(x)$ is discounted continuation entropy, measuring the total information cost of the probability distortion from date $t$ onward.
+
+This decomposition plays a central role in the welfare calculations of {ref}`the welfare section <welfare_experiments>` below, where it explains why type III uncertainty compensation is twice that of type II.
 
 ### Gaussian mean-shift distortions
 
-The preceding results hold for general distortions $\hat g$.
+Everything so far holds for general distortions $\hat g$.
+
 We now specialize to the Gaussian case that underlies our two consumption models.
 
 Under both models, the shock is $\varepsilon_{t+1} \sim \mathcal{N}(0,1)$.
@@ -848,7 +866,9 @@ Hence $\log \hat g_{t+1}$ is normal with mean $-w^2/2$ and variance $w^2$, and
 ```
 
 The mean shift $w$ is determined by how strongly each shock $\varepsilon_{t+1}$ affects continuation value.
+
 From {eq}`bhs_ghat`, the worst-case distortion puts $\hat g \propto \exp(-W(x_{t+1})/\theta)$.
+
 If $W(x_{t+1})$ loads on $\varepsilon_{t+1}$ with coefficient $\lambda$, then the Gaussian mean shift is $w = -\lambda/\theta$.
 
 By guessing linear value functions and matching coefficients in the Bellman equation ({ref}`Exercise 11 <dov_ex11>` works out both cases), we obtain the worst-case mean shifts
@@ -860,7 +880,9 @@ w_{rw}(\theta) = -\frac{\sigma_\varepsilon}{(1-\beta)\theta},
 w_{ts}(\theta) = -\frac{\sigma_\varepsilon}{(1-\rho\beta)\theta}.
 ```
 
-The denominator $(1-\beta)$ in the random-walk case is replaced by $(1-\beta\rho)$ in the trend-stationary case: because the AR(1) component is persistent, each shock has a larger effect on continuation utility.
+The denominator $(1-\beta)$ in the random-walk case becomes $(1-\beta\rho)$ in the trend-stationary case.  
+
+Because the AR(1) component is persistent, each shock has a larger cumulative effect on continuation utility, so the worst-case distortion is more aggressive.
 
 ```{code-cell} ipython3
 def w_from_θ(θ, model):
@@ -895,9 +917,9 @@ def η_from_θ(θ, model):
     return β * w**2 / (2.0 * (1.0 - β))
 ```
 
-This formula provides a mapping between $\theta$ and $\eta$ that aligns multiplier and constraint preferences along an exogenous endowment process.
+This gives a clean mapping between $\theta$ and $\eta$ that aligns multiplier and constraint preferences along an exogenous endowment process.
 
-In the {ref}`detection-error section <detection_error_section>` below, we show that it is more natural to hold $\eta$ (or equivalently the detection-error probability $p$) fixed rather than $\theta$ when comparing across consumption models.
+As we will see in the {ref}`detection-error section <detection_error_section>` below, it is more natural to hold $\eta$ (or equivalently the detection-error probability $p$) fixed rather than $\theta$ when comparing across consumption models.
 
 ### Value functions for random-walk consumption
 
@@ -928,18 +950,20 @@ Using $W = J + \theta N$, the type III/IV value function is
 J(x_t) = W(x_t) - \theta N(x_t) = \frac{1}{1-\beta}\left[c_t + \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{(1-\beta)\theta}\right)\right].
 ```
 
-The coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ doubles from $\tfrac{1}{2}$ in $W$ to $1$ in $J$ because $W$ includes the entropy "rebate" $\theta N$ that partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model.
+Notice that the coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ doubles from $\tfrac{1}{2}$ in $W$ to $1$ in $J$.  
 
-This difference propagates directly into the welfare calculations below.
+The reason is that $W$ includes the entropy "rebate" $\theta N$, which partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model with no such offset.
+
+This seemingly small difference drives a factor-of-two wedge in the welfare calculations below.
 
 (detection_error_section)=
 ## A new calibration language: detection-error probabilities
 
-The preceding section derived SDF moments, value functions, and worst-case distortions as functions of $\gamma$ (or equivalently $\theta$).
+So far we have expressed SDF moments, value functions, and worst-case distortions as functions of $\gamma$ (or equivalently $\theta$).
 
 But if $\gamma$ should not be calibrated by introspection about atemporal gambles, what replaces it?
 
-The answer proposed by {cite:t}`BHS_2009` is a statistical test that asks how easily one could distinguish the approximating model from its worst-case alternative.
+The answer proposed by {cite:t}`BHS_2009` is a statistical test: how easily could an econometrician distinguish the approximating model from its worst-case alternative?
 
 ### Likelihood-ratio testing and detection errors
 
@@ -959,13 +983,15 @@ Then $p(\theta^{-1}) = \frac{1}{2}(p_A + p_B)$ is the average probability of cho
 
 Fix a sample size $T$ (here 235 quarters, matching the postwar US data used in the paper).
 
-For a given $\theta$, compute the worst-case model and ask: if a Bayesian ran a likelihood-ratio test to distinguish the approximating model from the worst-case model, what fraction of the time would she make an error?
+For a given $\theta$, compute the worst-case model and imagine that a Bayesian runs a likelihood-ratio test to distinguish it from the approximating model.  
+
+What fraction of the time would she pick the wrong one?
 
 That fraction is the **detection-error probability** $p(\theta^{-1})$.
 
-A high $p$ (near 0.5) means the two models are nearly indistinguishable, so the consumer's fear is hard to rule out.
+When $p$ is close to 0.5 the two models are nearly indistinguishable, so the consumer's fear is hard to rule out.
 
-A low $p$ means the worst-case model is easy to reject and the robustness concern is less compelling.
+When $p$ is small the worst-case model is easy to reject and the robustness concern carries less force.
 
 ### Market price of model uncertainty
 
@@ -1014,37 +1040,33 @@ def θ_from_detection_probability(p, model):
 
 ### Interpreting the calibration objects
 
-We now summarize the chain of mappings that connects preference parameters to statistical distinguishability.
+Let us trace the chain of mappings that connects preference parameters to statistical distinguishability.
 
-The parameter $\theta$ indexes how expensive it is for the minimizing player to distort the approximating model.
+The parameter $\theta$ governs how expensive it is for the minimizing player to distort the approximating model.
 
-A small $\theta$ means a cheap distortion and therefore stronger robustness concerns.
+A small $\theta$ means cheap distortions and therefore stronger robustness concerns.
 
 The associated $\gamma = 1 + \left[(1-\beta)\theta\right]^{-1}$ can be large even when we do not want to interpret behavior as extreme atemporal risk aversion.
 
-The distortion magnitude $|w(\theta)|$ is a direct measure of how pessimistically the agent tilts one-step probabilities.
+The distortion magnitude $|w(\theta)|$ directly measures how pessimistically the agent tilts one-step probabilities.
 
-Detection error probability $p(\theta^{-1})$ translates that tilt into a statistical statement about finite-sample distinguishability.
+The detection-error probability $p(\theta^{-1})$ translates that tilt into a statistical statement about finite-sample distinguishability.
 
-High $p(\theta^{-1})$ means the two models are hard to distinguish.
+High $p$ means the two models are hard to tell apart, while low $p$ means the worst case is easier to reject.
 
-Low $p(\theta^{-1})$ means they are easier to distinguish.
+This chain bridges econometric identification and preference calibration.
 
-This mapping bridges econometric identification and preference calibration.
-
-Finally, recall from {eq}`bhs_eta_formula` that discounted entropy is $\eta = \frac{\beta}{2(1-\beta)}w(\theta)^2$.
-
-This tells us that when the distortion is a Gaussian mean shift, discounted entropy is proportional to the squared market price of model uncertainty.
+Finally, recall from {eq}`bhs_eta_formula` that discounted entropy is $\eta = \frac{\beta}{2(1-\beta)}w(\theta)^2$, so when the distortion is a Gaussian mean shift, discounted entropy is proportional to the squared market price of model uncertainty.
 
 ### Detection probabilities across the two models
 
-The left panel below plots $p(\theta^{-1})$ against $\theta^{-1}$ for the two consumption specifications.
+The left panel below plots $p(\theta^{-1})$ against $\theta^{-1}$ for both consumption specifications.
 
-Notice that the same numerical $\theta$ corresponds to very different detection probabilities across models, because baseline dynamics differ.
+Because the baseline dynamics differ, the same numerical $\theta$ implies very different detection probabilities across the two models.
 
-The right panel resolves this by plotting detection probabilities against discounted relative entropy $\eta$, which normalizes the statistical distance.
+The right panel resolves this by plotting detection probabilities against discounted relative entropy $\eta$, which normalizes the statistical distance.  
 
-Indexed by $\eta$, the two curves coincide.
+Once indexed by $\eta$, the two curves fall on top of each other.
 
 ```{code-cell} ipython3
 ---
@@ -1083,11 +1105,11 @@ plt.tight_layout()
 plt.show()
 ```
 
-Detection-error probabilities (or equivalently, discounted entropy) therefore provide the right cross-model yardstick.
+Detection-error probabilities (or equivalently, discounted entropy) are therefore the right yardstick for cross-model comparisons.
 
-Holding $\theta$ fixed when switching from a random walk to a trend-stationary specification implicitly changes how much misspecification the consumer fears.
+If we hold $\theta$ fixed when switching from a random walk to a trend-stationary specification, we implicitly change how much misspecification the consumer fears.
 
-Holding $\eta$ or $p$ fixed keeps the statistical difficulty of detecting misspecification constant.
+Holding $\eta$ or $p$ fixed instead keeps the statistical difficulty of detecting misspecification constant.
 
 The explicit mapping that equates discounted entropy across models is ({ref}`Exercise 7 <dov_ex7>` derives it):
 
@@ -1105,9 +1127,9 @@ Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ i
 
 ## Detection probabilities unify the two models
 
-We now redraw Tallarini's figure using detection-error probabilities.
+With this machinery in hand, we can redraw Tallarini's figure using detection-error probabilities as the common index.
 
-For each detection-error probability $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$, invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m), \sigma(m))$ pair.
+For each $p(\theta^{-1}) = 0.50, 0.45, \ldots, 0.01$, we invert to find the model-specific $\theta$, convert to $\gamma$, and plot the implied $(E(m), \sigma(m))$ pair.
 
 ```{code-cell} ipython3
 p_points = np.array(
@@ -1194,15 +1216,21 @@ plt.tight_layout()
 plt.show()
 ```
 
-The result is striking: the random-walk and trend-stationary loci nearly coincide.
+The result is striking.
+
+The random-walk and trend-stationary loci nearly coincide.
+
+Recall that under Tallarini's $\gamma$-calibration, reaching the Hansen--Jagannathan bound required $\gamma \approx 50$ for the random walk but $\gamma \approx 75$ for the trend-stationary model.
 
-Recall that under Tallarini's $\gamma$-calibration, reaching the Hansen--Jagannathan bound required $\gamma \approx 50$ for the random walk but $\gamma \approx 75$ for the trend-stationary model --- very different numbers for the "same" preference parameter.
+These are very different numbers for what is supposed to be the "same" preference parameter.
 
-Under detection-error calibration, both models reach the bound at the same detectability level.
+Under detection-error calibration, both models reach the bound at essentially the same detectability level.
 
-The apparent model dependence was an artifact of using $\gamma$ as a cross-model yardstick.
+The apparent model dependence was an artifact of using $\gamma$ as the cross-model yardstick.
 
-Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell the same story: a representative consumer with moderate, difficult-to-dismiss fears about model misspecification behaves as though she has very high risk aversion.
+Once we measure robustness concerns in units of statistical detectability, the two consumption specifications tell a single, coherent story.
+
+A representative consumer with moderate, difficult-to-dismiss fears about model misspecification behaves as though she has very high risk aversion.
 
 The following figure brings together the two key ideas of this section: a small one-step density shift that is hard to detect (left panel) compounds into a large gap in expected log consumption (right panel).
 
@@ -1274,9 +1302,9 @@ plt.tight_layout()
 plt.show()
 ```
 
-The next figure decomposes the log SDF into two additive components.
+The next figure makes the "doubts or variability?" question by decomposing the log SDF into two additive components.
 
-Taking logs of the SDF {eq}`bhs_sdf` gives
+Taking logs of {eq}`bhs_sdf` gives
 
 $$
 \log m_{t+1}
@@ -1302,9 +1330,15 @@ so the slope of $\log m_{t+1}$ in $\varepsilon_{t+1}$ is $\sigma_\varepsilon - w
 
 Since $w < 0$, the distortion steepens the SDF relative to what log utility alone would deliver.
 
-In the figure below, the intertemporal marginal rate of substitution (IMRS) is nearly flat: at postwar calibrated volatility ($\sigma_\varepsilon = 0.005$), it contributes almost nothing to the pricing kernel's slope.
+The figure below reveals how little work log utility does on its own.  
+
+The intertemporal marginal rate of substitution (IMRS) is nearly flat.
+
+At postwar calibrated volatility ($\sigma_\varepsilon = 0.005$), it contributes almost nothing to the pricing kernel's slope.
 
-The distortion accounts for virtually all of the SDF volatility --- what looks like extreme risk aversion ($\gamma \approx 34$) is really log utility plus moderate fears of model misspecification.
+The worst-case distortion accounts for virtually all of the SDF's volatility.  
+
+What looks like extreme risk aversion ($\gamma \approx 34$) is really just log utility combined with moderate fears of model misspecification.
 
 ```{code-cell} ipython3
 ---
@@ -1345,19 +1379,19 @@ plt.show()
 (welfare_experiments)=
 ## What do risk premia measure?
 
-{cite:t}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate aggregate fluctuations.
+{cite:t}`Lucas_2003` asked how much consumption a representative consumer would sacrifice to eliminate aggregate fluctuations.  
 
-His answer rested on the assumption that the consumer knows the data-generating process.
+His answer rested on the assumption that the consumer knows the true data-generating process.
 
-The robust reinterpretation introduces a second, distinct thought experiment.
+The robust reinterpretation opens up a second, quite different thought experiment.
 
-Instead of eliminating all randomness, suppose we keep randomness but remove the consumer's fear of model misspecification (set $\theta = \infty$).
+Instead of eliminating all randomness, suppose we keep the randomness but remove the consumer's fear of model misspecification (set $\theta = \infty$).  
 
 How much would she pay for that relief?
 
-Formally, we seek a permanent proportional reduction $c_0 - c_0^k$ in initial log consumption that leaves an agent of type $k$ indifferent between the original risky plan and a deterministic certainty-equivalent path.
+To answer this, we seek a permanent proportional reduction $c_0 - c_0^k$ in initial log consumption that leaves an agent of type $k$ indifferent between the original risky plan and a deterministic certainty-equivalent path.
 
-Because utility is log and the consumption process is Gaussian, these compensations are available in closed form.
+Because utility is log and the consumption process is Gaussian, these compensations can be computed in closed form.
 
 ### The certainty equivalent path
 
@@ -1368,7 +1402,9 @@ The point of comparison is the deterministic path with the same mean level of co
 c_{t+1}^{ce} - c_t^{ce} = \mu + \tfrac{1}{2}\sigma_\varepsilon^2.
 ```
 
-The additional $\tfrac{1}{2}\sigma_\varepsilon^2$ term is a Jensen's inequality correction: $E[C_t] = E[e^{c_t}] = \exp(c_0 + t\mu + \tfrac{1}{2}t\sigma_\varepsilon^2)$, so {eq}`bhs_ce_path` matches the mean *level* of consumption at every date.
+The additional $\tfrac{1}{2}\sigma_\varepsilon^2$ term is a Jensen's inequality correction.
+
+Since $E[C_t] = E[e^{c_t}] = \exp(c_0 + t\mu + \tfrac{1}{2}t\sigma_\varepsilon^2)$, {eq}`bhs_ce_path` matches the mean *level* of consumption at every date.
 
 ### Compensating variations from the value functions
 
@@ -1428,7 +1464,7 @@ where the last step uses $\gamma = 1 + [(1-\beta)\theta]^{-1}$.
 
 Because $W \equiv U$, we have $c_0^{II} = c_0^I$ and the total compensation is the same.
 
-However, the interpretation differs: we can now decompose it into **risk** and **model uncertainty** components.
+However, the interpretation differs because we can now decompose it into **risk** and **model uncertainty** components.
 
 A type II agent with $\theta = \infty$ (no model uncertainty) has log preferences and requires
 
@@ -1443,9 +1479,13 @@ A type II agent with $\theta = \infty$ (no model uncertainty) has log preference
 \frac{\beta \sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
 ```
 
-The risk term $\Delta c_0^{risk}$ is Lucas's cost of business cycles: at postwar consumption volatility ($\sigma_\varepsilon \approx 0.005$), it is small.
+The risk term $\Delta c_0^{risk}$ is Lucas's cost of business cycles.
 
-The uncertainty term $\Delta c_0^{uncertainty}$ is the additional compensation a type II agent requires for facing model misspecification. It can be first-order whenever the detection-error probability is moderate, because $\theta$ appears in the denominator.
+At postwar consumption volatility ($\sigma_\varepsilon \approx 0.005$), it is negligibly small.
+
+The uncertainty term $\Delta c_0^{uncertainty}$ captures the additional compensation a type II agent demands for facing model misspecification. 
+
+With $\theta$ in the denominator, this term can be first-order even when the detection-error probability is only moderate.
 
 ### Type III (constraint) compensation
 
@@ -1474,7 +1514,9 @@ c_0 - c_0^{III}
 \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}(2\gamma - 1).
 ```
 
-The risk component is the same $\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}$ as before. The uncertainty component alone is
+The risk component is the same $\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}$ as before.
+
+The uncertainty component alone is
 
 $$
 c_0^{III}(r) - c_0^{III}
@@ -1483,22 +1525,24 @@ c_0^{III}(r) - c_0^{III}
 $$
 
 which is *twice* the type II uncertainty compensation {eq}`bhs_type2_rw_decomp`.
-The factor of two traces back to the difference between $W$ and $J$ noted after {eq}`bhs_J_rw`: the entropy rebate $\theta N$ in $W = J + \theta N$ partially offsets the pessimistic tilt for the type II agent, but not for the type III agent who evaluates consumption purely under the worst-case model.
+
+The factor of two traces back to the difference between $W$ and $J$ noted after {eq}`bhs_J_rw`.
+
+The entropy rebate $\theta N$ in $W = J + \theta N$ partially offsets the pessimistic tilt for the type II agent, but not for the type III agent who evaluates consumption purely under the worst-case model.
 
 ### Type IV (ex post Bayesian) compensation
 
 A type IV agent believes the pessimistic model, so the perceived drift is $\tilde\mu = \mu - \sigma_\varepsilon^2/[(1-\beta)\theta]$.
+
 The compensation for moving to the certainty-equivalent path is the same as {eq}`bhs_type3_rw_decomp`, because this agent ranks plans using the same value function $J$.
 
 ### Comparison with a risky but free-of-model-uncertainty path
 
-The certainty equivalents above compare a risky plan to a deterministic path, eliminating both risk and uncertainty simultaneously.
-
-We now describe an alternative measure that isolates compensation for model uncertainty by keeping risk intact.
+The certainty equivalents above compare a risky plan to a deterministic path, thereby eliminating both risk and uncertainty at once.
 
-We compare two situations with identical risky consumption for all dates $t \geq 1$.
+We now describe an alternative measure that isolates compensation for model uncertainty alone by keeping risk intact.
 
-All compensation for model uncertainty is concentrated in an adjustment to date-zero consumption.
+The idea is to compare two situations with identical risky consumption for all dates $t \geq 1$, concentrating all compensation for model uncertainty in a single adjustment to date-zero consumption.
 
 Specifically, we seek $c_0^{II}(u)$ that makes a type II agent indifferent between:
 
@@ -1526,7 +1570,9 @@ $$
 = \frac{\beta}{1-\beta}\left[c_0 + \frac{\mu}{1-\beta}\right],
 $$
 
-where we used $E[c_1] = c_0 + \mu$ (the noise term has zero mean). Expanding gives
+where we used $E[c_1] = c_0 + \mu$ (the noise term has zero mean).
+
+Expanding gives
 
 $$
 \beta E\left[V^{\log}(c_1)\right]
@@ -1550,7 +1596,9 @@ c_0 - c_0^{II}(u) = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta} = \frac
 
 This is $\frac{1}{1-\beta}$ times the uncertainty compensation $\Delta c_0^{\text{uncertainty}}$ from {eq}`bhs_type2_rw_decomp`.
 
-The multiplicative factor $\frac{1}{1-\beta}$ arises because all compensation is concentrated in a single period: adjusting $c_0$ alone must offset the cumulative loss in continuation value that the uncertainty penalty imposes in every future period.
+The extra factor of $\frac{1}{1-\beta}$ arises because all compensation is packed into a single period.
+
+Adjusting $c_0$ alone must offset the cumulative loss in continuation value that the uncertainty penalty imposes in every future period.
 
 An analogous calculation for a **type III** agent, using $J(c_0)$ from {eq}`bhs_J_rw`, gives
 
@@ -1575,7 +1623,9 @@ The following table collects all compensating variations for the random walk mod
 | III | $c_0^{III}(r) - c_0^{III}$ | $\frac{\beta\sigma_\varepsilon^2}{(1-\beta)^2\theta}$ | uncertainty only (vs. deterministic) |
 | III | $c_0 - c_0^{III}(u)$ | $\frac{\beta\sigma_\varepsilon^2}{(1-\beta)^3\theta}$ | uncertainty only (vs. risky path) |
 
-The "vs. deterministic" rows use the certainty-equivalent path {eq}`bhs_ce_path` as a benchmark; the "vs. risky path" rows use the risky-but-uncertainty-free comparison of {eq}`bhs_comp_type2u`--{eq}`bhs_comp_type3u`.
+The "vs. deterministic" rows use the certainty-equivalent path {eq}`bhs_ce_path` as a benchmark.
+
+The "vs. risky path" rows use the risky-but-uncertainty-free comparison of {eq}`bhs_comp_type2u`--{eq}`bhs_comp_type3u`.
 
 ### Trend-stationary formulas
 
@@ -1589,7 +1639,7 @@ $$
 \Delta c_0^{unc,ts,III} = \frac{\beta\sigma_\varepsilon^2}{(1-\beta\rho)^2\theta}.
 $$
 
-The qualitative message is the same: the risk component is negligible, and the model-uncertainty component dominates.
+The qualitative message carries over: the risk component is negligible, and the model-uncertainty component dominates.
 
 ## Visualizing the welfare decomposition
 
@@ -1674,9 +1724,12 @@ The left panel illustrates the elimination of model uncertainty and risk for a t
 
 The shaded fan shows a one-standard-deviation band for the $j$-step-ahead conditional distribution of $c_t$ under the calibrated random-walk model.
 
-The dashed line $c^{II}$ shows the certainty-equivalent path whose date-zero consumption is reduced by $c_0 - c_0^{II}$, making the type II agent indifferent between this deterministic trajectory and the stochastic plan; it compensates for bearing both risk and model ambiguity.
+The dashed line $c^{II}$ shows the certainty-equivalent path whose date-zero consumption is reduced by $c_0 - c_0^{II}$, making the type II agent indifferent between this deterministic trajectory and the stochastic plan.
+
+It compensates for bearing both risk and model ambiguity.
 
 The solid line $c^r$ shows the certainty equivalent for a type II agent without model uncertainty ($\theta = \infty$), initialized at $c_0 - c_0^{II}(r)$.
+
 At postwar calibrated values this gap is small, so $c^r$ sits just below the center of the fan.
 
 Consistent with {cite:t}`Lucas_2003`, the welfare gains from eliminating well-understood risk are very small.
@@ -1687,7 +1740,7 @@ The right panel shows the set of nearby models that the robust consumer guards a
 
 Each shaded fan depicts a one-standard-deviation band for a different model in the ambiguity set.
 
-The models are statistically close to the baseline --- their detection-error probability is $p = 0.10$ --- but imply very different long-run consumption levels.
+The models are statistically close to the baseline, with detection-error probability $p = 0.10$, but imply very different long-run consumption levels.
 
 The consumer's caution against such alternatives accounts for the large certainty-equivalent gap in the left panel.
 
@@ -1695,11 +1748,11 @@ The consumer's caution against such alternatives accounts for the large certaint
 
 A type III (constraint-preference) agent evaluates the worst model inside an entropy ball of radius $\eta$.
 
-As $\eta$ grows, the set of plausible misspecifications expands and the welfare cost of confronting model uncertainty rises.
+As $\eta$ grows the set of plausible misspecifications expands, and with it the welfare cost of confronting model uncertainty.
 
-Because $\eta$ is not directly interpretable, we instead index these costs by the associated detection-error probability $p(\eta)$.
+Since $\eta$ itself is not easy to interpret, we instead index these costs by the associated detection-error probability $p(\eta)$.
 
-The figure below plots compensation for removing model uncertainty, measured as a proportion of consumption, against $p(\eta)$.
+The figure below plots the compensation for removing model uncertainty, measured as a proportion of consumption, against $p(\eta)$.
 
 ```{code-cell} ipython3
 η_grid = np.linspace(0.0, 5.0, 300)
@@ -1755,21 +1808,21 @@ plt.tight_layout()
 plt.show()
 ```
 
-The random-walk model implies somewhat larger costs than the trend-stationary model at the same detection-error probability, but both curves greatly exceed the classic Lucas cost of business cycles.
+The random-walk model implies somewhat larger costs than the trend-stationary model at the same detection-error probability, but both curves dwarf the classic Lucas cost of business cycles.
 
-To put these magnitudes in perspective, Lucas estimated that eliminating all aggregate consumption risk is worth roughly 0.05% of consumption.
+To put the magnitudes in perspective, Lucas estimated that eliminating all aggregate consumption risk is worth roughly 0.05% of consumption.
 
-At detection-error probabilities of 10--20%, the model-uncertainty compensation alone runs to several percent of consumption.
+At detection-error probabilities of 10--20%, the model-uncertainty compensation alone runs to several percent, orders of magnitude larger.
 
-Under the robust reading, the large risk premia that Tallarini matched with high $\gamma$ are compensations for bearing model uncertainty, and the implied welfare gains from resolving that uncertainty are correspondingly large.
+Under the robust reading, the large risk premia that Tallarini matched with high $\gamma$ are really compensations for bearing model uncertainty, and the implied welfare gains from resolving that uncertainty are correspondingly large.
 
-The following contour plot shows how type II (multiplier) compensation varies over a two-dimensional parameter space: the detection-error probability $p$ and the consumption volatility $\sigma_\varepsilon$.
+The following contour plot shows how type II (multiplier) compensation varies over two dimensions: the detection-error probability $p$ and the consumption volatility $\sigma_\varepsilon$.
 
-The star marks the calibrated point ($p = 0.10$, $\sigma_\varepsilon = 0.5\%$).
+The cross marks the calibrated point ($p = 0.10$, $\sigma_\varepsilon = 0.5\%$).
 
 At the calibrated volatility, moving left (lower $p$, stronger robustness concerns) increases compensation dramatically, while the classic risk-only cost (the $p = 50\%$ edge) remains negligible.
 
-Comparing the two panels shows that the random-walk model generates much larger welfare costs than the trend-stationary model at the same ($p$, $\sigma_\varepsilon$), because permanent shocks compound the worst-case drift indefinitely.
+A comparison of the two panels reveals that the random-walk model generates much larger welfare costs than the trend-stationary model at the same ($p$, $\sigma_\varepsilon$), because permanent shocks compound the worst-case drift indefinitely.
 
 ```{code-cell} ipython3
 ---
@@ -1819,17 +1872,17 @@ plt.show()
 
 ## Why doesn't learning eliminate these fears?
 
-A natural objection is: if the consumer has 235 quarters of data, why can't she learn the true drift well enough to dismiss the worst-case model?
+A natural objection arises: if the consumer has 235 quarters of data, why can't she learn the true drift well enough to dismiss the worst-case model?
 
-The answer is that the drift is a low-frequency feature of the data.
+The answer is that the drift is a low-frequency feature of the data, and low-frequency features are hard to pin down.
 
 Estimating the mean of a random walk to the precision needed to reject small but economically meaningful shifts requires far more data than estimating volatility.
 
 The following figure makes this point concrete.
 
-Consumption is measured as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator, and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
+We measure consumption as real personal consumption expenditures on nondurable goods and services, deflated by its implicit chain price deflator and expressed in per-capita terms using the civilian noninstitutional population aged 16+.
 
-We construct real per-capita nondurables-plus-services consumption from four FRED series:
+The construction uses four FRED series:
 
 | FRED series | Description |
 | --- | --- |
@@ -1838,7 +1891,9 @@ We construct real per-capita nondurables-plus-services consumption from four FRE
 | `DPCERD3Q086SBEA` | PCE implicit price deflator (index 2017 $= 100$, quarterly) |
 | `CNP16OV` | Civilian noninstitutional population, 16+ (thousands, monthly) |
 
-We use nominal rather than chained-dollar components because chained-dollar series are not additive: chain-weighted indices update their base-period expenditure weights every period, so components deflated with different price changes do not sum to the separately chained aggregate. 
+We use nominal rather than chained-dollar components because chained-dollar series are not additive.
+
+Chain-weighted indices update their base-period expenditure weights every period, so components deflated with different price changes do not sum to the separately chained aggregate.
 
 Adding nominal series and deflating the sum with a single price index avoids this problem.
 
@@ -1987,15 +2042,15 @@ plt.show()
 
 In the left panel, postwar U.S. log consumption is shown alongside two deterministic trend lines: the approximating-model drift $\mu$ and the worst-case drift $\mu + \sigma_\varepsilon w(\theta)$ for $p(\theta^{-1}) = 0.20$.
 
-The two trends are close enough that, even with decades of data, it is hard to distinguish them by eye.
+The two trends are close enough that, even with six decades of data, it is hard to distinguish them by eye.
 
-In the right panel, as the detection-error probability rises (models become harder to tell apart), the worst-case mean growth rate moves back toward $\hat\mu$.
+In the right panel, as the detection-error probability rises (the two models become harder to tell apart), the worst-case mean growth rate drifts back toward $\hat\mu$.
 
-The dashed gray lines mark a two-standard-error band around the maximum-likelihood estimate of $\mu$.
+The dashed gray lines mark a two-standard-error band around the maximum-likelihood estimate of $\mu$.  
 
 Even at detection probabilities in the 5--20% range, the worst-case drift remains inside (or very near) this confidence band.
 
-Drift distortions that are economically large --- large enough to generate substantial model-uncertainty premia --- are statistically small relative to sampling uncertainty in $\hat\mu$.
+Drift distortions that are economically large, large enough to generate substantial model-uncertainty premia, are statistically small relative to sampling uncertainty in $\hat\mu$.
 
 Robustness concerns persist despite long histories precisely because the low-frequency features that matter most for pricing are the hardest to estimate precisely.
 
@@ -2003,18 +2058,20 @@ Robustness concerns persist despite long histories precisely because the low-fre
 
 The title of this lecture poses a question: are large risk premia prices of **variability** (atemporal risk aversion) or prices of **doubts** (model uncertainty)?
 
-The analysis above shows that the answer cannot be settled by asset-pricing data alone, because the two interpretations are observationally equivalent.
+Asset-pricing data alone cannot settle the question, because the two interpretations are observationally equivalent.
 
 But the choice of interpretation matters for the conclusions we draw.
 
 Under the risk-aversion reading, high Sharpe ratios imply that consumers would pay a great deal to smooth known aggregate consumption fluctuations.
 
-Under the robustness reading, those same Sharpe ratios tell us consumers would pay a great deal to resolve uncertainty about which probability model governs consumption growth.
+Under the robustness reading, those same Sharpe ratios tell us that consumers would pay a great deal to resolve uncertainty about which probability model actually governs consumption growth.
 
 Three features of the analysis support the robustness reading:
 
-1. Detection-error probabilities provide a more stable calibration language than $\gamma$: the two consumption models that required very different $\gamma$ values to match the data yield nearly identical pricing implications when indexed by detectability.
-2. The welfare gains implied by asset prices decompose overwhelmingly into a model-uncertainty component, with the pure risk component remaining small --- consistent with Lucas's original finding.
+1. Detection-error probabilities provide a more stable calibration language than $\gamma$. 
+
+The two consumption models that required very different $\gamma$ values to match the data yield nearly identical pricing implications when indexed by detectability.
+2. The welfare gains implied by asset prices decompose overwhelmingly into a model-uncertainty component, with the pure risk component remaining small, consistent with Lucas's original finding.
 3. The drift distortions that drive pricing are small enough to hide inside standard-error bands, so finite-sample learning cannot eliminate the consumer's fears.
 
 Whether one ultimately prefers the risk or the uncertainty interpretation, the framework clarifies that the question is not about the size of risk premia but about the economic object those premia measure.

From 3be0eca776bbe292cd8d572d6a3b6c700a7c40b2 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 18:57:25 +1100
Subject: [PATCH 25/37] updates

---
 lectures/doubts_or_variability.md | 45 ++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index cc1e166a1..27fc62b5d 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -65,6 +65,7 @@ In addition to what's in Anaconda, this lecture will need the following librarie
 
 ```{code-cell} ipython3
 :tags: [hide-output]
+
 !pip install pandas-datareader
 ```
 
@@ -212,7 +213,6 @@ def hj_std_bound(E_m):
     return np.sqrt(np.maximum(var_lb, 0.0))
 ```
 
-
 ### The puzzle
 
 Reconciling formula {eq}`bhs_crra_sdf` with the market price of risk extracted from data on asset returns (like those in Table 1 below) requires a value of $\gamma$ so high that it provokes skepticism.
@@ -269,7 +269,6 @@ For each value of $\gamma \in \{1, 5, 10, \ldots, 51\}$, we plot the implied $(E
 
 These are time-separable CRRA (crosses), Epstein--Zin preferences with random-walk consumption (circles), and Epstein--Zin preferences with trend-stationary consumption (pluses).
 
-
 ```{code-cell} ipython3
 ---
 mystnb:
@@ -897,18 +896,30 @@ def w_from_θ(θ, model):
 
 ### Discounted entropy
 
-When the approximating and worst-case conditional densities are $\mathcal{N}(0,1)$ and $\mathcal{N}(w,1)$, conditional relative entropy is
+When the approximating and worst-case conditional densities are $\mathcal{N}(0,1)$ and $\mathcal{N}(w,1)$, the likelihood ratio is $\hat g(\varepsilon) = \exp(w\varepsilon - \frac{1}{2}w^2)$, so $\log \hat g(\varepsilon) = w\varepsilon - \frac{1}{2}w^2$.  
+
+Under the worst-case measure $\varepsilon \sim \mathcal{N}(w,1)$, so $E_{\hat\pi}[\varepsilon] = w$, giving conditional relative entropy
 
 ```{math}
 :label: bhs_conditional_entropy
-E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2.
+E_t[\hat g_{t+1}\log \hat g_{t+1}] = w \cdot w - \frac{1}{2}w^2 = \frac{1}{2}w(\theta)^2.
 ```
 
-Because the distortion is i.i.d., the discounted entropy recursion {eq}`bhs_N_recursion` reduces to $N = \beta(\frac{1}{2}w^2 + N)$, giving discounted entropy
+Because the distortion is i.i.d., the conditional entropy $E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2$ from {eq}`bhs_conditional_entropy` is constant and $N(x)$ does not depend on $x$.  
+
+The recursion {eq}`bhs_N_recursion` then reduces to $N(x) = \beta(\frac{1}{2}w^2 + N(x))$, where we have used $\int \hat g(\varepsilon)\pi(\varepsilon)d\varepsilon = 1$ (since $\hat g$ is a likelihood ratio).  
+
+Solving for $N(x)$ gives
+
+$$
+N(x)(1-\beta) = \frac{\beta}{2}w(\theta)^2
+$$
+
+gives discounted entropy
 
 ```{math}
 :label: bhs_eta_formula
-\eta = \frac{\beta}{2(1-\beta)} w(\theta)^2.
+\eta = N(x) = \frac{\beta}{2(1-\beta)} w(\theta)^2.
 ```
 
 ```{code-cell} ipython3
@@ -927,6 +938,14 @@ We now solve the recursions {eq}`bhs_W_decomp_bellman`, {eq}`bhs_J_recursion`, a
 
 Substituting $w_{rw}(\theta) = -\sigma_\varepsilon / [(1-\beta)\theta]$ from {eq}`bhs_w_formulas` into {eq}`bhs_eta_formula` gives
 
+$$
+N(x) = \frac{\beta}{2(1-\beta)} w_{rw}(\theta)^2
+  = \frac{\beta}{2(1-\beta)} \left(\frac{-\sigma_\varepsilon}{(1-\beta)\theta}\right)^2
+  = \frac{\beta}{2(1-\beta)} \cdot \frac{\sigma_\varepsilon^2}{(1-\beta)^2\theta^2}
+$$
+
+so that
+
 ```{math}
 :label: bhs_N_rw
 N(x) = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^3\theta^2}.
@@ -1242,8 +1261,8 @@ The right panel reveals the cumulative consequence: a per-period shift that is v
 ---
 mystnb:
   figure:
-    caption: Small one-step density shift (left) produces large cumulative
-      consumption gap (right) at detection-error probability $p = 0.03$ with $T = 240$ quarters
+    caption: Small one-step density shift (left) produces large cumulative consumption
+      gap (right) at detection-error probability $p = 0.03$ with $T = 240$ quarters
     name: fig-bhs-fear
 ---
 p_star = 0.03
@@ -1263,7 +1282,7 @@ ax1.plot(ε, f0, 'k', lw=2.5,
          label=r'Approximating $\mathcal{N}(0, 1)$')
 ax1.fill_between(ε, fw, alpha=0.15, color='C3')
 ax1.plot(ε, fw, 'C3', lw=2, ls='--',
-         label=f'Worst case $\mathcal{{N}}({w_star:.2f},1)$')
+         label=fr'Worst case $\mathcal{{N}}({w_star:.2f},1)$')
 
 peak = norm.pdf(0, 0, 1)
 ax1.annotate('', xy=(w_star, 0.55 * peak), xytext=(0, 0.55 * peak),
@@ -1344,8 +1363,8 @@ What looks like extreme risk aversion ($\gamma \approx 34$) is really just log u
 ---
 mystnb:
   figure:
-    caption: "Doubts or variability? Decomposition of the robust SDF into
-      log-utility IMRS and worst-case distortion at $p = 0.10$"
+    caption: Doubts or variability? Decomposition of the robust SDF into log-utility
+      IMRS and worst-case distortion at $p = 0.10$
     name: fig-bhs-sdf-decomp
 ---
 θ_cal = θ_from_detection_probability(0.10, "rw")
@@ -1828,8 +1847,8 @@ A comparison of the two panels reveals that the random-walk model generates much
 ---
 mystnb:
   figure:
-    caption: Type II compensation across detection-error probability and
-      consumption volatility
+    caption: Type II compensation across detection-error probability and consumption
+      volatility
     name: fig-bhs-contour
 ---
 p_grid = np.linspace(0.02, 0.49, 300)

From cc2518b0bc8551a0545f868b1be746fe40cc12e3 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 18:59:16 +1100
Subject: [PATCH 26/37] updates

---
 lectures/doubts_or_variability.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 27fc62b5d..eac59e5cf 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -896,23 +896,23 @@ def w_from_θ(θ, model):
 
 ### Discounted entropy
 
-When the approximating and worst-case conditional densities are $\mathcal{N}(0,1)$ and $\mathcal{N}(w,1)$, the likelihood ratio is $\hat g(\varepsilon) = \exp(w\varepsilon - \frac{1}{2}w^2)$, so $\log \hat g(\varepsilon) = w\varepsilon - \frac{1}{2}w^2$.  
+When the approximating and worst-case conditional densities are $\mathcal{N}(0,1)$ and $\mathcal{N}(w(\theta),1)$, the likelihood ratio is $\hat g(\varepsilon) = \exp(w(\theta)\varepsilon - \frac{1}{2}w(\theta)^2)$, so $\log \hat g(\varepsilon) = w(\theta)\varepsilon - \frac{1}{2}w(\theta)^2$.
 
-Under the worst-case measure $\varepsilon \sim \mathcal{N}(w,1)$, so $E_{\hat\pi}[\varepsilon] = w$, giving conditional relative entropy
+Under the worst-case measure $\varepsilon \sim \mathcal{N}(w(\theta),1)$, so $E_{\hat\pi}[\varepsilon] = w(\theta)$, giving conditional relative entropy
 
 ```{math}
 :label: bhs_conditional_entropy
-E_t[\hat g_{t+1}\log \hat g_{t+1}] = w \cdot w - \frac{1}{2}w^2 = \frac{1}{2}w(\theta)^2.
+E_t[\hat g_{t+1}\log \hat g_{t+1}] = w(\theta) \cdot w(\theta) - \frac{1}{2}w(\theta)^2 = \frac{1}{2}w(\theta)^2.
 ```
 
 Because the distortion is i.i.d., the conditional entropy $E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2$ from {eq}`bhs_conditional_entropy` is constant and $N(x)$ does not depend on $x$.  
 
-The recursion {eq}`bhs_N_recursion` then reduces to $N(x) = \beta(\frac{1}{2}w^2 + N(x))$, where we have used $\int \hat g(\varepsilon)\pi(\varepsilon)d\varepsilon = 1$ (since $\hat g$ is a likelihood ratio).  
+The recursion {eq}`bhs_N_recursion` then reduces to $N(x) = \beta(\frac{1}{2}w(\theta)^2 + N(x))$, where we have used $\int \hat g(\varepsilon)\pi(\varepsilon)d\varepsilon = 1$ (since $\hat g$ is a likelihood ratio).  
 
-Solving for $N(x)$ gives
+Solving for $N(x)$,
 
 $$
-N(x)(1-\beta) = \frac{\beta}{2}w(\theta)^2
+N(x)(1-\beta) = \frac{\beta}{2}w(\theta)^2,
 $$
 
 gives discounted entropy

From 1f109251246aad59225a8fa6e649c6783476fa24 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 19:04:23 +1100
Subject: [PATCH 27/37] update

---
 lectures/doubts_or_variability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index eac59e5cf..ea6aa795b 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -905,7 +905,7 @@ Under the worst-case measure $\varepsilon \sim \mathcal{N}(w(\theta),1)$, so $E_
 E_t[\hat g_{t+1}\log \hat g_{t+1}] = w(\theta) \cdot w(\theta) - \frac{1}{2}w(\theta)^2 = \frac{1}{2}w(\theta)^2.
 ```
 
-Because the distortion is i.i.d., the conditional entropy $E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2$ from {eq}`bhs_conditional_entropy` is constant and $N(x)$ does not depend on $x$.  
+Because the distortion is i.i.d., the conditional entropy $E_t[\hat g_{t+1}\log \hat g_{t+1}] = \frac{1}{2}w(\theta)^2$ from {eq}`bhs_conditional_entropy` is constant and $N(x)$ does not depend on $x$.
 
 The recursion {eq}`bhs_N_recursion` then reduces to $N(x) = \beta(\frac{1}{2}w(\theta)^2 + N(x))$, where we have used $\int \hat g(\varepsilon)\pi(\varepsilon)d\varepsilon = 1$ (since $\hat g$ is a likelihood ratio).  
 

From 4d3923cff8aa08ea6c74e9a33c23a225ab6e7ebb Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 19:06:26 +1100
Subject: [PATCH 28/37] update

---
 lectures/doubts_or_variability.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index ea6aa795b..30d8058b8 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -976,7 +976,7 @@ The reason is that $W$ includes the entropy "rebate" $\theta N$, which partially
 This seemingly small difference drives a factor-of-two wedge in the welfare calculations below.
 
 (detection_error_section)=
-## A new calibration language: detection-error probabilities
+## Detection-error probabilities
 
 So far we have expressed SDF moments, value functions, and worst-case distortions as functions of $\gamma$ (or equivalently $\theta$).
 
@@ -1144,7 +1144,7 @@ At our calibration $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{R
 
 Because $\rho = 0.98$ and $\beta = 0.995$, the ratio $(1-\beta)/(1-\rho\beta)$ is much less than one, so holding entropy fixed requires a substantially smaller $\theta$ (stronger robustness) for the trend-stationary model than for the random walk.
 
-## Detection probabilities unify the two models
+## Unify the two models using detection-error probabilities
 
 With this machinery in hand, we can redraw Tallarini's figure using detection-error probabilities as the common index.
 
@@ -1183,7 +1183,6 @@ mystnb:
     caption: Pricing loci from common detectability
     name: fig-bhs-3
 ---
-from scipy.optimize import brentq
 
 # Empirical Sharpe ratio — the minimum of the HJ bound curve
 sharpe = (r_e_mean - r_f_mean) / r_excess_std

From b1a3345940a8f81457f868dd8bb68a5a754ff4b9 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 19:23:43 +1100
Subject: [PATCH 29/37] updates

---
 lectures/doubts_or_variability.md | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 30d8058b8..e34473a49 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -728,7 +728,7 @@ $$
 \right],
 $$
 
-where $\theta \ge 0$ is the multiplier on the entropy constraint.
+where $\theta \geq 0$ is the multiplier on the entropy constraint.
 
 Collecting terms inside the expectation gives
 
@@ -973,8 +973,6 @@ Notice that the coefficient on $\sigma_\varepsilon^2/[(1-\beta)\theta]$ doubles
 
 The reason is that $W$ includes the entropy "rebate" $\theta N$, which partially offsets the pessimistic tilt, while $J$ evaluates consumption purely under the worst-case model with no such offset.
 
-This seemingly small difference drives a factor-of-two wedge in the welfare calculations below.
-
 (detection_error_section)=
 ## Detection-error probabilities
 
@@ -1482,7 +1480,7 @@ where the last step uses $\gamma = 1 + [(1-\beta)\theta]^{-1}$.
 
 Because $W \equiv U$, we have $c_0^{II} = c_0^I$ and the total compensation is the same.
 
-However, the interpretation differs because we can now decompose it into **risk** and **model uncertainty** components.
+However, the interpretation differs because we can now decompose it into *risk* and *model uncertainty* components.
 
 A type II agent with $\theta = \infty$ (no model uncertainty) has log preferences and requires
 
@@ -1649,13 +1647,14 @@ The "vs. risky path" rows use the risky-but-uncertainty-free comparison of {eq}`
 
 For the trend-stationary model, the denominators $(1-\beta)$ in the uncertainty terms are replaced by $(1-\beta\rho)$, and the risk terms involve $(1-\beta\rho^2)$:
 
-$$
+```{math}
+:label: bhs_ts_compensations
 \Delta c_0^{risk,ts} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho^2)},
 \qquad
 \Delta c_0^{unc,ts,II} = \frac{\beta\sigma_\varepsilon^2}{2(1-\beta\rho)^2\theta},
 \qquad
 \Delta c_0^{unc,ts,III} = \frac{\beta\sigma_\varepsilon^2}{(1-\beta\rho)^2\theta}.
-$$
+```
 
 The qualitative message carries over: the risk component is negligible, and the model-uncertainty component dominates.
 
@@ -2764,8 +2763,8 @@ For type II (multiplier) preferences under random-walk consumption growth, deriv
 
 In particular, derive
 
-1. the **risk** term by comparing the stochastic economy to a deterministic consumption path with the same mean level of consumption (Lucas's thought experiment), and
-2. the **uncertainty** term by comparing a type II agent with parameter $\theta$ to the expected-utility case $\theta=\infty$, holding the stochastic environment fixed.
+1. the *risk* term by comparing the stochastic economy to a deterministic consumption path with the same mean level of consumption (Lucas's thought experiment), and
+2. the *uncertainty* term by comparing a type II agent with parameter $\theta$ to the expected-utility case $\theta=\infty$, holding the stochastic environment fixed.
 ```
 
 ```{solution-start} dov_ex8
@@ -2780,7 +2779,8 @@ $$
 
 with $\varepsilon_j\stackrel{iid}{\sim}\mathcal{N}(0,1)$.
 
-**Risk term.**
+**Risk term:**
+
 The mean level of consumption is
 
 $$
@@ -2801,6 +2801,7 @@ $$
 \frac{c_0}{1-\beta} + \frac{\beta\mu}{(1-\beta)^2},
 
 $$
+
 while for the deterministic mean-level path it is
 
 $$
@@ -2810,6 +2811,7 @@ $$
 $$
 
 If we reduce initial consumption by $\Delta c_0^{risk}$ (so $\bar c_t$ shifts down by $\Delta c_0^{risk}$ for all $t$), utility falls by $\Delta c_0^{risk}/(1-\beta)$.
+
 Equating the two utilities gives
 
 $$
@@ -2820,8 +2822,10 @@ $$
 \Delta c_0^{risk}=\frac{\beta\sigma_\varepsilon^2}{2(1-\beta)}.
 $$
 
-**Uncertainty term.**
+**Uncertainty term:**
+
 For type II multiplier preferences, the minimizing distortion is a Gaussian mean shift with parameter $w$ and per-period relative entropy $\tfrac{1}{2}w^2$.
+
 Under the distorted model, $E[\varepsilon]=w$, so
 
 $$
@@ -2838,7 +2842,7 @@ J(w)
 \sum_{t\geq 0}\beta^{t+1}\theta\cdot\frac{w^2}{2}.
 $$
 
-Using $\sum_{t\ge0}\beta^t=1/(1-\beta)$ and $\sum_{t\ge0}t\beta^t=\beta/(1-\beta)^2$,
+Using $\sum_{t\geq0}\beta^t=1/(1-\beta)$ and $\sum_{t\geq0}t\beta^t=\beta/(1-\beta)^2$,
 
 $$
 J(w)
@@ -2987,7 +2991,7 @@ which matches {eq}`bhs_W_rw`.
 ```{exercise}
 :label: dov_ex10
 
-Derive the trend-stationary risk compensation stated in the lecture.
+Derive the trend-stationary risk compensation $\Delta c_0^{risk,ts}$ in {eq}`bhs_ts_compensations`.
 
 For the trend-stationary model with $\tilde c_{t+1} - \zeta = \rho(\tilde c_t - \zeta) + \sigma_\varepsilon\varepsilon_{t+1}$, where $\tilde c_t = c_t - \mu t$, compute the risk compensation $\Delta c_0^{risk,ts}$ by comparing expected log utility under the stochastic plan to the deterministic certainty-equivalent path, and show that
 
@@ -3043,7 +3047,7 @@ The uncertainty compensation follows from the value function: $\Delta c_0^{unc,t
 
 Derive the worst-case mean shifts {eq}`bhs_w_formulas` for both consumption models.
 
-Recall that the worst-case distortion {eq}`bhs_ghat` has $\hat g \propto \exp(-W(x_{t+1})/\theta)$.
+From {eq}`bhs_ghat`, $\hat g_{t+1} \propto \exp(-W(x_{t+1})/\theta)$.
 
 When $W$ is linear in the state, the exponent is linear in $\varepsilon_{t+1}$, and the Gaussian mean shift is $w = -\lambda/\theta$ where $\lambda$ is the coefficient on $\varepsilon_{t+1}$ in $W(x_{t+1})$.
 

From 9cd4afc49a72c8448e012d3d82581e986688ab26 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 19:48:31 +1100
Subject: [PATCH 30/37] updates

---
 lectures/doubts_or_variability.md | 316 +++++++++++++++---------------
 1 file changed, 156 insertions(+), 160 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index e34473a49..10395627b 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -870,7 +870,7 @@ From {eq}`bhs_ghat`, the worst-case distortion puts $\hat g \propto \exp(-W(x_{t
 
 If $W(x_{t+1})$ loads on $\varepsilon_{t+1}$ with coefficient $\lambda$, then the Gaussian mean shift is $w = -\lambda/\theta$.
 
-By guessing linear value functions and matching coefficients in the Bellman equation ({ref}`Exercise 11 <dov_ex11>` works out both cases), we obtain the worst-case mean shifts
+By guessing linear value functions and matching coefficients in the Bellman equation ({ref}`Exercise 6 <dov_ex6>` works out both cases), we obtain the worst-case mean shifts
 
 ```{math}
 :label: bhs_w_formulas
@@ -955,7 +955,7 @@ For $W$, we guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ for some constant $d$ an
 
 Under the random walk, $W(x_{t+1}) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d]$, so $-W(x_{t+1})/\theta$ is affine in the standard normal $\varepsilon_{t+1}$.
 
-Using the fact that $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$ for a normal random variable $Z$, the Bellman equation {eq}`bhs_bellman_type1` reduces to a constant-matching condition that pins down $d$ ({ref}`Exercise 9 <dov_ex9>` works through the algebra):
+Using the fact that $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$ for a normal random variable $Z$, the Bellman equation {eq}`bhs_bellman_type1` reduces to a constant-matching condition that pins down $d$ ({ref}`Exercise 7 <dov_ex7>` works through the algebra):
 
 ```{math}
 :label: bhs_W_rw
@@ -1024,7 +1024,7 @@ The **market price of model uncertainty** (MPU) is the conditional standard devi
 \approx |w(\theta)|.
 ```
 
-In the Gaussian mean-shift setting, $L_T$ is normal with mean $\pm \tfrac{1}{2}w^2T$ and variance $w^2T$, so the detection-error probability has the closed form ({ref}`Exercise 6 <dov_ex6>` derives this)
+In the Gaussian mean-shift setting, $L_T$ is normal with mean $\pm \tfrac{1}{2}w^2T$ and variance $w^2T$, so the detection-error probability has the closed form ({ref}`Exercise 8 <dov_ex8>` derives this)
 
 ```{math}
 :label: bhs_detection_formula
@@ -1128,7 +1128,7 @@ If we hold $\theta$ fixed when switching from a random walk to a trend-stationar
 
 Holding $\eta$ or $p$ fixed instead keeps the statistical difficulty of detecting misspecification constant.
 
-The explicit mapping that equates discounted entropy across models is ({ref}`Exercise 7 <dov_ex7>` derives it):
+The explicit mapping that equates discounted entropy across models is ({ref}`Exercise 9 <dov_ex9>` derives it):
 
 ```{math}
 :label: bhs_theta_cross_model
@@ -1985,8 +1985,6 @@ print(f"  T   = {len(diff_c)} quarters")
 ```{code-cell} ipython3
 p_fig6 = 0.20
 
-# Figure 6 overlays deterministic lines on the loaded consumption data.
-# Use sample-estimated RW moments to avoid data-vintage drift mismatches.
 rw_fig6 = dict(μ=μ_hat, σ_ε=σ_hat)
 w_fig6 = 2.0 * norm.ppf(p_fig6) / np.sqrt(T)
 
@@ -1997,8 +1995,6 @@ t6 = np.arange(T + 1)
 μ_approx = rw_fig6["μ"]
 μ_worst = rw_fig6["μ"] + rw_fig6["σ_ε"] * w_fig6
 
-# Match BHS Figure 6 visual construction by fitting intercepts separately
-# while holding the two drifts fixed.
 a_approx = (c - μ_approx * t6).mean()
 a_worst = (c - μ_worst * t6).mean()
 line_approx = a_approx + μ_approx * t6
@@ -2614,6 +2610,151 @@ $$
 ```{exercise}
 :label: dov_ex6
 
+Derive the worst-case mean shifts {eq}`bhs_w_formulas` for both consumption models.
+
+From {eq}`bhs_ghat`, $\hat g_{t+1} \propto \exp(-W(x_{t+1})/\theta)$.
+
+When $W$ is linear in the state, the exponent is linear in $\varepsilon_{t+1}$, and the Gaussian mean shift is $w = -\lambda/\theta$ where $\lambda$ is the coefficient on $\varepsilon_{t+1}$ in $W(x_{t+1})$.
+
+1. Random-walk model: Guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$. Using $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$, find $\lambda$ and show that $w = -\sigma_\varepsilon/[(1-\beta)\theta]$.
+
+2. Trend-stationary model: Write $z_t = \tilde c_t - \zeta$ and guess $W(x_t) = \frac{1}{1-\beta}[c_t + \alpha_1 z_t + \alpha_0]$. Show that:
+   - The coefficient on $\varepsilon_{t+1}$ in $W(x_{t+1})$ is $(1+\alpha_1)\sigma_\varepsilon/(1-\beta)$.
+   - Matching coefficients on $z_t$ in the Bellman equation gives $\alpha_1 = \beta(\rho-1)/(1-\beta\rho)$.
+   - Therefore $1+\alpha_1 = (1-\beta)/(1-\beta\rho)$ and $w = -\sigma_\varepsilon/[(1-\beta\rho)\theta]$.
+```
+
+```{solution-start} dov_ex6
+:class: dropdown
+```
+
+**Part 1.**
+Under the guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ and $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$,
+
+$$
+W(x_{t+1}) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d].
+$$
+
+The coefficient on $\varepsilon_{t+1}$ is $\lambda = \sigma_\varepsilon/(1-\beta)$, so $w = -\lambda/\theta = -\sigma_\varepsilon/[(1-\beta)\theta]$.
+
+**Part 2.**
+Under the guess $W(x_t) = \frac{1}{1-\beta}[c_t + \alpha_1 z_t + \alpha_0]$ with $c_{t+1} = c_t + \mu + (\rho-1)z_t + \sigma_\varepsilon\varepsilon_{t+1}$ and $z_{t+1} = \rho z_t + \sigma_\varepsilon\varepsilon_{t+1}$,
+
+$$
+W(x_{t+1}) = \tfrac{1}{1-\beta}\bigl[c_t + \mu + (\rho-1)z_t + \sigma_\varepsilon\varepsilon_{t+1} + \alpha_1(\rho z_t + \sigma_\varepsilon\varepsilon_{t+1}) + \alpha_0\bigr].
+$$
+
+The coefficient on $\varepsilon_{t+1}$ is $(1+\alpha_1)\sigma_\varepsilon/(1-\beta)$.
+
+To find $\alpha_1$, substitute the guess into the Bellman equation.
+
+The factors of $\frac{1}{1-\beta}$ cancel on both sides, and matching coefficients on $z_t$ gives
+
+$$
+\alpha_1 = \beta\bigl[(\rho-1) + \alpha_1\rho\bigr]
+\quad\Rightarrow\quad
+\alpha_1(1-\beta\rho) = \beta(\rho-1)
+\quad\Rightarrow\quad
+\alpha_1 = \frac{\beta(\rho-1)}{1-\beta\rho}.
+$$
+
+Therefore
+
+$$
+1+\alpha_1 = \frac{1-\beta\rho + \beta(\rho-1)}{1-\beta\rho} = \frac{1-\beta}{1-\beta\rho},
+$$
+
+and the coefficient on $\varepsilon_{t+1}$ becomes $(1+\alpha_1)\sigma_\varepsilon/(1-\beta) = \sigma_\varepsilon/(1-\beta\rho)$, giving $w = -\sigma_\varepsilon/[(1-\beta\rho)\theta]$.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex7
+
+Verify the closed-form value function {eq}`bhs_W_rw` for the random-walk model by substituting a guess of the form $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ into the risk-sensitive Bellman equation {eq}`bhs_bellman_type1`.
+
+1. Under the random walk $c_{t+1} = c_t + \mu + \sigma_\varepsilon \varepsilon_{t+1}$, show that $W(Ax_t + B\varepsilon) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d]$.
+2. Substitute into the $\log E\exp$ term, using the fact that for $Z \sim \mathcal{N}(\mu_Z, \sigma_Z^2)$ we have $\log E[\exp(Z)] = \mu_Z + \frac{1}{2}\sigma_Z^2$.
+3. Solve for $d$ and confirm that it matches {eq}`bhs_W_rw`.
+```
+
+```{solution-start} dov_ex7
+:class: dropdown
+```
+
+**Part 1.** Under the random walk, $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$. Substituting the guess $W(x) = \frac{1}{1-\beta}[Hx + d]$ with $Hx_t = c_t$:
+
+$$
+W(Ax_t + B\varepsilon_{t+1}) = \frac{1}{1-\beta}\bigl[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d\bigr].
+$$
+
+**Part 2.** The Bellman equation {eq}`bhs_bellman_type1` requires computing
+
+$$
+-\beta\theta\log E_t\left[\exp\left(\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}\right)\right].
+$$
+
+Substituting the guess:
+
+$$
+\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}
+=
+\frac{-1}{(1-\beta)\theta}\bigl[c_t + \mu + d + \sigma_\varepsilon\varepsilon_{t+1}\bigr].
+$$
+
+This is an affine function of the standard normal $\varepsilon_{t+1}$, so the argument of the $\log E\exp$ is normal with
+
+$$
+\mu_Z = \frac{-(c_t + \mu + d)}{(1-\beta)\theta},
+\qquad
+\sigma_Z^2 = \frac{\sigma_\varepsilon^2}{(1-\beta)^2\theta^2}.
+$$
+
+Using $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$:
+
+$$
+-\beta\theta\left[\frac{-(c_t + \mu + d)}{(1-\beta)\theta} + \frac{\sigma_\varepsilon^2}{2(1-\beta)^2\theta^2}\right]
+=
+\frac{\beta}{1-\beta}\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
+$$
+
+**Part 3.** The Bellman equation becomes
+
+$$
+\frac{1}{1-\beta}[c_t + d]
+=
+c_t + \frac{\beta}{1-\beta}\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
+$$
+
+Expanding the right-hand side:
+
+$$
+c_t + \frac{\beta c_t}{1-\beta} + \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}
+=
+\frac{c_t}{1-\beta} + \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
+$$
+
+Equating both sides and cancelling $\frac{c_t}{1-\beta}$:
+
+$$
+\frac{d}{1-\beta} = \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
+$$
+
+Solving: $d - \beta d = \beta\mu - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)\theta}$, so
+
+$$
+d = \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right),
+$$
+
+which matches {eq}`bhs_W_rw`.
+
+```{solution-end}
+```
+
+```{exercise}
+:label: dov_ex8
+
 In the Gaussian mean-shift setting of {ref}`Exercise 5 <dov_ex5>`, let $L_T$ be the log likelihood ratio between the worst-case and approximating models based on $T$ observations.
 
 1. Show that $L_T$ is normal under each model.
@@ -2621,7 +2762,7 @@ In the Gaussian mean-shift setting of {ref}`Exercise 5 <dov_ex5>`, let $L_T$ be
 3. Using the definition of detection-error probability in {eq}`bhs_detection_formula`, derive the closed-form expression {eq}`bhs_detection_closed`.
 ```
 
-```{solution-start} dov_ex6
+```{solution-start} dov_ex8
 :class: dropdown
 ```
 
@@ -2701,7 +2842,7 @@ which is {eq}`bhs_detection_closed`.
 ```
 
 ```{exercise}
-:label: dov_ex7
+:label: dov_ex9
 
 Using the formulas for $w(\theta)$ in {eq}`bhs_w_formulas` and the definition of discounted entropy
 
@@ -2714,7 +2855,7 @@ show that holding $\eta$ fixed across the random-walk and trend-stationary consu
 Specialize your result to the case $\sigma_\varepsilon^{\text{TS}} = \sigma_\varepsilon^{\text{RW}}$ and interpret the role of $\rho$.
 ```
 
-```{solution-start} dov_ex7
+```{solution-start} dov_ex9
 :class: dropdown
 ```
 
@@ -2757,7 +2898,7 @@ To hold entropy fixed, the trend-stationary model therefore requires a smaller $
 ```
 
 ```{exercise}
-:label: dov_ex8
+:label: dov_ex10
 
 For type II (multiplier) preferences under random-walk consumption growth, derive the compensating-variation formulas in {eq}`bhs_type2_rw_decomp`.
 
@@ -2767,7 +2908,7 @@ In particular, derive
 2. the *uncertainty* term by comparing a type II agent with parameter $\theta$ to the expected-utility case $\theta=\infty$, holding the stochastic environment fixed.
 ```
 
-```{solution-start} dov_ex8
+```{solution-start} dov_ex10
 :class: dropdown
 ```
 
@@ -2906,90 +3047,7 @@ Together these reproduce {eq}`bhs_type2_rw_decomp`.
 ```
 
 ```{exercise}
-:label: dov_ex9
-
-Verify the closed-form value function {eq}`bhs_W_rw` for the random-walk model by substituting a guess of the form $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ into the risk-sensitive Bellman equation {eq}`bhs_bellman_type1`.
-
-1. Under the random walk $c_{t+1} = c_t + \mu + \sigma_\varepsilon \varepsilon_{t+1}$, show that $W(Ax_t + B\varepsilon) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d]$.
-2. Substitute into the $\log E\exp$ term, using the fact that for $Z \sim \mathcal{N}(\mu_Z, \sigma_Z^2)$ we have $\log E[\exp(Z)] = \mu_Z + \frac{1}{2}\sigma_Z^2$.
-3. Solve for $d$ and confirm that it matches {eq}`bhs_W_rw`.
-```
-
-```{solution-start} dov_ex9
-:class: dropdown
-```
-
-**Part 1.** Under the random walk, $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$. Substituting the guess $W(x) = \frac{1}{1-\beta}[Hx + d]$ with $Hx_t = c_t$:
-
-$$
-W(Ax_t + B\varepsilon_{t+1}) = \frac{1}{1-\beta}\bigl[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d\bigr].
-$$
-
-**Part 2.** The Bellman equation {eq}`bhs_bellman_type1` requires computing
-
-$$
--\beta\theta\log E_t\left[\exp\left(\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}\right)\right].
-$$
-
-Substituting the guess:
-
-$$
-\frac{-W(Ax_t + B\varepsilon_{t+1})}{\theta}
-=
-\frac{-1}{(1-\beta)\theta}\bigl[c_t + \mu + d + \sigma_\varepsilon\varepsilon_{t+1}\bigr].
-$$
-
-This is an affine function of the standard normal $\varepsilon_{t+1}$, so the argument of the $\log E\exp$ is normal with
-
-$$
-\mu_Z = \frac{-(c_t + \mu + d)}{(1-\beta)\theta},
-\qquad
-\sigma_Z^2 = \frac{\sigma_\varepsilon^2}{(1-\beta)^2\theta^2}.
-$$
-
-Using $\log E[e^Z] = \mu_Z + \frac{1}{2}\sigma_Z^2$:
-
-$$
--\beta\theta\left[\frac{-(c_t + \mu + d)}{(1-\beta)\theta} + \frac{\sigma_\varepsilon^2}{2(1-\beta)^2\theta^2}\right]
-=
-\frac{\beta}{1-\beta}\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
-$$
-
-**Part 3.** The Bellman equation becomes
-
-$$
-\frac{1}{1-\beta}[c_t + d]
-=
-c_t + \frac{\beta}{1-\beta}\left[c_t + \mu + d - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right].
-$$
-
-Expanding the right-hand side:
-
-$$
-c_t + \frac{\beta c_t}{1-\beta} + \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}
-=
-\frac{c_t}{1-\beta} + \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
-$$
-
-Equating both sides and cancelling $\frac{c_t}{1-\beta}$:
-
-$$
-\frac{d}{1-\beta} = \frac{\beta(\mu + d)}{1-\beta} - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)^2\theta}.
-$$
-
-Solving: $d - \beta d = \beta\mu - \frac{\beta\sigma_\varepsilon^2}{2(1-\beta)\theta}$, so
-
-$$
-d = \frac{\beta}{1-\beta}\left(\mu - \frac{\sigma_\varepsilon^2}{2(1-\beta)\theta}\right),
-$$
-
-which matches {eq}`bhs_W_rw`.
-
-```{solution-end}
-```
-
-```{exercise}
-:label: dov_ex10
+:label: dov_ex11
 
 Derive the trend-stationary risk compensation $\Delta c_0^{risk,ts}$ in {eq}`bhs_ts_compensations`.
 
@@ -3002,7 +3060,7 @@ $$
 *Hint:* You will need $\operatorname{Var}(z_t) = \sigma_\varepsilon^2(1 + \rho^2 + \cdots + \rho^{2(t-1)})$ and the formula $\sum_{t \geq 1}\beta^t \sum_{j=0}^{t-1}\rho^{2j} = \frac{\beta}{(1-\beta)(1-\beta\rho^2)}$.
 ```
 
-```{solution-start} dov_ex10
+```{solution-start} dov_ex11
 :class: dropdown
 ```
 
@@ -3041,65 +3099,3 @@ The uncertainty compensation follows from the value function: $\Delta c_0^{unc,t
 
 ```{solution-end}
 ```
-
-```{exercise}
-:label: dov_ex11
-
-Derive the worst-case mean shifts {eq}`bhs_w_formulas` for both consumption models.
-
-From {eq}`bhs_ghat`, $\hat g_{t+1} \propto \exp(-W(x_{t+1})/\theta)$.
-
-When $W$ is linear in the state, the exponent is linear in $\varepsilon_{t+1}$, and the Gaussian mean shift is $w = -\lambda/\theta$ where $\lambda$ is the coefficient on $\varepsilon_{t+1}$ in $W(x_{t+1})$.
-
-1. Random-walk model: Guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$. Using $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$, find $\lambda$ and show that $w = -\sigma_\varepsilon/[(1-\beta)\theta]$.
-
-2. Trend-stationary model: Write $z_t = \tilde c_t - \zeta$ and guess $W(x_t) = \frac{1}{1-\beta}[c_t + \alpha_1 z_t + \alpha_0]$. Show that:
-   - The coefficient on $\varepsilon_{t+1}$ in $W(x_{t+1})$ is $(1+\alpha_1)\sigma_\varepsilon/(1-\beta)$.
-   - Matching coefficients on $z_t$ in the Bellman equation gives $\alpha_1 = \beta(\rho-1)/(1-\beta\rho)$.
-   - Therefore $1+\alpha_1 = (1-\beta)/(1-\beta\rho)$ and $w = -\sigma_\varepsilon/[(1-\beta\rho)\theta]$.
-```
-
-```{solution-start} dov_ex11
-:class: dropdown
-```
-
-**Part 1.**
-Under the guess $W(x_t) = \frac{1}{1-\beta}[c_t + d]$ and $c_{t+1} = c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1}$,
-
-$$
-W(x_{t+1}) = \frac{1}{1-\beta}[c_t + \mu + \sigma_\varepsilon\varepsilon_{t+1} + d].
-$$
-
-The coefficient on $\varepsilon_{t+1}$ is $\lambda = \sigma_\varepsilon/(1-\beta)$, so $w = -\lambda/\theta = -\sigma_\varepsilon/[(1-\beta)\theta]$.
-
-**Part 2.**
-Under the guess $W(x_t) = \frac{1}{1-\beta}[c_t + \alpha_1 z_t + \alpha_0]$ with $c_{t+1} = c_t + \mu + (\rho-1)z_t + \sigma_\varepsilon\varepsilon_{t+1}$ and $z_{t+1} = \rho z_t + \sigma_\varepsilon\varepsilon_{t+1}$,
-
-$$
-W(x_{t+1}) = \tfrac{1}{1-\beta}\bigl[c_t + \mu + (\rho-1)z_t + \sigma_\varepsilon\varepsilon_{t+1} + \alpha_1(\rho z_t + \sigma_\varepsilon\varepsilon_{t+1}) + \alpha_0\bigr].
-$$
-
-The coefficient on $\varepsilon_{t+1}$ is $(1+\alpha_1)\sigma_\varepsilon/(1-\beta)$.
-
-To find $\alpha_1$, substitute the guess into the Bellman equation.
-
-The factors of $\frac{1}{1-\beta}$ cancel on both sides, and matching coefficients on $z_t$ gives
-
-$$
-\alpha_1 = \beta\bigl[(\rho-1) + \alpha_1\rho\bigr]
-\quad\Rightarrow\quad
-\alpha_1(1-\beta\rho) = \beta(\rho-1)
-\quad\Rightarrow\quad
-\alpha_1 = \frac{\beta(\rho-1)}{1-\beta\rho}.
-$$
-
-Therefore
-
-$$
-1+\alpha_1 = \frac{1-\beta\rho + \beta(\rho-1)}{1-\beta\rho} = \frac{1-\beta}{1-\beta\rho},
-$$
-
-and the coefficient on $\varepsilon_{t+1}$ becomes $(1+\alpha_1)\sigma_\varepsilon/(1-\beta) = \sigma_\varepsilon/(1-\beta\rho)$, giving $w = -\sigma_\varepsilon/[(1-\beta\rho)\theta]$.
-
-```{solution-end}
-```

From 26b6994160ba5b527f481a38fdbb8b87fd9e6fa1 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Thu, 12 Feb 2026 19:55:57 +1100
Subject: [PATCH 31/37] updates

---
 lectures/doubts_or_variability.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 10395627b..6a8d03168 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -45,7 +45,7 @@ Their answer, and the theme of this lecture, is that much of what looks like "ri
 
 The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max--min recursion in which the agent fears that the probability model governing consumption growth may be wrong.
 
-Under this reading, the parameter that looked like extreme risk aversion instead measures concern about **misspecification**.
+Under this reading, the parameter that looked like extreme risk aversion instead measures concern about *misspecification*.
 
 They show that modest amounts of model uncertainty can substitute for large amounts of risk aversion in terms of choices and effects on asset prices.
 
@@ -540,7 +540,7 @@ and the robustness parameter
 \theta = \frac{-1}{(1-\beta)(1-\gamma)}.
 ```
 
-Substituting into {eq}`bhs_type1_recursion` yields the **risk-sensitive recursion** ({ref}`Exercise 3 <dov_ex3>` asks you to verify this step)
+Substituting into {eq}`bhs_type1_recursion` yields the *risk-sensitive recursion* ({ref}`Exercise 3 <dov_ex3>` asks you to verify this step)
 
 ```{math}
 :label: bhs_risk_sensitive
@@ -1563,9 +1563,9 @@ The idea is to compare two situations with identical risky consumption for all d
 Specifically, we seek $c_0^{II}(u)$ that makes a type II agent indifferent between:
 
 1. Facing the stochastic plan under $\theta < \infty$ (fear of model misspecification), consuming $c_0$ at date zero.
-2. Facing the **same** stochastic plan under $\theta = \infty$ (no fear of misspecification), but consuming only $c_0^{II}(u) < c_0$ at date zero.
+2. Facing the *same* stochastic plan under $\theta = \infty$ (no fear of misspecification), but consuming only $c_0^{II}(u) < c_0$ at date zero.
 
-In both cases, continuation consumptions $c_t$ for $t \geq 1$ are generated by the random walk starting from the **same** $c_0$.
+In both cases, continuation consumptions $c_t$ for $t \geq 1$ are generated by the random walk starting from the *same* $c_0$.
 
 For the type II agent under $\theta < \infty$, the total value is $W(c_0)$ from {eq}`bhs_W_rw`.
 
@@ -1623,7 +1623,7 @@ An analogous calculation for a **type III** agent, using $J(c_0)$ from {eq}`bhs_
 c_0 - c_0^{III}(u) = \frac{\beta\sigma_\varepsilon^2}{(1-\beta)^3\theta} = \frac{\beta\sigma_\varepsilon^2(\gamma - 1)}{(1-\beta)^2},
 ```
 
-which is $\frac{1}{1-\beta}$ times the type III uncertainty compensation and **twice** the type II compensation {eq}`bhs_comp_type2u`, again reflecting the absence of the entropy rebate in $J$.
+which is $\frac{1}{1-\beta}$ times the type III uncertainty compensation and *twice* the type II compensation {eq}`bhs_comp_type2u`, again reflecting the absence of the entropy rebate in $J$.
 
 ### Summary of welfare compensations (random walk)
 
@@ -2069,7 +2069,7 @@ Robustness concerns persist despite long histories precisely because the low-fre
 
 ## Concluding remarks
 
-The title of this lecture poses a question: are large risk premia prices of **variability** (atemporal risk aversion) or prices of **doubts** (model uncertainty)?
+The title of this lecture poses a question: are large risk premia prices of *variability* (atemporal risk aversion) or prices of *doubts* (model uncertainty)?
 
 Asset-pricing data alone cannot settle the question, because the two interpretations are observationally equivalent.
 

From 95bd4c74e2c21806b05563b1d32692b47a2f795d Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Fri, 13 Feb 2026 11:17:57 +0800
Subject: [PATCH 32/37] Tom's Feb 13 edits of bhs lecture

---
 lectures/doubts_or_variability.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 6a8d03168..e34215eb9 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -31,29 +31,29 @@ kernelspec:
 > associated with occupations with high earnings risk, or in the revenues raised by
 > state-operated lotteries. It
 > would be good to have the equity premium resolved, but I think we need to look beyond high
-> estimates of risk aversion to do it.* -- Robert Lucas Jr., January 10, 2003
+> estimates of risk aversion to do it.* -- Robert E. Lucas Jr., {cite}`Lucas_2003`
 
 ## Overview
 
 {cite:t}`Tall2000` showed that a recursive preference specification could match the equity premium and the risk-free rate puzzle simultaneously.
 
-But matching required setting the risk-aversion coefficient $\gamma$ to around 50 for a random-walk consumption model and around 75 for a trend-stationary model, exactly the range that provoked Lucas's skepticism.
+But matching required setting the risk-aversion coefficient $\gamma$ to around 50 for a random-walk consumption model and around 75 for a trend-stationary model, exactly the range that provoked the  skepticism in the above quote from {cite:t}`Lucas_2003`.
 
 {cite:t}`BHS_2009` ask whether those large $\gamma$ values really measure aversion to atemporal risk, or whether they instead measure the agent's doubts about the underlying probability model.
 
 Their answer, and the theme of this lecture, is that much of what looks like "risk aversion" can be reinterpreted as **model uncertainty**.
 
-The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a max--min recursion in which the agent fears that the probability model governing consumption growth may be wrong.
+The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a another recursion that expresses an agent's concern  that the probability model governing consumption growth may be wrong.
 
-Under this reading, the parameter that looked like extreme risk aversion instead measures concern about *misspecification*.
+Under this reading, the parameter that indicates  extreme risk aversion in one interpretation of the recursion  instead indicates concerns about *misspecification* in another interpretation of the same recursion.
 
-They show that modest amounts of model uncertainty can substitute for large amounts of risk aversion in terms of choices and effects on asset prices.
+{cite:t}`BHS_2009` show that modest amounts of model uncertainty can substitute for large amounts of risk aversion in terms of choices and effects on asset prices.
 
 This reinterpretation changes the welfare question that asset prices answer.
 
-Do large risk premia measure the benefits from reducing well-understood aggregate fluctuations, or the benefits from reducing doubts about the underlying model?
+Do large risk premia measure the benefits from reducing well-understood aggregate fluctuations, or do they measure  benefits from reducing doubts about the  model describing consumption growth?
 
-We begin with the Hansen--Jagannathan bound, then specify the statistical environment, lay out four related preference specifications and the connections among them, and finally revisit Tallarini's calibration through the lens of detection-error probabilities.
+We begin with a {cite:t}`Hansen_Jagannathan_1991` bound, then specify the statistical environment, lay out four related preference specifications and the connections among them, and finally revisit Tallarini's calibration through the lens of detection-error probabilities.
 
 Along the way, we draw on ideas and techniques from
 
@@ -213,7 +213,7 @@ def hj_std_bound(E_m):
     return np.sqrt(np.maximum(var_lb, 0.0))
 ```
 
-### The puzzle
+### Two puzzles
 
 Reconciling formula {eq}`bhs_crra_sdf` with the market price of risk extracted from data on asset returns (like those in Table 1 below) requires a value of $\gamma$ so high that it provokes skepticism.
 

From 790a771ea9ca2bc8da3782e5de9079aa6f446bab Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Fri, 13 Feb 2026 19:19:24 +1100
Subject: [PATCH 33/37] Tom's Feb 13 second wave of edits of bhs lecture

---
 lectures/_static/quant-econ.bib   | 21 +++++++++++++++++++++
 lectures/doubts_or_variability.md | 27 +++++++++++++++++----------
 2 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index 5a2e26ca0..756d85a8b 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -1303,6 +1303,27 @@ @incollection{Bewley86
   address   = {Amsterdam},
   pages     = {27-102}
 }
+@article{jacobson1973optimal,
+  title={Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games},
+  author={Jacobson, David},
+  journal={IEEE Transactions on Automatic control},
+  volume={18},
+  number={2},
+  pages={124--131},
+  year={1973},
+  publisher={IEEE}
+}
+
+@article{hansen1995discounted,
+  title={Discounted linear exponential quadratic gaussian control},
+  author={Hansen, Lars Peter and Sargent, Thomas J},
+  journal={IEEE Transactions on Automatic control},
+  volume={40},
+  number={5},
+  pages={968--971},
+  year={1995},
+  publisher={IEEE}
+}
 
 @article{Tall2000,
   author  = {Tallarini, Thomas D},
diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index e34215eb9..f7402d7fc 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -45,7 +45,7 @@ Their answer, and the theme of this lecture, is that much of what looks like "ri
 
 The same recursion that defines Tallarini's risk-sensitive agent is observationally equivalent to a another recursion that expresses an agent's concern  that the probability model governing consumption growth may be wrong.
 
-Under this reading, the parameter that indicates  extreme risk aversion in one interpretation of the recursion  instead indicates concerns about *misspecification* in another interpretation of the same recursion.
+Under this reading, a  parameter value  that indicates  extreme risk aversion in one interpretation of the recursion  indicates concerns about *misspecification* in another interpretation of the same recursion.
 
 {cite:t}`BHS_2009` show that modest amounts of model uncertainty can substitute for large amounts of risk aversion in terms of choices and effects on asset prices.
 
@@ -191,7 +191,7 @@ In words, no asset's Sharpe ratio can exceed the market price of risk.
 
 The bound {eq}`bhs_hj_bound` is stated in conditional terms.
 
-There is also an unconditional counterpart that works with a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$:
+There is  an unconditional counterpart that involves a vector of $n$ gross returns $R_{t+1}$ (e.g., equity and risk-free) with unconditional mean $E(R)$ and covariance matrix $\Sigma_R$:
 
 ```{math}
 :label: bhs_hj_unconditional
@@ -219,7 +219,7 @@ Reconciling formula {eq}`bhs_crra_sdf` with the market price of risk extracted f
 
 This is the **equity premium puzzle**.
 
-But the puzzle has a second dimension.
+But high values of $\gamma$  bring another difficulty.  
 
 High values of $\gamma$ that deliver enough volatility $\sigma(m)$ also push $E(m)$, the reciprocal of the gross risk-free rate, too far down, away from the Hansen--Jagannathan bound.
 
@@ -229,9 +229,11 @@ This is the **risk-free rate puzzle** of {cite:t}`Weil_1989`.
 
 The figure below reproduces Tallarini's key diagnostic.
 
-We show it before developing the underlying theory because it motivates much of what follows.
+Because it motivates much of what follow, we show Tallarini's figure  before developing the underlying theory.
+
+
+Closed-form expressions for the Epstein--Zin SDF moments used in the plot are derived in {ref}`Exercise 2 <dov_ex2>`.
 
-The closed-form expressions for the Epstein--Zin SDF moments used in the plot are derived in {ref}`Exercise 2 <dov_ex2>`.
 
 The code below implements them alongside the corresponding CRRA moments.
 
@@ -265,7 +267,7 @@ def moments_crra_rw(γ):
     return E_m, mpr
 ```
 
-For each value of $\gamma \in \{1, 5, 10, \ldots, 51\}$, we plot the implied $(E(m),\sigma(m))$ pair for three specifications.
+For each value of $\gamma \in \{1, 5, 10, \ldots, 51\}$, we plot the implied $(E(m),\sigma(m))$ pair for three combinations of specifications of preferences and consumption growth processes.
 
 These are time-separable CRRA (crosses), Epstein--Zin preferences with random-walk consumption (circles), and Epstein--Zin preferences with trend-stationary consumption (pluses).
 
@@ -332,7 +334,7 @@ Instead, they reflect the agent's doubts about the probability model itself.
 
 ## The choice setting
 
-To make this reinterpretation precise, we first need to formalize the environment.
+To understand  their reinterpretation, we first need to describe their statistical models of consumption growth. 
 
 ### Shocks and consumption plans
 
@@ -432,6 +434,11 @@ print(f"std[r_e-r_f]={r_excess_std:.4f}")
 ### Overview of agents I, II, III, and IV
 
 We compare four preference specifications over consumption plans $C^\infty \in \mathcal{C}$.
+```{note}
+For  origins of the names **multipler** and **constraint**  preferences, see {cite:t}`HansenSargent2001`.
+The risk-sensitive preference specification used here comes from {cite:t}`hansen1995discounted`, which adjusts specifications used earlier by 
+{cite:t}`jacobson1973optimal`, {cite:t}`Whittle_1981`, and  {cite:t}`Whittle_1990` to accommodate discounting in a way that preserves time-invariant optimal decision rules. 
+```
 
 *Type I agent (Kreps--Porteus--Epstein--Zin--Tallarini)* with
 - a discount factor $\beta \in (0,1)$;
@@ -466,7 +473,7 @@ Types I and II turn out to be observationally equivalent in a strong sense, havi
 
 Types III and IV are equivalent in a weaker but still useful sense, delivering the same worst-case pricing implications as a type II agent for a given endowment process.
 
-We now formalize each agent type and develop the equivalences among them.
+We now formalize each agent type and describe relationships among them.
 
 For each type, we derive a Bellman equation that pins down the agent's value function and stochastic discount factor.
 
@@ -479,11 +486,11 @@ $$
 where $\hat g_{t+1}$ is a likelihood-ratio distortion that we will define in each case.
 
 
-Along the way, we introduce the likelihood-ratio distortion that enters the stochastic discount factor and develop the detection-error probability that will serve as our new calibration device.
+Along the way, we introduce the likelihood-ratio distortion that enters the stochastic discount factor and describe  detection-error probabilities that will serve as our new calibration tool.
 
 ### Type I: Kreps--Porteus--Epstein--Zin--Tallarini preferences
 
-The general Epstein--Zin--Weil specification combines current consumption with a certainty equivalent of future utility through a CES aggregator:
+The Epstein--Zin--Weil specification combines current consumption with a certainty equivalent of future utility through a CES aggregator:
 
 ```{math}
 :label: bhs_ez_general

From 7aa1fd8a7ea178535a0bc9ed00fa20e9d030fb18 Mon Sep 17 00:00:00 2001
From: thomassargent30 <ts43@nyu.edu>
Date: Sat, 14 Feb 2026 12:33:55 +0800
Subject: [PATCH 34/37] Tom's Feb 14 valentine day edit of bhs lecture

---
 lectures/doubts_or_variability.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index f7402d7fc..878daa096 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -1768,7 +1768,7 @@ The models are statistically close to the baseline, with detection-error probabi
 
 The consumer's caution against such alternatives accounts for the large certainty-equivalent gap in the left panel.
 
-## How large are the welfare gains from resolving model uncertainty?
+## Welfare gains from removing model uncertainty
 
 A type III (constraint-preference) agent evaluates the worst model inside an entropy ball of radius $\eta$.
 
@@ -1894,13 +1894,15 @@ plt.tight_layout()
 plt.show()
 ```
 
-## Why doesn't learning eliminate these fears?
+## Learning doesn't eliminate misspecification fears
 
-A natural objection arises: if the consumer has 235 quarters of data, why can't she learn the true drift well enough to dismiss the worst-case model?
+A reasonable question  arises: if the consumer has 235 quarters of data,  can't she learn enough to dismiss the worst-case model?
 
-The answer is that the drift is a low-frequency feature of the data, and low-frequency features are hard to pin down.
+The answer is no.
 
-Estimating the mean of a random walk to the precision needed to reject small but economically meaningful shifts requires far more data than estimating volatility.
+This is because   the drift is a low-frequency feature that is very  hard to pin down.
+
+Estimating the mean of a random walk to the precision needed to reject small but economically meaningful shifts requires far more data than estimating volatility precisely does.
 
 The following figure makes this point concrete.
 

From 78d461a3e8e8177c0593eeaa7a6e4ce79fa79eab Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 18 Feb 2026 10:38:43 +1100
Subject: [PATCH 35/37] updates

---
 lectures/_static/quant-econ.bib | 23 -----------------------
 1 file changed, 23 deletions(-)

diff --git a/lectures/_static/quant-econ.bib b/lectures/_static/quant-econ.bib
index a50e65843..756d85a8b 100644
--- a/lectures/_static/quant-econ.bib
+++ b/lectures/_static/quant-econ.bib
@@ -34,29 +34,6 @@ @incollection{frisch33
  year = {1933}
 }
 
-@incollection{slutsky:1927,
- address = {Moscow},
- author = {Slutsky, Eugen},
- booktitle = {Problems of Economic Conditions},
- date-added = {2021-02-16 14:44:03 -0600},
- date-modified = {2021-02-16 14:44:03 -0600},
- publisher = {The Conjuncture Institute},
- title = {The Summation of Random Causes as the Source of Cyclic Processes},
- volume = {3},
- year = {1927}
-}
-
-@incollection{frisch33,
- author = {Ragar Frisch},
- booktitle = {Economic Essays in Honour of Gustav Cassel},
- date-added = {2015-01-09 21:08:15 +0000},
- date-modified = {2015-01-09 21:08:15 +0000},
- pages = {171-205},
- publisher = {Allen and Unwin},
- title = {Propagation Problems and Impulse Problems in Dynamic Economics},
- year = {1933}
-}
-
 @article{harsanyi1968games,
   title={Games with Incomplete Information Played by ``{B}ayesian'' Players, {I}--{III} Part {II}. {B}ayesian Equilibrium Points},
   author={Harsanyi, John C.},

From f28628df561849bb635bf113fbde9b5edb03bd7d Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 18 Feb 2026 11:10:02 +1100
Subject: [PATCH 36/37] updates

---
 lectures/doubts_or_variability.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index 878daa096..bb127467d 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -26,12 +26,12 @@ kernelspec:
 :depth: 2
 ```
 
-> *No one has found risk aversion parameters of 50 or 100 in the diversification of
+> No one has found risk aversion parameters of 50 or 100 in the diversification of
 > individual portfolios, in the level of insurance deductibles, in the wage premiums
 > associated with occupations with high earnings risk, or in the revenues raised by
 > state-operated lotteries. It
 > would be good to have the equity premium resolved, but I think we need to look beyond high
-> estimates of risk aversion to do it.* -- Robert E. Lucas Jr., {cite}`Lucas_2003`
+> estimates of risk aversion to do it. -- Robert E. Lucas Jr., {cite}`Lucas_2003`
 
 ## Overview
 
@@ -2092,8 +2092,10 @@ Three features of the analysis support the robustness reading:
 
 1. Detection-error probabilities provide a more stable calibration language than $\gamma$. 
 
-The two consumption models that required very different $\gamma$ values to match the data yield nearly identical pricing implications when indexed by detectability.
+   - The two consumption models that required very different $\gamma$ values to match the data yield nearly identical pricing implications when indexed by detectability.
+
 2. The welfare gains implied by asset prices decompose overwhelmingly into a model-uncertainty component, with the pure risk component remaining small, consistent with Lucas's original finding.
+
 3. The drift distortions that drive pricing are small enough to hide inside standard-error bands, so finite-sample learning cannot eliminate the consumer's fears.
 
 Whether one ultimately prefers the risk or the uncertainty interpretation, the framework clarifies that the question is not about the size of risk premia but about the economic object those premia measure.

From f330136e1cd96b288c5db50a0e94eff12115ae63 Mon Sep 17 00:00:00 2001
From: Humphrey Yang <u6474961@anu.edu.au>
Date: Wed, 18 Feb 2026 11:14:43 +1100
Subject: [PATCH 37/37] updates

---
 lectures/doubts_or_variability.md | 50 ++++++++++++++++++++-----------
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/lectures/doubts_or_variability.md b/lectures/doubts_or_variability.md
index bb127467d..e5edbae28 100644
--- a/lectures/doubts_or_variability.md
+++ b/lectures/doubts_or_variability.md
@@ -281,13 +281,16 @@ mystnb:
 γ_grid = np.arange(1, 55, 5)
 
 Em_rw = np.array([moments_type1_rw(γ)[0] for γ in γ_grid])
-σ_m_rw = np.array([moments_type1_rw(γ)[0] * moments_type1_rw(γ)[1] for γ in γ_grid])
+σ_m_rw = np.array(
+    [moments_type1_rw(γ)[0] * moments_type1_rw(γ)[1] for γ in γ_grid])
 
 Em_ts = np.array([moments_type1_ts(γ)[0] for γ in γ_grid])
-σ_m_ts = np.array([moments_type1_ts(γ)[0] * moments_type1_ts(γ)[1] for γ in γ_grid])
+σ_m_ts = np.array(
+    [moments_type1_ts(γ)[0] * moments_type1_ts(γ)[1] for γ in γ_grid])
 
 Em_crra = np.array([moments_crra_rw(γ)[0] for γ in γ_grid])
-σ_m_crra = np.array([moments_crra_rw(γ)[0] * moments_crra_rw(γ)[1] for γ in γ_grid])
+σ_m_crra = np.array(
+    [moments_crra_rw(γ)[0] * moments_crra_rw(γ)[1] for γ in γ_grid])
 
 Em_grid = np.linspace(0.8, 1.01, 1000)
 HJ_std = np.array([hj_std_bound(x) for x in Em_grid])
@@ -1719,20 +1722,28 @@ fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
 # Panel A
 ax = axes[0]
-ax.fill_between(t, mean_base - std_base, mean_base + std_base, alpha=0.25, color="tab:blue")
-ax.plot(t, ce_risk_unc, lw=2, ls="--", color="black", label="certainty equivalent: risk + uncertainty")
-ax.plot(t, ce_risk, lw=2, color="tab:orange", label="certainty equivalent: risk only")
-ax.plot(t, mean_base, lw=2, color="tab:blue", label="approximating-model mean")
+ax.fill_between(t, mean_base - std_base, mean_base + std_base, 
+                alpha=0.25, color="tab:blue")
+ax.plot(t, ce_risk_unc, lw=2, ls="--", color="black", 
+                label="certainty equivalent: risk + uncertainty")
+ax.plot(t, ce_risk, lw=2, color="tab:orange", 
+                label="certainty equivalent: risk only")
+ax.plot(t, mean_base, lw=2, 
+                color="tab:blue", label="approximating-model mean")
 ax.set_xlabel("quarters")
 ax.set_ylabel("log consumption")
 ax.legend(frameon=False, fontsize=8, loc="upper left")
 
 # Panel B
 ax = axes[1]
-ax.fill_between(t, mean_base - std_base, mean_base + std_base, alpha=0.20, color="tab:blue")
-ax.fill_between(t, mean_low - std_base, mean_low + std_base, alpha=0.20, color="tab:red")
-ax.fill_between(t, mean_high - std_base, mean_high + std_base, alpha=0.20, color="tab:green")
-ax.plot(t, ce_risk_unc, lw=2, ls="--", color="black", label="certainty equivalent: risk + uncertainty")
+ax.fill_between(t, mean_base - std_base, mean_base + std_base, 
+                            alpha=0.20, color="tab:blue")
+ax.fill_between(t, mean_low - std_base, mean_low + std_base, 
+                            alpha=0.20, color="tab:red")
+ax.fill_between(t, mean_high - std_base, mean_high + std_base, 
+                            alpha=0.20, color="tab:green")
+ax.plot(t, ce_risk_unc, lw=2, ls="--", color="black", 
+                        label="certainty equivalent: risk + uncertainty")
 ax.plot(t, mean_base, lw=2, color="tab:blue", label="approximating-model mean")
 ax.plot(t, mean_low, lw=2, color="tab:red", label="worst-case-leaning mean")
 ax.plot(t, mean_high, lw=2, color="tab:green", label="best-case-leaning mean")
@@ -1806,7 +1817,7 @@ gain_ts = np.where(
 gain_rw_pct = 100.0 * (np.exp(gain_rw) - 1.0)
 gain_ts_pct = 100.0 * (np.exp(gain_ts) - 1.0)
 
-# Detection error probabilities implied by η (common across RW/TS for the Gaussian mean-shift case)
+# Detection error probabilities implied by η
 p_eta_pct = 100.0 * norm.cdf(-0.5 * w_abs_grid * np.sqrt(T))
 order = np.argsort(p_eta_pct)
 p_plot = p_eta_pct[order]
@@ -1951,7 +1962,7 @@ nom_sv = _read_fred_series("PCESV", start_date, end_date)       # quarterly, 194
 defl = _read_fred_series("DPCERD3Q086SBEA", start_date, end_date)  # quarterly, 1947–
 pop_m = _read_fred_series("CNP16OV", start_date, end_date)      # monthly, 1948–
 
-# Step 1: add nominal nondurables + services (nominal $ are additive)
+# Step 1: add nominal nondurables + services
 nom_total = nom_nd + nom_sv
 
 # Step 2: deflate by PCE implicit price deflator (index 2017=100)
@@ -1965,11 +1976,14 @@ real_pc = (real_total / pop_q).dropna()
 real_pc = real_pc.loc["1948-01-01":"2006-12-31"].dropna()
 
 if real_pc.empty:
-    raise RuntimeError("FRED returned no usable observations after alignment/filtering")
+    raise RuntimeError(
+        "FRED returned no usable observations after alignment/filtering")
 
 # Step 4: log consumption
 log_c_data = np.log(real_pc.to_numpy(dtype=float).reshape(-1))
-years_data = (real_pc.index.year + (real_pc.index.month - 1) / 12.0).to_numpy(dtype=float)
+years_data = (
+    real_pc.index.year 
+    + (real_pc.index.month - 1) / 12.0).to_numpy(dtype=float)
 
 print(f"Fetched {len(log_c_data)} quarterly observations from FRED")
 print(f"Sample: {years_data[0]:.1f} – {years_data[-1] + 0.25:.1f}")
@@ -2029,7 +2043,8 @@ fig, axes = plt.subplots(1, 2, figsize=(12, 4))
 
 ax = axes[0]
 ax.plot(years, c, lw=2, color="tab:blue", label="log consumption")
-ax.plot(years, line_approx, lw=2, ls="--", color="black", label="approximating model")
+ax.plot(years, line_approx, lw=2, ls="--", 
+            color="black", label="approximating model")
 ax.plot(
     years,
     line_worst,
@@ -2051,7 +2066,8 @@ ax.plot(
     label=r"$\mu + \sigma_\varepsilon w(\theta)$",
 )
 ax.axhline(1_000.0 * rw_fig6["μ"], lw=2, color="black", label=r"$\hat\mu$")
-ax.axhline(1_000.0 * upper_band, lw=2, ls="--", color="gray", label=r"$\hat\mu \pm 2\hat s.e.$")
+ax.axhline(1_000.0 * upper_band, lw=2, ls="--", 
+                    color="gray", label=r"$\hat\mu \pm 2\hat s.e.$")
 ax.axhline(1_000.0 * lower_band, lw=2, ls="--", color="gray")
 ax.set_xlabel("detection error probability (percent)")
 ax.set_ylabel(r"mean consumption growth ($\times 10^{-3}$)")