Pedagogy: add 15_optimization/038 — optimization on a histogram surrogate (classification example)

## Summary

Build `15_optimization/038_optimization_on_histogram_surrogate.ipynb` as a companion to the existing `030_Classification_Optimization.ipynb`. The new notebook teaches a general numerical-methods pattern: when iterative parameter optimization is expensive over $N$ raw data points, replace the per-point sum in the cost function with a per-cell sum over a multi-dimensional histogram of the feature space, optimize on the surrogate, then verify the final result on the full dataset.

The pattern was used in Ch. 6 of:

> Lee, K. (2004). *Longitudinal Driver Model and Collision Warning and Avoidance Algorithms Based on Human Driving Databases.* Ph.D. dissertation, Mechanical Engineering, University of Michigan.

There, two 5D histograms reduced 3.4 M data points (~400 MB) to ~1.9 MB while approximating the confusion matrix used by `fmincon` with 1000 random initial conditions. The same pattern transplants cleanly to a 2D classification example for teaching.

## Pedagogical arc

- [ ] Motivate the problem: iterative optimization (multi-start, hyperparameter sweeps) makes the per-point cost-function scan the bottleneck
- [ ] Build per-class histograms from a 2D training set (two overlapping Gaussians, ~$10^6$ points) — visualize side-by-side heatmaps analogous to the dissertation's Fig. 6.2 "Threatening" / "Safe" panels
- [ ] Surrogate confusion matrix: $\widehat{TP}, \widehat{FP}, \widehat{TN}, \widehat{FN}$ as weighted sums over cell centroids (the algebra of Eqs. 38–39 in Ch. 6, restated for classification)
- [ ] Train a logistic regression model two ways — on raw data vs on the surrogate — and overlay the decision boundaries
- [ ] Bias-vs-speed tradeoff via an `ipywidgets` slider on grid resolution $n \times n$, with a live Pareto plot
- [ ] Verify-on-truth step: re-evaluate the surrogate-trained model point-by-point on the full dataset; tabulate the confusion matrix both ways
- [ ] Train / evaluation split for overfitting check (mirrors Ch. 6 §6.1's split-half analysis)

## Scope decisions

- [ ] Classifier: logistic regression (clean weighted log-loss on the surrogate) — likely. SVM is more visual but heavier.
- [ ] Grid: uniform for the first notebook. Log-spaced (per dimension) and quantile-spaced are mentioned as alternatives but not implemented in 038.
- [ ] Cost function: choose one of (a) the geometric mean of TP and precision, mirroring Ch. 6 Eq. 40, or (b) sklearn-style weighted log-loss. Signpost the other in the closing.

## Curricular hooks

- Bridges `20_probability/` histograms (per-class joint densities) and `15_optimization/` (optimization on a surrogate)
- Direct follow-on to `030_Classification_Optimization.ipynb`
- Train-on-surrogate, verify-on-truth pattern is a general engineering-numerical-methods lesson reusable elsewhere

## Out of scope (parked as future threads)

- [ ] Adaptive $C$-groups (k-means coreset) — extends the rectangular grid to data-adaptive cells, the move that escapes the curse of dimensionality. Would live as a sibling notebook (~`039`) once 038 lands.
- [ ] Soft-assignment / Gaussian-mixture surrogate
- [ ] Boundary-aware adaptive refinement (cluster misclassified points after a first pass)
- [ ] Sufficient-statistics-per-cell extension (Bradley, Fayyad, Reina 1998), the high-D successor to histograms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pedagogy: add 15_optimization/038 — optimization on a histogram surrogate (classification example) #425

Summary

Pedagogical arc

Scope decisions

Curricular hooks

Out of scope (parked as future threads)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pedagogy: add 15_optimization/038 — optimization on a histogram surrogate (classification example) #425

Description

Summary

Pedagogical arc

Scope decisions

Curricular hooks

Out of scope (parked as future threads)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions