Modal pipeline broken: HF artifacts don't match #538's expected filenames

## Problem

PR #538 changes the filenames and adds new artifacts that `download_calibration_inputs` expects from HuggingFace, but the HF repo still has the old artifacts. Modal workers download from HF, fail to find the expected files, and crash.

### What #538 expects vs what's on HF

| File expected by #538 | On HF? | Notes |
|---|---|---|
| `calibration/source_imputed_stratified_extended_cps.h5` | **No** | Only `stratified_extended_cps.h5` exists (old name) |
| `calibration/calibration_weights.npy` | **No** | Only `w_district_calibration.npy` exists (old name) |
| `calibration/stacked_blocks.npy` | **No** | New artifact from unified calibration |
| `calibration/geo_labels.json` | **No** | New artifact from unified calibration |
| `calibration/stacked_takeup.npz` | **No** | New artifact from `compute_stacked_takeup()` |

The first two are **required** downloads — the pipeline cannot proceed without them. The last three are optional but needed for correct takeup values and geography assignment.

### Root cause

#538 changes the contract between the calibration step and the Modal H5-building pipeline (new filenames, new artifacts), but the HF repo was never updated to match. This is a chicken-and-egg problem: the new artifacts are produced by the updated calibration pipeline, which needs to run first and upload results to HF before Modal can consume them.

### Additional concerns

1. **Modal image caching.** Modal aggressively caches container images. If the image was built from an older branch, workers run old code expecting old filenames — creating mismatches in either direction. The `branch` parameter does `git checkout` + `uv sync` inside the container, but stale dependencies or cached volume data can cause silent failures.

2. **No artifact versioning.** Weights, blocks, takeup, and dataset are all downloaded independently with no guarantee they came from the same calibration run. Updating one but not the others produces silent inconsistency.

3. **The `skip_download` workaround.** #538 added a `--skip-download` flag that lets you pre-push inputs to the Modal volume, but this requires someone to manually produce and upload the artifacts first — which is the local-only workflow described in #538's comments.

### Suggested fix

1. Run the full calibration pipeline locally (or on a single Modal worker) to produce the new artifacts
2. Upload them to HF under the new names (`source_imputed_stratified_extended_cps.h5`, `calibration_weights.npy`, `stacked_blocks.npy`, `geo_labels.json`, `stacked_takeup.npz`)
3. Consider keeping the old filenames as symlinks/copies during a transition period, or pinning artifact sets with a version/manifest so stale combinations are detected rather than silently used

Related: #538, #592, #594

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modal pipeline broken: HF artifacts don't match #538's expected filenames #599

Problem

What #538 expects vs what's on HF

Root cause

Additional concerns

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File expected by #538	On HF?	Notes
`calibration/source_imputed_stratified_extended_cps.h5`	No	Only `stratified_extended_cps.h5` exists (old name)
`calibration/calibration_weights.npy`	No	Only `w_district_calibration.npy` exists (old name)
`calibration/stacked_blocks.npy`	No	New artifact from unified calibration
`calibration/geo_labels.json`	No	New artifact from unified calibration
`calibration/stacked_takeup.npz`	No	New artifact from `compute_stacked_takeup()`

Modal pipeline broken: HF artifacts don't match #538's expected filenames #599

Description

Problem

What #538 expects vs what's on HF

Root cause

Additional concerns

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions