Skip to content

Modal pipeline broken: HF artifacts don't match #538's expected filenames #599

@juaristi22

Description

@juaristi22

Problem

PR #538 changes the filenames and adds new artifacts that download_calibration_inputs expects from HuggingFace, but the HF repo still has the old artifacts. Modal workers download from HF, fail to find the expected files, and crash.

What #538 expects vs what's on HF

File expected by #538 On HF? Notes
calibration/source_imputed_stratified_extended_cps.h5 No Only stratified_extended_cps.h5 exists (old name)
calibration/calibration_weights.npy No Only w_district_calibration.npy exists (old name)
calibration/stacked_blocks.npy No New artifact from unified calibration
calibration/geo_labels.json No New artifact from unified calibration
calibration/stacked_takeup.npz No New artifact from compute_stacked_takeup()

The first two are required downloads — the pipeline cannot proceed without them. The last three are optional but needed for correct takeup values and geography assignment.

Root cause

#538 changes the contract between the calibration step and the Modal H5-building pipeline (new filenames, new artifacts), but the HF repo was never updated to match. This is a chicken-and-egg problem: the new artifacts are produced by the updated calibration pipeline, which needs to run first and upload results to HF before Modal can consume them.

Additional concerns

  1. Modal image caching. Modal aggressively caches container images. If the image was built from an older branch, workers run old code expecting old filenames — creating mismatches in either direction. The branch parameter does git checkout + uv sync inside the container, but stale dependencies or cached volume data can cause silent failures.

  2. No artifact versioning. Weights, blocks, takeup, and dataset are all downloaded independently with no guarantee they came from the same calibration run. Updating one but not the others produces silent inconsistency.

  3. The skip_download workaround. Add calibration package checkpointing, target config, and hyperparameter CLI #538 added a --skip-download flag that lets you pre-push inputs to the Modal volume, but this requires someone to manually produce and upload the artifacts first — which is the local-only workflow described in Add calibration package checkpointing, target config, and hyperparameter CLI #538's comments.

Suggested fix

  1. Run the full calibration pipeline locally (or on a single Modal worker) to produce the new artifacts
  2. Upload them to HF under the new names (source_imputed_stratified_extended_cps.h5, calibration_weights.npy, stacked_blocks.npy, geo_labels.json, stacked_takeup.npz)
  3. Consider keeping the old filenames as symlinks/copies during a transition period, or pinning artifact sets with a version/manifest so stale combinations are detected rather than silently used

Related: #538, #592, #594

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions