Skip to content

Local reproducibility snapshots for researchers using the Python package #320

@MaxGhenis

Description

@MaxGhenis

Context

At the 2026-04-21 meeting with Lars Vilhuber (transcript lines 65-75), the topic of local-copy and caching guidance for researchers using the Python package came up as a piece of reproducibility hygiene distinct from TRACE. Lars's framing: researchers running a simulation on their laptop should be able to preserve exactly what they ran (model + data + reform + environment), so that if they come back to it six months later, they can still reproduce their own work.

This is the non-TRACE version-identification workstream Casper spoke about separately (transcript 415-417): TRACE is for citations a reader cannot rerun; local version-identification is for the researcher themselves.

Policyengine-app#2832 implements the webapp-side version badge. This issue is the Python-package-side equivalent: help a researcher running policyengine locally keep a reproducible record of each run.

What to build

  1. A policyengine CLI command or helper that snapshots everything needed to reproduce a specific local run to a single directory:

    • Pinned package versions (pip freeze subset for pe.py + country + country-data)
    • The reform JSON (if any)
    • The h5 content hash (already in the release manifest)
    • The simulation output (results + optional per-household frame)
    • A short README documenting how to reproduce with the exact install line
  2. Documentation in household-api-docs showing researchers how to use this — distinct from the TRACE emission flow. The distinction matters because TRACE targets citation durability; local snapshots target "can I get back to my own work?"

  3. Default-on behavior for anyone using policyengine.calculate_household or policyengine.simulate via the Python API. A subdirectory under the working directory should be created automatically unless the user opts out. The cost of an extra megabyte of disk is worth the reproducibility gain.

Non-goals

  • Not TRACE. Local snapshots are not signed, not institutionally attested, not meant to serve as paper citations. They are researcher-local cache.
  • Not preservation-grade storage. Researchers responsible for their own backups.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions