From 3437fb668625b45841628a0fb31ec22aaa617655 Mon Sep 17 00:00:00 2001 From: Anthony Volk Date: Mon, 16 Mar 2026 19:41:13 +0100 Subject: [PATCH 1/3] docs: Add missing documentation for advanced outputs, economic impact analysis, and regions Closes #255 - Add economic-impact-analysis.md: full baseline-vs-reform workflow, ChangeAggregate vs pipeline - Add advanced-outputs.md: DecileImpact, IntraDecileImpact, Poverty, Inequality, OutputCollection - Add regions-and-scoping.md: Region, RegionRegistry, scoping strategies, geographic impacts - Expand core-concepts.md: data loading, ensure() vs run(), Dynamic, simulation_modifier - Flesh out dev.md: setup, testing, CI, architecture overview - Update index.md and myst.yml TOC to include all pages (+ orphaned visualisation.md) - Align PR docs CI with production (MyST instead of Jupyter Book) - Add examples/us_budgetary_impact.py Co-Authored-By: Claude Opus 4.6 --- .github/workflows/pr_docs_changes.yaml | 33 ++- changelog.d/255.added | 1 + docs/advanced-outputs.md | 276 ++++++++++++++++++++++++ docs/core-concepts.md | 112 +++++++++- docs/dev.md | 99 ++++++++- docs/economic-impact-analysis.md | 287 +++++++++++++++++++++++++ docs/index.md | 11 +- docs/myst.yml | 8 +- docs/regions-and-scoping.md | 251 +++++++++++++++++++++ examples/us_budgetary_impact.py | 151 +++++++++++++ 10 files changed, 1198 insertions(+), 31 deletions(-) create mode 100644 changelog.d/255.added create mode 100644 docs/advanced-outputs.md create mode 100644 docs/economic-impact-analysis.md create mode 100644 docs/regions-and-scoping.md create mode 100644 examples/us_budgetary_impact.py diff --git a/.github/workflows/pr_docs_changes.yaml b/.github/workflows/pr_docs_changes.yaml index 3ede653c..2ef9a20d 100644 --- a/.github/workflows/pr_docs_changes.yaml +++ b/.github/workflows/pr_docs_changes.yaml @@ -13,24 +13,15 @@ on: jobs: Test: - runs-on: ubuntu-latest - name: Test documentation builds - steps: - - name: Checkout repo - uses: actions/checkout@v4 - - name: Install uv - uses: astral-sh/setup-uv@v5 - - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: '3.13' - - - name: Install package - run: uv pip install -e .[dev] --system - - name: Install policyengine - run: uv pip install policyengine --system - - name: Install JB - run: uv pip install "jupyter-book>=2.0.0a0" --system - - name: Test documentation builds - run: cd docs && jupyter book build \ No newline at end of file + runs-on: ubuntu-latest + name: Test documentation builds + steps: + - name: Checkout repo + uses: actions/checkout@v4 + - uses: actions/setup-node@v4 + with: + node-version: 18.x + - name: Install MyST + run: npm install -g mystmd + - name: Test documentation builds + run: cd docs && myst build --html diff --git a/changelog.d/255.added b/changelog.d/255.added new file mode 100644 index 00000000..ad96b4fd --- /dev/null +++ b/changelog.d/255.added @@ -0,0 +1 @@ +Added documentation for economic impact analysis, advanced outputs (DecileImpact, Poverty, Inequality, IntraDecileImpact), regions and scoping strategies, simulation lifecycle (ensure vs run), Dynamic class, data loading, and simulation modifiers. Added US budgetary impact example script. Fixed PR docs CI to use MyST matching production. diff --git a/docs/advanced-outputs.md b/docs/advanced-outputs.md new file mode 100644 index 00000000..5fdbaead --- /dev/null +++ b/docs/advanced-outputs.md @@ -0,0 +1,276 @@ +# Advanced outputs + +Beyond `Aggregate` and `ChangeAggregate` (covered in [Core concepts](core-concepts.md)), the package provides specialised output types for distributional analysis, poverty measurement, and inequality metrics. + +All output types follow the same pattern: create an instance, call `.run()`, read the result fields. Convenience functions are provided for common use cases. + +## OutputCollection + +Many convenience functions return an `OutputCollection[T]`, a container holding both the individual output objects and a pandas DataFrame: + +```python +from policyengine.core import OutputCollection + +# Returned by calculate_decile_impacts(), calculate_us_poverty_rates(), etc. +collection = calculate_us_poverty_rates(simulation) + +# Access individual objects +for poverty in collection.outputs: + print(f"{poverty.poverty_type}: {poverty.rate:.4f}") + +# Access as DataFrame +print(collection.dataframe) +``` + +## DecileImpact + +Calculates the impact of a policy reform on a single income decile: baseline and reform mean income, absolute and relative change, and counts of people better off, worse off, and unchanged. + +### Using the convenience function + +```python +from policyengine.outputs.decile_impact import calculate_decile_impacts + +decile_impacts = calculate_decile_impacts( + dataset=dataset, + tax_benefit_model_version=us_latest, + baseline_policy=None, # Current law + reform_policy=reform, + income_variable="household_net_income", # Default for US +) + +for d in decile_impacts.outputs: + print(f"Decile {d.decile}: " + f"baseline={d.baseline_mean:,.0f}, " + f"reform={d.reform_mean:,.0f}, " + f"change={d.absolute_change:+,.0f} " + f"({d.relative_change:+.2f}%)") +``` + +### Using directly + +```python +from policyengine.outputs.decile_impact import DecileImpact + +impact = DecileImpact( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + income_variable="household_net_income", + decile=5, # 5th decile +) +impact.run() + +print(f"Count better off: {impact.count_better_off:,.0f}") +print(f"Count worse off: {impact.count_worse_off:,.0f}") +``` + +### Parameters + +| Parameter | Default | Description | +|---|---|---| +| `income_variable` | `equiv_hbai_household_net_income` | Income variable to group by and measure changes | +| `decile_variable` | `None` | Use a pre-computed grouping variable instead of `qcut` | +| `entity` | Auto-detected | Entity level for the income variable | +| `quantiles` | `10` | Number of quantile groups (10 = deciles, 5 = quintiles) | + +For US simulations, use `income_variable="household_net_income"`. The UK default (`equiv_hbai_household_net_income`) is the equivalised HBAI measure. + +## IntraDecileImpact + +Classifies people within each decile into five income change categories: + +| Category | Threshold | +|---|---| +| Lose more than 5% | change <= -5% | +| Lose less than 5% | -5% < change <= -0.1% | +| No change | -0.1% < change <= 0.1% | +| Gain less than 5% | 0.1% < change <= 5% | +| Gain more than 5% | change > 5% | + +Proportions are people-weighted (using `household_count_people * household_weight`). + +### Using the convenience function + +```python +from policyengine.outputs.intra_decile_impact import compute_intra_decile_impacts + +intra = compute_intra_decile_impacts( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + income_variable="household_net_income", +) + +for row in intra.outputs: + if row.decile == 0: + label = "Overall" + else: + label = f"Decile {row.decile}" + print(f"{label}: " + f"lose>5%={row.lose_more_than_5pct:.2%}, " + f"lose<5%={row.lose_less_than_5pct:.2%}, " + f"no change={row.no_change:.2%}, " + f"gain<5%={row.gain_less_than_5pct:.2%}, " + f"gain>5%={row.gain_more_than_5pct:.2%}") +``` + +The function returns deciles 1-10 plus an overall average at `decile=0`. + +## Poverty + +Calculates poverty headcount and rates for a single simulation, with optional demographic filtering. + +### Poverty types + +**UK** (4 measures): +- Absolute before housing costs (BHC) +- Absolute after housing costs (AHC) +- Relative before housing costs (BHC) +- Relative after housing costs (AHC) + +**US** (2 measures): +- SPM poverty +- Deep SPM poverty (below 50% of SPM threshold) + +### Calculating all poverty rates + +```python +from policyengine.outputs.poverty import ( + calculate_uk_poverty_rates, + calculate_us_poverty_rates, +) + +# US +us_poverty = calculate_us_poverty_rates(simulation) +for p in us_poverty.outputs: + print(f"{p.poverty_type}: headcount={p.headcount:,.0f}, rate={p.rate:.4f}") + +# UK +uk_poverty = calculate_uk_poverty_rates(simulation) +for p in uk_poverty.outputs: + print(f"{p.poverty_type}: headcount={p.headcount:,.0f}, rate={p.rate:.4f}") +``` + +### Poverty by demographic group + +```python +from policyengine.outputs.poverty import ( + calculate_us_poverty_by_age, + calculate_us_poverty_by_gender, + calculate_us_poverty_by_race, + calculate_uk_poverty_by_age, + calculate_uk_poverty_by_gender, +) + +# By age group (child <18, adult 18-64, senior 65+) +by_age = calculate_us_poverty_by_age(simulation) +for p in by_age.outputs: + print(f"{p.filter_group} {p.poverty_type}: {p.rate:.4f}") + +# By gender +by_gender = calculate_us_poverty_by_gender(simulation) + +# By race (US only: WHITE, BLACK, HISPANIC, OTHER) +by_race = calculate_us_poverty_by_race(simulation) +``` + +### Custom filters + +```python +from policyengine.outputs.poverty import Poverty + +# Child poverty only +child_poverty = Poverty( + simulation=simulation, + poverty_variable="spm_unit_is_in_spm_poverty", + entity="person", + filter_variable="age", + filter_variable_leq=17, +) +child_poverty.run() +print(f"Child SPM poverty rate: {child_poverty.rate:.4f}") +``` + +### Result fields + +| Field | Description | +|---|---| +| `headcount` | Weighted count of people in poverty | +| `total_population` | Weighted total population (after filters) | +| `rate` | `headcount / total_population` | +| `filter_group` | Group label set by demographic convenience functions | + +## Inequality + +Calculates weighted inequality metrics for a single simulation: Gini coefficient and income share measures. + +### Using convenience functions + +```python +from policyengine.outputs.inequality import ( + calculate_uk_inequality, + calculate_us_inequality, +) + +# US (uses household_net_income by default) +ineq = calculate_us_inequality(simulation) +print(f"Gini: {ineq.gini:.4f}") +print(f"Top 10% share: {ineq.top_10_share:.4f}") +print(f"Top 1% share: {ineq.top_1_share:.4f}") +print(f"Bottom 50% share: {ineq.bottom_50_share:.4f}") + +# UK (uses equiv_hbai_household_net_income by default) +ineq = calculate_uk_inequality(simulation) +``` + +### With demographic filters + +```python +# Inequality among working-age adults only +ineq = calculate_us_inequality( + simulation, + filter_variable="age", + filter_variable_geq=18, + filter_variable_leq=64, +) +``` + +### Using directly + +```python +from policyengine.outputs.inequality import Inequality + +ineq = Inequality( + simulation=simulation, + income_variable="household_net_income", + entity="household", +) +ineq.run() +``` + +### Result fields + +| Field | Description | +|---|---| +| `gini` | Weighted Gini coefficient (0 = perfect equality, 1 = perfect inequality) | +| `top_10_share` | Share of total income held by top 10% | +| `top_1_share` | Share of total income held by top 1% | +| `bottom_50_share` | Share of total income held by bottom 50% | + +## Comparing baseline and reform + +Poverty and inequality are single-simulation outputs. To compare baseline and reform, compute both and take the difference: + +```python +baseline_poverty = calculate_us_poverty_rates(baseline_sim) +reform_poverty = calculate_us_poverty_rates(reform_sim) + +for bp, rp in zip(baseline_poverty.outputs, reform_poverty.outputs): + change = rp.rate - bp.rate + print(f"{bp.poverty_type}: {bp.rate:.4f} -> {rp.rate:.4f} ({change:+.4f})") + +baseline_ineq = calculate_us_inequality(baseline_sim) +reform_ineq = calculate_us_inequality(reform_sim) +print(f"Gini change: {reform_ineq.gini - baseline_ineq.gini:+.4f}") +``` + +The `economic_impact_analysis()` function does this automatically and returns both baseline and reform poverty/inequality in the `PolicyReformAnalysis` result. See [Economic impact analysis](economic-impact-analysis.md). diff --git a/docs/core-concepts.md b/docs/core-concepts.md index 52c3f290..8bbc2db9 100644 --- a/docs/core-concepts.md +++ b/docs/core-concepts.md @@ -117,6 +117,40 @@ dataset = PolicyEngineUKDataset( ) ``` +## Data loading + +Before running simulations, you need representative microdata. The package provides three functions for managing datasets: + +- **`ensure_datasets()`**: Load from disk if available, otherwise download and compute (recommended) +- **`create_datasets()`**: Always download from HuggingFace and compute from scratch +- **`load_datasets()`**: Load previously saved HDF5 files from disk + +```python +from policyengine.tax_benefit_models.us import ensure_datasets + +# First run: downloads from HuggingFace, computes variables, saves to ./data/ +# Subsequent runs: loads from disk instantly +datasets = ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], + years=[2026], + data_folder="./data", +) +dataset = datasets["enhanced_cps_2024_2026"] +``` + +```python +from policyengine.tax_benefit_models.uk import ensure_datasets + +datasets = ensure_datasets( + datasets=["hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"], + years=[2026], + data_folder="./data", +) +dataset = datasets["enhanced_frs_2023_24_2026"] +``` + +All datasets are stored as HDF5 files on disk. No database server is required. + ## Simulations Simulations apply tax-benefit models to datasets, calculating all variables for the specified year. @@ -141,6 +175,25 @@ output_household = simulation.output_dataset.data.household print(output_household[["household_id", "household_net_income", "household_tax"]]) ``` +### Simulation lifecycle: `run()` vs `ensure()` + +The `Simulation` class provides two methods for computing results: + +| Method | Behaviour | +|---|---| +| `simulation.run()` | Always recomputes from scratch. No caching. | +| `simulation.ensure()` | Checks in-memory LRU cache, then tries loading from disk, then falls back to `run()` + `save()`. | + +```python +# One-off computation (no caching) +simulation.run() + +# Cache-or-compute (preferred for production use) +simulation.ensure() +``` + +`ensure()` uses a module-level LRU cache (max 100 simulations) and saves output datasets as HDF5 files alongside the input dataset. On repeated calls, it returns cached results instantly. For baseline-vs-reform comparisons, `economic_impact_analysis()` calls `ensure()` internally, so you rarely need to call it yourself. + ### Accessing calculated variables After running a simulation, you can access the calculated variables from the output dataset: @@ -211,6 +264,56 @@ reform = Simulation( reform.run() ``` +### Combining policies + +Policies can be combined using the `+` operator: + +```python +combined = policy_a + policy_b +# Concatenates parameter_values and chains simulation_modifiers +``` + +### Simulation modifiers + +For reforms that cannot be expressed as parameter value changes, `Policy` accepts a `simulation_modifier` callable that directly manipulates the underlying `policyengine_core` simulation: + +```python +def my_modifier(sim): + """Custom reform logic applied to the core simulation object.""" + p = sim.tax_benefit_system.parameters + # Modify parameters programmatically + return sim + +policy = Policy( + name="Custom reform", + simulation_modifier=my_modifier, +) +``` + +Note: the UK model supports `simulation_modifier`. The US model currently only uses the `parameter_values` path. + +## Dynamic behavioural responses + +The `Dynamic` class is structurally identical to `Policy` and represents behavioural responses to policy changes (e.g., labour supply elasticities). It is applied after the policy in the simulation pipeline. + +```python +from policyengine.core.dynamic import Dynamic + +dynamic = Dynamic( + name="Labour supply response", + parameter_values=[...], # Same format as Policy +) + +simulation = Simulation( + dataset=dataset, + tax_benefit_model_version=uk_latest, + policy=policy, + dynamic=dynamic, +) +``` + +Dynamic responses can also be combined using the `+` operator and support `simulation_modifier` callables. + ## Outputs Output classes provide structured analysis of simulation results. @@ -549,8 +652,11 @@ See `examples/income_distribution_us.py` for: ## Next steps -- See `examples/` for complete working examples -- Review country-specific documentation: +- [Economic impact analysis](economic-impact-analysis.md): Full baseline-vs-reform comparison workflow +- [Advanced outputs](advanced-outputs.md): DecileImpact, Poverty, Inequality, IntraDecileImpact +- [Regions and scoping](regions-and-scoping.md): Sub-national analysis (states, constituencies, districts) +- Country-specific documentation: - [UK tax-benefit model](country-models-uk.md) - [US tax-benefit model](country-models-us.md) -- Explore the API reference for detailed class documentation +- [Visualisation](visualisation.md): Publication-ready charts +- See `examples/` for complete working scripts diff --git a/docs/dev.md b/docs/dev.md index accfa48c..5ae84682 100644 --- a/docs/dev.md +++ b/docs/dev.md @@ -1,8 +1,101 @@ -# Development principles +# Development -General principles for developing this package's codebase go here. +## Principles 1. **STRONG** preference for simplicity. Let's make this package as simple as it possibly can be. 2. Remember the goal of this package: to make it easy to create, run, save and analyse PolicyEngine simulations. When considering further features, always ask: can we instead *make it super easy* for people to do this outside the package? 3. Be consistent about property names. `name` = human readable few words you could put as the noun in a sentence without fail. `id` = unique identifier, ideally a UUID. `description` = longer human readable text that describes the object. `created_at` and `updated_at` = timestamps for when the object was created and last updated. -4. Constraints can be good. We should set constraints where they help us simplify the codebase and usage, but not where they unnecessarily block useful functionality. \ No newline at end of file +4. Constraints can be good. We should set constraints where they help us simplify the codebase and usage, but not where they unnecessarily block useful functionality. + +## Setup + +```bash +git clone https://github.com/PolicyEngine/policyengine.py.git +cd policyengine.py +uv pip install -e .[dev] +``` + +This installs both UK and US country models plus dev dependencies (pytest, ruff, mypy, towncrier). + +## Common commands + +```bash +make format # ruff format +make test # pytest with coverage +make docs # build documentation site +make clean # remove caches, build artifacts, .h5 files +``` + +## Testing + +Tests require a `HUGGING_FACE_TOKEN` environment variable for downloading datasets: + +```bash +export HUGGING_FACE_TOKEN=hf_... +make test +``` + +To run a specific test: + +```bash +pytest tests/test_models.py -v +pytest tests/test_parametric_reforms.py -k "test_uk" -v +``` + +## Linting and formatting + +```bash +ruff format . # format code +ruff check . # lint +mypy src/policyengine # type check (informational) +``` + +## CI pipeline + +PRs trigger the following checks: + +| Check | Status | Command | +|---|---|---| +| Lint + format | Required | `ruff check .` + `ruff format --check .` | +| Tests (Python 3.13) | Required | `make test` | +| Tests (Python 3.14) | Required | `make test` | +| Mypy | Informational | `mypy src/policyengine` | +| Docs build | Required | MyST build | + +## Versioning and releases + +This project uses [towncrier](https://towncrier.readthedocs.io/) for changelog management. When making a PR, add a changelog fragment: + +```bash +# Fragment types: breaking, added, changed, fixed, removed +echo "Description of change" > changelog.d/my-change.added +``` + +On merge, the versioning workflow bumps the version, builds the changelog, and creates a GitHub Release. + +## Architecture + +### Package layout + +``` +src/policyengine/ +├── core/ # Domain models (Simulation, Dataset, Policy, etc.) +├── tax_benefit_models/ +│ ├── uk/ # UK model, datasets, analysis, outputs +│ └── us/ # US model, datasets, analysis, outputs +├── outputs/ # Output templates (Aggregate, Poverty, etc.) +├── countries/ # Geographic region registries +└── utils/ # Helpers (reforms, entity mapping, plotting) +``` + +### Key design decisions + +**Pydantic everywhere**: All domain objects are Pydantic `BaseModel` subclasses. This gives us validation, serialisation, and clear field documentation. + +**HDF5 for storage**: Datasets and simulation outputs are stored as HDF5 files. No database server is required. The `MicroDataFrame` from the `microdf` package wraps pandas DataFrames with weight-aware `.sum()`, `.mean()`, `.count()`. + +**Country-specific model classes**: `PolicyEngineUSLatest` and `PolicyEngineUKLatest` each implement `run()`, `save()`, and `load()`. The US model passes reforms as a dict at `Microsimulation(reform=...)` construction time. The UK model supports both parametric reforms and `simulation_modifier` callables applied post-construction. + +**LRU cache + file caching**: `Simulation.ensure()` checks an in-process LRU cache (max 100 entries), then tries loading from disk, then falls back to `run()` + `save()`. + +**Output pattern**: All output types inherit from `Output`, implement `.run()`, and populate result fields. Convenience functions (e.g., `calculate_us_poverty_rates()`) create, run, and return collections of output objects. diff --git a/docs/economic-impact-analysis.md b/docs/economic-impact-analysis.md new file mode 100644 index 00000000..db782729 --- /dev/null +++ b/docs/economic-impact-analysis.md @@ -0,0 +1,287 @@ +# Economic impact analysis + +The `economic_impact_analysis()` function is the canonical way to compare a baseline simulation against a reform simulation. It produces a comprehensive `PolicyReformAnalysis` containing decile impacts, programme-by-programme statistics, poverty rates, and inequality metrics in a single call. + +## Overview + +There are two approaches to comparing simulations: + +| Approach | Use case | +|---|---| +| `ChangeAggregate` | Single-metric queries: "What is the total tax revenue change?" | +| `economic_impact_analysis()` | Full analysis: decile impacts, programme stats, poverty, inequality | + +`ChangeAggregate` gives you one number per call. `economic_impact_analysis()` runs ~30+ aggregate computations and returns a structured result containing everything. + +## Full analysis workflow + +### US example + +```python +import datetime +from policyengine.core import Parameter, ParameterValue, Policy, Simulation +from policyengine.tax_benefit_models.us import ( + economic_impact_analysis, + ensure_datasets, + us_latest, +) + +# 1. Load data +datasets = ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], + years=[2026], + data_folder="./data", +) +dataset = datasets["enhanced_cps_2024_2026"] + +# 2. Define reform +param = Parameter( + name="gov.irs.deductions.standard.amount.SINGLE", + tax_benefit_model_version=us_latest, +) +reform = Policy( + name="Double standard deduction (single)", + parameter_values=[ + ParameterValue( + parameter=param, + start_date=datetime.date(2026, 1, 1), + end_date=datetime.date(2026, 12, 31), + value=30_950, + ), + ], +) + +# 3. Create simulations (no need to call .run() — ensure() is called internally) +baseline_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, +) +reform_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + policy=reform, +) + +# 4. Run full analysis +analysis = economic_impact_analysis(baseline_sim, reform_sim) +``` + +### UK example + +```python +import datetime +from policyengine.core import Parameter, ParameterValue, Policy, Simulation +from policyengine.tax_benefit_models.uk import ( + economic_impact_analysis, + ensure_datasets, + uk_latest, +) + +datasets = ensure_datasets( + datasets=["hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"], + years=[2026], + data_folder="./data", +) +dataset = datasets["enhanced_frs_2023_24_2026"] + +param = Parameter( + name="gov.hmrc.income_tax.allowances.personal_allowance.amount", + tax_benefit_model_version=uk_latest, +) +reform = Policy( + name="Zero personal allowance", + parameter_values=[ + ParameterValue( + parameter=param, + start_date=datetime.date(2026, 1, 1), + end_date=datetime.date(2026, 12, 31), + value=0, + ), + ], +) + +baseline_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=uk_latest, +) +reform_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=uk_latest, + policy=reform, +) + +analysis = economic_impact_analysis(baseline_sim, reform_sim) +``` + +## What `economic_impact_analysis()` computes + +The function calls `ensure()` on both simulations (run + cache if not already computed), then produces: + +### Decile impacts + +Mean income changes by income decile (1-10), with counts of people better off, worse off, and unchanged. + +```python +for d in analysis.decile_impacts.outputs: + print(f"Decile {d.decile}: avg change={d.absolute_change:+.0f}, " + f"relative={d.relative_change:+.2f}%") +``` + +**Fields on each `DecileImpact`:** +- `decile`: 1-10 +- `baseline_mean`, `reform_mean`: Mean income before and after reform +- `absolute_change`: Mean absolute income change +- `relative_change`: Mean percentage income change +- `count_better_off`, `count_worse_off`, `count_no_change`: Weighted counts + +### Programme/program statistics + +Per-programme totals, changes, and winner/loser counts. + +**US programs analysed:** `income_tax`, `payroll_tax`, `state_income_tax`, `snap`, `tanf`, `ssi`, `social_security`, `medicare`, `medicaid`, `eitc`, `ctc` + +**UK programmes analysed:** `income_tax`, `national_insurance`, `vat`, `council_tax`, `universal_credit`, `child_benefit`, `pension_credit`, `income_support`, `working_tax_credit`, `child_tax_credit` + +```python +for p in analysis.program_statistics.outputs: # US + print(f"{p.program_name}: baseline=${p.baseline_total/1e9:.1f}B, " + f"reform=${p.reform_total/1e9:.1f}B, change=${p.change/1e9:+.1f}B") +``` + +**Fields on each `ProgramStatistics` / `ProgrammeStatistics`:** +- `program_name` / `programme_name`: Variable name +- `baseline_total`, `reform_total`: Weighted sums +- `change`: `reform_total - baseline_total` +- `baseline_count`, `reform_count`: Weighted recipient counts +- `winners`, `losers`: Weighted counts of people gaining/losing + +### Poverty rates + +Poverty headcount and rates for both baseline and reform simulations. + +**US poverty types:** SPM poverty, deep SPM poverty + +**UK poverty types:** Absolute BHC, absolute AHC, relative BHC, relative AHC + +```python +for bp, rp in zip(analysis.baseline_poverty.outputs, + analysis.reform_poverty.outputs): + print(f"{bp.poverty_type}: baseline={bp.rate:.4f}, reform={rp.rate:.4f}") +``` + +### Inequality metrics + +Gini coefficient and income share metrics for both simulations. + +```python +bi = analysis.baseline_inequality +ri = analysis.reform_inequality +print(f"Gini: baseline={bi.gini:.4f}, reform={ri.gini:.4f}") +print(f"Top 10% share: baseline={bi.top_10_share:.4f}, reform={ri.top_10_share:.4f}") +print(f"Top 1% share: baseline={bi.top_1_share:.4f}, reform={ri.top_1_share:.4f}") +print(f"Bottom 50% share: baseline={bi.bottom_50_share:.4f}, reform={ri.bottom_50_share:.4f}") +``` + +## The `PolicyReformAnalysis` return type + +```python +class PolicyReformAnalysis(BaseModel): + decile_impacts: OutputCollection[DecileImpact] + program_statistics: OutputCollection[ProgramStatistics] # US + # programme_statistics: OutputCollection[ProgrammeStatistics] # UK + baseline_poverty: OutputCollection[Poverty] + reform_poverty: OutputCollection[Poverty] + baseline_inequality: Inequality + reform_inequality: Inequality +``` + +Each `OutputCollection` contains: +- `outputs`: List of individual output objects +- `dataframe`: A pandas DataFrame with all results in tabular form + +## Using ChangeAggregate for targeted queries + +When you only need a single metric, `ChangeAggregate` is more direct than the full analysis pipeline. It requires that both simulations have already been run (or ensure'd). + +### Tax revenue change + +```python +from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType + +baseline_sim.run() +reform_sim.run() + +revenue = ChangeAggregate( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + variable="household_tax", + aggregate_type=ChangeAggregateType.SUM, +) +revenue.run() +print(f"Revenue change: ${revenue.result / 1e9:.1f}B") +``` + +### Winners and losers + +```python +winners = ChangeAggregate( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + variable="household_net_income", + aggregate_type=ChangeAggregateType.COUNT, + change_geq=1, # Gained at least $1 +) +winners.run() + +losers = ChangeAggregate( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + variable="household_net_income", + aggregate_type=ChangeAggregateType.COUNT, + change_leq=-1, # Lost at least $1 +) +losers.run() +``` + +### Filtering by income decile + +```python +# Average loss in the 3rd income decile +avg_loss = ChangeAggregate( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + variable="household_net_income", + aggregate_type=ChangeAggregateType.MEAN, + filter_variable="household_net_income", + quantile=10, + quantile_eq=3, +) +avg_loss.run() +``` + +### Filter options reference + +**Absolute change filters:** +- `change_geq`: Change >= value (e.g., gain >= 500) +- `change_leq`: Change <= value (e.g., loss <= -500) +- `change_eq`: Change == value + +**Relative change filters:** +- `relative_change_geq`: Relative change >= value (decimal, e.g., 0.05 = 5%) +- `relative_change_leq`: Relative change <= value +- `relative_change_eq`: Relative change == value + +**Variable filters:** +- `filter_variable`: Variable to filter on (from the baseline simulation) +- `filter_variable_eq`, `filter_variable_leq`, `filter_variable_geq`: Comparison operators + +**Quantile filters:** +- `quantile`: Number of quantiles (e.g., 10 for deciles, 5 for quintiles) +- `quantile_eq`: Exact quantile (e.g., 3 for 3rd decile) +- `quantile_leq`: Maximum quantile +- `quantile_geq`: Minimum quantile + +## Examples + +- `examples/policy_change_uk.py`: Full UK reform analysis with ChangeAggregate and visualisation +- `examples/us_budgetary_impact.py`: US budgetary impact comparing both approaches diff --git a/docs/index.md b/docs/index.md index dd467d12..1eb1d322 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,4 +6,13 @@ We do this by: * Standardising around a set of core types that let us do policy analysis in an object-oriented way * Exemplifying this behaviour by using this package in all PolicyEngine's production applications, and analyses -In this documentation, we'll walk through the core concepts/types that this package makes available, and how you can use them to run policy analyses at scale. +## Documentation + +- [Core concepts](core-concepts.md): Architecture, datasets, simulations, policies, outputs, entity mapping +- [Economic impact analysis](economic-impact-analysis.md): Full baseline-vs-reform comparison workflow +- [Advanced outputs](advanced-outputs.md): DecileImpact, Poverty, Inequality, IntraDecileImpact +- [Regions and scoping](regions-and-scoping.md): Sub-national analysis (states, constituencies, districts) +- [UK tax-benefit model](country-models-uk.md): Entities, parameters, reform examples +- [US tax-benefit model](country-models-us.md): Entities, parameters, reform examples +- [Visualisation](visualisation.md): Publication-ready charts with Plotly +- [Development](dev.md): Setup, testing, CI, architecture diff --git a/docs/myst.yml b/docs/myst.yml index 053152c6..0f8d647a 100644 --- a/docs/myst.yml +++ b/docs/myst.yml @@ -7,15 +7,17 @@ project: # keywords: [] # authors: [] github: https://github.com/PolicyEngine/policyengine.py - # To autogenerate a Table of Contents, run "jupyter book init --write-toc" toc: - # Auto-generated by `myst init --write-toc` - file: index.md - file: core-concepts.md + - file: economic-impact-analysis.md + - file: advanced-outputs.md + - file: regions-and-scoping.md - file: country-models-uk.md - file: country-models-us.md + - file: visualisation.md - file: dev.md - + site: template: book-theme # options: diff --git a/docs/regions-and-scoping.md b/docs/regions-and-scoping.md new file mode 100644 index 00000000..9be4ddbc --- /dev/null +++ b/docs/regions-and-scoping.md @@ -0,0 +1,251 @@ +# Regions and scoping + +The package supports sub-national analysis through a geographic region system. Regions can scope simulations to states, constituencies, congressional districts, local authorities, and cities. + +## Region system + +### Region + +A `Region` represents a geographic area with a unique prefixed code: + +| Region type | Code format | Examples | +|---|---|---| +| National | `us`, `uk` | `us`, `uk` | +| State | `state/{code}` | `state/ca`, `state/ny` | +| Congressional district | `congressional_district/{ST-DD}` | `congressional_district/CA-01` | +| Place/city | `place/{ST-FIPS}` | `place/NJ-57000` | +| UK country | `country/{name}` | `country/england` | +| Constituency | `constituency/{name}` | `constituency/Sheffield Central` | +| Local authority | `local_authority/{code}` | `local_authority/E09000001` | + +### RegionRegistry + +Each model version has a `RegionRegistry` providing O(1) lookups: + +```python +from policyengine.tax_benefit_models.us import us_latest + +registry = us_latest.region_registry + +# Look up by code +california = registry.get("state/ca") +print(f"{california.label}: {california.region_type}") + +# Get all regions of a type +states = registry.get_by_type("state") +print(f"{len(states)} states") + +districts = registry.get_by_type("congressional_district") +print(f"{len(districts)} congressional districts") + +# Get children of a region +ca_districts = registry.get_children("state/ca") +``` + +```python +from policyengine.tax_benefit_models.uk import uk_latest + +registry = uk_latest.region_registry + +# UK countries +countries = registry.get_by_type("country") +for c in countries: + print(f"{c.code}: {c.label}") +``` + +### Region counts + +**US:** 1 national + 51 states (inc. DC) + 436 congressional districts + 333 census places = 821 regions + +**UK:** 1 national + 4 countries. Constituencies and local authorities are available via extended registry builders. + +## Scoping strategies + +Scoping strategies control how a national dataset is narrowed to represent a sub-national region. They are applied during `Simulation.run()`, before the microsimulation calculation. + +### RowFilterStrategy + +Filters dataset rows where a household-level variable matches a specific value. Used for UK countries and US places/cities. + +```python +from policyengine.core import Simulation +from policyengine.core.scoping_strategy import RowFilterStrategy + +# Simulate only California households +simulation = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + scoping_strategy=RowFilterStrategy( + variable_name="state_code", + variable_value="CA", + ), +) +simulation.run() +``` + +This removes all non-California households from the dataset before running the simulation. The remaining household weights still reflect California's population. + +```python +# UK: simulate only England +simulation = Simulation( + dataset=dataset, + tax_benefit_model_version=uk_latest, + scoping_strategy=RowFilterStrategy( + variable_name="country", + variable_value="ENGLAND", + ), +) +``` + +### WeightReplacementStrategy + +Replaces household weights from a pre-computed weight matrix stored in Google Cloud Storage. Used for UK constituencies and local authorities, where the weight matrix (shape: N_regions x N_households) reweights all households to represent each region's demographics. + +```python +from policyengine.core.scoping_strategy import WeightReplacementStrategy + +simulation = Simulation( + dataset=dataset, + tax_benefit_model_version=uk_latest, + scoping_strategy=WeightReplacementStrategy( + weight_matrix_bucket="policyengine-uk-data", + weight_matrix_key="parliamentary_constituency_weights.h5", + lookup_csv_bucket="policyengine-uk-data", + lookup_csv_key="constituencies_2024.csv", + region_code="Sheffield Central", + ), +) +``` + +Unlike row filtering, weight replacement keeps all households but assigns region-specific weights. This is more statistically robust for small geographic areas where filtering would leave too few households. + +### Legacy filter fields + +For backward compatibility, `Simulation` also accepts `filter_field` and `filter_value` parameters, which are auto-converted to a `RowFilterStrategy`: + +```python +# These two are equivalent: +simulation = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + filter_field="state_code", + filter_value="CA", +) + +simulation = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + scoping_strategy=RowFilterStrategy( + variable_name="state_code", + variable_value="CA", + ), +) +``` + +## Geographic impact outputs + +The package provides output types that compute per-region metrics across all regions simultaneously. + +### CongressionalDistrictImpact (US) + +Groups households by `congressional_district_geoid` and computes weighted average and relative income changes per district. + +```python +from policyengine.outputs.congressional_district_impact import ( + compute_us_congressional_district_impacts, +) + +baseline_sim.run() +reform_sim.run() + +impact = compute_us_congressional_district_impacts(baseline_sim, reform_sim) + +for d in impact.district_results: + print(f"District {d['state_fips']:02d}-{d['district_number']:02d}: " + f"avg change=${d['average_household_income_change']:+,.0f}, " + f"relative={d['relative_household_income_change']:+.2%}") +``` + +**Result fields per district:** +- `district_geoid`: Integer SSDD (state FIPS * 100 + district number) +- `state_fips`: State FIPS code +- `district_number`: District number within state +- `average_household_income_change`: Weighted mean change +- `relative_household_income_change`: Weighted relative change +- `population`: Weighted household count + +### ConstituencyImpact (UK) + +Uses pre-computed weight matrices (650 x N_households) to compute per-constituency income changes without filtering. + +```python +from policyengine.outputs.constituency_impact import ( + compute_uk_constituency_impacts, +) + +impact = compute_uk_constituency_impacts( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + weight_matrix_path="parliamentary_constituency_weights.h5", + constituency_csv_path="constituencies_2024.csv", + year="2025", +) + +for c in impact.constituency_results: + print(f"{c['constituency_name']}: " + f"avg change={c['average_household_income_change']:+,.0f}") +``` + +**Result fields per constituency:** +- `constituency_code`, `constituency_name`: Identifiers +- `x`, `y`: Hex map coordinates +- `average_household_income_change`, `relative_household_income_change` +- `population`: Weighted household count + +### LocalAuthorityImpact (UK) + +Works identically to `ConstituencyImpact` but for local authorities (360 x N_households weight matrix). + +```python +from policyengine.outputs.local_authority_impact import ( + compute_uk_local_authority_impacts, +) + +impact = compute_uk_local_authority_impacts( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + weight_matrix_path="local_authority_weights.h5", + local_authority_csv_path="local_authorities_2024.csv", + year="2025", +) +``` + +## Using regions with `economic_impact_analysis()` + +Scoping strategies compose naturally with the full analysis pipeline: + +```python +from policyengine.core.scoping_strategy import RowFilterStrategy + +# State-level analysis +baseline_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + scoping_strategy=RowFilterStrategy( + variable_name="state_code", + variable_value="CA", + ), +) +reform_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + policy=reform, + scoping_strategy=RowFilterStrategy( + variable_name="state_code", + variable_value="CA", + ), +) + +# Full analysis scoped to California +analysis = economic_impact_analysis(baseline_sim, reform_sim) +``` diff --git a/examples/us_budgetary_impact.py b/examples/us_budgetary_impact.py new file mode 100644 index 00000000..048f7e4e --- /dev/null +++ b/examples/us_budgetary_impact.py @@ -0,0 +1,151 @@ +"""Example: US budgetary impact comparison between baseline and reform. + +Demonstrates the canonical policyengine.py workflow: +1. Ensure datasets exist (download + compute or load from cache) +2. Define a parametric reform +3. Run baseline and reform simulations +4. Use economic_impact_analysis() for the full analysis +5. Use ChangeAggregate for targeted single-metric queries + +Run: python examples/us_budgetary_impact.py +""" + +import datetime + +from policyengine.core import Parameter, ParameterValue, Policy, Simulation +from policyengine.outputs.change_aggregate import ( + ChangeAggregate, + ChangeAggregateType, +) +from policyengine.tax_benefit_models.us import ( + economic_impact_analysis, + ensure_datasets, + us_latest, +) + + +def main(): + year = 2026 + + # ── Step 1: Get dataset (downloads from HuggingFace on first run) ── + print("Ensuring datasets are available...") + datasets = ensure_datasets( + datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"], + years=[year], + data_folder="./data", + ) + dataset = datasets[f"enhanced_cps_2024_{year}"] + print(f" Loaded: {dataset}") + + # ── Step 2: Define a reform ── + # Example: double the standard deduction for single filers + param = Parameter( + name="gov.irs.deductions.standard.amount.SINGLE", + tax_benefit_model_version=us_latest, + ) + reform = Policy( + name="Double standard deduction (single)", + parameter_values=[ + ParameterValue( + parameter=param, + start_date=datetime.date(year, 1, 1), + end_date=datetime.date(year, 12, 31), + value=30_950, + ), + ], + ) + + # ── Step 3: Create simulations ── + baseline_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + ) + reform_sim = Simulation( + dataset=dataset, + tax_benefit_model_version=us_latest, + policy=reform, + ) + + # ── Step 4a: Quick budgetary number via ChangeAggregate ── + # This requires running the simulations first. + print("\nRunning simulations...") + baseline_sim.run() + reform_sim.run() + + tax_change = ChangeAggregate( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + variable="household_tax", + aggregate_type=ChangeAggregateType.SUM, + ) + tax_change.run() + print(f"\nQuick budgetary result:") + print(f" Tax revenue change: ${tax_change.result / 1e9:.2f}B") + + # Count winners and losers + winners = ChangeAggregate( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + variable="household_net_income", + aggregate_type=ChangeAggregateType.COUNT, + change_geq=1, + ) + losers = ChangeAggregate( + baseline_simulation=baseline_sim, + reform_simulation=reform_sim, + variable="household_net_income", + aggregate_type=ChangeAggregateType.COUNT, + change_leq=-1, + ) + winners.run() + losers.run() + print(f" Winners: {winners.result / 1e6:.2f}M households") + print(f" Losers: {losers.result / 1e6:.2f}M households") + + # ── Step 4b: Full analysis via economic_impact_analysis ── + # Note: this calls .ensure() internally, which is a no-op here since + # we already ran the simulations above. If we hadn't called .run(), + # ensure() would run + cache them automatically. + print("\nRunning full economic impact analysis...") + analysis = economic_impact_analysis(baseline_sim, reform_sim) + + print("\n=== Program-by-Program Impact ===") + for prog in analysis.program_statistics.outputs: + print( + f" {prog.program_name:30s} " + f"baseline=${prog.baseline_total / 1e9:8.1f}B " + f"reform=${prog.reform_total / 1e9:8.1f}B " + f"change=${prog.change / 1e9:+8.1f}B" + ) + + print("\n=== Decile Impacts ===") + for d in analysis.decile_impacts.outputs: + print( + f" Decile {d.decile:2d}: " + f"avg change=${d.absolute_change:+8.0f} " + f"relative={d.relative_change:+.2%}" + ) + + print("\n=== Poverty ===") + for bp, rp in zip( + analysis.baseline_poverty.outputs, + analysis.reform_poverty.outputs, + strict=True, + ): + print( + f" {bp.metric:30s} " + f"baseline={bp.rate:.4f} " + f"reform={rp.rate:.4f} " + f"change={rp.rate - bp.rate:+.4f}" + ) + + print("\n=== Inequality ===") + bi = analysis.baseline_inequality + ri = analysis.reform_inequality + print(f" Gini: baseline={bi.gini:.4f} reform={ri.gini:.4f}") + print(f" Top 10% share: baseline={bi.top_10_share:.4f} reform={ri.top_10_share:.4f}") + print(f" Top 1% share: baseline={bi.top_1_share:.4f} reform={ri.top_1_share:.4f}") + + +if __name__ == "__main__": + main() From e6cc8648dfc87595025f64142743a5781140623e Mon Sep 17 00:00:00 2001 From: Anthony Volk Date: Mon, 16 Mar 2026 19:46:59 +0100 Subject: [PATCH 2/3] style: Run ruff format and fix lint errors Co-Authored-By: Claude Opus 4.6 --- examples/us_budgetary_impact.py | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/examples/us_budgetary_impact.py b/examples/us_budgetary_impact.py index 048f7e4e..f99df6d1 100644 --- a/examples/us_budgetary_impact.py +++ b/examples/us_budgetary_impact.py @@ -79,7 +79,7 @@ def main(): aggregate_type=ChangeAggregateType.SUM, ) tax_change.run() - print(f"\nQuick budgetary result:") + print("\nQuick budgetary result:") print(f" Tax revenue change: ${tax_change.result / 1e9:.2f}B") # Count winners and losers @@ -143,8 +143,12 @@ def main(): bi = analysis.baseline_inequality ri = analysis.reform_inequality print(f" Gini: baseline={bi.gini:.4f} reform={ri.gini:.4f}") - print(f" Top 10% share: baseline={bi.top_10_share:.4f} reform={ri.top_10_share:.4f}") - print(f" Top 1% share: baseline={bi.top_1_share:.4f} reform={ri.top_1_share:.4f}") + print( + f" Top 10% share: baseline={bi.top_10_share:.4f} reform={ri.top_10_share:.4f}" + ) + print( + f" Top 1% share: baseline={bi.top_1_share:.4f} reform={ri.top_1_share:.4f}" + ) if __name__ == "__main__": From babb1eb2bd9ae782d46de8baca86c1671d248596 Mon Sep 17 00:00:00 2001 From: Anthony Volk Date: Mon, 16 Mar 2026 20:57:45 +0100 Subject: [PATCH 3/3] docs: Add examples page with literalinclude and fix cross-references - Create docs/examples.md embedding all example scripts via {literalinclude} - Add examples page to myst.yml TOC - Replace all "See examples/..." prose with links to examples.md sections Co-Authored-By: Claude Opus 4.6 --- docs/core-concepts.md | 8 ++-- docs/country-models-uk.md | 8 ++-- docs/country-models-us.md | 9 ++--- docs/economic-impact-analysis.md | 4 +- docs/examples.md | 67 ++++++++++++++++++++++++++++++++ docs/index.md | 1 + docs/myst.yml | 1 + docs/visualisation.md | 2 +- 8 files changed, 83 insertions(+), 17 deletions(-) create mode 100644 docs/examples.md diff --git a/docs/core-concepts.md b/docs/core-concepts.md index 8bbc2db9..425c5f62 100644 --- a/docs/core-concepts.md +++ b/docs/core-concepts.md @@ -583,7 +583,7 @@ COLORS = { ### 1. Analyse employment income variation -See `examples/employment_income_variation_uk.py` for a complete example of: +See [UK employment income variation](examples.md#uk-employment-income-variation) for a complete example of: - Creating custom datasets with varied parameters - Running single simulations - Extracting results with filters @@ -591,7 +591,7 @@ See `examples/employment_income_variation_uk.py` for a complete example of: ### 2. Policy reform analysis -See `examples/policy_change_uk.py` for: +See [UK policy reform analysis](examples.md#uk-policy-reform-analysis) for: - Applying parametric reforms - Comparing baseline and reform - Analysing winners/losers by decile @@ -599,7 +599,7 @@ See `examples/policy_change_uk.py` for: ### 3. Distributional analysis -See `examples/income_distribution_us.py` for: +See [US income distribution](examples.md#us-income-distribution) for: - Loading representative microdata - Calculating statistics by income decile - Mapping variables across entity levels @@ -659,4 +659,4 @@ See `examples/income_distribution_us.py` for: - [UK tax-benefit model](country-models-uk.md) - [US tax-benefit model](country-models-us.md) - [Visualisation](visualisation.md): Publication-ready charts -- See `examples/` for complete working scripts +- [Examples](examples.md): Complete working scripts diff --git a/docs/country-models-uk.md b/docs/country-models-uk.md index bd9d1fbd..0bc54505 100644 --- a/docs/country-models-uk.md +++ b/docs/country-models-uk.md @@ -363,11 +363,9 @@ When creating custom datasets, validate: ## Examples -See working examples in the `examples/` directory: - -- `employment_income_variation_uk.py`: Vary employment income, analyse benefit phase-outs -- `policy_change_uk.py`: Apply reforms, analyse winners/losers -- `income_bands_uk.py`: Create income band scenarios +- [UK employment income variation](examples.md#uk-employment-income-variation): Vary employment income, analyse benefit phase-outs +- [UK policy reform analysis](examples.md#uk-policy-reform-analysis): Apply reforms, analyse winners/losers +- [UK income bands](examples.md#uk-income-bands): Calculate net income and tax by income decile ## References diff --git a/docs/country-models-us.md b/docs/country-models-us.md index 547a4f3b..268c888f 100644 --- a/docs/country-models-us.md +++ b/docs/country-models-us.md @@ -431,11 +431,10 @@ When creating custom datasets, validate: ## Examples -See working examples in the `examples/` directory: - -- `income_distribution_us.py`: Analyse benefit distribution by income decile -- `employment_income_variation_us.py`: Vary employment income, analyse phase-outs -- `speedtest_us_simulation.py`: Performance benchmarking +- [US income distribution](examples.md#us-income-distribution): Analyse benefit distribution by income decile +- [US employment income variation](examples.md#us-employment-income-variation): Vary employment income, analyse phase-outs +- [US budgetary impact](examples.md#us-budgetary-impact): Full baseline-vs-reform comparison +- [Simulation performance](examples.md#simulation-performance): Performance benchmarking ## References diff --git a/docs/economic-impact-analysis.md b/docs/economic-impact-analysis.md index db782729..0d28dff8 100644 --- a/docs/economic-impact-analysis.md +++ b/docs/economic-impact-analysis.md @@ -283,5 +283,5 @@ avg_loss.run() ## Examples -- `examples/policy_change_uk.py`: Full UK reform analysis with ChangeAggregate and visualisation -- `examples/us_budgetary_impact.py`: US budgetary impact comparing both approaches +- [UK policy reform analysis](examples.md#uk-policy-reform-analysis): Full reform analysis with ChangeAggregate and visualisation +- [US budgetary impact](examples.md#us-budgetary-impact): Budgetary impact comparing both approaches diff --git a/docs/examples.md b/docs/examples.md new file mode 100644 index 00000000..b7b4e91a --- /dev/null +++ b/docs/examples.md @@ -0,0 +1,67 @@ +# Examples + +Complete working scripts demonstrating common workflows. Each script can be run directly with `python examples/.py`. + +## US budgetary impact + +The canonical workflow for comparing a baseline and reform simulation, using both `economic_impact_analysis()` and `ChangeAggregate`. + +```{literalinclude} ../examples/us_budgetary_impact.py +:language: python +``` + +## UK policy reform analysis + +Applying parametric reforms, comparing baseline and reform with `ChangeAggregate`, analysing winners and losers by income decile, and visualising results with Plotly. + +```{literalinclude} ../examples/policy_change_uk.py +:language: python +``` + +## UK income bands + +Calculating net income and tax by income decile using representative microdata and `Aggregate` with quantile filters. + +```{literalinclude} ../examples/income_bands_uk.py +:language: python +``` + +## US income distribution + +Loading enhanced CPS microdata, running a full microsimulation, and calculating statistics within income deciles. + +```{literalinclude} ../examples/income_distribution_us.py +:language: python +``` + +## UK employment income variation + +Creating a custom dataset with varied employment income, running a single simulation, and visualising benefit phase-outs. + +```{literalinclude} ../examples/employment_income_variation_uk.py +:language: python +``` + +## US employment income variation + +Same approach as the UK version, varying employment income from $0 to $200k and plotting household net income. + +```{literalinclude} ../examples/employment_income_variation_us.py +:language: python +``` + +## Household impact calculation + +Using `calculate_household_impact()` to compute taxes and benefits for individual custom households (both UK and US). + +```{literalinclude} ../examples/household_impact_example.py +:language: python +``` + +## Simulation performance + +Benchmarking how `simulation.run()` scales with dataset size. + +```{literalinclude} ../examples/speedtest_us_simulation.py +:language: python +``` diff --git a/docs/index.md b/docs/index.md index 1eb1d322..3a6d2b43 100644 --- a/docs/index.md +++ b/docs/index.md @@ -14,5 +14,6 @@ We do this by: - [Regions and scoping](regions-and-scoping.md): Sub-national analysis (states, constituencies, districts) - [UK tax-benefit model](country-models-uk.md): Entities, parameters, reform examples - [US tax-benefit model](country-models-us.md): Entities, parameters, reform examples +- [Examples](examples.md): Complete working scripts - [Visualisation](visualisation.md): Publication-ready charts with Plotly - [Development](dev.md): Setup, testing, CI, architecture diff --git a/docs/myst.yml b/docs/myst.yml index 0f8d647a..2984d6e7 100644 --- a/docs/myst.yml +++ b/docs/myst.yml @@ -15,6 +15,7 @@ project: - file: regions-and-scoping.md - file: country-models-uk.md - file: country-models-us.md + - file: examples.md - file: visualisation.md - file: dev.md diff --git a/docs/visualisation.md b/docs/visualisation.md index 639f12ae..662ec3b1 100644 --- a/docs/visualisation.md +++ b/docs/visualisation.md @@ -69,4 +69,4 @@ COLORS = { ## Complete example -See `examples/employment_income_variation.py` for a full demonstration of using `format_fig()` in an analysis workflow. +See [UK employment income variation](examples.md#uk-employment-income-variation) for a full demonstration of using `format_fig()` in an analysis workflow.