From 1c28d8524a01a306700e8b700a4d955a6ee4e013 Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:41:35 +0100 Subject: [PATCH 01/10] docs: add code review roadmap for red flags in src/eo_api --- ROADMAP.md | 241 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 241 insertions(+) create mode 100644 ROADMAP.md diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..0622cd1 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,241 @@ +# Code Review: Red Flags in `src/eo_api` + +## Context + +Critical review of the `src/eo_api` codebase looking for obvious red flags -- security issues, reliability problems, and architectural concerns that could cause real trouble. + +--- + +## Phased Roadmap + +### Phase 1: Security hardening (issues #1, #3, #4) + +**1a. Fix SSRF + path traversal in `zonal_statistics.py:108-121`** +- Replace `urlopen()` with `httpx.get()` using a timeout +- Remove local file path support entirely + +**1b. Fix CORS in `main.py:12-18`** +- Drop `allow_credentials=True` (the API does not use cookies or session auth) +- Keep `allow_origins=["*"]` since this is a public data API + +**1c. Replace dynamic import with allowlist in `cache.py:166-172`** +- Build a dict mapping function path strings to actual callables +- Look up from allowlist instead of `importlib.import_module` + +### Phase 2: Reliability (issues #2, #7, #8) + +**2a. Lazy-load DHIS2 constants in `constants.py:12-14`** +- Wrap the DHIS2 call + GeoDataFrame computation in a function with `@functools.lru_cache` +- Close the client after use +- Update `cache.py` and `serialize.py` imports to call the function instead of reading module-level vars + +**2b. Narrow exception handling in `registry.py:28`** +- Catch `(yaml.YAMLError, OSError)` instead of bare `Exception` +- Re-raise on unexpected errors so they surface at startup + +**2c. Preserve exception chains in `chirps3_dhis2_pipeline.py:523-524`** +- Change `from None` to `from e` + +### Phase 3: Correctness + cleanup (issues #5, #6, #9, #10) + +**3a. Make OGCAPI base URL configurable in `tasks.py:17`** +- `OGCAPI_BASE_URL = os.getenv("OGCAPI_BASE_URL", "http://localhost:8000/ogcapi")` + +**3b. Replace `assert` with exceptions in `serialize.py:41,70`** +- `if len(...) != 1: raise ValueError(...)` + +**3c. Cache registry lookup in `registry.py:34-37`** +- Add `@functools.lru_cache` to `list_datasets()` (YAML files don't change at runtime) +- `get_dataset()` uses the cached result + +**3d. Clean up temp file on write failure in `serialize.py:87-93`** +- Wrap `ds.to_netcdf(path)` in try/except, remove file on failure + +### Verification + +- `ruff check src/` and `ruff format --check src/` pass +- `mypy src/` passes +- `pyright` passes +- `pytest` passes + +--- + +## Detailed Findings + +## 1. SSRF + path traversal in zonal_statistics.py + +**File:** `routers/ogcapi/plugins/processes/zonal_statistics.py:108-121` + +`_read_geojson_input()` accepts user-supplied strings and will: +- Fetch arbitrary URLs via `urlopen()` with **no timeout** (can hang forever) +- Read arbitrary local files with **no path restriction** (e.g. `/etc/passwd`) + +```python +if parsed.scheme in {"http", "https"}: + with urlopen(geojson_input) as response: # SSRF, no timeout + payload = response.read().decode("utf-8") + +path = Path(geojson_input) # path traversal +``` + +**Fix:** Use httpx with a timeout, remove file-path support entirely. + +--- + +## 2. Module-level DHIS2 network call blocks import + +**File:** `datasets/constants.py:12-14` + +```python +client = create_client() +ORG_UNITS_GEOJSON = get_org_units_geojson(client, level=2) +BBOX = list(map(float, gpd.read_file(json.dumps(ORG_UNITS_GEOJSON)).total_bounds)) +``` + +- App startup blocks on DHIS2 being reachable -- no timeout, no fallback +- If DHIS2 is down, the app won't start at all +- The `client` is never closed +- Imported transitively by `datasets/cache.py` and `datasets/serialize.py`, so this runs even if you only touch the datasets module + +**Fix:** Lazy-load behind a function with `@lru_cache`, add timeout/fallback. + +--- + +## 3. CORS allows all origins with credentials + +**File:** `main.py:12-18` + +```python +allow_origins=["*"], +allow_credentials=True, +``` + +`allow_origins=["*"]` + `allow_credentials=True` is the one CORS combination that's explicitly dangerous -- it enables cross-origin credential theft. Browsers actually block this combo for `Access-Control-Allow-Credentials`, but misconfigured CORS middleware may reflect the `Origin` header instead of sending `*`, which **does** work and is exploitable. + +**Fix:** Drop `allow_credentials=True` since cookies/auth headers aren't needed cross-origin. + +--- + +## 4. Dynamic import from YAML config + +**File:** `datasets/cache.py:166-172` + +```python +def _get_dynamic_function(full_path: str) -> Callable[..., Any]: + module = importlib.import_module(module_path) + return getattr(module, function_name) +``` + +The function path comes from YAML files in `datasets/registry/`. If those files are ever editable by users or loaded from an untrusted source, this is arbitrary code execution. + +Currently low-risk since the YAML is checked into the repo, but the pattern is fragile. + +**Fix:** Use an explicit allowlist mapping instead of dynamic import. + +--- + +## 5. Hardcoded localhost URL for Prefect tasks + +**File:** `prefect_flows/tasks.py:17` + +```python +OGCAPI_BASE_URL = "http://localhost:8000/ogcapi" +``` + +Prefect tasks call back to the API over HTTP using a hardcoded localhost URL. This breaks in any deployment where Prefect workers run outside the API container. + +**Fix:** Read from environment variable with sensible default. + +--- + +## 6. `assert` used for runtime validation + +**File:** `datasets/serialize.py:41,70` + +```python +assert len(temp_df[time_dim].unique()) == 1 +assert len(temp_ds[time_dim].values) == 1 +``` + +These are stripped when Python runs with `-O`. If these invariants matter (and they do -- the preview functions assume a single timestep), they should be real exceptions. + +**Fix:** Replace with `if ... : raise ValueError(...)`. + +--- + +## 7. Silent data loss on YAML parse errors + +**File:** `datasets/registry.py:28` + +```python +except Exception: + logger.exception("Error loading %s", file_path.name) +``` + +If a YAML config file is corrupted or has a syntax error, the datasets in that file silently disappear from the registry. The bare `Exception` catch also swallows unexpected errors like `TypeError` from malformed data. + +**Fix:** Catch `(yaml.YAMLError, OSError)` specifically, and consider failing loudly on startup. + +--- + +## 8. Exception chain suppressed with `from None` + +**File:** `routers/ogcapi/plugins/processes/chirps3_dhis2_pipeline.py:523-524` + +```python +except Exception as e: + raise ProcessorExecuteError(str(e)) from None +``` + +`from None` suppresses the original traceback. When this pipeline fails for unexpected reasons, you'll get a flat error message with no stack trace to debug from. + +**Fix:** Use `from e` instead of `from None`. + +--- + +## 9. Registry rebuilt from disk on every request + +**File:** `datasets/registry.py:34-37` + +```python +def get_dataset(dataset_id: str) -> dict[str, Any] | None: + datasets_lookup = {d["id"]: d for d in list_datasets()} # reads all YAML files + return datasets_lookup.get(dataset_id) +``` + +Every call to `get_dataset()` reads and parses all YAML files from disk and rebuilds the lookup dict. This runs on every API request that involves a dataset. + +**Fix:** Cache with `@lru_cache` or build the lookup once at startup. + +--- + +## 10. Temp file leak on write failure + +**File:** `datasets/serialize.py:87-93` + +```python +fd = tempfile.NamedTemporaryFile(suffix=".nc", delete=False) +path = fd.name +fd.close() +ds.to_netcdf(path) # if this fails, orphaned file in /tmp +return path +``` + +If `to_netcdf()` raises, the temp file is never cleaned up. Not catastrophic since it's in `/tmp`, but worth a try/except that removes the file on failure. + +--- + +## Summary by severity + +| # | Issue | Impact | +|---|-------|--------| +| 1 | SSRF + path traversal | Security: can read local files, make internal requests | +| 2 | Module-level DHIS2 call | Reliability: app won't start if DHIS2 is down | +| 3 | CORS misconfiguration | Security: potential cross-origin credential leak | +| 4 | Dynamic import from YAML | Security: fragile pattern, code exec if YAML untrusted | +| 5 | Hardcoded localhost URL | Deployment: breaks non-local Prefect setups | +| 6 | assert for validation | Correctness: stripped with `-O` flag | +| 7 | Silent YAML error swallowing | Reliability: datasets silently disappear | +| 8 | Exception chain suppressed | Debuggability: no stack trace on failures | +| 9 | Registry rebuilt per request | Performance: unnecessary disk I/O | +| 10 | Temp file leak | Cleanliness: orphaned files on failure | From cc73b08ad04ddd3be04eaf0a26b69411ffef0ec7 Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:44:37 +0100 Subject: [PATCH 02/10] fix: close SSRF and path traversal in zonal_statistics - Replace urllib urlopen() with httpx.get() with a 30s timeout - Remove local file path support entirely (was an unrestricted read) - Reject any string input that is not an HTTP(S) URL --- .../plugins/processes/zonal_statistics.py | 24 +++++++++---------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/src/eo_api/routers/ogcapi/plugins/processes/zonal_statistics.py b/src/eo_api/routers/ogcapi/plugins/processes/zonal_statistics.py index 4cce0cb..9f6edaa 100644 --- a/src/eo_api/routers/ogcapi/plugins/processes/zonal_statistics.py +++ b/src/eo_api/routers/ogcapi/plugins/processes/zonal_statistics.py @@ -3,11 +3,9 @@ from __future__ import annotations import json -from pathlib import Path from typing import Any -from urllib.parse import urlparse -from urllib.request import urlopen +import httpx import numpy as np import rasterio from pydantic import ValidationError @@ -27,7 +25,7 @@ "inputs": { "geojson": { "title": "GeoJSON FeatureCollection", - "description": "FeatureCollection object or URI/path to GeoJSON file.", + "description": "FeatureCollection object or HTTP(S) URL to GeoJSON file.", "schema": {"oneOf": [{"type": "object"}, {"type": "string"}]}, "minOccurs": 1, "maxOccurs": 1, @@ -109,16 +107,16 @@ def _read_geojson_input(geojson_input: dict[str, Any] | str) -> dict[str, Any]: if isinstance(geojson_input, dict): return geojson_input - parsed = urlparse(geojson_input) - if parsed.scheme in {"http", "https"}: - with urlopen(geojson_input) as response: - payload = response.read().decode("utf-8") - return _parse_geojson_object(payload) + if not geojson_input.startswith(("http://", "https://")): + raise ProcessorExecuteError("GeoJSON string input must be an HTTP(S) URL") - path = Path(geojson_input) - if not path.exists(): - raise ProcessorExecuteError(f"GeoJSON file not found: {geojson_input}") - return _parse_geojson_object(path.read_text(encoding="utf-8")) + try: + response = httpx.get(geojson_input, timeout=30) + response.raise_for_status() + except httpx.HTTPError as err: + raise ProcessorExecuteError(f"Failed to fetch GeoJSON from URL: {err}") from err + + return _parse_geojson_object(response.text) def _parse_geojson_object(payload: str) -> dict[str, Any]: From 754f212cb9243aca1d63b21e2993474943222e06 Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:45:21 +0100 Subject: [PATCH 03/10] fix: drop allow_credentials from CORS middleware allow_origins=["*"] + allow_credentials=True is a dangerous CORS combination. Credentials are not needed for this public data API. --- src/eo_api/main.py | 1 - 1 file changed, 1 deletion(-) diff --git a/src/eo_api/main.py b/src/eo_api/main.py index f99667a..e707022 100644 --- a/src/eo_api/main.py +++ b/src/eo_api/main.py @@ -12,7 +12,6 @@ app.add_middleware( CORSMiddleware, allow_origins=["*"], - allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) From 1aba1ed45c62a526c8749c1e702852257a38bfc4 Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:46:19 +0100 Subject: [PATCH 04/10] fix: preserve exception chain in chirps3_dhis2_pipeline Change `from None` to `from e` so the original traceback is retained when ProcessorExecuteError is raised. --- .../routers/ogcapi/plugins/processes/chirps3_dhis2_pipeline.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/eo_api/routers/ogcapi/plugins/processes/chirps3_dhis2_pipeline.py b/src/eo_api/routers/ogcapi/plugins/processes/chirps3_dhis2_pipeline.py index 0351b3d..8cdb5d4 100644 --- a/src/eo_api/routers/ogcapi/plugins/processes/chirps3_dhis2_pipeline.py +++ b/src/eo_api/routers/ogcapi/plugins/processes/chirps3_dhis2_pipeline.py @@ -521,7 +521,7 @@ def execute(self, data: dict[str, Any], outputs: Any = None) -> tuple[str, dict[ "or increase DHIS2_HTTP_TIMEOUT_SECONDS." ) from None except Exception as e: - raise ProcessorExecuteError(str(e)) from None + raise ProcessorExecuteError(str(e)) from e finally: client.close() From 6f758a08a2824dc60a7c562c3f63955afea8557d Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:46:43 +0100 Subject: [PATCH 05/10] fix: make OGCAPI base URL configurable via environment variable Read OGCAPI_BASE_URL from env so Prefect workers outside the API container can reach the correct endpoint. --- src/eo_api/prefect_flows/tasks.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/eo_api/prefect_flows/tasks.py b/src/eo_api/prefect_flows/tasks.py index d25ce95..2a045ff 100644 --- a/src/eo_api/prefect_flows/tasks.py +++ b/src/eo_api/prefect_flows/tasks.py @@ -5,6 +5,7 @@ """ import logging +import os from pathlib import Path import httpx @@ -14,7 +15,7 @@ logger = logging.getLogger(__name__) -OGCAPI_BASE_URL = "http://localhost:8000/ogcapi" +OGCAPI_BASE_URL = os.getenv("OGCAPI_BASE_URL", "http://localhost:8000/ogcapi") PROCESS_TIMEOUT_SECONDS = 600 From 25b1050746672f8c8edacb1798379d0a642aef52 Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:47:34 +0100 Subject: [PATCH 06/10] fix: replace assert with ValueError in serialize preview functions Asserts are stripped with python -O. These invariants guard real logic so they need to be proper exceptions. --- src/eo_api/datasets/serialize.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/eo_api/datasets/serialize.py b/src/eo_api/datasets/serialize.py index e7761d8..1077a00 100644 --- a/src/eo_api/datasets/serialize.py +++ b/src/eo_api/datasets/serialize.py @@ -38,7 +38,8 @@ def dataframe_to_preview(df: pd.DataFrame, dataset: dict[str, Any], period_type: temp_df = df[[time_dim, "id", varname]] temp_df[time_dim] = pandas_period_string(temp_df[time_dim], period_type) - assert len(temp_df[time_dim].unique()) == 1 + if len(temp_df[time_dim].unique()) != 1: + raise ValueError("dataframe_to_preview expects exactly one timestep") org_units = gpd.read_file(json.dumps(constants.ORG_UNITS_GEOJSON)) org_units_with_temp = org_units.merge(temp_df, on="id", how="left") @@ -67,7 +68,8 @@ def xarray_to_preview(ds: xr.Dataset, dataset: dict[str, Any], period_type: str) temp_ds = ds[[time_dim, varname]] temp_ds = temp_ds.assign_coords({time_dim: lambda x: numpy_period_array(x[time_dim].values, period_type)}) - assert len(temp_ds[time_dim].values) == 1 + if len(temp_ds[time_dim].values) != 1: + raise ValueError("xarray_to_preview expects exactly one timestep") fig = Figure() ax = fig.subplots() From f182ca254ae06a65620ceef5646f4d27498df74a Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:48:03 +0100 Subject: [PATCH 07/10] fix: cache registry YAML lookup with lru_cache YAML files don't change at runtime so there's no need to re-read and re-parse them on every request. --- src/eo_api/datasets/registry.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/eo_api/datasets/registry.py b/src/eo_api/datasets/registry.py index 03ea31f..debc697 100644 --- a/src/eo_api/datasets/registry.py +++ b/src/eo_api/datasets/registry.py @@ -1,5 +1,6 @@ """Dataset registry backed by YAML config files.""" +import functools import logging from pathlib import Path from typing import Any @@ -12,6 +13,7 @@ CONFIGS_DIR = SCRIPT_DIR / "registry" +@functools.lru_cache(maxsize=1) def list_datasets() -> list[dict[str, Any]]: """Load all YAML files in the registry folder and return a flat list of datasets.""" datasets: list[dict[str, Any]] = [] From 0c00b4ed9f4dd0dfea2996db0f06e88e96758f1a Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:49:49 +0100 Subject: [PATCH 08/10] fix: narrow exception handling in registry YAML loader Catch (yaml.YAMLError, OSError) instead of bare Exception so unexpected errors like TypeError from malformed data surface immediately instead of silently dropping datasets. --- src/eo_api/datasets/registry.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/eo_api/datasets/registry.py b/src/eo_api/datasets/registry.py index debc697..fae4a85 100644 --- a/src/eo_api/datasets/registry.py +++ b/src/eo_api/datasets/registry.py @@ -27,7 +27,7 @@ def list_datasets() -> list[dict[str, Any]]: with open(file_path, encoding="utf-8") as f: file_datasets = yaml.safe_load(f) datasets.extend(file_datasets) - except Exception: + except (yaml.YAMLError, OSError): logger.exception("Error loading %s", file_path.name) return datasets From a49cfcd5615565ee3f3b2d2c9ba90acee5bbdf30 Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:51:13 +0100 Subject: [PATCH 09/10] fix: clean up temp file on netcdf write failure If to_netcdf() raises, remove the orphaned temp file before re-raising the exception. --- src/eo_api/datasets/serialize.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/eo_api/datasets/serialize.py b/src/eo_api/datasets/serialize.py index 1077a00..c6266a6 100644 --- a/src/eo_api/datasets/serialize.py +++ b/src/eo_api/datasets/serialize.py @@ -91,7 +91,11 @@ def xarray_to_temporary_netcdf(ds: xr.Dataset) -> str: fd = tempfile.NamedTemporaryFile(suffix=".nc", delete=False) path = fd.name fd.close() - ds.to_netcdf(path) + try: + ds.to_netcdf(path) + except Exception: + os.remove(path) + raise return path From 595e82037f12ab116b09428112fc8ce858aedf17 Mon Sep 17 00:00:00 2001 From: Morten Hansen Date: Thu, 26 Feb 2026 19:52:04 +0100 Subject: [PATCH 10/10] chore: remove temporary roadmap document --- ROADMAP.md | 241 ----------------------------------------------------- 1 file changed, 241 deletions(-) delete mode 100644 ROADMAP.md diff --git a/ROADMAP.md b/ROADMAP.md deleted file mode 100644 index 0622cd1..0000000 --- a/ROADMAP.md +++ /dev/null @@ -1,241 +0,0 @@ -# Code Review: Red Flags in `src/eo_api` - -## Context - -Critical review of the `src/eo_api` codebase looking for obvious red flags -- security issues, reliability problems, and architectural concerns that could cause real trouble. - ---- - -## Phased Roadmap - -### Phase 1: Security hardening (issues #1, #3, #4) - -**1a. Fix SSRF + path traversal in `zonal_statistics.py:108-121`** -- Replace `urlopen()` with `httpx.get()` using a timeout -- Remove local file path support entirely - -**1b. Fix CORS in `main.py:12-18`** -- Drop `allow_credentials=True` (the API does not use cookies or session auth) -- Keep `allow_origins=["*"]` since this is a public data API - -**1c. Replace dynamic import with allowlist in `cache.py:166-172`** -- Build a dict mapping function path strings to actual callables -- Look up from allowlist instead of `importlib.import_module` - -### Phase 2: Reliability (issues #2, #7, #8) - -**2a. Lazy-load DHIS2 constants in `constants.py:12-14`** -- Wrap the DHIS2 call + GeoDataFrame computation in a function with `@functools.lru_cache` -- Close the client after use -- Update `cache.py` and `serialize.py` imports to call the function instead of reading module-level vars - -**2b. Narrow exception handling in `registry.py:28`** -- Catch `(yaml.YAMLError, OSError)` instead of bare `Exception` -- Re-raise on unexpected errors so they surface at startup - -**2c. Preserve exception chains in `chirps3_dhis2_pipeline.py:523-524`** -- Change `from None` to `from e` - -### Phase 3: Correctness + cleanup (issues #5, #6, #9, #10) - -**3a. Make OGCAPI base URL configurable in `tasks.py:17`** -- `OGCAPI_BASE_URL = os.getenv("OGCAPI_BASE_URL", "http://localhost:8000/ogcapi")` - -**3b. Replace `assert` with exceptions in `serialize.py:41,70`** -- `if len(...) != 1: raise ValueError(...)` - -**3c. Cache registry lookup in `registry.py:34-37`** -- Add `@functools.lru_cache` to `list_datasets()` (YAML files don't change at runtime) -- `get_dataset()` uses the cached result - -**3d. Clean up temp file on write failure in `serialize.py:87-93`** -- Wrap `ds.to_netcdf(path)` in try/except, remove file on failure - -### Verification - -- `ruff check src/` and `ruff format --check src/` pass -- `mypy src/` passes -- `pyright` passes -- `pytest` passes - ---- - -## Detailed Findings - -## 1. SSRF + path traversal in zonal_statistics.py - -**File:** `routers/ogcapi/plugins/processes/zonal_statistics.py:108-121` - -`_read_geojson_input()` accepts user-supplied strings and will: -- Fetch arbitrary URLs via `urlopen()` with **no timeout** (can hang forever) -- Read arbitrary local files with **no path restriction** (e.g. `/etc/passwd`) - -```python -if parsed.scheme in {"http", "https"}: - with urlopen(geojson_input) as response: # SSRF, no timeout - payload = response.read().decode("utf-8") - -path = Path(geojson_input) # path traversal -``` - -**Fix:** Use httpx with a timeout, remove file-path support entirely. - ---- - -## 2. Module-level DHIS2 network call blocks import - -**File:** `datasets/constants.py:12-14` - -```python -client = create_client() -ORG_UNITS_GEOJSON = get_org_units_geojson(client, level=2) -BBOX = list(map(float, gpd.read_file(json.dumps(ORG_UNITS_GEOJSON)).total_bounds)) -``` - -- App startup blocks on DHIS2 being reachable -- no timeout, no fallback -- If DHIS2 is down, the app won't start at all -- The `client` is never closed -- Imported transitively by `datasets/cache.py` and `datasets/serialize.py`, so this runs even if you only touch the datasets module - -**Fix:** Lazy-load behind a function with `@lru_cache`, add timeout/fallback. - ---- - -## 3. CORS allows all origins with credentials - -**File:** `main.py:12-18` - -```python -allow_origins=["*"], -allow_credentials=True, -``` - -`allow_origins=["*"]` + `allow_credentials=True` is the one CORS combination that's explicitly dangerous -- it enables cross-origin credential theft. Browsers actually block this combo for `Access-Control-Allow-Credentials`, but misconfigured CORS middleware may reflect the `Origin` header instead of sending `*`, which **does** work and is exploitable. - -**Fix:** Drop `allow_credentials=True` since cookies/auth headers aren't needed cross-origin. - ---- - -## 4. Dynamic import from YAML config - -**File:** `datasets/cache.py:166-172` - -```python -def _get_dynamic_function(full_path: str) -> Callable[..., Any]: - module = importlib.import_module(module_path) - return getattr(module, function_name) -``` - -The function path comes from YAML files in `datasets/registry/`. If those files are ever editable by users or loaded from an untrusted source, this is arbitrary code execution. - -Currently low-risk since the YAML is checked into the repo, but the pattern is fragile. - -**Fix:** Use an explicit allowlist mapping instead of dynamic import. - ---- - -## 5. Hardcoded localhost URL for Prefect tasks - -**File:** `prefect_flows/tasks.py:17` - -```python -OGCAPI_BASE_URL = "http://localhost:8000/ogcapi" -``` - -Prefect tasks call back to the API over HTTP using a hardcoded localhost URL. This breaks in any deployment where Prefect workers run outside the API container. - -**Fix:** Read from environment variable with sensible default. - ---- - -## 6. `assert` used for runtime validation - -**File:** `datasets/serialize.py:41,70` - -```python -assert len(temp_df[time_dim].unique()) == 1 -assert len(temp_ds[time_dim].values) == 1 -``` - -These are stripped when Python runs with `-O`. If these invariants matter (and they do -- the preview functions assume a single timestep), they should be real exceptions. - -**Fix:** Replace with `if ... : raise ValueError(...)`. - ---- - -## 7. Silent data loss on YAML parse errors - -**File:** `datasets/registry.py:28` - -```python -except Exception: - logger.exception("Error loading %s", file_path.name) -``` - -If a YAML config file is corrupted or has a syntax error, the datasets in that file silently disappear from the registry. The bare `Exception` catch also swallows unexpected errors like `TypeError` from malformed data. - -**Fix:** Catch `(yaml.YAMLError, OSError)` specifically, and consider failing loudly on startup. - ---- - -## 8. Exception chain suppressed with `from None` - -**File:** `routers/ogcapi/plugins/processes/chirps3_dhis2_pipeline.py:523-524` - -```python -except Exception as e: - raise ProcessorExecuteError(str(e)) from None -``` - -`from None` suppresses the original traceback. When this pipeline fails for unexpected reasons, you'll get a flat error message with no stack trace to debug from. - -**Fix:** Use `from e` instead of `from None`. - ---- - -## 9. Registry rebuilt from disk on every request - -**File:** `datasets/registry.py:34-37` - -```python -def get_dataset(dataset_id: str) -> dict[str, Any] | None: - datasets_lookup = {d["id"]: d for d in list_datasets()} # reads all YAML files - return datasets_lookup.get(dataset_id) -``` - -Every call to `get_dataset()` reads and parses all YAML files from disk and rebuilds the lookup dict. This runs on every API request that involves a dataset. - -**Fix:** Cache with `@lru_cache` or build the lookup once at startup. - ---- - -## 10. Temp file leak on write failure - -**File:** `datasets/serialize.py:87-93` - -```python -fd = tempfile.NamedTemporaryFile(suffix=".nc", delete=False) -path = fd.name -fd.close() -ds.to_netcdf(path) # if this fails, orphaned file in /tmp -return path -``` - -If `to_netcdf()` raises, the temp file is never cleaned up. Not catastrophic since it's in `/tmp`, but worth a try/except that removes the file on failure. - ---- - -## Summary by severity - -| # | Issue | Impact | -|---|-------|--------| -| 1 | SSRF + path traversal | Security: can read local files, make internal requests | -| 2 | Module-level DHIS2 call | Reliability: app won't start if DHIS2 is down | -| 3 | CORS misconfiguration | Security: potential cross-origin credential leak | -| 4 | Dynamic import from YAML | Security: fragile pattern, code exec if YAML untrusted | -| 5 | Hardcoded localhost URL | Deployment: breaks non-local Prefect setups | -| 6 | assert for validation | Correctness: stripped with `-O` flag | -| 7 | Silent YAML error swallowing | Reliability: datasets silently disappear | -| 8 | Exception chain suppressed | Debuggability: no stack trace on failures | -| 9 | Registry rebuilt per request | Performance: unnecessary disk I/O | -| 10 | Temp file leak | Cleanliness: orphaned files on failure |