🔒 [security fix: replace ast.literal_eval with json.loads]#2611
🔒 [security fix: replace ast.literal_eval with json.loads]#2611SatoryKono wants to merge 4 commits intomainfrom
Conversation
🎯 **What:** Replaced unsafe `ast.literal_eval` with `json.loads` for parsing composite metadata strings in `src/bioetl/domain/services/composite_metadata_helpers.py`. Updated producers in `src/bioetl/application/composite/merger_metrics_mixin.py` to use `json.dumps` for serialization.⚠️ **Risk:** `ast.literal_eval`, while safer than `eval`, can still be abused and is slower for parsing JSON-compatible data. It could potentially lead to performance issues or unexpected behavior when handling untrusted string payloads. **Note:** This change is backward-incompatible with metadata previously serialized using Python `str()` (single quotes). 🛡️ **Solution:** Switched to standard `json` module for both serialization and deserialization of metadata columns, ensuring a safer and more standard approach to handling structured string data. Co-authored-by: SatoryKono <13055362+SatoryKono@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughComposite metadata parsing now prefers JSON and falls back to legacy Python-literal parsing; lineage fields are stored as JSON strings in the merger DataFrame; new unit tests cover parsing behavior; CI pip-audit is run against an exported requirements file; several generated/test artifact files were removed. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant Parser as composite_metadata_helpers._parse_literal
participant Merger as merger_metrics_mixin._add_lineage
participant DF as DataFrame Storage
Caller->>Parser: provide metadata payload (string/object)
alt payload is string
Parser->>Parser: try json.loads(payload)
alt json.loads succeeds
Parser-->>Caller: return parsed structure
else json.loads fails (ValueError)
Parser->>Parser: fallback ast.literal_eval
alt fallback succeeds
Parser-->>Caller: return parsed structure
else fallback fails or MemoryError
Parser-->>Caller: return None
end
end
else payload is non-string/object
Parser-->>Caller: return input or None
end
Caller->>Merger: call _add_lineage(parsed structure)
Merger->>Merger: build status_dict and sources_used
Merger->>Merger: json.dumps(status_dict) and json.dumps(sources_used)
Merger->>DF: write JSON strings into `_enrichment_status` and `_source_providers` columns
Merger-->>Caller: lineage recorded
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1b053ccd-cee3-4ec7-aa17-2378bedeb5ee
📒 Files selected for processing (3)
src/bioetl/application/composite/merger_metrics_mixin.pysrc/bioetl/domain/services/composite_metadata_helpers.pytests/unit/domain/services/test_composite_metadata_helpers.py
🎯 **What:** - Replaced unsafe `ast.literal_eval` with `json.loads` for parsing composite metadata strings in `src/bioetl/domain/services/composite_metadata_helpers.py`. - Updated producers in `src/bioetl/application/composite/merger_metrics_mixin.py` to use `json.dumps` for serialization. - Added comprehensive unit tests for metadata parsing. - Fixed CI `root-hygiene` by removing untracked files from the repository root. - Fixed CI `pip-audit` by correctly exporting requirements with hashes before auditing.⚠️ **Risk:** `ast.literal_eval`, while safer than `eval`, can still be abused and is slower for parsing JSON-compatible data. This change is backward-incompatible with metadata previously serialized using Python `str()` (single quotes). 🛡️ **Solution:** Switched to standard `json` module for both serialization and deserialization of metadata columns, ensuring a safer and more standard approach to handling structured string data. Cleaned up the workspace and fixed CI configuration issues. Co-authored-by: SatoryKono <13055362+SatoryKono@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/security.yml (2)
12-13: Workflow changes won't self-test due topaths-ignore.The
paths-ignoreincludes.github/workflows/**, meaning changes to this security workflow file itself won't trigger the workflow. While this prevents recursive triggers, it also means workflow changes aren't validated until merged. Consider removing the workflow path exclusion or using a separate trigger strategy for workflow files.Also applies to: 20-21
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/security.yml around lines 12 - 13, The workflow's paths-ignore currently excludes the pattern '.github/workflows/**' so changes to the workflow won't trigger its own runs; update the workflow configuration by removing that exclusion from the paths-ignore list (or replace it with a safer strategy such as adding an explicit workflow_dispatch trigger or moving workflow self-tests into a separate workflow that does not ignore '.github/workflows/**') so edits to the workflow file will be tested; look for the 'paths-ignore' key and the string '.github/workflows/**' in the workflow definition and modify accordingly.
59-60: CI/local divergence: Makefilesecuritytarget doesn't use hash verification.The CI now runs
pip-audit -r requirements-audit.txt --require-hashes, but the Makefile'ssecuritytarget (line 266) runs$(RUN) pip-audit --skip-editablewithout the requirements file or hash verification. Developers runningmake securitylocally won't catch hash validation failures that CI will catch.Consider updating the Makefile target to align with CI:
security: ## Run security audit (osv-scanner + pip-audit) `@echo` "$(BLUE)Running security audit...$(NC)" `@echo` "$(BLUE)Running osv-scanner (primary, supports uv.lock)...$(NC)" `@which` osv-scanner >/dev/null 2>&1 && osv-scanner scan . || echo "$(YELLOW)osv-scanner not installed. Install from: https://github.com/google/osv-scanner$(NC)" `@echo` "$(BLUE)Running pip-audit (secondary)...$(NC)" $(RUN) uv export --format requirements-txt | grep -v '^-e \.' > requirements-audit.txt $(RUN) pip-audit -r requirements-audit.txt --require-hashes --disable-pip --strict🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/security.yml around lines 59 - 60, The Makefile's security target (target name "security", currently calling "$(RUN) pip-audit --skip-editable") diverges from CI which runs pip-audit with an exported requirements file and hash verification; update the "security" target to export a requirements-audit.txt (use "uv export --format requirements-txt | grep -v '^-e \.' > requirements-audit.txt" or equivalent) and then run pip-audit against that file with the same flags CI uses: include "-r requirements-audit.txt --require-hashes --disable-pip --strict" so local "make security" matches the CI behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/security.yml:
- Around line 56-57: The step running the pipeline command `uv export --format
requirements-txt | grep -v '^-e \.' > requirements-audit.txt` can silently
produce an empty requirements-audit.txt if `uv export` fails; update the
workflow run step to enable strict pipe failure handling (e.g. set -o pipefail)
or otherwise ensure the shell fails when `uv export` fails so the job stops and
does not create an empty file—modify the run wrapper for that command (the line
invoking `uv export --format requirements-txt | grep -v '^-e \.' >
requirements-audit.txt`) to enable pipefail or add explicit error checking so
failures are surfaced.
---
Nitpick comments:
In @.github/workflows/security.yml:
- Around line 12-13: The workflow's paths-ignore currently excludes the pattern
'.github/workflows/**' so changes to the workflow won't trigger its own runs;
update the workflow configuration by removing that exclusion from the
paths-ignore list (or replace it with a safer strategy such as adding an
explicit workflow_dispatch trigger or moving workflow self-tests into a separate
workflow that does not ignore '.github/workflows/**') so edits to the workflow
file will be tested; look for the 'paths-ignore' key and the string
'.github/workflows/**' in the workflow definition and modify accordingly.
- Around line 59-60: The Makefile's security target (target name "security",
currently calling "$(RUN) pip-audit --skip-editable") diverges from CI which
runs pip-audit with an exported requirements file and hash verification; update
the "security" target to export a requirements-audit.txt (use "uv export
--format requirements-txt | grep -v '^-e \.' > requirements-audit.txt" or
equivalent) and then run pip-audit against that file with the same flags CI
uses: include "-r requirements-audit.txt --require-hashes --disable-pip
--strict" so local "make security" matches the CI behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5f10f59c-f107-416a-b5c3-90b5d1004a0e
📒 Files selected for processing (2)
.github/workflows/security.ymlrequirements-audit.txt
| - name: Export requirements for audit | ||
| run: uv export --format requirements-txt | grep -v '^-e \.' > requirements-audit.txt |
There was a problem hiding this comment.
Add error handling to prevent silent failures.
If uv export fails, the pipeline continues because grep returns 0 on empty input, producing an empty requirements-audit.txt. This could cause pip-audit to pass silently without actually auditing any dependencies.
Proposed fix using `set -o pipefail`
- name: Export requirements for audit
- run: uv export --format requirements-txt | grep -v '^-e \.' > requirements-audit.txt
+ run: |
+ set -o pipefail
+ uv export --format requirements-txt | grep -v '^-e \.' > requirements-audit.txt📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Export requirements for audit | |
| run: uv export --format requirements-txt | grep -v '^-e \.' > requirements-audit.txt | |
| - name: Export requirements for audit | |
| run: | | |
| set -o pipefail | |
| uv export --format requirements-txt | grep -v '^-e \.' > requirements-audit.txt |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/security.yml around lines 56 - 57, The step running the
pipeline command `uv export --format requirements-txt | grep -v '^-e \.' >
requirements-audit.txt` can silently produce an empty requirements-audit.txt if
`uv export` fails; update the workflow run step to enable strict pipe failure
handling (e.g. set -o pipefail) or otherwise ensure the shell fails when `uv
export` fails so the job stops and does not create an empty file—modify the run
wrapper for that command (the line invoking `uv export --format requirements-txt
| grep -v '^-e \.' > requirements-audit.txt`) to enable pipefail or add explicit
error checking so failures are surfaced.
Co-authored-by: SatoryKono <13055362+SatoryKono@users.noreply.github.com>
…d CI hygiene Co-authored-by: SatoryKono <13055362+SatoryKono@users.noreply.github.com>
🎯 What: Replaced unsafe
ast.literal_evalwithjson.loadsfor parsing composite metadata strings insrc/bioetl/domain/services/composite_metadata_helpers.py. Updated producers insrc/bioetl/application/composite/merger_metrics_mixin.pyto usejson.dumpsfor serialization.ast.literal_eval, while safer thaneval, can still be abused and is slower for parsing JSON-compatible data. It could potentially lead to performance issues or unexpected behavior when handling untrusted string payloads. Note: This change is backward-incompatible with metadata previously serialized using Pythonstr()(single quotes).🛡️ Solution: Switched to standard
jsonmodule for both serialization and deserialization of metadata columns, ensuring a safer and more standard approach to handling structured string data.PR created automatically by Jules for task 17293590732592342983 started by @SatoryKono
Summary by CodeRabbit