Release v0.8.0: security hardening and database mirrors by neuromechanist · Pull Request #252 · OpenScience-Collective/osa

neuromechanist · 2026-03-08T02:05:41Z

Summary

Security hardening: logging sanitization, cost protection limits, SSRF prevention, model validation (Security hardening: logging, cost protection, SSRF, model validation #248)
Ephemeral database mirrors for developer workflow: new API routes, CLI commands, and knowledge mirror module (Add ephemeral database mirrors for developer workflow #251)
Comprehensive PR review fixes: hardened error handling, type safety, updated model pricing
Version bump to 0.8.0

Changes

Security and Error Handling

Enhanced logging to prevent sensitive data exposure (SecureFormatter with API key redaction)
Added cost protection guards with warn/block thresholds for platform API keys
SSRF mitigation for URL fetching, stricter model validation
Fixed bare except Exception in mirror middleware with specific error handlers
Fixed path traversal vulnerability in _get_production_db_path (defense-in-depth)
Added CorruptMirrorError to distinguish corrupt metadata from not-found
Let OSError propagate from delete_mirror instead of masking as "not found"
Fixed SecureJSONFormatter fallback path to redact API keys
run_sync_now raises ValueError for unknown sync types instead of silent empty return

Mirror System

New /mirrors/ API router for CRUD mirror management
osa mirror CLI subcommand (create, list, info, delete, refresh, sync, pull)
src/knowledge/mirror.py module for ephemeral local copies of production databases
Fixed ContextVar propagation for Python 3.11 compatibility (copy_context + run_in_executor)
Added active_mirror_context context manager for safe mirror routing
Validate-on-set for set_active_mirror (fail fast on invalid IDs)
Error when refresh_mirror refreshes zero communities

Type Improvements

MirrorInfo uses datetime objects instead of strings with __post_init__ validation
ModelRate NamedTuple for self-documenting pricing entries (input_per_1m, output_per_1m)
Shared is_safe_identifier() utility (eliminates 5x duplicated validation pattern)
Public get_mirror_db_path() (stops cross-module import of private _get_mirror_dir)

Code Quality

Extracted CLI error handling into _handle_api_errors context manager (7 repetitions removed)
Removed unnecessary _get_user_id helper and duplicate _validate_mirror_id
Added cleanup failure tracking with consecutive failure counter
MirrorSyncResponse.items_synced uses Field(default_factory=dict)

Model Updates (2026-03)

Updated MODEL_PRICING with ~50 models from all providers (was ~16)
Added Claude 4.5/4.6, GPT-5.x, Gemini 3.x, DeepSeek V3.2, Qwen 3.5
Updated direct API model mappings: OPENAI_MODELS (GPT-5.x, 4.1, o3/o4), ANTHROPIC_MODELS (Claude 4.x/4.5/4.6)
Added Claude 4.6 to CACHEABLE_MODELS for prompt caching
Updated widget model selector with current top models

Test plan

All 48 mirror and cost protection tests passing
Full test suite: 1664 passed, 0 new failures
CI tests passing on develop
Docker build succeeding
Ruff lint and format clean
Verify mirror CLI commands work end-to-end on dev server
Verify widget model selector shows updated models

…248) * Security hardening: logging, cost protection - Wire up SecureFormatter in app startup (#65): call configure_secure_logging() before any logging occurs - Add cost manipulation protection (#67): block models above $15/1M input tokens on platform/community keys, warn above $5/1M; BYOK users unrestricted - Verified SSRF protection (#66) and model validation (#68) already have comprehensive test coverage Closes #65, closes #66, closes #67, closes #68 * Address PR review findings - Fix misleading "fallback rate" comment in _check_model_cost - Add logging for unknown models (operator visibility) - Extract _models_by_cost() test helper to reduce duplication - Add boundary test at exact block threshold - Add BYOK + unknown model test - Assert BYOK guidance in error message - Fix module docstring wording * Fix SecureJSONFormatter broad exception catch Split the catch-all Exception handler into specific expected errors (ValueError, TypeError, KeyError) that include context for debugging, and unexpected errors that re-raise after printing to stderr. Matches the pattern already used in SecureFormatter.format().

* Add ephemeral database mirrors for developer workflow ContextVar-based DB routing lets developers work on isolated copies of community SQLite databases via X-Mirror-ID header. Includes REST API, CLI commands (osa mirror create/list/sync/pull), auto-cleanup scheduler, and per-user rate limits. Replaces issue #219 (ephemeral backends). Closes #219 * Address PR review: security, error handling, test coverage - Fix path traversal: validate mirror_id in _get_mirror_dir - Handle corrupt metadata gracefully (return None, don't crash) - Use asyncio.to_thread for blocking sync in async endpoint - Fix middleware to skip ContextVar for non-mirror requests - Add community_id format validation on CreateMirrorRequest - Use Literal type for sync_type, remove invalid 'discourse' - Sanitize error messages in sync endpoint (no raw exceptions) - Fix CLI pull to exit non-zero on partial download failure - Add connection error handling to info/delete/refresh commands - Use temp file + rename for download_mirror_db (atomic writes) - Remove dead config settings (not wired to mirror module) - Fix download endpoint to import from knowledge, not CLI - Fix inaccurate docstrings (list_mirrors, create_mirror) - Add 14 new tests: path traversal, corrupt metadata, cleanup

- Fix path traversal in _get_production_db_path (validate community_id) - Replace bare except Exception in mirror middleware with specific handlers - Fix ContextVar propagation for Python 3.11 (copy_context + run_in_executor) - Add CorruptMirrorError to distinguish corrupt metadata from not-found - Let OSError propagate from delete_mirror instead of returning False - Error when refresh_mirror refreshes zero communities - Add validate-on-set for set_active_mirror (fail fast on bad IDs) - Add active_mirror_context context manager for safe mirror routing - Use datetime objects in MirrorInfo instead of strings - Extract shared is_safe_identifier and _validate_community_id - Remove duplicate _validate_mirror_id from db.py (import from mirror.py) - Add public get_mirror_db_path (stop importing private _get_mirror_dir) - Use ModelRate NamedTuple for pricing entries (self-documenting fields) - Upgrade unknown-model cost check from INFO to WARNING - Fix SecureJSONFormatter fallback path to redact API keys - Extract CLI error handling into _handle_api_errors context manager - Remove unnecessary _get_user_id helper - Update MODEL_PRICING with latest models (2026-03) - Update widget model selector with latest models - Add MirrorSyncResponse.items_synced default_factory - Add cleanup failure tracking in _cleanup_mirrors - run_sync_now raises ValueError for unknown sync types - Fix cleanup to log instead of ignore_errors=True - Update comment about ReDoS to accurate description

github-actions · 2026-03-08T02:23:08Z

Preview Deployment

Name	Link
Preview URL	https://develop-demo.osc.earth
Branch	`develop`
Commit	`0b9b022`

This preview will be updated automatically when you push new commits.

Update OPENAI_MODELS (add GPT-5.x, GPT-4.1, o3/o4; remove GPT-3.5/4), ANTHROPIC_MODELS (add Claude 4.x/4.5/4.6; remove old Claude 3 entries), and CACHEABLE_MODELS (add Claude 4.6 Opus/Sonnet). Fix corresponding tests for updated model names and pricing.

* Remove broken HED and EEGLAB doc entry, re-enable URL test The HedAndEEGLAB document was removed from the upstream hed-resources repository. Remove the dead documentation entry from the HED community config and re-enable the URL accessibility test that was skipped because of this broken link. Closes #139 * Remove broken javascriptTests.json doc entry Another upstream HED URL (hed-specification/tests/javascriptTests.json) returns 404. Remove this dead entry as well.

* Improve MNE system prompt: data types and uncertainty handling Add eye-tracking to supported modalities, add explicit uncertainty handling instructions to prevent confabulation, and add the eyetracking tutorial to documentation sources. Closes #250 * Address review: soften uncertainty handling, update description Soften uncertainty instructions to preserve helpfulness while adding verification emphasis. Update the description field to include eye-tracking for consistency with the system prompt.

@patch

* Refactor auth tests to use real communities, remove mocks Replace MagicMock and @patch with real community configurations loaded via discover_assistants(). Use monkeypatch for environment variables (real Settings reads them). All test scenarios preserved. Closes #85 * Address review: use dynamic config values, fixture for discovery Derive test origins and model values from loaded community configs instead of hardcoding them. Move discover_assistants() into a module-scoped fixture for better error reporting. Add helper functions for config access.

Error handling: - Add CorruptMirrorError/ValueError handling to all mirror endpoints - Block unknown models on platform/community keys (fail-closed) - Add OSError handling to create_mirror_endpoint - Make cleanup_expired_mirrors resilient to per-mirror failures - Narrow scheduler cleanup catch to expected exception types - Add field_validator to RefreshMirrorRequest.community_ids Type design: - Make MirrorInfo a frozen dataclass with tuple community_ids - Move is_safe_identifier to src/core/validation.py (shared utility) - Add non-negativity validation to MODEL_PRICING at import time - Expand SecureFormatter key patterns for Anthropic/OpenAI keys Code quality: - Replace deprecated asyncio.get_event_loop() with get_running_loop() - Fix ContextVar comment accuracy (request lifecycle, not per-task) - Use get_active_mirror() instead of _active_mirror_id.get() - Fix docstring inaccuracies (caching, asyncio, model names) Tests: - Add active_mirror_context tests (set/reset, exception safety) - Add MirrorInfo invariant tests (empty ids, invalid id, immutability) - Add serialization round-trip test - Add TTL clamping test - Add run_sync_now invalid sync_type test - Update cost protection test for fail-closed behavior Closes #256

* Address PR #252 review findings: error handling, types, tests Error handling: - Add CorruptMirrorError/ValueError handling to all mirror endpoints - Block unknown models on platform/community keys (fail-closed) - Add OSError handling to create_mirror_endpoint - Make cleanup_expired_mirrors resilient to per-mirror failures - Narrow scheduler cleanup catch to expected exception types - Add field_validator to RefreshMirrorRequest.community_ids Type design: - Make MirrorInfo a frozen dataclass with tuple community_ids - Move is_safe_identifier to src/core/validation.py (shared utility) - Add non-negativity validation to MODEL_PRICING at import time - Expand SecureFormatter key patterns for Anthropic/OpenAI keys Code quality: - Replace deprecated asyncio.get_event_loop() with get_running_loop() - Fix ContextVar comment accuracy (request lifecycle, not per-task) - Use get_active_mirror() instead of _active_mirror_id.get() - Fix docstring inaccuracies (caching, asyncio, model names) Tests: - Add active_mirror_context tests (set/reset, exception safety) - Add MirrorInfo invariant tests (empty ids, invalid id, immutability) - Add serialization round-trip test - Add TTL clamping test - Add run_sync_now invalid sync_type test - Update cost protection test for fail-closed behavior Closes #256 * Use generic redaction placeholder, remove misleading __all__ - Change redaction string from "sk-or-v1-***[redacted]" to "***[key-redacted]" since the pattern now covers multiple providers - Remove __all__ from mirror.py since no callers use wildcard imports from that module (is_safe_identifier now lives in core.validation) * Add ValueError catch to delete endpoint, validate community IDs - Add missing ValueError handling in delete_mirror_endpoint for consistency with all other mirror endpoints - Add community ID validation in MirrorInfo.__post_init__ so corrupt metadata with path-traversal community IDs is caught at load time - Document CorruptMirrorError in refresh_mirror docstring

neuromechanist added 3 commits March 3, 2026 16:15

Bump version to 0.8.0 for release

45812cf

neuromechanist deployed to testpypi March 8, 2026 02:05 — with GitHub Actions View deployment

neuromechanist added 4 commits March 7, 2026 18:26

neuromechanist mentioned this pull request Mar 9, 2026

Address PR #252 review findings: error handling, tests, type design #256

Closed

20 tasks

neuromechanist mentioned this pull request Mar 9, 2026

Address PR #252 review findings: error handling, types, tests #257

Merged

4 tasks

neuromechanist added 2 commits March 9, 2026 13:31

Merge branch 'main' into develop

a2d8788

neuromechanist merged commit ccb5905 into main Mar 9, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.8.0: security hardening and database mirrors#252

Release v0.8.0: security hardening and database mirrors#252
neuromechanist merged 10 commits intomainfrom
develop

neuromechanist commented Mar 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neuromechanist commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Security and Error Handling

Mirror System

Type Improvements

Code Quality

Model Updates (2026-03)

Test plan

Uh oh!

github-actions bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview Deployment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neuromechanist commented Mar 8, 2026 •

edited

Loading

github-actions bot commented Mar 8, 2026 •

edited

Loading