Skip to content

Release v0.8.0: security hardening and database mirrors#252

Merged
neuromechanist merged 10 commits intomainfrom
develop
Mar 9, 2026
Merged

Release v0.8.0: security hardening and database mirrors#252
neuromechanist merged 10 commits intomainfrom
develop

Conversation

@neuromechanist
Copy link
Member

@neuromechanist neuromechanist commented Mar 8, 2026

Summary

Changes

Security and Error Handling

  • Enhanced logging to prevent sensitive data exposure (SecureFormatter with API key redaction)
  • Added cost protection guards with warn/block thresholds for platform API keys
  • SSRF mitigation for URL fetching, stricter model validation
  • Fixed bare except Exception in mirror middleware with specific error handlers
  • Fixed path traversal vulnerability in _get_production_db_path (defense-in-depth)
  • Added CorruptMirrorError to distinguish corrupt metadata from not-found
  • Let OSError propagate from delete_mirror instead of masking as "not found"
  • Fixed SecureJSONFormatter fallback path to redact API keys
  • run_sync_now raises ValueError for unknown sync types instead of silent empty return

Mirror System

  • New /mirrors/ API router for CRUD mirror management
  • osa mirror CLI subcommand (create, list, info, delete, refresh, sync, pull)
  • src/knowledge/mirror.py module for ephemeral local copies of production databases
  • Fixed ContextVar propagation for Python 3.11 compatibility (copy_context + run_in_executor)
  • Added active_mirror_context context manager for safe mirror routing
  • Validate-on-set for set_active_mirror (fail fast on invalid IDs)
  • Error when refresh_mirror refreshes zero communities

Type Improvements

  • MirrorInfo uses datetime objects instead of strings with __post_init__ validation
  • ModelRate NamedTuple for self-documenting pricing entries (input_per_1m, output_per_1m)
  • Shared is_safe_identifier() utility (eliminates 5x duplicated validation pattern)
  • Public get_mirror_db_path() (stops cross-module import of private _get_mirror_dir)

Code Quality

  • Extracted CLI error handling into _handle_api_errors context manager (7 repetitions removed)
  • Removed unnecessary _get_user_id helper and duplicate _validate_mirror_id
  • Added cleanup failure tracking with consecutive failure counter
  • MirrorSyncResponse.items_synced uses Field(default_factory=dict)

Model Updates (2026-03)

  • Updated MODEL_PRICING with ~50 models from all providers (was ~16)
  • Added Claude 4.5/4.6, GPT-5.x, Gemini 3.x, DeepSeek V3.2, Qwen 3.5
  • Updated direct API model mappings: OPENAI_MODELS (GPT-5.x, 4.1, o3/o4), ANTHROPIC_MODELS (Claude 4.x/4.5/4.6)
  • Added Claude 4.6 to CACHEABLE_MODELS for prompt caching
  • Updated widget model selector with current top models

Test plan

  • All 48 mirror and cost protection tests passing
  • Full test suite: 1664 passed, 0 new failures
  • CI tests passing on develop
  • Docker build succeeding
  • Ruff lint and format clean
  • Verify mirror CLI commands work end-to-end on dev server
  • Verify widget model selector shows updated models

…248)

* Security hardening: logging, cost protection

- Wire up SecureFormatter in app startup (#65): call
  configure_secure_logging() before any logging occurs
- Add cost manipulation protection (#67): block models
  above $15/1M input tokens on platform/community keys,
  warn above $5/1M; BYOK users unrestricted
- Verified SSRF protection (#66) and model validation (#68)
  already have comprehensive test coverage

Closes #65, closes #66, closes #67, closes #68

* Address PR review findings

- Fix misleading "fallback rate" comment in _check_model_cost
- Add logging for unknown models (operator visibility)
- Extract _models_by_cost() test helper to reduce duplication
- Add boundary test at exact block threshold
- Add BYOK + unknown model test
- Assert BYOK guidance in error message
- Fix module docstring wording

* Fix SecureJSONFormatter broad exception catch

Split the catch-all Exception handler into specific expected
errors (ValueError, TypeError, KeyError) that include context
for debugging, and unexpected errors that re-raise after
printing to stderr. Matches the pattern already used in
SecureFormatter.format().
* Add ephemeral database mirrors for developer workflow

ContextVar-based DB routing lets developers work on isolated copies of
community SQLite databases via X-Mirror-ID header. Includes REST API,
CLI commands (osa mirror create/list/sync/pull), auto-cleanup scheduler,
and per-user rate limits. Replaces issue #219 (ephemeral backends).

Closes #219

* Address PR review: security, error handling, test coverage

- Fix path traversal: validate mirror_id in _get_mirror_dir
- Handle corrupt metadata gracefully (return None, don't crash)
- Use asyncio.to_thread for blocking sync in async endpoint
- Fix middleware to skip ContextVar for non-mirror requests
- Add community_id format validation on CreateMirrorRequest
- Use Literal type for sync_type, remove invalid 'discourse'
- Sanitize error messages in sync endpoint (no raw exceptions)
- Fix CLI pull to exit non-zero on partial download failure
- Add connection error handling to info/delete/refresh commands
- Use temp file + rename for download_mirror_db (atomic writes)
- Remove dead config settings (not wired to mirror module)
- Fix download endpoint to import from knowledge, not CLI
- Fix inaccurate docstrings (list_mirrors, create_mirror)
- Add 14 new tests: path traversal, corrupt metadata, cleanup
- Fix path traversal in _get_production_db_path (validate community_id)
- Replace bare except Exception in mirror middleware with specific handlers
- Fix ContextVar propagation for Python 3.11 (copy_context + run_in_executor)
- Add CorruptMirrorError to distinguish corrupt metadata from not-found
- Let OSError propagate from delete_mirror instead of returning False
- Error when refresh_mirror refreshes zero communities
- Add validate-on-set for set_active_mirror (fail fast on bad IDs)
- Add active_mirror_context context manager for safe mirror routing
- Use datetime objects in MirrorInfo instead of strings
- Extract shared is_safe_identifier and _validate_community_id
- Remove duplicate _validate_mirror_id from db.py (import from mirror.py)
- Add public get_mirror_db_path (stop importing private _get_mirror_dir)
- Use ModelRate NamedTuple for pricing entries (self-documenting fields)
- Upgrade unknown-model cost check from INFO to WARNING
- Fix SecureJSONFormatter fallback path to redact API keys
- Extract CLI error handling into _handle_api_errors context manager
- Remove unnecessary _get_user_id helper
- Update MODEL_PRICING with latest models (2026-03)
- Update widget model selector with latest models
- Add MirrorSyncResponse.items_synced default_factory
- Add cleanup failure tracking in _cleanup_mirrors
- run_sync_now raises ValueError for unknown sync types
- Fix cleanup to log instead of ignore_errors=True
- Update comment about ReDoS to accurate description
@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2026

Preview Deployment

Name Link
Preview URL https://develop-demo.osc.earth
Branch develop
Commit 0b9b022

This preview will be updated automatically when you push new commits.

Update OPENAI_MODELS (add GPT-5.x, GPT-4.1, o3/o4; remove GPT-3.5/4),
ANTHROPIC_MODELS (add Claude 4.x/4.5/4.6; remove old Claude 3 entries),
and CACHEABLE_MODELS (add Claude 4.6 Opus/Sonnet). Fix corresponding
tests for updated model names and pricing.
* Remove broken HED and EEGLAB doc entry, re-enable URL test

The HedAndEEGLAB document was removed from the upstream
hed-resources repository. Remove the dead documentation entry
from the HED community config and re-enable the URL accessibility
test that was skipped because of this broken link.

Closes #139

* Remove broken javascriptTests.json doc entry

Another upstream HED URL (hed-specification/tests/javascriptTests.json)
returns 404. Remove this dead entry as well.
* Improve MNE system prompt: data types and uncertainty handling

Add eye-tracking to supported modalities, add explicit uncertainty
handling instructions to prevent confabulation, and add the
eyetracking tutorial to documentation sources.

Closes #250

* Address review: soften uncertainty handling, update description

Soften uncertainty instructions to preserve helpfulness while
adding verification emphasis. Update the description field to
include eye-tracking for consistency with the system prompt.
* Refactor auth tests to use real communities, remove mocks

Replace MagicMock and @patch with real community configurations
loaded via discover_assistants(). Use monkeypatch for environment
variables (real Settings reads them). All test scenarios preserved.

Closes #85

* Address review: use dynamic config values, fixture for discovery

Derive test origins and model values from loaded community configs
instead of hardcoding them. Move discover_assistants() into a
module-scoped fixture for better error reporting. Add helper
functions for config access.
neuromechanist added a commit that referenced this pull request Mar 9, 2026
Error handling:
- Add CorruptMirrorError/ValueError handling to all mirror endpoints
- Block unknown models on platform/community keys (fail-closed)
- Add OSError handling to create_mirror_endpoint
- Make cleanup_expired_mirrors resilient to per-mirror failures
- Narrow scheduler cleanup catch to expected exception types
- Add field_validator to RefreshMirrorRequest.community_ids

Type design:
- Make MirrorInfo a frozen dataclass with tuple community_ids
- Move is_safe_identifier to src/core/validation.py (shared utility)
- Add non-negativity validation to MODEL_PRICING at import time
- Expand SecureFormatter key patterns for Anthropic/OpenAI keys

Code quality:
- Replace deprecated asyncio.get_event_loop() with get_running_loop()
- Fix ContextVar comment accuracy (request lifecycle, not per-task)
- Use get_active_mirror() instead of _active_mirror_id.get()
- Fix docstring inaccuracies (caching, asyncio, model names)

Tests:
- Add active_mirror_context tests (set/reset, exception safety)
- Add MirrorInfo invariant tests (empty ids, invalid id, immutability)
- Add serialization round-trip test
- Add TTL clamping test
- Add run_sync_now invalid sync_type test
- Update cost protection test for fail-closed behavior

Closes #256
* Address PR #252 review findings: error handling, types, tests

Error handling:
- Add CorruptMirrorError/ValueError handling to all mirror endpoints
- Block unknown models on platform/community keys (fail-closed)
- Add OSError handling to create_mirror_endpoint
- Make cleanup_expired_mirrors resilient to per-mirror failures
- Narrow scheduler cleanup catch to expected exception types
- Add field_validator to RefreshMirrorRequest.community_ids

Type design:
- Make MirrorInfo a frozen dataclass with tuple community_ids
- Move is_safe_identifier to src/core/validation.py (shared utility)
- Add non-negativity validation to MODEL_PRICING at import time
- Expand SecureFormatter key patterns for Anthropic/OpenAI keys

Code quality:
- Replace deprecated asyncio.get_event_loop() with get_running_loop()
- Fix ContextVar comment accuracy (request lifecycle, not per-task)
- Use get_active_mirror() instead of _active_mirror_id.get()
- Fix docstring inaccuracies (caching, asyncio, model names)

Tests:
- Add active_mirror_context tests (set/reset, exception safety)
- Add MirrorInfo invariant tests (empty ids, invalid id, immutability)
- Add serialization round-trip test
- Add TTL clamping test
- Add run_sync_now invalid sync_type test
- Update cost protection test for fail-closed behavior

Closes #256

* Use generic redaction placeholder, remove misleading __all__

- Change redaction string from "sk-or-v1-***[redacted]" to
  "***[key-redacted]" since the pattern now covers multiple providers
- Remove __all__ from mirror.py since no callers use wildcard imports
  from that module (is_safe_identifier now lives in core.validation)

* Add ValueError catch to delete endpoint, validate community IDs

- Add missing ValueError handling in delete_mirror_endpoint for
  consistency with all other mirror endpoints
- Add community ID validation in MirrorInfo.__post_init__ so corrupt
  metadata with path-traversal community IDs is caught at load time
- Document CorruptMirrorError in refresh_mirror docstring
@neuromechanist neuromechanist merged commit ccb5905 into main Mar 9, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant