Skip to content

refactor: tier-3 medium & low findings (config hardening, validation, cleanups)#63

Open
williaby wants to merge 1 commit into
claude/tier-2-concurrencyfrom
claude/tier-3-findings
Open

refactor: tier-3 medium & low findings (config hardening, validation, cleanups)#63
williaby wants to merge 1 commit into
claude/tier-2-concurrencyfrom
claude/tier-3-findings

Conversation

@williaby
Copy link
Copy Markdown
Contributor

@williaby williaby commented May 30, 2026

Summary

Tier 3 of the architecture review — the medium and low findings. Stacked on #62 (base is the tier-2 branch), so this diff shows only tier-3 changes. Retarget to main once #61/#62 merge.

Low findings

  • L1 Remove dead utils/financial.py stub (+ its dead ruff ignore).
  • L2 Export all routers (batch, health, ingest, user) from rag_processor.api and fix __all__.
  • L5 Readiness 503 body conforms to the declared ReadinessStatus model.
  • L6 Document why Sentry before_send drops KeyboardInterrupt/SystemExit.

Medium findings

  • M3 PDF classification thresholds moved to settings (configurable, per-instance overridable; removes 50/10/200 magic numbers).
  • M4 Shared ensure_batch_owned helper (dedupes the 404-as-403 / opaque-logging logic across get_batch/get_job).
  • M5 Real Redis readiness check (check_redis PINGs via a worker thread), gated by readiness_require_redis so it's truthful but only fails the probe when Redis is declared required.
  • M6 Strict pipeline-config loading: raise ConfigurationError on a missing file instead of silently falling back to localhost defaults (opt-in; default behavior unchanged + clearer warning).
  • M7 Inject FileRouter via a get_file_router dependency instead of a module-level singleton (enables dependency_overrides).

Deliberately deferred (with rationale)

  • L3 / L4 / M12 — defaults are sensible / low-value; can be a follow-up tier.
  • M8 JWKS backoff/circuit-breaker — the stampede-preventing lock already landed in refactor: unify Redis access, wire ingest pipeline, consolidate auth #61; a full breaker is larger.
  • M9 orjson — adds a dependency for no measured benefit.
  • M10 Broad except — remaining ones already justified inline.
  • M11 Event-replay ordering on trimmed history — low-value edge case.

Verification (final commit 4cc8d63)

  • 473 passed, 1 skipped
  • ruff check + format: clean
  • basedpyright: 0 errors
  • coverage: 92.83% (gate 80%)

New tests: test_routing_config (M3 + M7), test_health_readiness (M5 + L5), ensure_batch_owned coverage (M4), pipeline strict-mode tests (M6).

Process note: this branch was rebuilt + force-pushed once during development to recover from a partially-applied edit batch, and intermediate commits were briefly red (a stale non-editable venv install masked the real source, plus a couple of edits that didn't land). Both are resolved (editable reinstall + the fixes above); the final HEAD is verified green against the real src/.

https://claude.ai/code/session_01PA6dtgMhfzSe22VVtqBfxE

Copilot AI review requested due to automatic review settings May 30, 2026 03:39
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • master
  • develop

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7d9d9c3d-018c-4036-98c9-eaaf9c1f8533

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/tier-3-findings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Tier-3 cleanup PR that lands the first three "low" findings from the architecture review: removes a dead stub module (and its associated ruff ignore), exports all API routers from the rag_processor.api package, and adds an explanatory comment to Sentry's before_send filter.

Changes:

  • Delete src/rag_processor/utils/financial.py and drop the now-obsolete per-file ruff ignore in pyproject.toml.
  • Re-export batch_router and user_router from rag_processor.api and update __all__.
  • Replace the terse "Example:" comment in before_send_hook with a rationale explaining why KeyboardInterrupt / SystemExit are dropped.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/rag_processor/utils/financial.py Removed dead stub module (L1).
pyproject.toml Removed the ruff per-file ignore tied to the deleted stub.
src/rag_processor/api/init.py Added batch_router and user_router exports; refreshed __all__ (L2).
src/rag_processor/core/sentry.py Documented why KeyboardInterrupt/SystemExit are filtered (L6).

@williaby williaby force-pushed the claude/tier-3-findings branch from 36a49d4 to b75fda3 Compare May 30, 2026 04:14
Addresses the medium and low findings from the repository architecture
review (stacked on the tier-2 work).

Medium:
- M3: PDF classification thresholds (scanned chars/page and the scanned/
  digital high-confidence cutoffs) move to Settings; FileClassifier reads them
  by default and accepts per-instance overrides. Removes the hardcoded
  50/10/200 magic numbers.
- M4: extract ensure_batch_owned() in auth.dependencies; get_batch/get_job
  delegate to it, so the "missing-or-not-owned -> 404 (never 403), log opaque
  IDs only" behavior lives in one place.
- M5: replace the always-pass placeholder readiness checks with a real
  check_redis() that PINGs Redis off the event loop. The check always reports
  the true status; it only fails the /health/ready probe when the new
  readiness_require_redis setting is enabled, so CI/dev without Redis still
  report ready while staying truthful in the body.
- M6: load_pipeline_config gains an opt-in strict mode that raises
  ConfigurationError on a missing config file instead of silently falling back
  to the built-in localhost defaults; the lenient warning now states the
  defaults are not production-safe.
- M7: ingest_files resolves its FileRouter via a get_file_router dependency
  (shared singleton) instead of a module-level global, enabling
  app.dependency_overrides in tests.

Low:
- L1: remove the unused utils/financial.py stub and its dead ruff ignore.
- L2: export the batch/health/ingest/user routers from rag_processor.api and
  complete __all__.
- L5: the readiness 503 body now conforms to the declared ReadinessStatus
  schema instead of an ad-hoc dict.
- L6: document why Sentry before_send drops KeyboardInterrupt/SystemExit.

Tests: add test_routing_config (M3 thresholds + M7 DI), test_health_readiness
(M5 check + L5 schema), ensure_batch_owned coverage (M4), and pipeline
strict-mode tests (M6).

https://claude.ai/code/session_01PA6dtgMhfzSe22VVtqBfxE
@williaby williaby force-pushed the claude/tier-3-findings branch from 5dffd65 to 3899270 Compare May 30, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants