Skip to content

Codebase cleanup: DRY, error handling, decomposition, docs#220

Open
tlongwell-block wants to merge 21 commits intomainfrom
quality-improvement
Open

Codebase cleanup: DRY, error handling, decomposition, docs#220
tlongwell-block wants to merge 21 commits intomainfrom
quality-improvement

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

@tlongwell-block tlongwell-block commented Apr 3, 2026

Why

General cleanup pass. The codebase grew quickly and accumulated some duplication, oversized functions, inconsistent error handling, and stale docs. Nothing was on fire, but it was getting harder to navigate.

What changed

110 files · +4,255 / −4,373 (net −118 lines) · 10 crates · 8 docs

Bug fixes

  • TOCTOU race in replace_addressable_event — advisory lock serializes concurrent calls; stale-replay guard rejects older events; IS NOT DISTINCT FROM for NULL channel_id; deterministic NIP-01 tiebreak (ORDER BY created_at DESC, id DESC)
  • split_filters empty-kinds inversionSome([]) now correctly matches nothing instead of everything
  • React ErrorBoundary — prevents blank white screen on render crash
  • Desktop memory leak — ancestor resolution set capped at 500 entries

DRY / decomposition

  • Shared relay protocolsprout-core::relay_protocol replaces ~985 lines of duplication across sprout-acp and sprout-mcp
  • ingest_event decomposed from 1,524 LOC into ~130-line pipeline + 22 named step functions
  • ChannelManagementSheet.tsx decomposed (762 → 300 LOC) into 3 section components
  • AppShell.tsx hooks extracted (useSearchSession, useAncestorResolution, useViewRouter)
  • MCP helperssign_and_send (17 call sites), parse_uuid (12 call sites)
  • Desktop utilities consolidated: truncatePubkey, formatRelativeTime, useBackendProviderProbe

Error handling

  • ApiError enum (10 variants) replaces raw (StatusCode, Json<Value>) tuples across 19 API handler files
  • SideEffectError (5 variants) replaces 79 anyhow uses in side_effects.rs
  • Internal/database errors log server-side, return generic messages to clients
  • All pre-existing API error codes and response shapes preserved (invalid_token, token_revoked, token_expired, invalid_auth, channel_not_permitted, insufficient_scope, retry_after_seconds, etc.)

Proxy hardening

  • DashMap + flush-on-full → moka LRU cache in shadow_keys
  • Channel UUID validation in admin endpoints

Docs

All 8 top-level docs updated — MySQL → Postgres, LOC counts, MCP tool counts, infrastructure ports, test counts, etc.

Desktop a11y

  • window.confirm()AlertDialog
  • Button-inside-button fix
  • aria-pressed, aria-hidden additions
  • Dead code removed (sidebarData.ts, UpdateChecker.tsx)

Test infrastructure

  • Shared helpers in tests/helpers/mod.rs
  • McpSession Drop impl for cleanup
  • Stderr readiness scan replaces 10s blind sleep

What's deferred

  • Port hardening features to sprout-mcp (keepalive, retry, since-filter)
  • Split side_effects.rs by event kind family
  • Add tests/e2e_proxy.rs
  • N+1 query fix: get_thread_summaries_bulk
  • Remove canvas column from channel queries

Test results

Suite Result
Unit tests 812 passed, 0 failed
e2e_rest_api 39/40 (1 pre-existing NIP-05 test-ordering issue)
e2e_relay 27/27
e2e_mcp 14/14
e2e_nostr_interop 15/15
e2e_tokens 20/20
e2e_media 7/7
e2e_media_extended 18/18
e2e_workflows 0/7 (pre-existing — seeded UUID dependency)

Desktop pnpm check: 0 errors.
Full workspace cargo check + cargo clippy -D warnings: clean.

Critical fixes:
- C1: Fix TOCTOU race in replace_addressable_event — advisory lock serializes
  concurrent calls, IS NOT DISTINCT FROM handles NULL channel_id, excludes
  incoming event ID from soft-delete, ON CONFLICT DO UPDATE un-deletes replayed
  events, RETURNING xmax for correct duplicate detection
- C2: Add React ErrorBoundary to desktop app (prevents blank white screen)
- C3: Align nostr version 0.37→0.36 (matches workspace pin, fixed 5 files)
- C4: Fix split_filters empty kinds inversion (Some([]) now matches nothing)
- C5: Fix VISION.md marking unimplemented approval gates as done
- H2: Add delete_channel to ALL_TOOLS (channel_admin, disabled by default)
- H13: Eliminate duplicate PgPool (Db::pool() accessor, relay reuses it)

Documentation reconciliation (10 files):
- MySQL→Postgres across all docs (syntax, ports, references)
- ARCHITECTURE.md: LOC counts, MCP tools 43→44, constants, known limitations
- README.md: missing crates, NIPs, env vars
- TESTING.md: test counts, missing test files
- SECURITY.md: HMAC→SHA-256 hash chain
- NOSTR.md: psql bytea syntax
- Proxy/ACP READMEs: false claims corrected

Crossfire reviewed by codex CLI (9/10) and opus reviewer.
Theme 1 — Shared relay protocol:
- Create sprout-core::relay_protocol with RelayMessage, parse_relay_message,
  relay_ws_to_http, pct_encode, TwoGenDedup (25 tests)

Theme 5 — Unified error strategy:
- Define ApiError enum (10 variants) with IntoResponse in sprout-relay
- Replace all raw (StatusCode, Json<Value>) tuples across 19 API handler files
- 148 relay tests pass

MCP cleanup:
- Extract sign_and_send helper (17 occurrences replaced)
- Extract parse_uuid helper (12 double-parse patterns eliminated)
- Remove validate_uuid (replaced by parse_uuid)

Workspace hygiene:
- Add [workspace.lints.clippy] unwrap_used = warn
- Fix sprout-workflow non-workspace deps (evalexpr, cron)
- Fix deny.toml multiple-versions = deny (24 skip entries)
- Make sprout-media axum dep optional behind feature flag
Relay protocol deduplication:
- sprout-acp: Remove local RelayMessage, TwoGenDedup, parse_relay_message,
  relay_ws_to_http, pct_encode — all imported from sprout-core::relay_protocol
- sprout-mcp: Remove local RelayMessage, OkResponse, parse_relay_message,
  relay_ws_to_http — all imported from sprout-core::relay_protocol
- sprout-test-client: Import directly from sprout-core instead of sprout-mcp
- Add OkResponse struct to relay_protocol (backward compat with MCP/test-client)
- Add From<RelayProtocolError> impls for RelayError and RelayClientError
- Replace manual TwoGenDedup in sprout-acp/main.rs with core's TwoGenDedup

Error handling:
- Define SideEffectError enum (Database, Json, PubSub, Nostr, Internal)
- Replace all 79 anyhow uses in side_effects.rs with typed errors
- 148 relay tests pass, 234 acp tests pass, 47 mcp tests pass

~985 lines of duplication eliminated.
Theme 2 — God function decomposition:
- Decompose ingest_event (1524 LOC, 22 steps) into named pub(crate) functions;
  ingest_event is now a ~130-line readable pipeline
- Extract AppShell.tsx hooks: useSearchSession, useAncestorResolution, useViewRouter
  (AppShell 819→680 LOC)
- Decompose ChannelManagementSheet.tsx (762→300 LOC) into ChannelDetailsSection,
  ChannelMembersSection, ChannelDangerZone + shared Section component

Theme 3 — DRY fixes:
- Fan-out loop (4×) → ConnectionManager::send_event_to_matches
- effective_message_author: delete duplicate in side_effects.rs + messages.rs
- row_to_channel_record: pub(crate) in channel.rs, imported in dm.rs
- parse_api_token_row: consolidated in api_token.rs
- query_feed_* aliases removed (3 pairs → 3 functions)
- channel_base_to_json extracted for shared channel serialization
- truncatePubkey (5×) → shared/lib/pubkey.ts
- formatRelativeTime + parseTimestamp (8 callers) → shared/lib/time.ts
- useBackendProviderProbe hook extracted (2 dialogs)
- evalexpr string helpers → sprout-core/evalexpr_helpers.rs

All tests pass: relay 148, acp 234, mcp 47, db 50, workflow 140.
Desktop pnpm check: 0 errors.
Desktop fixes:
- Replace window.confirm() with AlertDialog in AgentsView.tsx (3 occurrences)
- Fix TokenSettingsCard N parallel getChannelMembers → use ch.isMember directly
- Fix ForumPostCard button-inside-button (outer button → div[role=button])
- Add aria-pressed to scope toggle buttons in CreateAgentDialogSections.tsx
- Add aria-hidden to sticky day header in MessageTimeline.tsx
- Delete dead code: sidebarData.ts, UpdateChecker.tsx

Test infrastructure:
- Extract shared test helpers into tests/helpers/mod.rs (tiny_jpeg, sign_blossom_auth,
  blossom_auth_header, sub_id, create_test_channel, http_client)
- Add Drop impl to McpSession (kills orphaned child process on panic)
- Replace 10s blind sleep with stderr readiness scan + 30s timeout
- Fix stale '43 tools' comments → 44 in e2e_mcp.rs
- Fix e2e_workflows.rs seeded pubkey → self-contained ensure_user_exists()

All cargo check clean. Desktop pnpm check: 0 errors.
Proxy fixes:
- PROXY-H2: Channel UUID validation in admin endpoints — invalid UUIDs now
  return 400 with specific error listing invalid values (was silently dropped)
- PROXY-M8: Replace DashMap + flush-on-full with moka LRU cache in shadow_keys
  (eliminates thundering herd on cache eviction)

Workspace safety:
- Add #![deny(unsafe_code)] to sprout-media and sprout-cli

All cargo check clean. Proxy: 80/80 tests pass.
…text note

- ApiError::Internal and ApiError::Database now log the real error server-side
  and return a generic 'internal server error' to API clients (prevents leaking
  DB/serialization details over the REST API)
- Fix stale 'duplicate:' message in ingest_event → 'duplicate: event already exists'
- Add hashtext collision note to advisory lock comment in replace_addressable_event
…e ErrorBoundary

TOCTOU fix (replace_addressable_event):
- Only tombstone active row if incoming event is strictly newer (created_at
  comparison + event ID tiebreaker per NIP-01). Stale replays now no-op
  instead of reviving older events over newer ones.

relay_protocol:
- relay_ws_to_http: use strip_prefix instead of blind string replace
  (preserves path/query/fragment, won't corrupt ws:// in paths)
- TwoGenDedup::new: clamp limit to minimum of 2 (prevents degenerate behavior)
- Added tests: path-with-trailing-slash, non-ws scheme, limit clamping

Desktop ErrorBoundary:
- Show generic 'An unexpected error occurred' instead of raw error.message
  (prevents leaking internal details to users; error is still logged to console)
The unwrap_used=warn lint interacts with the pre-push hook's
'cargo clippy -- -D warnings' which promotes all warnings to errors.
With ~610 pre-existing unwrap() calls across the workspace, this
breaks the build. Deferred to a follow-up that addresses the unwrap
calls incrementally.
…code in helpers

- e2e_media_extended.rs: restore nostr::{EventBuilder, Kind, Tag, Timestamp} imports
  that were incorrectly removed during helpers extraction
- e2e_workflows.rs: shadow pubkey_hex as &str to prevent move-after-use errors
- helpers/mod.rs: add #![allow(dead_code)] since not every test file uses every helper
- Full workspace clippy --all-targets -D warnings passes clean
…ecific error codes

TOCTOU fix (replace_addressable_event):
- Remove RETURNING xmax (system columns not accessible on partitioned tables
  in Postgres 17) — use ON CONFLICT DO NOTHING + rows_affected() instead
- Stale-replay guard makes ON CONFLICT DO UPDATE unnecessary

ApiError response format:
- Standard variants use original single-field format: {"error": "human text"}
- Custom variant uses two-field format: {"error": "code", "message": "text"}
  for domain-specific codes (nip98_not_supported, scope_escalation, etc.)
- Unauthorized variant now carries a message string
- Internal/Database errors log server-side, return generic message

Token handler domain codes restored:
- nip98_not_supported, invalid_scopes, scope_escalation, already_revoked,
  rate_limited, not_found — all use ApiError::Custom for backward compat

Integration test results (on fresh DB):
- REST API: 40/40 ✅
- Relay: 27/27 ✅
- MCP: 14/14 ✅
- Media: 7/7 ✅
- Media extended: 18/18 ✅
- Nostr interop: 15/15 ✅
- Tokens: 17/20 (3 pre-existing failures on fresh DB — need seeded channel)
- Workflows: 0/7 (pre-existing — all tests depend on seeded CHANNEL_GENERAL UUID)
The test_api_error_helper_maps_status_codes test used
ApiError::Unauthorized("...".into()) inside matches!() which is
a pattern context, not an expression context. Use _ wildcard instead.
…n codes

All 8 remaining token error branches now use ApiError::Custom with explicit
domain codes instead of embedding codes in human-readable message strings:
- channel_escalation (3 sites), invalid_channel_ids (2), not_channel_member,
  missing_required_scope, token_limit_exceeded

Updated IntoResponse comment to accurately describe the dual envelope format:
standard variants → {"error": "text"}, Custom → {"error": "code", "message": "text"}
…nction, remove expect()

1. replace_addressable_event: add ORDER BY created_at DESC to stale-replay
   check so it always compares against the newest active row (handles legacy
   corruption with multiple active rows for same addressable key)
2. Token mint: distinguish ChannelNotFound from real DB errors in get_channel
   call — DB failures now return 500 internal_error instead of 400 invalid_channel_ids
3. Remove expect() on Scope::from_str — use unwrap_or(Scope::Unknown) to
   avoid panicking in production even though the parse is infallible
Merges 4 commits from main:
- feat: add Workflows UI surface (#217)
- chore: clean Tauri build artifacts in just clean (#219)
- fix: debounce composer pattern detection (#222)
- fix(desktop): plug memory leaks causing 37GB+ RAM growth (#223)

Conflict resolutions:
- approvals.rs: Merged main's refactored approval handlers (validate_approval,
  execute_grant/deny, by-hash endpoints) with our ApiError conversion
- workflows.rs: Converted new list_run_approvals endpoint to ApiError
- AppShell.tsx: Merged main's 'workflows' view type with our hook extractions
- CreateAgentDialog.tsx / AddChannelBotDialog.tsx: Merged main's eslint cleanup
  with our useBackendProviderProbe extraction

All cargo clippy clean. Desktop pnpm check: 0 errors.
…n AppShell

TypeScript build failed because AppShell.tsx references AppView and MainView
types in the selectView callback, but those types were local to useViewRouter.ts.
Export them and import in AppShell.
…estors

Crossfire review (codex CLI + opus subagents) found 13 issues across
4 rounds of adversarial review. Final score: 9/10 APPROVED.

API contract restorations (mod.rs, tokens.rs, workflows.rs, error.rs):
- Restore distinct auth error codes: invalid_token, token_revoked,
  token_expired (were collapsed to generic 'authentication required')
- Restore 'authentication failed' on invalid-credential paths (JWT
  validation, dev JWT, X-Pubkey) vs 'authentication required' on
  no-auth-provided paths
- Restore retry_after_seconds in rate-limit response: ApiError::Custom
  gains Option<serde_json::Value> 4th field for extra response data
- Restore structured 403 codes: channel_not_permitted and
  insufficient_scope use ApiError::Custom for two-field envelope
- Restore NIP-98 auth failure codes in POST /api/tokens: invalid_auth
  with specific messages per failure mode
- Restore token-management NIP-98 rejection messages: 'Bearer token
  required to list/revoke tokens'
- Restore webhook auth error messages: 'authentication failed' vs
  'webhook secret required but not configured'
- Fix NIP-98 message wording: 'only supported' (was 'only valid')
- Remove double-logging: internal_error() no longer logs (IntoResponse
  is the single logging point)

TOCTOU hardening (sprout-db/lib.rs):
- Add id DESC to ORDER BY in replace_addressable_event stale-replay
  guard: deterministic NIP-01 tiebreak even with legacy corruption

Desktop fix (useAncestorResolution.ts):
- Restore MAX_REQUESTED_ANCESTORS=500 eviction cap dropped during
  hook extraction (prevents unbounded Set growth in long sessions)

Test results (clean DB, release build):
- Unit: 812 passed, 0 failed
- e2e_rest_api: 39/40 (1 pre-existing NIP-05 test-ordering issue)
- e2e_relay: 27/27
- e2e_mcp: 14/14
- e2e_nostr_interop: 15/15
- e2e_tokens: 20/20
- e2e_media: 7/7
- e2e_media_extended: 18/18
- e2e_workflows: 0/7 (pre-existing — seeded CHANNEL_GENERAL UUID)
Zero regressions from these fixes.
@tlongwell-block tlongwell-block changed the title Quality overhaul: 6.5→9/10 across 110 files (critical fixes, architecture, DRY, a11y, docs) Codebase cleanup: DRY, error handling, decomposition, docs Apr 4, 2026
Replace positional tuple Custom(StatusCode, &str, String, Option<Value>)
with named struct Coded { status, code, message, extra } for self-documenting
call sites. No behavior change — status codes, error codes, and JSON response
shapes are identical.

23 call sites updated across error.rs, api/mod.rs, and api/tokens.rs.
… check

Pre-storage validation (validate_admin_event, validate_standard_deletion_event)
now returns ValidationError with Rejected/Infra variants instead of SideEffectError.
This fixes a semantic bug where DB failures during validation were misreported as
'invalid event' (client error) instead of infrastructure failures.

Pipeline callers (check_admin_event, check_standard_deletion) now route:
  - ValidationError::Rejected → IngestError::Rejected (400 / OK false)
  - ValidationError::Infra → IngestError::Internal (500)

Also fixes check_channel_not_archived: previously silently ignored DB errors
(fail-open by accident). Now fail-closed with tracing::warn — DB errors during
archive checks return IngestError::Internal instead of letting events through.

validate_admin_event: get_channel() no longer swallows DB errors into misleading
'channel not found' message — bare ? propagates the real error as Infra.

SideEffectError unchanged for post-storage side effects (fire-and-forget path).
148 relay tests pass.
DbError::ChannelNotFound, MemberNotFound, NotFound, and AccessDenied are
client-state errors (the event references something that doesn't exist or
the caller lacks permission). The blanket From<DbError> was mapping all
variants to ValidationError::Infra, causing admin events against nonexistent
channels to return 500 instead of 400.

Now: client-state DbError variants → Rejected (400), infrastructure errors
(Sqlx, Serde, etc.) → Infra (500). 148 relay tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant