feat: report orchestrator health in readiness endpoint#8
Draft
deangoodmanson wants to merge 5 commits intodevelopfrom
Draft
feat: report orchestrator health in readiness endpoint#8deangoodmanson wants to merge 5 commits intodevelopfrom
deangoodmanson wants to merge 5 commits intodevelopfrom
Conversation
- Use pgvector/pgvector:pg17 image so the vector extension is available - Add platform: linux/amd64 for py-std-worker to suppress emulation warning - Remove ivfflat index on document_chunks (vector(3072) exceeds 2000-dim limit) - Fix health CLI to parse flat "ok" strings from readiness endpoint Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The /health/ready endpoint now includes an orchestrator check alongside database, event_source, and queue. The orchestrator task signals liveness via an Arc<AtomicBool> in AppState — set to true when the loop starts, false if it exits unexpectedly before shutdown. This closes the gap where orchestrator crashes were invisible to health checks: a stalled or panicked orchestrator task now causes the readiness probe to return 503, triggering a restart in orchestrated environments. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…es2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 tasks
…dering - Initialize orchestrator_alive to false; set true only when orchestrator actually starts (fixes window where health reported ok before startup) - Store true before notify_one() to eliminate race between startup signal and health check reads - Use Release/Acquire ordering instead of Relaxed for cross-thread visibility on weak memory model architectures (arm64) - Remove "ready" as a valid check status (it's a top-level field, not a per-check value) - Add comment explaining defensive "error" status handling - Add test for orchestrator unhealthy path (503 + orchestrator: "unhealthy") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
Claude Sonnet 4.6 found a handful of issues with the first version of this PR, and committed the fixes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
orchestrator_alive: Arc<AtomicBool>toAppState(initializesfalse; settrueonly when orchestrator starts)spawn_orchestrator()sets the flagtruebefore signaling ready,falseon unexpected exit (not during graceful shutdown); usesRelease/Acquireordering for correct visibility on arm64/health/readynow includesorchestrator: "ok"|"unhealthy"in its checks and includes orchestrator in theall_healthygatekruxiaflow health) now correctly parses flat"ok"strings from the readiness endpoint — previously showed❓ Not reported in readiness checkBehavior
Normal operation:
{ "status": "ready", "checks": { "database": "ok", "event_source": "ok", "queue": "ok", "orchestrator": "ok" } }Orchestrator crash:
{ "status": "not_ready", "checks": { ..., "orchestrator": "unhealthy" } }→ Returns HTTP 503, triggering a restart in Kubernetes/Docker health checks.
Test plan
./docker exec kruxiaflow /kruxiaflow healthshows✅ orchestrator - okfor all servicescargo test -p kruxiaflow-apipasses includingtest_readiness_endpoint_orchestrator_unhealthy🤖 Generated with Claude Code