Skip to content

fix: clear zombie doc-organizer indexer pid#15

Open
DevNexsler wants to merge 1 commit into
feat/communications-context-enrichmentfrom
fix/doc-organizer-zombie-pid-20260521
Open

fix: clear zombie doc-organizer indexer pid#15
DevNexsler wants to merge 1 commit into
feat/communications-context-enrichmentfrom
fix/doc-organizer-zombie-pid-20260521

Conversation

@DevNexsler
Copy link
Copy Markdown
Owner

Summary

  • treat zombie indexer.pid targets as dead instead of active
  • clean stale zombie pid files during both file_status and file_index_update
  • add regression coverage for zombie pid handling in status and index update paths

Root Cause

A background indexer crashed with free(): invalid next size (fast), left /data/index/indexer.pid behind, and stayed as a zombie process. os.kill(pid, 0) succeeds for zombies, so the MCP server kept reporting already_running and never allowed FTS rebuild/self-heal.

Evidence

  • /data/index/indexer.pid persisted with pid 22512
  • /proc/22512/status reported State: Z (zombie)
  • OpenClaw doc-organizer checks returned already_running with pid 22512 repeatedly for ~24h
  • indexer.log ended with free(): invalid next size (fast)

Fix

  • add shared pid-resolution helper in mcp_server.py
  • reap child zombies with waitpid(..., WNOHANG) when possible
  • fall back to /proc/<pid>/status and treat Z as dead
  • unlink stale zombie pid files before reporting/running indexer

Test Report

PYTHONPATH=. ../../.venv/bin/pytest tests/test_index_nonblocking.py tests/test_mcp_contract.py -k "index_update or file_status"
9 passed, 38 deselected, 4 warnings in 3.25s

PYTHONPATH=. ../../.venv/bin/pytest tests/test_config.py tests/test_prefect_server.py
24 passed in 1.34s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant