Skip to content

fix: auto-recover corrupt LanceDB tables on startup#14

Open
DevNexsler wants to merge 1 commit into
feat/communications-context-enrichmentfrom
fix/doc-organizer-init-recovery
Open

fix: auto-recover corrupt LanceDB tables on startup#14
DevNexsler wants to merge 1 commit into
feat/communications-context-enrichmentfrom
fix/doc-organizer-init-recovery

Conversation

@DevNexsler
Copy link
Copy Markdown
Owner

Summary

  • auto-recover unreadable LanceDB tables during MCP lazy init and indexer startup
  • share a targeted open-with-recovery path instead of failing file_status / first index open on known Lance corruption signatures
  • add regression coverage for recovery open path and MCP store bootstrap wiring

Evidence

  • doc-organizer session logs in last 24h showed repeated MCP failures around 2026-05-19 09:03-09:30 America/New_York:
    • service_unavailable: LanceError(IO): Generic memory error: Invalid range 0..0 for object of size 0 bytes
    • retrieval_failed: missing chunks.lance manifest
  • container recovered only after restart/reindex, which means init-time corruption handling was incomplete

Root Cause

flow_index_vault had post-index recovery, but startup paths still opened LanceDB directly. If table was unreadable before indexing began, file_status and first index open failed before recovery code ran.

Fix

  • add open_store_with_recovery() and recover_corrupt_table() in lancedb_store.py
  • use recovery-aware open in mcp_server._build_store_and_embed()
  • use recovery-aware open in flow_index_vault.index_vault_flow()

Test Report

  • PYTHONPATH=. pytest -q tests/test_store.py -k 'open_store_with_recovery'
  • PYTHONPATH=. pytest -q tests/test_mcp_contract.py -k 'build_store_and_embed_uses_recovery_open or get_deps_failure_in_status'
  • PYTHONPATH=. pytest tests/test_config.py tests/test_prefect_server.py

Required suite output:

============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
rootdir: /tmp/ragbox-health-fix
configfile: pyproject.toml
plugins: asyncio-1.3.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 24 items

tests/test_config.py ............                                        [ 50%]
tests/test_prefect_server.py ............                                [100%]

============================== 24 passed in 1.23s ==============================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant