Skip to content

feat: detect embedding model drift and add rebuild tool#1439

Open
yeyitech wants to merge 2 commits intovolcengine:mainfrom
yeyitech:feat/embedding-model-compat-check
Open

feat: detect embedding model drift and add rebuild tool#1439
yeyitech wants to merge 2 commits intovolcengine:mainfrom
yeyitech:feat/embedding-model-compat-check

Conversation

@yeyitech
Copy link
Copy Markdown
Contributor

Summary

  • add persisted embedding compatibility metadata for local vectordb collections
  • fail fast on server startup when the configured embedding identity no longer matches stored vectors
  • add openviking-rebuild-vectors to delete stale per-account vectors and reindex all scopes from AGFS content
  • cover metadata mismatch handling and rebuild traversal with targeted tests

Closes #1066.

Validation

  • ruff check openviking/service/core.py openviking/service/vector_rebuild.py openviking/storage/embedding_compat.py openviking/storage/viking_fs.py openviking_cli/rebuild_vectors.py openviking_cli/utils/config/embedding_config.py openviking_cli/exceptions.py tests/storage/test_embedding_compat.py tests/service/test_vector_rebuild.py
  • python -m compileall openviking/service/core.py openviking/service/vector_rebuild.py openviking/storage/embedding_compat.py openviking/storage/viking_fs.py openviking_cli/rebuild_vectors.py openviking_cli/utils/config/embedding_config.py tests/storage/test_embedding_compat.py tests/service/test_vector_rebuild.py
  • python -m openviking_cli.rebuild_vectors --help
  • manual validation harness for the new compatibility + rebuild logic (the repo still has the existing pytest-asyncio collection failure: AttributeError: 'Package' object has no attribute 'obj', so direct pytest collection is still blocked)

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1066 - Partially compliant

Compliant requirements:

  • Record current embedding model identifier in metadata file
  • Add openviking-rebuild-vectors tool to rebuild vectors
  • Fail fast on startup with clear error when embedding model changes

Non-compliant requirements:

  • (No requirements violated, but implementation has bugs)

Requires further human verification:

  • Manual validation of the rebuild tool in real environment
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 75
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

NameError on Startup

The initialize method uses an undefined variable config when calling ensure_embedding_collection_compatibility. This will cause a NameError and crash the server on startup.

config,
Blocking Sync Call in Async Function

The fallback path for discover_accounts calls synchronous viking_fs.agfs.ls("/local") in an async function, which blocks the event loop.

entries = viking_fs.agfs.ls("/local")

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix undefined config variable

The variable config is not defined in the initialize method scope. Use self._config
instead (the stored config instance) to avoid a NameError.

openviking/service/core.py [281-286]

 if not self._embedding_compat_check_disabled():
     await ensure_embedding_collection_compatibility(
         self._vikingdb_manager,
-        config,
+        self._config,
         config_path=os.environ.get("OPENVIKING_CONFIG_FILE"),
     )
Suggestion importance[1-10]: 9

__

Why: The variable config is indeed undefined in the initialize method scope; using self._config prevents a NameError, which is a critical correctness issue.

High
General
Remove blocking sync FS call

The fallback path uses a synchronous viking_fs.agfs.ls("/local") call inside an
async function, which blocks the event loop. Remove this fallback and require the
async list_account_roots method, or wrap the sync call in a thread pool.

openviking/service/vector_rebuild.py [53-56]

-try:
-    entries = await viking_fs.list_account_roots()
-except AttributeError:
-    entries = viking_fs.agfs.ls("/local")
+entries = await viking_fs.list_account_roots()
Suggestion importance[1-10]: 6

__

Why: The fallback uses a synchronous call inside an async function, which can block the event loop. Removing it aligns with the PR's addition of the async list_account_roots method.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

Feature: auto-detect embedding model change and trigger vector rebuild

1 participant