Fix discovery watcher ignoring frontend's --model-path#1
Open
Fix discovery watcher ignoring frontend's --model-path#1
Conversation
…enizer When the dynamo frontend is deployed with --model-path pointing to a local tokenizer directory, the discovery watcher was ignoring this path entirely. Instead, it attempted to load model config/tokenizer from the path advertised by discovered workers, which is a local filesystem path that only exists on the worker nodes. This caused model registration to fail in disaggregated serving setups. The fix passes the frontend's local model path into ModelWatcher and uses it to re-point the discovered worker's card file references (config.json, tokenizer.json, etc.) to the frontend's local directory before attempting download_config(). Since the files exist locally, download_config() returns early via has_local_files() without attempting any HuggingFace download. https://claude.ai/code/session_01FDtNTJPnuwHymY56WGNUg4
a129347 to
4232f93
Compare
Extract `prepare_card_for_download` from `do_worker_set_registration` so the watcher's card preparation logic can be unit-tested independently. The key test `test_prepare_card_with_local_path_succeeds` exercises the actual watcher code path and FAILS without the fix — verified by removing the `update_dir` call from `prepare_card_for_download`. https://claude.ai/code/session_01FDtNTJPnuwHymY56WGNUg4
4232f93 to
b7d4a74
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ModelWatcherignoring the frontend's--model-pathwhen processing discovered workers in disaggregated serving setupsModelWatcherand use it to re-point discovered worker card file references (config.json, tokenizer.json, etc.) to the frontend's local directory beforedownload_config()update_dir(),download_config()returns early viahas_local_files()without attempting any HuggingFace downloadProblem
When the dynamo frontend is deployed with
--model-pathpointing to a local tokenizer directory, the discovery watcher ignores this path entirely. It attempts to load model config/tokenizer from the path advertised by discovered workers — a local filesystem path that only exists on the worker nodes. This causes model registration to fail (/v1/modelsreturns empty, inference returns 404).Changes
lib/llm/src/discovery/watcher.rs— Addedlocal_model_path: Option<PathBuf>field toModelWatcher. Indo_worker_set_registration(), callscard.update_dir()with the frontend's local path beforedownload_config().lib/llm/src/model_card.rs— Madeupdate_dir()public so the watcher can call it.lib/llm/src/entrypoint/input/common.rs— PassesLocalModel::path()toModelWatcher::new().lib/llm/src/entrypoint/input/http.rs— Same plumbing throughrun_watcher().lib/llm/src/entrypoint/input/grpc.rs— Same plumbing throughrun_watcher().lib/llm/tests/http_metrics.rs— Updated test call sites withNonefor the new parameter.Test plan
--model-pathpointing to local tokenizer directory and--discovery-backend=kubernetes/v1/modelsreturns the discovered model/v1/chat/completionsreturns successful inference results--model-pathon frontend) still work via HF download fallbackhttps://claude.ai/code/session_01FDtNTJPnuwHymY56WGNUg4