Skip to content

NO-ISSUE: feat(preset): refresh quickstart presets and add Ministral-3-8B#129

Merged
hhk7734 merged 2 commits into
mainfrom
feat/quickstart-preset-refresh
May 13, 2026
Merged

NO-ISSUE: feat(preset): refresh quickstart presets and add Ministral-3-8B#129
hhk7734 merged 2 commits into
mainfrom
feat/quickstart-preset-refresh

Conversation

@hhk7734
Copy link
Copy Markdown
Member

@hhk7734 hhk7734 commented May 13, 2026

Summary

  • Cleanup: remove unmaintained presets (deepseek-r1-distill-llama-8b, ibm-granite-3.3-8b-instruct, qwen2-0.5b-instruct, qwen2.5-1.5b-instruct).
  • Image bump: update moreh-vllm image to v0.19.1.1 (from v0.17.1.1 / v0.19.1.0) across the maintained quickstart presets.
  • Prefix caching: opt all maintained presets into --enable-prefix-caching.
  • gemma-4-31b-it: add --enable-auto-tool-choice, --reasoning-parser gemma4, --tool-call-parser gemma4.
  • qwen3.6-27b: add --speculative-config '{"method":"mtp","num_speculative_tokens":3}' and new prefill/decode templates for PD disaggregation.
  • Ministral: replace mistralai/Mistral-7B-Instruct-v0.3 with mistralai/Ministral-3-8B-Reasoning-2512, configured with --tokenizer_mode mistral, --config_format mistral, --load_format mistral, --enable-auto-tool-choice, --tool-call-parser mistral, and --reasoning-parser mistral.

Test Plan

  • helm lint deploy/helm/moai-inference-preset passes.
  • helm template deploy/helm/moai-inference-preset renders the new templates including the --speculative-config and --kv-transfer-config JSON arguments without escaping issues.
  • Deploy a representative preset on MI250/MI300x and confirm the vLLM container starts with the new flags.
  • Verify Ministral-3-8B-Reasoning-2512 loads with the mistral tokenizer/config/load format.
  • Verify qwen3.6-27b PD prefill/decode pair handshakes via NixlConnector.

hhk7734 and others added 2 commits May 13, 2026 14:50
Drop deepseek-r1-distill-llama-8b, ibm-granite-3.3-8b-instruct,
qwen2-0.5b-instruct, and qwen2.5-1.5b-instruct preset templates that
are no longer maintained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…3-8B

Refresh quickstart preset templates across maintained models:

- Bump moreh-vllm image to v0.19.1.1
- Enable prefix caching by default
- Add reasoning and tool-call parsers for gemma-4-31b-it (gemma4)
- Add MTP speculative decoding to qwen3.6-27b
- Add PD-disaggregated prefill/decode templates for qwen3.6-27b
- Replace mistral-7b-instruct-v0.3 with Ministral-3-8B-Reasoning-2512,
  including mistral tokenizer/config/load format and reasoning parser

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 13, 2026 05:51
@hhk7734 hhk7734 requested a review from a team as a code owner May 13, 2026 05:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refreshes the Helm “quickstart” preset catalog for vLLM by removing unmaintained model presets, bumping the moreh-vllm image across maintained presets, and updating runtime flags for newer model/tooling + PD disaggregation (prefill/decode) workflows.

Changes:

  • Removed several unmaintained quickstart presets (DeepSeek R1 distill, IBM Granite, Qwen2 0.5B, Qwen2.5 1.5B).
  • Bumped moreh-vllm images to v0.19.1.1 and enabled --enable-prefix-caching broadly; added model-specific tool/reasoning parser flags.
  • Added/updated Qwen3.6-27B PD prefill/decode presets (including Nixl KV transfer) and replaced Mistral-7B preset with Ministral-3-8B-Reasoning-2512.

Reviewed changes

Copilot reviewed 71 out of 71 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi300x-tp2.helm.yaml New/updated Qwen3.6-27B prefill preset; image bump + prefix caching + PD args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi250-tp2.helm.yaml New/updated Qwen3.6-27B prefill preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi300x-tp2.helm.yaml New/updated Qwen3.6-27B decode preset; KV consumer + PD args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi250-tp2.helm.yaml New/updated Qwen3.6-27B decode preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-amd-mi300x-tp2.helm.yaml Qwen3.6-27B e2e preset; image bump + prefix caching + speculative config.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-amd-mi250-tp2.helm.yaml Qwen3.6-27B e2e preset for MI250; image bump + new flags.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-prefill-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-prefill-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (prefill, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-decode-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-decode-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (decode, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (e2e, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-prefill-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-prefill-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (prefill, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-decode-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-decode-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (decode, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (e2e, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (prefill, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (decode, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching (e2e, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-amd-mi300x-tp2.helm.yaml Removed unmaintained Qwen2.5-1.5B preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-amd-mi250-tp2.helm.yaml Removed unmaintained Qwen2.5-1.5B preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-prefill-amd-mi300x-tp2.helm.yaml Removed unmaintained Qwen2-0.5B prefill preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-prefill-amd-mi250-tp2.helm.yaml Removed unmaintained Qwen2-0.5B prefill preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-decode-amd-mi300x-tp2.helm.yaml Removed unmaintained Qwen2-0.5B decode preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-decode-amd-mi250-tp2.helm.yaml Removed unmaintained Qwen2-0.5B decode preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-amd-mi300x-tp2.helm.yaml Removed unmaintained Qwen2-0.5B e2e preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-amd-mi250-tp2.helm.yaml Removed unmaintained Qwen2-0.5B e2e preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-prefill-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching for GPT-OSS-20B prefill.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-prefill-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching for GPT-OSS-20B prefill (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-decode-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching for GPT-OSS-20B decode.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-decode-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching for GPT-OSS-20B decode (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching for GPT-OSS-20B e2e.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching for GPT-OSS-20B e2e (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi300x-tp2.helm.yaml Replaces Mistral-7B preset with Ministral-3-8B; adds Mistral-format args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi250-tp2.helm.yaml Ministral prefill preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi300x-tp2.helm.yaml Ministral decode preset; adds Mistral-format args + KV consumer.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi250-tp2.helm.yaml Ministral decode preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi300x-tp2.helm.yaml Ministral e2e preset; adds Mistral-format args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi250-tp2.helm.yaml Ministral e2e preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-prefill-amd-mi250-dp2-moe-tp2.helm.yaml Image bump + enable prefix caching for Phi Mini MoE prefill.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-decode-amd-mi250-dp2-moe-tp2.helm.yaml Image bump + enable prefix caching for Phi Mini MoE decode.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-amd-mi250-dp2-moe-tp2.helm.yaml Image bump + enable prefix caching for Phi Mini MoE e2e.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching for Llama 3.2 1B prefill.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching for Llama 3.2 1B prefill (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching for Llama 3.2 1B decode.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching for Llama 3.2 1B decode (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml Image bump + enable prefix caching for Llama 3.2 1B e2e.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.helm.yaml Image bump + enable prefix caching for Llama 3.2 1B e2e (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-prefill-amd-mi300x-tp2.helm.yaml Removed unmaintained IBM Granite prefill preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-prefill-amd-mi250-tp2.helm.yaml Removed unmaintained IBM Granite prefill preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-decode-amd-mi300x-tp2.helm.yaml Removed unmaintained IBM Granite decode preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-decode-amd-mi250-tp2.helm.yaml Removed unmaintained IBM Granite decode preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-amd-mi300x-tp2.helm.yaml Removed unmaintained IBM Granite e2e preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-amd-mi250-tp2.helm.yaml Removed unmaintained IBM Granite e2e preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-prefill-amd-mi300x-tp2.helm.yaml Image bump + prefix caching + Gemma4 tool/reasoning parser flags (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-prefill-amd-mi250-tp2.helm.yaml Same Gemma4 updates for MI250 (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-decode-amd-mi300x-tp2.helm.yaml Same Gemma4 updates (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-decode-amd-mi250-tp2.helm.yaml Same Gemma4 updates for MI250 (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-amd-mi300x-tp2.helm.yaml Same Gemma4 updates (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-amd-mi250-tp2.helm.yaml Same Gemma4 updates for MI250 (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-prefill-amd-mi300x-tp2.helm.yaml Removed unmaintained DeepSeek R1 distill prefill preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-prefill-amd-mi250-tp2.helm.yaml Removed unmaintained DeepSeek R1 distill prefill preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-decode-amd-mi300x-tp2.helm.yaml Removed unmaintained DeepSeek R1 distill decode preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-decode-amd-mi250-tp2.helm.yaml Removed unmaintained DeepSeek R1 distill decode preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-amd-mi300x-tp2.helm.yaml Removed unmaintained DeepSeek R1 distill e2e preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-amd-mi250-tp2.helm.yaml Removed unmaintained DeepSeek R1 distill e2e preset (MI250).
Comments suppressed due to low confidence (10)

deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi300x-tp2.helm.yaml:34

  • ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi250-tp2.helm.yaml:34
  • ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi300x-tp2.helm.yaml:34
  • ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi250-tp2.helm.yaml:34
  • ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi300x-tp2.helm.yaml:34
  • ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi250-tp2.helm.yaml:34
  • ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi300x-tp2.helm.yaml:33
  • --trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi250-tp2.helm.yaml:33
  • --trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi300x-tp2.helm.yaml:33
  • --trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.
    deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi250-tp2.helm.yaml:33
  • --trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.

@hhk7734 hhk7734 merged commit e445f96 into main May 13, 2026
8 checks passed
@hhk7734 hhk7734 deleted the feat/quickstart-preset-refresh branch May 13, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants