NO-ISSUE: feat(preset): refresh quickstart presets and add Ministral-3-8B by hhk7734 · Pull Request #129 · moreh-dev/mif

hhk7734 · 2026-05-13T05:51:31Z

Summary

Cleanup: remove unmaintained presets (deepseek-r1-distill-llama-8b, ibm-granite-3.3-8b-instruct, qwen2-0.5b-instruct, qwen2.5-1.5b-instruct).
Image bump: update moreh-vllm image to v0.19.1.1 (from v0.17.1.1 / v0.19.1.0) across the maintained quickstart presets.
Prefix caching: opt all maintained presets into --enable-prefix-caching.
gemma-4-31b-it: add --enable-auto-tool-choice, --reasoning-parser gemma4, --tool-call-parser gemma4.
qwen3.6-27b: add --speculative-config '{"method":"mtp","num_speculative_tokens":3}' and new prefill/decode templates for PD disaggregation.
Ministral: replace mistralai/Mistral-7B-Instruct-v0.3 with mistralai/Ministral-3-8B-Reasoning-2512, configured with --tokenizer_mode mistral, --config_format mistral, --load_format mistral, --enable-auto-tool-choice, --tool-call-parser mistral, and --reasoning-parser mistral.

Test Plan

helm lint deploy/helm/moai-inference-preset passes.
helm template deploy/helm/moai-inference-preset renders the new templates including the --speculative-config and --kv-transfer-config JSON arguments without escaping issues.
Deploy a representative preset on MI250/MI300x and confirm the vLLM container starts with the new flags.
Verify Ministral-3-8B-Reasoning-2512 loads with the mistral tokenizer/config/load format.
Verify qwen3.6-27b PD prefill/decode pair handshakes via NixlConnector.

Drop deepseek-r1-distill-llama-8b, ibm-granite-3.3-8b-instruct, qwen2-0.5b-instruct, and qwen2.5-1.5b-instruct preset templates that are no longer maintained. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…3-8B Refresh quickstart preset templates across maintained models: - Bump moreh-vllm image to v0.19.1.1 - Enable prefix caching by default - Add reasoning and tool-call parsers for gemma-4-31b-it (gemma4) - Add MTP speculative decoding to qwen3.6-27b - Add PD-disaggregated prefill/decode templates for qwen3.6-27b - Replace mistral-7b-instruct-v0.3 with Ministral-3-8B-Reasoning-2512, including mistral tokenizer/config/load format and reasoning parser Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Refreshes the Helm “quickstart” preset catalog for vLLM by removing unmaintained model presets, bumping the moreh-vllm image across maintained presets, and updating runtime flags for newer model/tooling + PD disaggregation (prefill/decode) workflows.

Changes:

Removed several unmaintained quickstart presets (DeepSeek R1 distill, IBM Granite, Qwen2 0.5B, Qwen2.5 1.5B).
Bumped moreh-vllm images to v0.19.1.1 and enabled --enable-prefix-caching broadly; added model-specific tool/reasoning parser flags.
Added/updated Qwen3.6-27B PD prefill/decode presets (including Nixl KV transfer) and replaced Mistral-7B preset with Ministral-3-8B-Reasoning-2512.

Reviewed changes

Copilot reviewed 71 out of 71 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi300x-tp2.helm.yaml	New/updated Qwen3.6-27B prefill preset; image bump + prefix caching + PD args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi250-tp2.helm.yaml	New/updated Qwen3.6-27B prefill preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi300x-tp2.helm.yaml	New/updated Qwen3.6-27B decode preset; KV consumer + PD args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi250-tp2.helm.yaml	New/updated Qwen3.6-27B decode preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-amd-mi300x-tp2.helm.yaml	Qwen3.6-27B e2e preset; image bump + prefix caching + speculative config.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-amd-mi250-tp2.helm.yaml	Qwen3.6-27B e2e preset for MI250; image bump + new flags.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-prefill-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-prefill-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (prefill, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-decode-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-decode-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (decode, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-vl-8b-instruct-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (e2e, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-prefill-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-prefill-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (prefill, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-decode-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-decode-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (decode, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-32b-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (e2e, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-prefill-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (prefill, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-decode-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (decode, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3-1.7b-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching (e2e, MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-amd-mi300x-tp2.helm.yaml	Removed unmaintained Qwen2.5-1.5B preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2.5-1.5b-instruct-amd-mi250-tp2.helm.yaml	Removed unmaintained Qwen2.5-1.5B preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-prefill-amd-mi300x-tp2.helm.yaml	Removed unmaintained Qwen2-0.5B prefill preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-prefill-amd-mi250-tp2.helm.yaml	Removed unmaintained Qwen2-0.5B prefill preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-decode-amd-mi300x-tp2.helm.yaml	Removed unmaintained Qwen2-0.5B decode preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-decode-amd-mi250-tp2.helm.yaml	Removed unmaintained Qwen2-0.5B decode preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-amd-mi300x-tp2.helm.yaml	Removed unmaintained Qwen2-0.5B e2e preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen2-0.5b-instruct-amd-mi250-tp2.helm.yaml	Removed unmaintained Qwen2-0.5B e2e preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-prefill-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching for GPT-OSS-20B prefill.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-prefill-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching for GPT-OSS-20B prefill (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-decode-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching for GPT-OSS-20B decode.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-decode-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching for GPT-OSS-20B decode (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching for GPT-OSS-20B e2e.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching for GPT-OSS-20B e2e (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi300x-tp2.helm.yaml	Replaces Mistral-7B preset with Ministral-3-8B; adds Mistral-format args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi250-tp2.helm.yaml	Ministral prefill preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi300x-tp2.helm.yaml	Ministral decode preset; adds Mistral-format args + KV consumer.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi250-tp2.helm.yaml	Ministral decode preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi300x-tp2.helm.yaml	Ministral e2e preset; adds Mistral-format args.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi250-tp2.helm.yaml	Ministral e2e preset for MI250.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-prefill-amd-mi250-dp2-moe-tp2.helm.yaml	Image bump + enable prefix caching for Phi Mini MoE prefill.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-decode-amd-mi250-dp2-moe-tp2.helm.yaml	Image bump + enable prefix caching for Phi Mini MoE decode.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-microsoft-phi-mini-moe-instruct-amd-mi250-dp2-moe-tp2.helm.yaml	Image bump + enable prefix caching for Phi Mini MoE e2e.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching for Llama 3.2 1B prefill.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-prefill-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching for Llama 3.2 1B prefill (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching for Llama 3.2 1B decode.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-decode-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching for Llama 3.2 1B decode (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi300x-tp2.helm.yaml	Image bump + enable prefix caching for Llama 3.2 1B e2e.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-meta-llama-llama-3.2-1b-instruct-amd-mi250-tp2.helm.yaml	Image bump + enable prefix caching for Llama 3.2 1B e2e (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-prefill-amd-mi300x-tp2.helm.yaml	Removed unmaintained IBM Granite prefill preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-prefill-amd-mi250-tp2.helm.yaml	Removed unmaintained IBM Granite prefill preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-decode-amd-mi300x-tp2.helm.yaml	Removed unmaintained IBM Granite decode preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-decode-amd-mi250-tp2.helm.yaml	Removed unmaintained IBM Granite decode preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-amd-mi300x-tp2.helm.yaml	Removed unmaintained IBM Granite e2e preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-ibm-granite-granite-3.3-8b-instruct-amd-mi250-tp2.helm.yaml	Removed unmaintained IBM Granite e2e preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-prefill-amd-mi300x-tp2.helm.yaml	Image bump + prefix caching + Gemma4 tool/reasoning parser flags (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-prefill-amd-mi250-tp2.helm.yaml	Same Gemma4 updates for MI250 (prefill).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-decode-amd-mi300x-tp2.helm.yaml	Same Gemma4 updates (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-decode-amd-mi250-tp2.helm.yaml	Same Gemma4 updates for MI250 (decode).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-amd-mi300x-tp2.helm.yaml	Same Gemma4 updates (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-google-gemma-4-31b-it-amd-mi250-tp2.helm.yaml	Same Gemma4 updates for MI250 (e2e).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-prefill-amd-mi300x-tp2.helm.yaml	Removed unmaintained DeepSeek R1 distill prefill preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-prefill-amd-mi250-tp2.helm.yaml	Removed unmaintained DeepSeek R1 distill prefill preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-decode-amd-mi300x-tp2.helm.yaml	Removed unmaintained DeepSeek R1 distill decode preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-decode-amd-mi250-tp2.helm.yaml	Removed unmaintained DeepSeek R1 distill decode preset (MI250).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-amd-mi300x-tp2.helm.yaml	Removed unmaintained DeepSeek R1 distill e2e preset.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-deepseek-ai-deepseek-r1-distill-llama-8b-amd-mi250-tp2.helm.yaml	Removed unmaintained DeepSeek R1 distill e2e preset (MI250).

Comments suppressed due to low confidence (10)

deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi300x-tp2.helm.yaml:34

ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-prefill-amd-mi250-tp2.helm.yaml:34
ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi300x-tp2.helm.yaml:34
ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-decode-amd-mi250-tp2.helm.yaml:34
ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi300x-tp2.helm.yaml:34
ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-mistralai-ministral-3-8b-reasoning-2512-amd-mi250-tp2.helm.yaml:34
ISVC_EXTRA_ARGS uses --tokenizer_mode/--config_format/--load_format with underscores. Existing vLLM presets in this repo use hyphenated flags (e.g., --tokenizer-mode), and argparse typically will not accept the underscore variants, which would prevent the container from starting. Please switch these flags to the hyphenated form (and verify the exact option names supported by the moreh-vllm:v0.19.1.1 entrypoint).
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi300x-tp2.helm.yaml:33
--trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-prefill-amd-mi250-tp2.helm.yaml:33
--trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi300x-tp2.helm.yaml:33
--trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.
deploy/helm/moai-inference-preset/templates/presets/quickstart/quickstart-vllm-qwen-qwen3.6-27b-decode-amd-mi250-tp2.helm.yaml:33
--trust-remote-code allows execution of arbitrary Python from the model repository at load time. If this is required for Qwen3.6, please pin the model code to a specific, vetted revision and ensure the runtime environment is appropriately locked down (e.g., restricted egress / hardened pod security), otherwise remove this flag.

hhk7734 and others added 2 commits May 13, 2026 14:50

NO-ISSUE: chore(preset): remove unused quickstart presets

4c3275d

Drop deepseek-r1-distill-llama-8b, ibm-granite-3.3-8b-instruct, qwen2-0.5b-instruct, and qwen2.5-1.5b-instruct preset templates that are no longer maintained. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 13, 2026 05:51

hhk7734 requested a review from a team as a code owner May 13, 2026 05:51

gitgod-bot assigned hhk7734 May 13, 2026

Copilot started reviewing on behalf of hhk7734 May 13, 2026 05:52 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread ...lates/presets/quickstart/quickstart-vllm-openai-gpt-oss-20b-prefill-amd-mi300x-tp2.helm.yaml

hhk7734 merged commit e445f96 into main May 13, 2026
8 checks passed

hhk7734 deleted the feat/quickstart-preset-refresh branch May 13, 2026 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO-ISSUE: feat(preset): refresh quickstart presets and add Ministral-3-8B#129

NO-ISSUE: feat(preset): refresh quickstart presets and add Ministral-3-8B#129
hhk7734 merged 2 commits into
mainfrom
feat/quickstart-preset-refresh

hhk7734 commented May 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hhk7734 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hhk7734 commented May 13, 2026 •

edited

Loading