OpenWebUI update - new features and gpt as a main model#4102
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the OpenWebUI integration demo documentation to use newer default models (including a VLM) and adds guidance for newer “agentic” OpenWebUI features like Web Search, Memory, and Code Interpreter.
Changes:
- Switched the primary chat model example to
OpenVINO/gpt-oss-20b-int4-ovand standardized the OpenWebUI Model ID toovms-model. - Replaced the VLM example model with
Junrui2021/Qwen3-VL-8B-Instruct-int4and added a new screenshot for image upload. - Added new documentation sections for Web Search, Memory/context, and Code Interpreter configuration in OpenWebUI.
Reviewed changes
Copilot reviewed 1 out of 13 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| demos/integration_with_OpenWebUI/README.md | Updates model pull/config instructions and adds new OpenWebUI feature sections (Web Search, Memory, Code Interpreter). |
| demos/integration_with_OpenWebUI/upload_images.png | Adds/updates a screenshot used by the VLM “upload images” step. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| mkdir models | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model Godreign/llama-3.2-3b-instruct-openvino-int4-model --model_repository_path /models --task text_generation | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --add_to_config --config_path /models/config.json --model_path Godreign/llama-3.2-3b-instruct-openvino-int4-model --model_name Godreign/llama-3.2-3b-instruct-openvino-int4-model | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss |
There was a problem hiding this comment.
There’s a trailing whitespace at the end of this Docker command line. Trimming it avoids noisy diffs and occasional copy/paste quirks.
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss | |
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss |
| @@ -73,4 +73,4 @@ | |||
| > **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue. | |||
There was a problem hiding this comment.
This WA is needed only for NPU, gpt-oss is fixed.
| @@ -17,4 +17,4 @@ | |||
| * [Docker Engine](https://docs.docker.com/engine/) installed | |||
| * Host with x86_64 architecture | |||
| * Linux, macOS, or Windows | |||
| * [Docker Engine](https://docs.docker.com/engine/) installed | ||
| * Host with x86_64 architecture | ||
| * Linux, macOS, or Windows | ||
| * Python 3.11 with pip |
There was a problem hiding this comment.
While pip package of OWU does allow for >=3.11, <3.13.0a1, the install instructions:
https://pypi.org/project/open-webui/
How to install recommend using 3.11
There was a problem hiding this comment.
let's make it GPU by default with option to switch to CPU
| > **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue. | ||
|
|
||
| ### References | ||
| [https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html](https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html#model-preparation) |
There was a problem hiding this comment.
is it still relevant reference?
There was a problem hiding this comment.
You prefer to drop it or replace it with possibly:
https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching_agent.html#export-llm-model?
There was a problem hiding this comment.
add info about Native Tool Calling
There was a problem hiding this comment.
for gpt-oss it will be "reasoning_effort":"low"
| ```bash | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenVINO/InternVL2-2B-int4-ov --task text_generation | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --add_to_config --config_path /models/config.json --model_path OpenVINO/InternVL2-2B-int4-ov --model_name OpenVINO/InternVL2-2B-int4-ov | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --model_repository_path /models --model_name ovms-model-vl --task text_generation --pipeline_type VLM_CB |
There was a problem hiding this comment.
Damian used it in his demos, I assumed that this model works better with that
| There are other options to fulfill the prerequisites like [OpenVINO Model Server deployment on baremetal Linux or Windows](https://docs.openvino.ai/2026/model-server/ovms_docs_deploying_server_baremetal.html) and [Open WebUI installation with Docker](https://docs.openwebui.com/#quick-start-with-docker-). The steps in this demo can be reused across different options, and the reference for each step cover both deployments. | ||
|
|
||
| This demo was tested on CPU but most of the models could be also run on Intel accelerators like GPU and NPU. | ||
| This demo was tested on GPU but most of the models could be also run on Intel accelerators like CPU and NPU. To load all models in this demo, minimum 25GB of RAM memory should be free. |
There was a problem hiding this comment.
GPU is the accelerator. It would be good to mention which GPU and how much RAM. What are the minimal requirements? If those requirements can't be met, other models can be applied. For some models also NPU can be employed
| * [Docker Engine](https://docs.docker.com/engine/) installed | ||
| * Host with x86_64 architecture | ||
| * Linux, macOS, or Windows | ||
| * Linux or Windows |
There was a problem hiding this comment.
| * Linux or Windows |
There was a problem hiding this comment.
Why we should remove that line?
Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
|
|
||
| 1. Go to **Workspace** → **Models** | ||
| 2. Choose model or create it. | ||
| 3. In **Buildin Tools** section enable **Memory** |
There was a problem hiding this comment.
| 3. In **Buildin Tools** section enable **Memory** | |
| 3. In **Built-in Tools** section enable **Memory** |


🛠 Summary
CVS-183785
Changing models used in OpenWebUI, adding sections about new agentic features.
Done [todo]: Update screenshots to use ovms-model model name instead of
Godreign/llama-3.2-3b-instruct-openvino-int4-model🧪 Checklist
``