OpenWebUI update - new features and gpt as a main model by przepeck · Pull Request #4102 · openvinotoolkit/model_server

przepeck · 2026-03-31T12:59:43Z

🛠 Summary

CVS-183785
Changing models used in OpenWebUI, adding sections about new agentic features.

Done [todo]: Update screenshots to use ovms-model model name instead of Godreign/llama-3.2-3b-instruct-openvino-int4-model

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

Copilot

Pull request overview

Updates the OpenWebUI integration demo documentation to use newer default models (including a VLM) and adds guidance for newer “agentic” OpenWebUI features like Web Search, Memory, and Code Interpreter.

Changes:

Switched the primary chat model example to OpenVINO/gpt-oss-20b-int4-ov and standardized the OpenWebUI Model ID to ovms-model.
Replaced the VLM example model with Junrui2021/Qwen3-VL-8B-Instruct-int4 and added a new screenshot for image upload.
Added new documentation sections for Web Search, Memory/context, and Code Interpreter configuration in OpenWebUI.

Reviewed changes

Copilot reviewed 1 out of 13 changed files in this pull request and generated 11 comments.

File	Description
demos/integration_with_OpenWebUI/README.md	Updates model pull/config instructions and adds new OpenWebUI feature sections (Web Search, Memory, Code Interpreter).
demos/integration_with_OpenWebUI/upload_images.png	Adds/updates a screenshot used by the VLM “upload images” step.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…tter compatibility

przepeck · 2026-04-07T07:16:46Z

link to sphinx version: https://openvino-doc.iotg.sclab.intel.com/openwebui_update/model-server/ovms_demos_integration_with_open_webui.html

Copilot

Pull request overview

Copilot reviewed 2 out of 23 changed files in this pull request and generated 3 comments.

…date

Copilot

Pull request overview

Copilot reviewed 2 out of 27 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-07T12:34:44Z

 mkdir models
-docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model Godreign/llama-3.2-3b-instruct-openvino-int4-model --model_repository_path /models --task text_generation
-docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --add_to_config --config_path  /models/config.json --model_path Godreign/llama-3.2-3b-instruct-openvino-int4-model --model_name Godreign/llama-3.2-3b-instruct-openvino-int4-model
+docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss 


There’s a trailing whitespace at the end of this Docker command line. Trimming it avoids noisy diffs and occasional copy/paste quirks.

Suggested change

docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss

docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss

dtrawins · 2026-04-07T14:17:22Z

@@ -73,4 +73,4 @@
 > **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.


This WA is needed only for NPU, gpt-oss is fixed.

dtrawins · 2026-04-07T14:18:49Z

@@ -17,4 +17,4 @@
 * [Docker Engine](https://docs.docker.com/engine/) installed
 * Host with x86_64 architecture
 * Linux, macOS, or Windows


dtrawins · 2026-04-07T14:19:04Z

 * [Docker Engine](https://docs.docker.com/engine/) installed
 * Host with x86_64 architecture
 * Linux, macOS, or Windows
 * Python 3.11 with pip 


only python3.11?

While pip package of OWU does allow for >=3.11, <3.13.0a1, the install instructions:
https://pypi.org/project/open-webui/
How to install recommend using 3.11

dtrawins · 2026-04-07T14:20:07Z

let's make it GPU by default with option to switch to CPU

dtrawins · 2026-04-07T14:21:12Z

 > **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.

 ### References
 [https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html](https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html#model-preparation)


is it still relevant reference?

You prefer to drop it or replace it with possibly:
https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching_agent.html#export-llm-model?

dtrawins · 2026-04-07T14:22:05Z

add info about Native Tool Calling

Its in there:

dtrawins · 2026-04-07T14:22:49Z

for gpt-oss it will be "reasoning_effort":"low"

dtrawins · 2026-04-07T14:23:51Z

 ```bash
-docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenVINO/InternVL2-2B-int4-ov --task text_generation
-docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --add_to_config --config_path /models/config.json  --model_path OpenVINO/InternVL2-2B-int4-ov --model_name OpenVINO/InternVL2-2B-int4-ov
+docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --model_repository_path /models --model_name ovms-model-vl --task text_generation --pipeline_type VLM_CB


is pipeline_type needed?

Damian used it in his demos, I assumed that this model works better with that

dtrawins · 2026-04-09T17:14:43Z

 There are other options to fulfill the prerequisites like [OpenVINO Model Server deployment on baremetal Linux or Windows](https://docs.openvino.ai/2026/model-server/ovms_docs_deploying_server_baremetal.html) and [Open WebUI installation with Docker](https://docs.openwebui.com/#quick-start-with-docker-). The steps in this demo can be reused across different options, and the reference for each step cover both deployments.

-This demo was tested on CPU but most of the models could be also run on Intel accelerators like GPU and NPU.
+This demo was tested on GPU but most of the models could be also run on Intel accelerators like CPU and NPU. To load all models in this demo, minimum 25GB of RAM memory should be free.


GPU is the accelerator. It would be good to mention which GPU and how much RAM. What are the minimal requirements? If those requirements can't be met, other models can be applied. For some models also NPU can be employed

dtrawins · 2026-04-10T21:59:00Z

 * [Docker Engine](https://docs.docker.com/engine/) installed
 * Host with x86_64 architecture
-* Linux, macOS, or Windows
+* Linux or Windows


Suggested change

* Linux or Windows

Why we should remove that line?

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

dtrawins · 2026-04-13T09:30:41Z

+
+1. Go to **Workspace**  → **Models**
+2. Choose model or create it.
+3. In **Buildin Tools** section enable **Memory**


Suggested change

3. In **Buildin Tools** section enable **Memory**

3. In **Built-in Tools** section enable **Memory**

in OWU it is "Buildin"

przepeck added 2 commits March 31, 2026 14:41

OpenWebUI update

a8ad2dc

OpenWebUI update

bfd5af7

przepeck requested review from atobiszei, Copilot and dtrawins March 31, 2026 12:59

Copilot started reviewing on behalf of przepeck March 31, 2026 13:00 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

dtrawins mentioned this pull request Apr 1, 2026

Update OpenWEB UI demo to new version for image generation #4100

Closed

3 tasks

Apply suggestion from @Copilot

e149a02

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

atobiszei reviewed Apr 2, 2026

View reviewed changes

Comment thread demos/integration_with_OpenWebUI/README.md

atobiszei reviewed Apr 2, 2026

View reviewed changes

Comment thread demos/integration_with_OpenWebUI/README.md Outdated

przepeck added 4 commits April 3, 2026 07:23

fixes

0a27de8

fix

3c7f3c1

spelling

1923661

changing images to use more generic model name for consistancy and be…

473c86a

…tter compatibility

przepeck changed the title ~~[WIP] OpenWebUI update~~ OpenWebUI update Apr 3, 2026

przepeck changed the title ~~OpenWebUI update~~ OpenWebUI update - new features and gpt as a main model Apr 3, 2026

przepeck added 4 commits April 3, 2026 11:20

advanced params screenshot

393a4a6

fix

bf6a331

spelling

cbcc906

tools changes

8ddc72f

atobiszei requested a review from Copilot April 7, 2026 11:44

atobiszei approved these changes Apr 7, 2026

View reviewed changes

Copilot started reviewing on behalf of atobiszei April 7, 2026 11:45 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread demos/integration_with_OpenWebUI/README.md Outdated

Comment thread demos/integration_with_OpenWebUI/README.md Outdated

Comment thread demos/integration_with_OpenWebUI/README.md Outdated

atobiszei and others added 3 commits April 7, 2026 13:51

Update OpenWEB UI demo to new version for image generation

4881a4e

typos

f2a4d4d

spelling whitelist

2c8a4d1

atobiszei requested a review from Copilot April 7, 2026 12:29

Copilot started reviewing on behalf of atobiszei April 7, 2026 12:30 View session

Merge remote-tracking branch 'origin/main' into przepeck/openwebui_up…

64278be

…date

Copilot AI reviewed Apr 7, 2026

View reviewed changes

dtrawins reviewed Apr 7, 2026

View reviewed changes

atobiszei and others added 4 commits April 7, 2026 16:57

Review fix p1

773b158

review changes

fddd075

GPU by default

683c181

GPU by default

b01a114

przepeck commented Apr 8, 2026

View reviewed changes

Comment thread demos/integration_with_OpenWebUI/README.md Outdated

dtrawins approved these changes Apr 8, 2026

View reviewed changes

przepeck and others added 3 commits April 9, 2026 11:24

fix

5bcb6bb

clarification

56ea1b0

Merge branch 'main' into przepeck/openwebui_update

59a07e2

dtrawins reviewed Apr 9, 2026

View reviewed changes

dtrawins requested changes Apr 9, 2026

View reviewed changes

llama swap reference, memory requirements

6f8bfb0

dtrawins reviewed Apr 10, 2026

View reviewed changes

Comment thread demos/integration_with_OpenWebUI/README.md Outdated

dtrawins reviewed Apr 10, 2026

View reviewed changes

przepeck and others added 2 commits April 13, 2026 06:32

Update demos/integration_with_OpenWebUI/README.md

fe247cb

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

review suggestions

b04f67e

dtrawins approved these changes Apr 13, 2026

View reviewed changes

dtrawins reviewed Apr 13, 2026

View reviewed changes

spelling

4856ae2

dtrawins merged commit 89a6299 into main Apr 16, 2026
1 check passed

przepeck deleted the przepeck/openwebui_update branch April 17, 2026 08:24

	docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss
	docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss

		@@ -73,4 +73,4 @@
		> Important Note: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue.

	3. In Buildin Tools section enable Memory
	3. In Built-in Tools section enable Memory

Conversation

przepeck commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 Summary

🧪 Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

przepeck commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dtrawins Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

przepeck commented Mar 31, 2026 •

edited

Loading

dtrawins Apr 9, 2026 •

edited

Loading