Skip to content

Jinja Exception: System message must be at the beginning with custom provider, docker model runner and qwen 3.5 family models #2327

@k33g

Description

@k33g

docker-agent version 1.42.0

In theory, when sending a set of messages with the OpenAI API to a local model, there should be only one message with a system role, and it must be the first message in the list.

Until now, inserting multiple messages with a system role hadn't really caused any issues (most chat templates are fairly permissive).

However, now, Qwen3.5's Jinja chat template rejects a message list of this type.

Qwen3.5's official chat_template.jinja contains logic like:

{%- for message in messages %}
{%- set content = render_content(message.content, true)|trim %}
{%- if message.role == "system" %}
{%- if not loop.first %}
{{- raise_exception('System message must be at the beginning.')
}}
{%- endif %}

https://huggingface.co/unsloth/Qwen3.5-2B/blob/main/chat_template.jinja#L85

This has an impact (error) on certain use cases of docker-agent with Docker Model Runner when using a custom provider.

1. docker-agent + dmr provider (it works)

If I use the model huggingface.co/unsloth/qwen3.5-2b-gguf:Q4_K_M with the dmr provider, I don't have any issues — every system-type message (such as those used for skill detection) is concatenated with the first system message:

docker-agent provider:

models:

  brain:
    provider: dmr
    model: huggingface.co/unsloth/qwen3.5-2b-gguf:Q4_K_M
    temperature: 0.0
    top_p: 0.95
    presence_penalty: 1.5 
    max_tokens: 65536

Request:

{
  "max_tokens": 65536,
  "messages": [
    {
      "content": "You are Bob, a coding expert\n\n## Custom Shell Tools\n\n### execute_command\nExecute a shell command and return its stdout and stderr output.\n- `command`: The shell command to execute.\n\n\nSkills provide specialized instructions for specific tasks. When a user's request matches a skill's description, use read_skill to load its instructions.\n\n<available_skills>\n  <skill>\n    <name>what-time-is-it</name>\n    <description>display the current date and time</description>\n  </skill>\n  <skill>\n    <name>greetings</name>\n    <description>when the user writes \"node greetings\" to somebody, the agent will run this skill with the appropriate parameter.</description>\n  </skill>\n  <skill>\n    <name>vulcan-salute</name>\n    <description>when the user writes \"vulcan salute\" or \"vulcan-salute\" to somebody, the agent will run this skill with the appropriate parameter.</description>\n  </skill>\n</available_skills>",
      "role": "system"
    },
    {
      "content": "what is your quest?",
      "role": "user"
    }
  ],
  "model": "huggingface.co/unsloth/qwen3.5-2b-gguf:Q4_K_M",
  "parallel_tool_calls": true,
  "presence_penalty": 1.5,
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "temperature": 0,
  "tools": [
    {
      "function": {
        "description": "Execute a shell command and return its stdout and stderr output.",
        "name": "execute_command",
        "parameters": {
          "properties": {
            "command": {
              "description": "The shell command to execute.",
              "type": "string"
            }
          },
          "type": "object"
        }
      },
      "type": "function"
    },
    {
      "function": {
        "description": "Read the content of a skill by name. Use this when a user's request matches an available skill.",
        "name": "read_skill",
        "parameters": {
          "properties": {
            "name": {
              "description": "The name of the skill to read",
              "type": "string"
            }
          },
          "required": [
            "name"
          ],
          "type": "object"
        }
      },
      "type": "function"
    }
  ],
  "top_p": 0.95
}

2. docker-agent + custom provider for dmr (error)

If I use the model huggingface.co/unsloth/qwen3.5-2b-gguf:Q4_K_M with a custom provider, I'm getting this error:

all models failed: error receiving from stream: HTTP 500: POST
"http://host.docker.internal:12434/engines/v1/chat/completions": 500 Internal Server Error
{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in
source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n
^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}

But why do I need a custom provider for Docker Model Runner? Because I need to connect to Docker Model Runner from inside a Docker Sandbox.

Upon examining the request content, I noticed that the custom provider created 3 messages with a system role:

  • the one I defined in the docker-agent configuration file
  • another one for a custom shell tool I defined
  • another one related to skills

docker-agent provider:

providers:
  host_dmr_provider:
    api_type: openai_chatcompletions
    base_url: http://host.docker.internal:12434/engines/v1

models:

  brain:
    provider: host_dmr_provider
    #provider: dmr
    model: huggingface.co/unsloth/qwen3.5-2b-gguf:Q4_K_M
    temperature: 0.0
    top_p: 0.95
    presence_penalty: 1.5
    max_tokens: 65536

Request:

{
  "max_tokens": 65536,
  "messages": [
    {
      "content": "You are Bob, a coding expert\n",
      "role": "system"
    },
    {
      "content": "## Custom Shell Tools\n\n### execute_command\nExecute a shell command and return its stdout and stderr output.\n- `command`: The shell command to execute.\n\n",
      "role": "system"
    },
    {
      "content": "Skills provide specialized instructions for specific tasks. When a user's request matches a skill's description, use read_skill to load its instructions.\n\n<available_skills>\n  <skill>\n    <name>greetings</name>\n    <description>when the user writes \"node greetings\" to somebody, the agent will run this skill with the appropriate parameter.</description>\n  </skill>\n  <skill>\n    <name>vulcan-salute</name>\n    <description>when the user writes \"vulcan salute\" or \"vulcan-salute\" to somebody, the agent will run this skill with the appropriate parameter.</description>\n  </skill>\n  <skill>\n    <name>what-time-is-it</name>\n    <description>display the current date and time</description>\n  </skill>\n</available_skills>",
      "role": "system"
    },
    {
      "content": "what is your favourite colour?",
      "role": "user"
    }
  ],
  "model": "huggingface.co/unsloth/qwen3.5-2b-gguf:Q4_K_M",
  "parallel_tool_calls": true,
  "presence_penalty": 1.5,
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "temperature": 0,
  "tools": [
    {
      "function": {
        "description": "Execute a shell command and return its stdout and stderr output.",
        "name": "execute_command",
        "parameters": {
          "additionalProperties": false,
          "properties": {
            "command": {
              "description": "The shell command to execute.",
              "type": [
                "string",
                "null"
              ]
            }
          },
          "required": [
            "command"
          ],
          "type": "object"
        }
      },
      "type": "function"
    },
    {
      "function": {
        "description": "Read the content of a skill by name. Use this when a user's request matches an available skill.",
        "name": "read_skill",
        "parameters": {
          "additionalProperties": false,
          "properties": {
            "name": {
              "description": "The name of the skill to read",
              "type": "string"
            }
          },
          "required": [
            "name"
          ],
          "type": "object"
        }
      },
      "type": "function"
    }
  ],
  "top_p": 0.95
}

3. docker-agent + custom provider for ollama (it works)

docker-agent ollama custom provider:

providers:
  host_ollama_provider:
    api_type: openai_chatcompletions
    base_url: http://host.docker.internal:11434/v1

models:

  brain:
    provider: host_ollama_provider
    #provider: dmr
    model: qwen3.5:2b
    temperature: 0.0
    top_p: 0.95
    presence_penalty: 1.5
    max_tokens: 65536

I took a look at Ollama's code — there is a renderer for Qwen3.5 models that works around the theory and assumes that there can be multiple system messages in a conversation: https://github.com/ollama/ollama/blob/main/model/renderers/qwen35.go#L138

Ollama doesn't execute the Jinja template embedded in the GGUF. Tt directly generates the token string in Go even if it finds system messages at a position other than 0, and injects them in the middle of the conversation.

if message.Role == "user" || (message.Role == "system" && i != 0) {
	sb.WriteString(imStartTag + message.Role + "\n" + content + imEndTag + "\n")
}

DMR related issue: docker/model-runner#827

Conclusion

In my opinion, in a multi-agent system that shares its history, this issue is likely to occur again when using DMR.

I have several workarounds to continue preparing my demos with docker-agent + DMR + sbx:

  • don't use models from the Qwen 3.5 family
  • don't use sbx in order to use the dmr provider
  • patch the model (that's my next plan 🤓)

I'm not sure what the best strategy is to fix this:

  • do it on the docker-agent side, but the issue will arise with other agents using DMR (e.g. with shared history)
  • do it like Ollama, and create a specific renderer on the DMR side
  • document how to patch the model
  • provide our own version of the model

I'm going to work on the last two points.

Here is the source code of my experiments if you need to reproduce them: https://codeberg.org/docker-agents/custom-provider-tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions