feat: cli OpenAI-compatible API `response_format` support by markstur · Pull Request #884 · generative-computing/mellea

markstur · 2026-04-17T23:37:58Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes m serve OpenAI API structured output #824

feat: cli OpenAI-compatible API `response_format` support

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

- Added `JsonSchemaFormat` model to represent JSON schema definitions - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`) - Used field alias to avoid conflict with Pydantic's `schema` method - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately) - Modified `make_chat_endpoint()` to: - Parse `response_format` from requests - Convert `json_schema` type to Pydantic models using the utility function - Detect if the serve function accepts a `format` parameter using `inspect.signature()` - Pass the generated Pydantic model as `format=` parameter to serve functions that support it - Handle backward compatibility with serve functions that don't accept `format` - Added proper error handling for invalid schemas - Test json_schema format is converted to Pydantic model and passed to serve - Test json_object format doesn't pass a schema - Test text format doesn't pass a schema - Test error handling for missing json_schema field - Test error handling for invalid JSON schemas - Test backward compatibility with serve functions without format parameter - Test optional fields in JSON schemas When a client sends a request with `response_format.type = "json_schema"`, the server: 1. Extracts the JSON schema from `response_format.json_schema.schema` 2. Dynamically creates a Pydantic model from the schema 3. Passes it as the `format=` parameter to the serve function 4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

github-actions · 2026-04-17T23:38:14Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

jakelorocco

I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)

* elif/pass not needed. Comment * set finish reason from result Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

planetf1 · 2026-04-23T08:53:25Z

OpenAI will return a finish_reason of length if limits are hit.

If we're trying to be OpenAI compatible (which is the intent here) my default response would be to say we should emulate that behaviour -- though I am uncomfortable in some ways with returning syntactically invalid json.. I guess the rationale is 'we returned what we could'. In some ways it comes down to our 'contract' with callers

For streaming this is a key difference as we fail validation (after the data has been streamed)

We also have the inconsistency between streaming (here) and non-streaming. For the latter we're already returning the same as OpenAI since we missed doing any validation...

I'd say to remove the validation -- then we have the same behaviour and it's down to the format parameter and how the backend constrained decoding works. I think this is what @jakelorocco describes

I think this also means my early comment about ADDING streaming validatino is something I no longer feel is the right way forward. In fact the opposite -- remove

psschwei · 2026-04-24T12:59:20Z

+1 on emulating openai as faithfully as possible

* inspect server() to set accepts_format and is_async in make_chat_endpoint instead of on every request * improved support for json_schema that can be converted to pydantic * no output validation. The module is given a format when applicable, but instead of throwing an error if the formatting is incorrect, the results are sent to the client. * consolidated redundant blocks of kwargs by building a dict of kwargs needed. * improved setting of finish_reason * added tests * improved comments/docstrings where clarification needed Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur · 2026-04-29T08:07:50Z

removed the validation so now it is more consistent with what you might expect from an OpenAI API server that will return what it has even if it is truncated or otherwise broken.

planetf1

Suggesting some changes. I think the main one is around the enum clash -- the others add value over what was there before, so the comments are more about completeness rather than regression - so I think it would be good to fix up, but minded to approve when you are comfortable with the changes. nothing else is a blocker

Add Pydantic model_validator to ResponseFormat class to validate that json_schema field is provided when type is 'json_schema'. This moves validation from the endpoint handler to the model level, providing earlier error detection and cleaner separation of concerns. Changes: - Import model_validator from pydantic in cli/serve/models.py - Add validate_json_schema_required() validator to ResponseFormat - Update test to expect ValidationError at model instantiation The validator raises ValueError with message "json_schema field is required when type is 'json_schema'" when validation fails. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

- Remove redundant validation check from cli/serve/app.py - Use cast() for type narrowing Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

The exception handler at line 178 in cli/serve/app.py now catches both ValueError and TypeError when validating JSON schemas. This handles edge cases where: - Enum values contain unhashable types (lists, dicts) - Pydantic's create_model() receives malformed field definitions Other exceptions (KeyError, AttributeError, RecursionError) are intentionally not caught as they indicate internal bugs in the schema converter, not invalid user input. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

Replace conditional asserts with unconditional checks in test_json_object_format_no_schema and test_text_format_no_schema. The previous pattern 'if "format" in kwargs: assert kwargs["format"] is None' would pass silently if format was absent, failing to verify the intended behavior. Changed to 'assert "format" not in kwargs' to properly test that no format parameter is passed for json_object and text response types. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

… to string Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Replace Enum generation with Literal[tuple(values)] to preserve case-variant enum values like ["open", "OPEN"] that were previously collapsing to a single member. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com> Assisted-by: IBM Bob

markstur · 2026-05-06T00:50:54Z

Addressed all the feedback. Thanks for the great feedback!

So, the limited finish_reason implementation is not too bad now, but really it should've been the first PR before this and other PRs gradually did bits of it.

The fact that we need a home-grown schema_converter is unfortunate, but now we have one and it's gotten better and better in this PR.

markstur added 3 commits April 17, 2026 13:03

feat: add response_format support in cli when streaming

79f327f

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

feat: cli response_format features adding doc examples

af7e905

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur requested a review from a team as a code owner April 17, 2026 23:37

markstur requested review from jakelorocco and planetf1 April 17, 2026 23:37

markstur changed the title ~~Issue 824~~ feat: cli OpenAI-compatible API response_format support Apr 17, 2026

github-actions Bot added the enhancement New feature or request label Apr 17, 2026

planetf1 reviewed Apr 20, 2026

View reviewed changes

Comment thread cli/serve/app.py

planetf1 reviewed Apr 20, 2026

View reviewed changes

Comment thread cli/serve/streaming.py Outdated

planetf1 reviewed Apr 20, 2026

View reviewed changes

Comment thread cli/serve/app.py Outdated

planetf1 reviewed Apr 20, 2026

View reviewed changes

Comment thread cli/serve/app.py Outdated

planetf1 reviewed Apr 20, 2026

View reviewed changes

Comment thread cli/serve/models.py

planetf1 reviewed Apr 20, 2026

View reviewed changes

Comment thread cli/serve/app.py Outdated

jakelorocco reviewed Apr 22, 2026

View reviewed changes

Comment thread cli/serve/app.py Outdated

Comment thread cli/serve/app.py

fix: cli use output stop reason and cleanup extra conditions

bfcd500

* elif/pass not needed. Comment * set finish reason from result Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur requested review from jakelorocco and planetf1 April 29, 2026 08:07

markstur enabled auto-merge May 1, 2026 00:23