Skip to content

feat: cli OpenAI-compatible API response_format support#884

Open
markstur wants to merge 13 commits intogenerative-computing:mainfrom
markstur:issue_824
Open

feat: cli OpenAI-compatible API response_format support#884
markstur wants to merge 13 commits intogenerative-computing:mainfrom
markstur:issue_824

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented Apr 17, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

feat: cli OpenAI-compatible API `response_format` support

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur requested a review from a team as a code owner April 17, 2026 23:37
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@markstur markstur changed the title Issue 824 feat: cli OpenAI-compatible API response_format support Apr 17, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Apr 17, 2026
Comment thread cli/serve/app.py
Comment thread cli/serve/streaming.py Outdated
Comment thread cli/serve/app.py Outdated
Comment thread cli/serve/app.py Outdated
Comment thread cli/serve/models.py
Comment thread cli/serve/app.py Outdated
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)

Comment thread cli/serve/app.py Outdated
Comment thread cli/serve/app.py
* elif/pass not needed. Comment
* set finish reason from result

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@planetf1
Copy link
Copy Markdown
Contributor

OpenAI will return a finish_reason of length if limits are hit.

If we're trying to be OpenAI compatible (which is the intent here) my default response would be to say we should emulate that behaviour -- though I am uncomfortable in some ways with returning syntactically invalid json.. I guess the rationale is 'we returned what we could'. In some ways it comes down to our 'contract' with callers

For streaming this is a key difference as we fail validation (after the data has been streamed)

We also have the inconsistency between streaming (here) and non-streaming. For the latter we're already returning the same as OpenAI since we missed doing any validation...

I'd say to remove the validation -- then we have the same behaviour and it's down to the format parameter and how the backend constrained decoding works. I think this is what @jakelorocco describes

I think this also means my early comment about ADDING streaming validatino is something I no longer feel is the right way forward. In fact the opposite -- remove

@psschwei
Copy link
Copy Markdown
Member

+1 on emulating openai as faithfully as possible

* inspect server() to set accepts_format and is_async in make_chat_endpoint instead of on every request
* improved support for json_schema that can be converted to pydantic
* no output validation. The module is given a format when applicable, but instead
  of throwing an error if the formatting is incorrect, the results are sent to the client.
* consolidated redundant blocks of kwargs by building a dict of kwargs needed.
* improved setting of finish_reason
* added tests
* improved comments/docstrings where clarification needed

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur
Copy link
Copy Markdown
Contributor Author

removed the validation so now it is more consistent with what you might expect from an OpenAI API server that will return what it has even if it is truncated or otherwise broken.

@markstur markstur enabled auto-merge May 1, 2026 00:23
Comment thread cli/serve/schema_converter.py Outdated
Comment thread cli/serve/schema_converter.py Outdated
Comment thread cli/serve/schema_converter.py Outdated
Comment thread cli/serve/utils.py
Comment thread test/cli/test_serve.py
Comment thread cli/serve/app.py
Comment thread cli/serve/models.py
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting some changes. I think the main one is around the enum clash -- the others add value over what was there before, so the comments are more about completeness rather than regression - so I think it would be good to fix up, but minded to approve when you are comfortable with the changes. nothing else is a blocker

markstur added 8 commits May 5, 2026 15:28
Add Pydantic model_validator to ResponseFormat class to validate that
json_schema field is provided when type is 'json_schema'. This moves
validation from the endpoint handler to the model level, providing
earlier error detection and cleaner separation of concerns.

Changes:
- Import model_validator from pydantic in cli/serve/models.py
- Add validate_json_schema_required() validator to ResponseFormat
- Update test to expect ValidationError at model instantiation

The validator raises ValueError with message "json_schema field is
required when type is 'json_schema'" when validation fails.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
- Remove redundant validation check from cli/serve/app.py
- Use cast() for type narrowing

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
The exception handler at line 178 in cli/serve/app.py now catches
both ValueError and TypeError when validating JSON schemas. This
handles edge cases where:
- Enum values contain unhashable types (lists, dicts)
- Pydantic's create_model() receives malformed field definitions

Other exceptions (KeyError, AttributeError, RecursionError) are
intentionally not caught as they indicate internal bugs in the
schema converter, not invalid user input.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
Replace conditional asserts with unconditional checks in
test_json_object_format_no_schema and test_text_format_no_schema.

The previous pattern 'if "format" in kwargs: assert kwargs["format"] is None'
would pass silently if format was absent, failing to verify the intended
behavior. Changed to 'assert "format" not in kwargs' to properly test
that no format parameter is passed for json_object and text response types.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
… to string

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Replace Enum generation with Literal[tuple(values)] to preserve
case-variant enum values like ["open", "OPEN"] that were previously
collapsing to a single member.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
@markstur
Copy link
Copy Markdown
Contributor Author

markstur commented May 6, 2026

Addressed all the feedback. Thanks for the great feedback!

So, the limited finish_reason implementation is not too bad now, but really it should've been the first PR before this and other PRs gradually did bits of it.

The fact that we need a home-grown schema_converter is unfortunate, but now we have one and it's gotten better and better in this PR.

@markstur markstur requested a review from planetf1 May 6, 2026 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

m serve OpenAI API structured output

4 participants