Tool results are always stringified, preventing multimodal (image) tool responses with OpenAI Responses API

## Problem

Tool results are always converted to strings via `JSON.stringify()`, which prevents sending multimodal content (e.g. images) in tool responses when using the OpenAI Responses API.

The OpenAI Responses API supports [multimodal tool outputs](https://developers.openai.com/api/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output) via `function_call_output`, where `output` can be either a string **or** an array of content parts:

```json
{
  "type": "function_call_output",
  "call_id": "call_xyz",
  "output": [
    { "type": "input_image", "image_url": "https://example.com/screenshot.png" },
    { "type": "input_text", "text": "Screenshot of the current state" }
  ]
}
```

This is documented in the [migration guide](https://developers.openai.com/api/docs/guides/migrate-to-responses) and the [function calling guide](https://developers.openai.com/api/docs/guides/function-calling) ("For functions that return images or files, you can pass an array of image or file objects instead of a string").

However, TanStack AI forces all tool results to strings at **two points**, making it impossible to use this capability:

### 1. Server-side tool execution — `tool-calls.ts`

https://github.com/TanStack/ai/blob/fda4b06afbfb6a5e4ff477abfe10c9eab238e709/packages/typescript/ai/src/activities/chat/tools/tool-calls.ts#L181-L182

### 2. Client-side tool result processing — `processor.ts`

https://github.com/TanStack/ai/blob/fda4b06afbfb6a5e4ff477abfe10c9eab238e709/packages/typescript/ai/src/activities/chat/stream/processor.ts#L316-L317

### 3. OpenAI adapter message conversion — `text.ts`

https://github.com/TanStack/ai/blob/fda4b06afbfb6a5e4ff477abfe10c9eab238e709/packages/typescript/ai-openai/src/adapters/text.ts#L709-L713

## Use Case

I am building a visual editor where the AI designs on a canvas. After each tool call (e.g. `add_text`, `edit_image`), I need the AI to **see** a screenshot of the canvas to verify its changes and iterate. The canvas screenshot is available via a URL endpoint.

With the current implementation, returning `{ type: "input_image", image_url: "https://..." }` from a tool gets stringified to `"{\"type\":\"input_image\",\"image_url\":\"https://...\"}"` — the model receives it as plain text and cannot actually see the image.

I verified manually that sending the `function_call_output` with `output` as an array (with `input_image` parts) to the OpenAI Responses API works correctly — the model receives and processes the image.

## Proposed Solution

The `ModelMessage` type for `role: "tool"` should support `content` as either `string` or `Array<ContentPart>`. The stringify logic in the three locations above should preserve arrays/objects when they match the expected multimodal format, and only stringify plain objects that are not content part arrays.

The OpenAI adapter should pass the array through to `function_call_output.output` instead of stringifying it.

I am aware this will bring another discussion: since we can return anything from tools currently. Forcing it to output string seems was the immediate fix to maintain consistency communication across all models (as I am not sure which models accept content parts as tool results). But, this is a crucial requirement to support.

## Environment

- `@tanstack/ai`: latest
- `@tanstack/ai-openai`: latest  
- `@cloudflare/tanstack-ai`: latest
- OpenAI adapter uses Responses API (`client.responses.create()`)


	output:
	typeof message.content === 'string'
	? message.content
	: JSON.stringify(message.content),
	})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tool results are always stringified, preventing multimodal (image) tool responses with OpenAI Responses API #363

Problem

1. Server-side tool execution — `tool-calls.ts`

2. Client-side tool result processing — `processor.ts`

3. OpenAI adapter message conversion — `text.ts`

Use Case

Proposed Solution

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	toolResultContent =
	typeof result === 'string' ? result : JSON.stringify(result)

	// Step 2: Create a tool-result part (for LLM conversation history)
	const content = typeof output === 'string' ? output : JSON.stringify(output)

Uh oh!

Tool results are always stringified, preventing multimodal (image) tool responses with OpenAI Responses API #363

Description

Problem

1. Server-side tool execution — tool-calls.ts

2. Client-side tool result processing — processor.ts

3. OpenAI adapter message conversion — text.ts

Use Case

Proposed Solution

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Server-side tool execution — `tool-calls.ts`

2. Client-side tool result processing — `processor.ts`

3. OpenAI adapter message conversion — `text.ts`