[Feature] HiDream-O1-Image 8B

### Feature Summary

HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders

### Detailed Description

https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev
https://huggingface.co/HiDream-ai/HiDream-O1-Image

<img width="11717" height="8744" alt="Image" src="https://github.com/user-attachments/assets/639c12c2-d029-48b8-bb74-e3bfc575ca4a" />

"Key Features
🧬 Pixel-Level Unified Transformer — One end-to-end model on raw pixels, no VAE, no disjoint text encoder.

🎨 One Model, Many Tasks — Text-to-image, long-text rendering, instruction editing, subject-driven personalization, and storyboard generation in a single architecture.

🧠 Reasoning-Driven Prompt Agent — Built-in "thinking" agent that resolves implicit knowledge, layout, and text rendering before generation.

🖼️ Native High Resolution — Direct synthesis up to 2,048 × 2,048 with sharp fine-grained detail.

⚡ Exceptional Efficiency and Versatility at 8B Scale — With only 8B parameters, achieves performance parity with or even surpasses larger open-source DiTs and leading closed-source models."

### Alternatives you considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] HiDream-O1-Image 8B #1482

Feature Summary

Detailed Description

Alternatives you considered

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] HiDream-O1-Image 8B #1482

Description

Feature Summary

Detailed Description

Alternatives you considered

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions