Skip to content

[Feature] HiDream-O1-Image 8B #1482

@GreenShadows

Description

@GreenShadows

Feature Summary

HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders

Detailed Description

https://huggingface.co/HiDream-ai/HiDream-O1-Image-Dev
https://huggingface.co/HiDream-ai/HiDream-O1-Image

Image

"Key Features
🧬 Pixel-Level Unified Transformer — One end-to-end model on raw pixels, no VAE, no disjoint text encoder.

🎨 One Model, Many Tasks — Text-to-image, long-text rendering, instruction editing, subject-driven personalization, and storyboard generation in a single architecture.

🧠 Reasoning-Driven Prompt Agent — Built-in "thinking" agent that resolves implicit knowledge, layout, and text rendering before generation.

🖼️ Native High Resolution — Direct synthesis up to 2,048 × 2,048 with sharp fine-grained detail.

⚡ Exceptional Efficiency and Versatility at 8B Scale — With only 8B parameters, achieves performance parity with or even surpasses larger open-source DiTs and leading closed-source models."

Alternatives you considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions