- Baidu open-sourced its 8-billion-parameter text-to-image model Ernie-Image, which runs smoothly on consumer graphics cards with 24 GB of VRAM.
- The model excels in following complex instructions and multi-language text rendering, boasting comprehensive capabilities comparable to top-tier closed-source models.

Baidu announced the open-sourcing of its text-to-image (T2I) model, Ernie-Image, with the core highlight being its ability to achieve top-tier image rendering on consumer-grade graphics cards.
Featuring only 8 billion parameters and equipped with a lightweight Prompt Enhancer, the model can handle top-tier rendering on consumer GPUs with 24 GB of VRAM, according to an announcement by Baidu's Ernie Bot team on Wednesday.
Ernie-Image is developed based on a single-stream Diffusion Transformer architecture and comes with a lightweight prompt enhancer.
It expands brief inputs into richer, more structured descriptions and has surpassed similar open-source models in multiple international benchmarks, the announcement said.
The model stands out in following complex instructions and text rendering, making it highly suitable for content production requiring multi-panel layouts, such as posters and comics.
It supports multi-language generation in Chinese, English, Japanese, and Korean, featuring clear typography and precise strokes that reach a leading level in the open-source field, the company said.
Currently, the model weights and inference code for Ernie-Image have been open-sourced on the Hugging Face platform under the Apache 2.0 license.
It already supports the ComfyUI workflow and has launched a GGUF quantization solution in collaboration with Unsloth.
Prior to its official open-source release, the model underwent a two-week internal testing phase by over 30 companies and 20 art designers.
Evaluation results show that Ernie-Image holds a leading position in comprehensive performance among all open-source models.
Particularly in text rendering capabilities, it has achieved state-of-the-art (SOTA) results for open-source models, placing it in the top tier alongside closed-source commercial models like NanoBanana, according to the company.