Baidu open-sources Ernie-Image model, bringing top-tier rendering to consumer GPUs

  • Baidu open-sourced its 8-billion-parameter text-to-image model Ernie-Image, which runs smoothly on consumer graphics cards with 24 GB of VRAM.
  • The model excels in following complex instructions and multi-language text rendering, boasting comprehensive capabilities comparable to top-tier closed-source models.
Baidu open-sources Ernie-Image model, bringing top-tier rendering to consumer GPUs
(Image credit: Ernie Bot)

Baidu announced the open-sourcing of its text-to-image (T2I) model, Ernie-Image, with the core highlight being its ability to achieve top-tier image rendering on consumer-grade graphics cards.

Featuring only 8 billion parameters and equipped with a lightweight Prompt Enhancer, the model can handle top-tier rendering on consumer GPUs with 24 GB of VRAM, according to an announcement by Baidu's Ernie Bot team on Wednesday.

Ernie-Image is developed based on a single-stream Diffusion Transformer architecture and comes with a lightweight prompt enhancer.

It expands brief inputs into richer, more structured descriptions and has surpassed similar open-source models in multiple international benchmarks, the announcement said.

The model stands out in following complex instructions and text rendering, making it highly suitable for content production requiring multi-panel layouts, such as posters and comics.

It supports multi-language generation in Chinese, English, Japanese, and Korean, featuring clear typography and precise strokes that reach a leading level in the open-source field, the company said.

Currently, the model weights and inference code for Ernie-Image have been open-sourced on the Hugging Face platform under the Apache 2.0 license.

It already supports the ComfyUI workflow and has launched a GGUF quantization solution in collaboration with Unsloth.

Prior to its official open-source release, the model underwent a two-week internal testing phase by over 30 companies and 20 art designers.

Evaluation results show that Ernie-Image holds a leading position in comprehensive performance among all open-source models.

Particularly in text rendering capabilities, it has achieved state-of-the-art (SOTA) results for open-source models, placing it in the top tier alongside closed-source commercial models like NanoBanana, according to the company.

Baidu launched a zero-deployment AI agent service named DuClaw on Wednesday to lower the technical threshold for users.
Mar 12, 2026
AI News Alert
Subscribe to receive email notifications immediately when new articles about AI are published.
AI
View more channels