On the afternoon of April 2, Ouyang Jian, General Manager of Baidu Smart Chips, shared the Kunlun chips for the first time in a public class, and disclosed a number of comparison data between the KunlunK200 and Intel T4 GPUs.
Ouyang Jian also showed the killer feature of the Kunlun chip and its compatibility with the China-made processor Feiteng via video.
At the 2018 Baidu AI Developer Conference, Baidu founder, chairman and CEO Robin Li announced the launch of his own AI chip, Kunlun.
Baidu AI chip accumulation was due to its FPGA to do the accumulation of AI acceleration, but also thanks to its software-defined accelerator and XPU architecture years of accumulation.
Baidu first started using FPGAs for AI architecture research and development in 2010, small-scale deployment went online in 2011, more than 10,000 FPGAs were deployed in 2017, AI chips were released in 2018, the second half of 2019, and mass production began in 2020.
Kunlun chips are positioned as general-purpose AI chips, with the goal of providing high-performance, low-cost, highly flexible AI chips.
Ouyang Jian said, "Compared to GPUs, the Kunlun chip has done a good job of being versatile and programmable, and we're still working on making the programmability better."
After the release of Kunlun, news about it was released one after another. Architecture-wise, Kunlun has 2 computing units, 512GB/S of memory bandwidth and 16MB SRAM/unit.
According to Ouyang Jian, 16MB SRAM is good for AI inference, XPU-SDNN on XPU architecture is designed for Tensor and so on, and XPU-Cluster can meet the needs of general processing.
Kunlun's first-generation chips do not use NVLink but are interconnected via the PCIE 4.0 interface. Backed by Samsung's 14nm manufacturing process and 2.5D package, the Kunlun chip can reach peak performance of 260TOPS and power consumption of 150W.
In terms of flexibility and ease of use, Kunlun offers developers a software stack similar to NVIDIA's CUDA, which can be programmed in C/C++ language, making it less difficult for developers to develop.
Currently, based on the first generation of Kunlun chips, Baidu has launched two AI acceleration cards, K100 and K200, the former with twice the arithmetic power and power consumption of the latter.
In today's share, Ouyang Jian gives a series of K200 vs. NVIDIA T4 data, in which the KunlunK200's Benchmark scores over 2,000, more than 3 times that of the NVIDIA T4, under the Gemm-Int8 data type, 4K X 4K matrix.
The Kunlun also has a significant performance advantage under the Bert/Ernie test model commonly used for voice.
In terms of online performance data, the Kunlun's performance is more stable than the NVIDIA T4, and the latency has an advantage.
In the image segmentation YOLOV3 algorithm, Kunlun has an advantage, but the advantage is no longer as obvious. But Ouyang Jian said Baidu is still improving Kunlun's performance through continuous optimization.
He also said that Kunlun has been applied at Baidu's internal scale. As for the external provision of AI computing power, on December 13 last year Baidu provided Kunlun computing power through the Baidu Cloud by way of targeted invitation.
In addition to providing Kunlun's computing power through Baidu Cloud, Ouyang Jian also demonstrated the use of Kunlun acceleration cards in industrial smart devices, demonstrating the use of CPUs and Kunlun acceleration cards for product defect detection, Kunlun can greatly increase speed, but did not give specific comparative data.
Another showcase is Kunlun's killer feature, which is adapted to the China-made processor platform Feiteng.
At the 2019 Feiteng Eco-Partner Conference, Ouyang Jian revealed that KunlunAI chips are adapting to China-made Feiteng servers to do performance tuning work.
In today's online share, Ouyang Jian demonstrates the significant acceleration in image splitting speed that comes with the Kunlun acceleration card.
As the representative of China-made core, Kunlun chose to match with Feteng for the big market of China-made's own chips.
Through the way of Feiteng CPU + KunlunAI accelerator, both parties can better realize the China-made chip in the server market China-made and can also be considered as an important driving force and killer application for the future growth of KunlunAI chip and accelerator card.