QIMING 910 AI Chip (MUSEv1 Architecture)

Last updated on Jun 6, 2021

QIMING-910 was my first solo tape-out project completed at Archip Lab during my senior year. I was in charge of all the development, from architecture design to the final RTL netlist, which was then shifted to UMC for the backend design and fabrication in the summer of 2019. Finally, with the help of some engineers in our team, we successfully verified this chip in December 2019! This experiment gave me insight into IC development, and part of this work was presented in DAC 2020.

The name of the architecture, MUSE, means MUlti-grained Sparsity Engine. The architecture of the chip leverages ternary quantization (-1, 1, 0) for weights, with which we can simply implement multiplication with MUX-based ALUs instead of the complicated logic. Moreover, using the ternary quantization significantly saves memory overhead by 16 $\times$ compared to the single floating-point baseline. It also reflects in a dramatic increase in the percentage of zeros, which provides us with an opportunity for us to skip redundant multiplication when the weight is zero. Therefore, we correspondingly propose a zero-aware processing element to process the data.

Fabricated in the UMC 55nm SP CMOS process, QIMING-910 achieves peak performance and power efficiency of 204.8 GOPS and 3.16 TOPS/W respectively. With the help of ternary-quantization, multiplier-free ALU design, and zero-aware processing, the power efficiency is 2 $\times$ ~ 10 $\times$ compared to the state-of-the-art NPU. We designed a testing board connected to XILINX Virtex-7 FPGA VC707 via FMC. VC707 here serves as a controller rather than a processor to transfer data from the PC to our chip. The overall system can accomplish the image classification task enabled by VGG-16 on the CIFAR-10 dataset.

AI chip MUSE Qiming Series

QIMING 910 AI Chip (MUSEv1 Architecture)

Zhanhong Tan

Ph.D. Candidate

Related