A 400MHz NPU with 7.8 TOPS²/W High-Performance-Guaranteed Efficiency in 55nm for Multi-Mode Pruning and Diverse Quantization Using Pattern-Kernel Encoding and Reconfigurable MAC Units

Abstract

Deployment of DNNs for edge devices significantly relies on pruning and quantization. For pruning, prior works only exploited unstructured or coarse-grained pruning. For quantization, UNPU studied various bit-width but not for diverse quantization function. There lacks a unified architecture for the diversity of pruning and quantization. Thus, we introduce a 55nm 400MHz 8.0TOPS/W NPU that leverages pattern-kernel encoding for multi-mode pruning and linear/nonlinear quantization, achieving 7.8TOPS2/W performance-guaranteed efficiency.

Publication
In Proceedings of IEEE Custom Integrated Circuits Conference (CICC), 2021
Zhanhong Tan
Zhanhong Tan
Ph.D. Candidate