Background
Deep neural networks (DNNs) are an important tool in enhancing deep learning capabilities, but they have substantial computational and memory requirements. Quantization is one of the important model compression approaches to address the mismatch between resource-hungry DNNs and resource-constrained devices, by converting the full-precision model weights or activations to lower precision. Specifically, Quantization-Aware Training (QAT) has achieved promising early results in creating low-bit models, but also lead to considerable accuracy loss and cannot achieve consistent performance on every model architecture. There is a need for a generalized, simple yet effective framework that is flexible to incorporate and improve QAT algorithms for both low-bit and high-bit quantization.
Invention Description
Researchers at Arizona State University have developed Self-Supervised Quantization-Aware Knowledge Distillation (SQAKD), which is new framework that enhances low-bit deep learning models’ performance by unifying quantization-aware training and knowledge distillation without requiring labeled data. This framework formulates Quantization-Aware Training (QAT) as a co-optimization problem, which minimizes the divergence loss between full-precision and low-bit models, improving the overall model performance and training efficiency.
Potential Applications:
Benefits and Advantages: