A Versatile Accelerator Design for Multiple Deep Neural Network Applications

Deep Neural Networks (DNNs) have become integral to numerous applications, from image recognition to video processing, touching almost every aspect of modern life. The expansion of DNN applications has led to increasing demands on underlying hardware architectures, particularly in terms of memory bandwidth and communication requirements. Despite numerous advancements, existing DNN accelerators, often characterized by their rigid Network-on-Chips (NoCs) and centralized buffer designs, struggle to efficiently support the simultaneous execution of multiple applications, leading to suboptimal performance due to insufficient data reuse, high DRAM access rates, and inadequate support for diverse dataflows.

Researchers at George Washington University have addressed these limitations by developing Venus, a versatile DNN accelerator architecture which enhances flexibility and scalability in DNN hardware with efficient communication and computation capabilities. Venus employs a tile-based architecture with distributed buffering, where each tile comprises an array of processing elements (PEs) and a portion of the distributed buffer. Venus features a flexible Network-on-Chip (NoC) which can adjust to the specific communication demands of different DNN models, thereby maximizing data reuse, minimizing DRAM accesses, and supporting multiple dataflows. Simulation results demonstrate that Venus achieves substantial improvements over baseline designs (NVDLA, ShiDianNao, Eyeriss, Planaria, Simba) in terms of runtime reduction (81%, 79%, 90%, 75%, and 50% on average) and energy consumption reduction (73%, 71%, 86%, 69%, and 62% on average) making it a valuable contribution to the field of deep learning hardware acceleration.

Fig. 1: Proposed Venus accelerator architecture

Advantages:

  • Enhanced Flexibility and Scalability in Hardware Architecture
  • Optimized Data Utilization and Efficiency
  • Adaptive Communication for Various DNN Models

Applications:

  • Enabling high-performance DNN inference for real-time video analysis and autonomous systems.
  • Versatile integration for processors and domain-specific accelerators, supporting advancements in AI/ML chip technology.
  • Multi-application support for AI/ML chips, enhancing multi-model machine learning applications.
  • Advancement in energy-efficient computing.
Patent Information: