Deep Neural Networks (DNNs) have become integral to numerous applications, from image recognition to video processing, touching almost every aspect of modern life. The expansion of DNN applications has led to increasing demands on underlying hardware architectures, particularly in terms of memory bandwidth and communication requirements. Despite numerous advancements, existing DNN accelerators, often characterized by their rigid Network-on-Chips (NoCs) and centralized buffer designs, struggle to efficiently support the simultaneous execution of multiple applications, leading to suboptimal performance due to insufficient data reuse, high DRAM access rates, and inadequate support for diverse dataflows.
Researchers at George Washington University have addressed these limitations by developing Venus, a versatile DNN accelerator architecture which enhances flexibility and scalability in DNN hardware with efficient communication and computation capabilities. Venus employs a tile-based architecture with distributed buffering, where each tile comprises an array of processing elements (PEs) and a portion of the distributed buffer. Venus features a flexible Network-on-Chip (NoC) which can adjust to the specific communication demands of different DNN models, thereby maximizing data reuse, minimizing DRAM accesses, and supporting multiple dataflows. Simulation results demonstrate that Venus achieves substantial improvements over baseline designs (NVDLA, ShiDianNao, Eyeriss, Planaria, Simba) in terms of runtime reduction (81%, 79%, 90%, 75%, and 50% on average) and energy consumption reduction (73%, 71%, 86%, 69%, and 62% on average) making it a valuable contribution to the field of deep learning hardware acceleration.
Fig. 1: Proposed Venus accelerator architecture
Advantages:
Applications: