Efficient hardware/software codesigns of deep learning accelerators are crucial for optimizing their performance in various applications, ranging from datacenters to mobile and wearable devices. The existing methods for optimizing deep learning accelerator designs are primarily black-box approaches, which do not consider crucial information about the underlying accelerator system being optimized. These methods include non-feedback techniques like grid search or random search, evolutionary methods such as genetic algorithms or simulated annealing, and black-box AI approaches like Bayesian optimization or reinforcement learning. However, these methods suffer from several limitations. They often require a large number of random trials and do not guarantee improvements in the objective cost of the design. Additionally, they typically focus on optimizing hardware configurations while neglecting software configurations, leading to compatibility and efficiency issues. Moreover, the time-consuming nature of these black-box optimizations hinders their dynamic usage for running deep learning models or scheduling applications in real-time scenarios.
Researchers at Arizona State University have developed a design space exploration (DSE) technique for dynamic and efficient hardware/software codesigns of deep learning accelerators. This technique introduces a gray-box optimization framework that takes into account execution costs and characteristics of deep learning accelerators to optimize their designs effectively. Unlike existing black-box approaches, this technique considers factors such as latency, power, area, and energy consumption to calculate new configurations that reduce inefficiencies and improve performance. The technique also focuses on joint hardware and software codesign, allowing for the exploration of optimized hardware configurations that are compatible with and efficient for specific software configurations. By integrating adaptive tuning of hyperparameters and efficient mappings of deep learning models onto hardware, significant improvements in efficiency and convergence speed are achieved, enabling dynamic deployment and scheduling of deep learning models in real-world applications.
Potential Applications:
Benefits and Advantages: