Hardware-Agnostic Message Passing Interface for Heterogeneous Computing Systems

Today's high-performance computing (HPC) systems are largely structured based on traditional central processing units (CPUs) with tightly coupled general-purpose graphics processing units (GPUs, which can be considered domain-specific accelerators). GPUs have a different programming model than CPUs and are only efficient in exploiting spatial parallelism for accelerating high-concurrency algorithms but not the temporal/pipeline parallelism vital to accelerating high-dependency algorithms that are widely used in predictive simulations for computational science. As a result, today's HPC systems still have huge room for improvement in terms of performance and energy efficiency for running complex scientific computing tasks (e.g., many large pieces of legacy HPC codes for predictive simulations are still running on CPUs).

In recent years, a few more accelerator choices for heterogeneous computing systems (e.g., HPC and other large-scale computing systems) have emerged, such as field-programmable gate arrays (FPGAs, which can be considered reconfigurable accelerators) and tensor processing units (TPUs, which can be considered application-specific accelerators). Although these new accelerators offer flexible or customized hardware architectures with excellent capabilities for exploiting temporal/pipeline parallelism efficiently, their adoption in extreme-scale scientific computing is still at its infancy and is expected to be a tortuous process regardless of their superior performance and energy efficiency benefits.

The fundamental challenge to the adoption of any new accelerators in HPC, such as FPGAs and TPUs, is that each accelerator's programming model, message passing interface, and virtualization stack is developed independently and is specific to the respective hardware architecture. With the lack of clarity in the demarcation between hardware-specific and hardware-agnostic development regions, today's programming models require domain-matter experts (DMEs) and hardware-matter experts (HMEs) to work interdependently to make a significant effort in optimizing hardware-specific codes in order to adopt new accelerator devices in HPC and gain performance benefits. This tangled association is a self-imposed bottleneck from existing programming models that impairs a future in true heterogeneous HPC and severely impacts the velocity of scientific discovery.

Researchers at Arizona State University have developed a compute-centric message passing interface (C2MPI) that provides a hardware-agnostic message passing interface for heterogeneous computing systems. Hardware-agnostic programming with high performance portability is envisioned to be a bedrock for realizing adoption of emerging accelerator technologies in heterogeneous computing systems, such as HPC systems, data center computing systems, and edge computing systems. The adoption of emerging accelerators is the key to achieving greater scale and performance in heterogeneous computing systems.

C2MPI provides a message passing specification for hardware-agnostic accelerator orchestration, an open-ended extensible multi-agent software framework that implements a set of proposed hardware-agnostic principles for enabling the portable and performance-optimized execution of hardware-agnostic application host codes across heterogeneous accelerator resources. The platform developed herein provides hardware-agnostic virtualization, routing, and arbitration layers, as well as hardware-centric partitioning and a scaling layer. Related PublicationHALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

Potential Applications:

  • High-performance computing, cloud computing
  • GPUs, FPGAs, TPUs, and other domain- and application-specific accelerators

Benefits and Advantages:

  • Hardware-agnostic programming against a unified API and true accelerator interoperability
  • Enables host code performance portability
  • Flexible hardware-agnostic environment allows application developers to develop high-performance applications without knowledge of the underlying hardware
  • Plug-and-playable for application acceleration across any network infrastructure
Patent Information: