Effective Triplication for Flexible and Real-Time Soft Error Resilience

­Background

The increasing use of digital systems in everyday life has made reliability a key factor in the design of modern microprocessors. Soft errors are caused by high-energy particles, power supply noises, transistor variability, and can modify the logic value stored in microprocessor memory elements, which can cause a timing or functional failure. Historically, soft errors were considered only a challenge for high-altitude applications because most of the high-energy particles are cascaded by the earth’s atmosphere before they reach ground level. However, the problem is now expanding to terrestrial-level particles due to changes in the atmosphere.

Software-level soft error tolerant schemes are promising because against hardware-based solutions, they can be applied on commercial-off-the-shelf processors selectively, either to only the safety/mission-critical applications, or only to the critical parts of an application.

Invention Description

Researchers at Arizona State University have developed NEMESIS, a novel compiler-level fine-grain soft error technique for detection, diagnosis and recovery that can provide a high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of a mismatch, the NEMESIS recovery routine reverts the effect of error from the architectural state of the program and resumes normal execution of the program.

Potential Applications

  • Autonomous vehicles
  • Implantable medical devices
  • High-performance computing
  • Protection against hardware malfunctions for safety/security applications

Benefits & Advantages

  • Able to detect all soft errors
  • Both control and data flow detection and recovery
  • Can recover from 97% of detected errors
  • Software-only reliability solution
  • Safe stop if an error is unrecoverable

Related Publication: NEMESIS: A software approach for computing in presence of soft errors | IEEE Conference Publication

 

Patent Information: