Background
The increasing use of digital systems in everyday life has made reliability a key factor in the design of modern microprocessors. Soft errors are caused by high-energy particles, power supply noises, transistor variability, and can modify the logic value stored in microprocessor memory elements, which can cause a timing or functional failure. Historically, soft errors were considered only a challenge for high-altitude applications because most of the high-energy particles are cascaded by the earth’s atmosphere before they reach ground level. However, the problem is now expanding to terrestrial-level particles due to changes in the atmosphere.
Software-level soft error tolerant schemes are promising because against hardware-based solutions, they can be applied on commercial-off-the-shelf processors selectively, either to only the safety/mission-critical applications, or only to the critical parts of an application.
Invention Description
Researchers at Arizona State University have developed NEMESIS, a novel compiler-level fine-grain soft error technique for detection, diagnosis and recovery that can provide a high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of a mismatch, the NEMESIS recovery routine reverts the effect of error from the architectural state of the program and resumes normal execution of the program.
Potential Applications
Benefits & Advantages
Related Publication: NEMESIS: A software approach for computing in presence of soft errors | IEEE Conference Publication