Fault-tolerant parallel processing with real-time error detection and recovery

Chung Ho Chen, Arun K. Somani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The performance of parallel processing for real-time application is very sensitive to the reliability of the system. This paper presents a unique error recovery mechanism based on new cache states, verified and non-verified, to detect and recover errors produced by the processor or cache memory or both due to transient faults. The proposed scheme remedies the insufficiency of the error-correcting code when facing with processor transient fault. This cache-based recovery metkod not only recovers errors in a local cache memory but also prevents the propagation of errors to other caches. We show that this new error recovery scheme can be easily integrated with existing cache coherency protocols.

Original languageEnglish
Title of host publicationConference Record of the 26th Asilomar Conference on Signals, Systems and Computers, ACSSC 1992
PublisherIEEE Computer Society
Pages994-998
Number of pages5
ISBN (Electronic)0818631600
DOIs
Publication statusPublished - 1992
Event26th Asilomar Conference on Signals, Systems and Computers, ACSSC 1992 - Pacific Grove, United States
Duration: 1992 Oct 261992 Oct 28

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
ISSN (Print)1058-6393

Conference

Conference26th Asilomar Conference on Signals, Systems and Computers, ACSSC 1992
Country/TerritoryUnited States
CityPacific Grove
Period92-10-2692-10-28

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Fault-tolerant parallel processing with real-time error detection and recovery'. Together they form a unique fingerprint.

Cite this