Cache protocol for error detection and recovery in fault-tolerant computing systems

Chung-Ho Chen, Arun K. Somani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We propose an error detection and recovery protocol for redundant processor systems employing caches. The protocol allows cache-based systems to vote more often and thereby reduce the chance of losing synchronization. The scheme is based on cache data broadcasting of a dirty line after modification. The scheme effectively exploits the redundancy of a fault-tolerant system using hardware voting. It recovers from erroneous data written by a processor and thus remedies the insufficiency of error-correcting codes. The protocol can also be used to speedup resynchronization process for a temporarily failed processor in a redundant system. More than 60% of cache lines are fully covered for recovery due to errors originated from the cache itself, including unrecoverable ECC errors. The performance overhead is to broadcast only 2-3% of the total memory references.

Original languageEnglish
Title of host publicationDigest of Papers - International Symposium on Fault-Tolerant Computing
PublisherPubl by IEEE
Pages278-287
Number of pages10
ISBN (Print)0818655224
Publication statusPublished - 1994
EventProceedings of the 24th International Symposium on Fault-Tolerant Computing - Austin, TX, USA
Duration: 1994 Jun 151994 Jun 17

Other

OtherProceedings of the 24th International Symposium on Fault-Tolerant Computing
CityAustin, TX, USA
Period94-06-1594-06-17

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Engineering(all)

Fingerprint Dive into the research topics of 'Cache protocol for error detection and recovery in fault-tolerant computing systems'. Together they form a unique fingerprint.

Cite this