Highly reliable interleaved memory systems for uniprocessor and multiprocessor computer architectures are presented. The memory systems are divided into groups. Each group consists of several banks and each bank has several modules. The error model is defined at the memory-module level. A module is faulty if any single or multiple faults result in loss of the entire module. Spare modules, as well as spare banks, are included in the systems to enhance reliability and availability. A faulty module is replaced by a spare module within a bank first, and, if the bank has no redundancy remaining for the faulty module, the whole bank will be replaced by a spare bank at the next higher level. The structure of the reconfigurable memory system is designed in such a way that the replacement of faulty modules (banks) by spare modules (banks) will not disturb memory references if each bank (group) has at most two spare modules (banks). If there are more than two spare modules (banks) in a bank (group), a second-level address translator is designed which can prohibit references to faulty modules by address remapping. The address translator can be implemented with a CAM or switches. Analysis results show that the system reliability can be significantly improved with little hardware overhead. Also, a typical system with one redundant row of modules has the highest cost-effectiveness during its useful lifetime period. User transparency in memory access is retained.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics