PRIMEQUEST has thorough error recovery mechanisms. The heart of the server where a system failure can bring down the entire system is protected by multiple RAS mechanisms.
- The QPI System protects itself by re-sending data, bandwidth degradation, and re-routing on failure
- Memory protection by Xeon® processor 7500 series and E7 family : Error Check and Correction, Double Data Device Correction by Xeon E7 family, Single Data Device Correction by Xeon 7500 series. Memory Mirroring, and Memory Scrubbing available for all processors.
- The processor protects itself by error detection and correction mechanisms. The processor functions and circuits, such as cache memory (Levels 1, 2, and 3), registers, ALUs, and TLBs all incorporate such mechanisms.
Despite such thorough protection mechanisms, it is not possible to remove all failure situations. That is why PRIMEQUEST also provides an error recovery mechanism for system board failures. A Reserved System Board mechanism enables automatic switching of a failed system board to a reserved board.
Reserved System Board benefits include:
- System performance can quickly recover to the original level
- The system board, while waiting to be used for recovery, can be used for development purposes.
Swift System Recovery on System Board Failure
- Failed System Boards are isolated from the system automatically
- Failed System Board replacement is automatic
- A simple operation moves the existing I/O resources to the new System Board
- The failed system board can be safely removed for maintenance or replacement