FUJITSU

SPARC64™ V Provides the Same High Reliability Technology as Mainframes

The avoiding server system downtimes typically requires a range of technologies such as guaranteed error detection, recovery processing, isolation of defective areas (downgrade), error logging, and software alerts.

PRIMEPOWER draws on SPARC64™ V RAS functions, which originate with mainframe processor functions, to achieve high reliability.

Error detection & correction Primary cache tag Parity Duplex construction Cache way degeneracy
Primary cache data ECC
Secondary cache tag ECC
Secondary cache data ECC
Error detection & retry ALU Parity Instruction retry
Register Parity
Other History

Functions for Achieving High Reliability

Thorough error detection and data protection

The arithmetic logic unit (ALU)s, registers, caches, system buses, and all other circuits have a total of more than 800 (*1) checkers installed, so that no errors escape detection. Since data is protected by measures such as ECC, 1-bit errors can be automatically restored by the hardware itself. Data integrity is achieved by protecting data through all-out error detection.

*1 This applies to a SPARC64™ V operating at a clock speed of over 1.65 GHz.

Self-restoration from instruction processing errors

When errors arise, the hardware automatically retries instruction processing. If successfully retried, the instruction processing continues with subsequent instructions without interruption. If the retry is unsuccessfully for a given number of times, an error is assumed, and the OS is notified of the error.

Self-restoration from instruction processing errors

Dynamic cache degeneracy

If an error occurs more than a predetermined number of times, the point of failure is dynamically isolated from other parts. At this time, the OS does not need to be rebooted to minimize impact on operations.

History

This dedicated circuit logs internal operations of the processor on error occurrence. Since records can be collected transparent to software and without affecting normal operations, the cause of failure can be promptly analyzed and identified.