
PRIMEPOWER incorporates SPARC64 V SPARC processors developed by Fujitsu. As with UltraSPARC processors developed by Sun Microsystems, SPARC64 is based on the SPARC V9 architecture and has been granted a SPARC V9 certificate from SPARC International.
SPARC64 V is the most powerful processor Fujitsu has developed and is based on many years of Fujitsu expertise and cumulative technologies in computer development.
PRIMEPOWER servers with SPARC64 V are the servers most often used for core business systems required to operate continuously, around-the-clock and all-year-round. For this reason, in addition to performance, SPARC64 V has been developed with an emphasis on RAS functions. (Note 1)
This article explains why SPARC64 V development places such emphasis on RAS function enhancement and goes on to discuss the superiority of SPARC64 V RAS functions relative to competing products.
Note 1: RAS is an acronym for Reliability, Availability, and Serviceability.
Hardware failures are a primary cause of system stoppage. They can be divided into two general categories: data errors, that occur while data is flowing through the CPU, memory or bus and part failures, where a fan, power supply or other component develops a defect. Data errors can be further classified into two subcategories: fixed failures wherein a data error always occurs at a specific location, and intermittent failures (soft errors), in which data errors occur intermittently at unspecified locations.

Data errors generally indicate that a bit or bits have been inverted (from 1 to 0 or vice versa) in a storage element such as memory or cache, on a bus, or in an arithmetic/logic circuit. Fixed failures are typically caused by hardware defects, such as disconnections or short-circuits whereas, intermittent failures generally result from external sources such as radiation, microwaves, or heat.


Although fixed failures can be easily identified by their location or cause, intermittent failures occur without predictable signs or regularity with regard to location and timing. However, recent research indicates that the occurrence of intermittent failures is intimately linked to faster processor clock rates.
For example, semiconductor microfabrication technologies have resulted in smaller-scale transistor sizes. While this results in faster, higher-performance transistors, it also makes them more susceptible to the effects of radiation and microwaves, making intermittent failures more likely.

What’s more, lower power supply voltage, the acceleration of LSI and bus clock frequencies, and other technologies for achieving higher processor performance may result in more frequent bit state inversions. In other words, processor performance and rate of intermittent failures are linked by a trade-off relationship.

For some time, Fujitsu mainframe developers have focused on solving the problem of both intermittent and fixed failures. The processors for Fujitsu mainframes incorporate various RAS functions needed to detect and correct intermittent failures. SPARC64 V is developed by those same developers and incorporates the same RAS functions.
There are two important aspects to handling intermittent failures: Failsafe failure identification and recovery. SPARC64 V provides powerful error detection and recovery mechanisms to achieve this.
Intermittent failures generally occur in on-board memory locations (RAM). The processor regions in which intermittent failures are most likely to occur are storage circuits known as cache memory, which is comprised of RAM. In recent years, processors for open servers, even those of our competitors, have sought to protect data in cache memory with ECC and parity functions. These they have touted as“incorporating the equivalent of mainframe-class RAS functions.” But intermittent failures can occur in circuits other than cache memory, including arithmetic/logic units or registers and in the data buses that connect them.
Believing that processors found in servers supporting core business operations must provide RAS functions, not just for cache memory data, but for other operations; Fujitsu has introduced mechanisms to apply parity protection other circuits, including arithmetic/logic units and registers. This ensures that any failures occurring in the processor, whether fixed or intermittent, will not escape detection. If an error is detected, it is automatically corrected by hardware or by repeating the instruction. If the error still cannot be corrected, the section in error is automatically isolated and the system degenerated. The server continues to operate using alterative resources.
The contents of the operation in question are logged internally in the processor at all times. In contrast to competing products, Fujitsu systems maintain log a history of all operations, not just to record error information, but to locate when and where an error occurred. This history function helps determine causes of error faster and with greater accuracy.
Error detection and self-healing range in SPARC64 V

Fujitsu believes that without such error detection and corrective measures, no processors can be regarded as having RAS functions equal to mainframes. SPARC64 V is the only open-server processor that truly provides RAS functions equivalent to mainframes.
Certain competitors provide open-server processors that can also perform instruction retries, dynamic cache degeneration, or error logging with software assistance. But SPARC64 V is the only ‘autonomous’ processor capable of re-executing instructions and dynamically reconfiguring around errors locations, all under hardware control.
| SPARC64 V | Company A | Company B | ||
|---|---|---|---|---|
| Error detection | Primary cache memory |
Instruction : Duplex + Parity Data : ECC |
Instruction : Parity Data : ECC |
Instruction : Duplex + Parity Data : ECC |
| Secondary cache memory |
Instruction : ECC Data : ECC |
Instruction : Parity Data : ECC |
Instruction : ECC Data : ECC |
|
| Arithmetic/logic units and registers |
Parity (Note 2) |
Unimplemented |
Unimplemented |
|
| Correction | Instruction retry by hardware |
Implemented |
Unimplemented |
Unimplemented |
| Degeneration | Degeneration of cache memory dynamic way (Note 3) |
Implemented |
Unimplemented |
Unimplemented |
| Recording | History function | Implemented |
Unimplemented |
Unimplemented |

Note 2: If a parity error is detected, hardware recovery is achieved through an instruction retry function.
Note 3: The term "way" is a unit of cache memory. The caches in SPARC64 V are configured with four ways.
Although SPARC64 V was developed with an emphasis on RAS functions, it has also won first-place rankings in many famous benchmark tests, demonstrating superior competitive performance. PRIMEPOWER currently holds world-record marks in number of benchmark tests, including SPECjbb(R)2000 (as of August 21, 2006).
With such high performance and advanced RAS functions, the SPARC64 V has been recognized by worldwide research societies, (including the Processor Forum and the Ninth International Symposium on High-Performance Computer Architecture (HPCA9), and has earned the attention of engineers all over the world for its innovative technologies and development methods. Fujitsu is continuing to develop high-performance, highly-reliability SPARC64 processors and UNIX servers with the ultimate goal of providing fully dependable state-of-the-art computers to customers in all corners of the world.