High Reliability Design at the Component Level
The basic system components (disk, power supplies and fans, etc.) are redundancy-structured and hot-swappable. CPU modules support dynamic degeneracy so that the system will continue to operate even when a failure occurs.

Component redundancy
The system's basic components such as disks, power supplies, and fans are redundancy structured. Even when a problem occurs on one side of the redundancy structure, the other side remains active and operates normally to allow operations to continue uninterrupted.
Technical description: Mechanism of N+1 redundancy
For example, assume a server operating with 3,600 W of power and power supply units capable of supplying up to 1,500 W of power per unit. In this case, although the server can be operated with three power supply units, a total of four power supply units can be deployed to ensure uninterrupted system operations.
Since four power supply units supply power for the server, even if one power supply unit fails, power from the remaining three other power supply units allows uninterrupted operation.

Where three power supply units supply power to the server with one power supply unit standing by, if one of the three power supply units breaks down, only two power supply units remain to supply power to the server - until the standby power supply unit is up and running. Since two power supply units can only generate 3,000 W of power, the server will suffer an outage.

Cnfiguration that prevents system outage by preparing the needed number of units required for operation plus one extra unit, as in this case, is referred to as "N+1 redundancy." N+1 redundancy also applies to fans and other components.
Hot swap and hot-plug
To ensure continuous, 24x7 operation, PRIMEPOWER is equipped with both "hot swap and hot-plug" capabilities. These allow users to replace faulty components or add new components to meet increased system demand while the system is running. These features help minimize maintenance-associated system outages.
Hot swapping of primary system components
Hot swapping of the basic system components (e.g., fans, power supplies, disk) is supported.
Hot swapping or hot-plugging of CPU and memory
The partitioning or dynamic reconfiguration feature may be used to change or add CPUs, memory, and other components without suspending business operations. This feature is offered with PRIMEPOWER 900 and above.
Hot swapping of PCI cards
Support is provided for hot swapping of fibre channel and LAN cards (PCI Hot Plug)*. This function is offered with PRIMEPOWER 900 and above.
*: Hot swapping may not be supported for certain types of PCI cards.
Downgrading function
If a failure occurs in a CPU, memory, or the PCI bus, the system automatically isolates the faulty area and restarts. If 1-bit errors occur frequently in the instructions or data cache of a CPU, the defective area can be dynamically separated without rebooting the OS. This downgrading function provides high failure tolerance, even for very rare failures.
Dynamic downgrading
SPARC64™ V deploys dynamic downgrading functions that isolate a CPU without stopping the system in the case of a failure to ensure uninterrupted operation.
Technical description: Mechanism of dynamic downgrading
Of the circuits that comprise the CPU, the cache is prone to intermittent failures. In the event of frequent 1-bit errors, the cache is dynamically downgraded stepwise, one way at a time. For example, the secondary cache of SPARC64 V is 4-way structured; even when an error occurs in way 1, the other three ways continue to operate. When all ways downgraded, the cache is dynamically downgrade in CPU units.
Thus, even in such an unlikely event, the system continues to run uninterrupted, with minimal performance degradation. In CPUs provided by competitors, if an error occurs in the cache, the cache is only downgraded after rebooting or dynamically degeneration in CPU units. This more general procedure, when applied for cache errors, significantly reduces system availability.

Downgrading during system startup
When powered on, initial diagnostics are performed before the OS is booted. In this initial diagnosis, the system boards and crossbar boards, etc. are checked to see if they are connected correctly. After connections are confirmed, each SPARC64 V is activated to diagnose memory, PCI bus, etc. If any failure is detected during initial diagnostics, the defective area is isolated.
By isolating failure-prone components, this function ensures that the system operates only with confirmed components. Notification of detected failures is sent to the system administrator via the system console.
Dual-channel power reception
This feature permits the server to receive power from two different power supplies. If power can be provided through dual power supplies, as in the case of a data center, power be duplexed from different sources.. If one power supply fails or has an outage, power continues through the alternative supply, allowing the server to continue running.
This feature is supported with PRIMEPOWER 650 and above.


