# SPARC64<sup>™</sup> X+: Fujitsu's Next Generation Processor for UNIX servers

August 27, 2013

**Toshio Yoshida** 

Processor Development Division Enterprise Server Business Unit Fujitsu Limited

All Rights Reserved, Copyright© FUJITSU LIMITED 2013

Fujitsu Processor Development

### ◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management

## ♦ Summary

# **Fujitsu Processor Development**



# **Fujitsu Processor Development**



Fujitsu Processor Development

◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management



SPARC64™ X+

# Design of SPARC64<sup>TM</sup> X / X+

Combine Fujitsu HPC and UNIX processor features

### ✓ Single-Thread Performance

- Higher clock speed
- Micro-architectural enhancements
- Directly connected DIMMs

### High Throughput for massive data processing

- SIMD parallelism and more registers
- Multi-core and multi-thread
- High bandwidth interconnect and memory links
- Scalability up to 64 sockets (2048 threads)

### Software on Chip (SWoC) for specific applications

- Cipher, Decimal, Database

# **SPARC64™ X+ Chip Overview**



SPARC64<sup>™</sup> X+

### Architecture Features

- 16 cores x 2 SMT threads
- Shared 24 MB L2\$
- Memory and I/O Controllers
- HPC-ACE
- SWoC (Software on Chip)

## 28nm CMOS

- 24.0mm x 25.0mm
- 2,990M transistors
- 1,500 signal pins
- 3.5GHz+

### Performance (peak)

- 448GFlops+
- 102GB/s memory throughput

## **SPARC64<sup>™</sup> X+** Pipeline



Fujitsu Processor Development

◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management

## ♦ Summary

# Software on Chip (SWoC)

### ◆SPARC64<sup>™</sup> X / X+ Software on Chip

- Cipher
- ✓ Decimal (IEEE754 DPD, NUMBER)
- Database processing

### Accelerate specific software functions in hardware

- SWoC engines implemented in floating-point unit can use 128 floating-point registers, software pipelining
- ✓ Area/number of gates < 3% of core and < 1% of chip</p>

# L1\$ L1D\$ Decimal Engine L11\$ L1D\$ Database Engine Instruction Control Execution Unit Cipher Engine

### SPARC64<sup>™</sup> X Core

### SPARC64<sup>™</sup> X+

#### All Rights Reserved, Copyright© FUJITSU LIMITED 2013

## **Cipher and Decimal Performance**

## Cipher

- AES/DES/SHA/RSA in SPARC64<sup>TM</sup> X
- RSA further improved in SPARC64<sup>TM</sup> X+
  - New instruction for RSA sign library
- Decimal
  - SPARC64<sup>™</sup> X+ micro-architectural enhancements speed up several NUMBER libraries



## **Database Acceleration**

- Fine-grained data manipulation
  - Byte vector in SPARC64<sup>TM</sup> X
  - Bit vector enhanced in SPARC64<sup>™</sup> X+
- Integer Byte Compare
  - Enhanced ISA supports SIMD operation
  - Enhanced core supports instruction in both floating-point pipelines



### **Bit Vector Operations**

Shift -> Mask -> Or



Extract 2 bit fields from rs1 -> Logical operation with rs2 Fd[rs1] Fd[rs2] Extract Logical operations Fd[rd]

All Rights Reserved, Copyright© FUJITSU LIMITED 2013

Fujitsu Processor Development

## ◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management

## ♦ Summary

# **Micro-Architectural Enhancements 1/2**

**Register window switches** • Out-of-order access to 48 integer registers (current & next window) No penalty for all window switches between same two windows Window [n+1] SPARC64<sup>™</sup> X handles only one window switch without penalty Window 1 [n] **Improved Branch prediction** • Rehashed indirect branch predictor Indirect branch with variable target address Local pattern branch predictor More pattern history table entries



# **Micro-Architectural Enhancements 2/2**

## L1 data cache

- Dedicated write pipeline
  - 64 RAM banks (8 sets of 8-banked RAMs)
  - One write and two reads each cycle, except when RAM bank conflict occurs
- Faster atomic memory operations
- Increased hardware prefetch throughput



### L1-D Cache Throughput



#### All Rights Reserved, Copyright© FUJITSU LIMITED 2013

Fujitsu Processor Development

## ◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management

## ♦ Summary

# **SPARC64<sup>™</sup> X / X+ System Architecture**

### Scales from 1 to 64 CPU sockets (2048 threads)

- Directory-based cache coherency
- High-speed interconnect, up to <u>25Gbps</u> per lane in SPARC64<sup>™</sup> X+ (Up to 14.5Gbps in SPARC64<sup>™</sup> X)

### System Configuration

- Building Block (BB) is 4 CPUs and 2 XBs
- Up to 4 BBs can be connected by XBs
- 16BBs can be connected via XB-Boxes

### **Building Block (4 CPU Sockets)**



### 16 BBs (64 CPU Sockets)

(Each line represents connections between a BB and two XBs in a XB-Box)



### SPARC64<sup>™</sup> X+

17

## **System Scalability**

 SPARC64<sup>TM</sup> X systems demonstrate high scalability across a wide-range of applications

 Integer, Floating-Point, Java, ERP, DWH

### SPARC64<sup>™</sup> X efficiently scales to 64 CPU sockets



SPARC64<sup>™</sup> X+

18

Fujitsu Processor Development

### ◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management

## ♦ Summary

# Reliability, Availability, Serviceability

| Units        | Error Detection<br>and Correction |
|--------------|-----------------------------------|
| Cache (Tags) | ECC,<br>Parity & Duplicate        |
| Cache (Data) | ECC, Parity                       |
| Registers    | ECC (INT/FP),<br>Parity (Others)  |
| ALUs         | Parity, Residue                   |

#### Other RAS features

**Cache dynamic degradation** 

**Hardware Instruction Retry** 

Lane dynamic degradation



### ◆ Mainframe-level RAS features for SPARC64<sup>™</sup> X / X+

- Number of checkers increased to ~54,000
- System bus mechanisms for self-recovery and lane dynamic degradation

## Guarantee Data Integrity and Keep on Running

SPARC64<sup>™</sup> X+

20

Fujitsu Processor Development

### ◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management

## ♦Summary

## **Power Management**

## Save energy while Idle

- CPU Lower Power (LP) State introduced in SPARC64<sup>TM</sup> X
  - Dynamically decrease frequency and voltage
  - Keep all data and caches coherent
  - State transition managed by software
- ✓ 45% power savings measured in SPARC64<sup>™</sup> X
- $\checkmark$  Transition time between states is ~1.7ms
- ✓ Continue working while in transition
- DIMM power saving mechanism
  - Memory controller supports two lower power states

22

- Power-down
- Self-refresh





Fujitsu Processor Development

## ◆SPARC64<sup>™</sup> X+

- Design Concept and Processor Overview
- Software on Chip (SWoC)
- Micro-Architecture
- System Architecture
- RAS
- Power Management

## ♦ Summary

## Summary

 ◆ SPARC64<sup>™</sup> X+ is Fujitsu's latest SPARC processor, designed for Fujitsu's next generation UNIX servers

- ◆ SPARC64<sup>™</sup> X+ realizes improved single-thread performance with a higher clock speed, micro-architectural enhancements, and SWoC
- ◆ SPARC64<sup>™</sup> X / X+ systems realize high scalability, from 1 to 64 CPU sockets (2048 threads)
- ◆ SPARC64<sup>TM</sup> X+ implements extensive RAS features

◆ Fujitsu will continue to develop the SPARC64<sup>™</sup> series

# **Abbreviations**

### • SPARC64<sup>™</sup> X+

- RSA: Reservation Station for Address generation
- RSE: Reservation Station for Execution
- RSF: Reservation Station for Floating-point
- RSBR: Reservation Station for Branch
- GUB: General-purpose Update Buffer
- FUB: Floating-point Update Buffer
- GPR: General-Purpose Register
- FPR: Floating-Point Register
- CSE: Commit Stack Entry
- EAG: Effective Address Generator
- EX : Execution unit (Integer)
- FL : Floating-point unit
- HPC-ACE: High Performance Computing-Arithmetic Computational Extensions
- ERP: Enterprise Resource Planning
- DWH: Data WareHouse