

FUJITSU Supercomputer PRIMEHPC FX100

## The K computer and the evolution of PRIMEHPC



Fujitsu has been developing supercomputers over 30 years, and will continue its development to deliver the best application performance.



#### K computer

SPARC64 VIIIfx: 8 cores / 128 GF 11 PF, 2010~



#### PRIMEHPC FX10

SPARC64 IXfx: 16 cores / 236.5 GF 23 PF, 2012~



SPARC64 XIfx: 32 cores / over 1 TF Over 100 PF, 2015~

## PRIMEHPC FX100 design concept



#### Designed for massively parallel supercomputer system

• High performance for a wide range of real applications

#### **Enhance and inherit K computer features**

- Many-core CPU-based architecture for application productivity
- Enhanced VISIMPACT (hardware barrier synchronization, sector cache, etc.)

#### Introduce new technologies to Exascale computing

• HPC-ACE2 : Wide SIMD enhancements

Assistant cores: Dedicated cores for non-calculation operation

• HMC : Leading-edge memory technology

## SPARC64<sup>TM</sup> XIfx



#### Over 1 TF high performance processor

- •32 compute cores
- •2 assistant cores: Offloading non-calculation operations
- → Daemon, IO, non-blocking MPI functions, etc.

#### **HPC-ACE2: ISA enhancements**

- •Two 256-bit wide SIMD units per core
- •64 bit x 4 / 32 bit x 8 FMA
- Addressing mode (stride load/store, indirect load/store)
- Cross lane operation (compress, permutation)



## **Hybrid Memory Cube (HMC)**



### **Excellent byte/flop balance**

• HMC: high performance per watt in small footprint

| Peak performance per node | K computer | PRIMEHPC FX10 | PRIMEHPC FX100 |
|---------------------------|------------|---------------|----------------|
| DP perf. (GFLOPS)         | 128        | 236.5         | Over 1000      |
| Memory BW (GB/s)          | 64         | 85            | 480            |

## Tofu interconnect 2



#### **Enhanced Tofu interconnect**

- Highly scalable, 6-dimensional mesh/torus topology
- •Increased link bandwidth by 2.5 times to 12.5GB/s

### CPU-integrated interconnect controller

- Reduced communication latency
- Improved packaging density and energy efficiency

### Optical cable connection between chassis

Enable flexible installation



## **Enhanced VISIMPACT**



### Technology for hybrid parallelization

- Automatic parallelization technology by Fujitsu's compiler
- Hardware barrier for fast synchronization

### Advantages of hybrid parallelization

- •To reduce communication cost in highly parallel programs
- •To increase user memory space by reducing communication buffer

#### **Enhanced hardware barriers**

•8 set between 32 cores



## **CPU** memory board





## Main unit



#### 2U rack mountable chassis

- High density: Four CPU memory boards per unit (total 12 nodes, over 12 TF)
- Water cooled: High reliability

CPU memory board x 4

Coolant water inlet/outlet

Cooling unit
Optical connectors

# System rack





# System rack: Front View





# System rack: Rear View







shaping tomorrow with you