# Past, Present, and Future of SPARC64 Processors

● Takumi Maruyama ● Tsuyoshi Motokurumada ● Kuniki Morita

Naozumi Aoki

SPARC64 is the name of the series of SPARC-V9 architecture processors that Fujitsu has developed. The development of the first SPARC64 started in the 1990s, and the development of the latest generation is actively ongoing as of 2011. The processor frequency of the first SPARC64 was as little as 118 MHz and it had only 20 million transistors, while the processor frequency of the latest SPARC64 VIIIfx is 2 GHz and it has more than 700 million transistors. This paper describes the history of the SPARC64 processor development, and the enhanced points of each generation.

## 1. Introduction

SPARC64 is the name of the series of SPARC-V9 architecture processors that Fujitsu has developed. The development of the first SPARC64 started in the 1990s, and the development of the latest generation is actively ongoing as of 2011 (**Figure 1**).

In the first half of the 1990s, various RISC processors emerged besides SPARC. However, only SPARC and POWER have survived to date among RISC processor architectures in the highend processor segment. Some processors were superseded in terms of performance and some could not survive in the market despite their highly acclaimed technical advancement. Both SPARC and POWER have been developed not by processor specialist vendors but by server vendors engaged in processor development. This fact suggests that the excellence of a processor is not determined by its performance alone but by how it performs in combination with various components such as a system and OS that constitute a server.

This paper describes the history of the SPARC64 processor development, and the

enhanced points of each generation.

## 2. SPARC64, SPARC64 II

SPARC64 processor has its origin in SPARC64<sup>1)</sup> developed by HAL Computer Systems (hereafter HAL), the then local subsidiary of Fujitsu in the United States. This processor adopted some cutting-edge technologies at the time of its shipment including the out-of-order superscalar method. However, more importantly, this model has established the SPARC-V9 instruction set architecture.<sup>2)</sup>

SPARC architecture is maintained by an independent non-profit organization called SPARC International. While consultations and decisions on SPARC-V9 were made by a board called the Architecture Committee in this organization, most of the committee members joined from HAL.

Compared with SPARC-V8, SPARC-V9 is characterized by expanded features that are essential requirements for server processors such as having a larger address space by adopting a 64-bit architecture, an availability to address multi-processors and enhanced reliability.



VISIMPACT: Virtual single processor by integrated multi-core parallel architecture

Figure 1 Development history of Fujitsu's processors.

Although SPARC64 and SPARC64 II were used as processors for workstations instead of for servers, it can be said that SPARC-V9 instruction set architecture established a foundation for using SPARC processors as servers.

SPARC64 was developed by using  $0.4 \,\mu m$  complementary metal oxide semiconductors (CMOS) based on a frequency of 118 MHz. The model adopted a multi-chip module (MCM) configuration, where one CPU core chip, four cache chips and one MMU chip are integrated on the same substrate. Acceleration of processing speed up to 161 MHz was achieved in SPARC64 II by using  $0.35 \,\mu m$  CMOS.

#### 3. SPARC64 GP

The early type of SPARC64 GP (0.24  $\mu m$  CMOS) was developed by HAL and its late type

(0.15 µm CMOS) was developed by Fujitsu.<sup>3)</sup> SPARC64 GP adopted a single-chip configuration while following the tradition of the basic pipeline configuration of SPARC64. Further, by adding enhanced features such as a large-capacity external secondary cache and multi-processor function, its capabilities as a processor for servers were reinforced. Thanks to a duplicate design and a single-error correction mechanism based on parity or error checking and correction (ECC) for the cache and bus interface, a higher level of reliability was achieved in comparison with those in other UNIX processors at that time.

SPARC64 GP was used as a processor for GRANPOWER and PRIMEPOWER, Fujitsu's UNIX servers. In its application in PRIMEPOWER 2000, the world's highest performance (7800 users) was achieved in the 2-Tier SAP-SD Benchmark test at that time. Its frequency was accelerated from the initial 250 MHz to 810 MHz in the end.

## 4. Mainframe

In the beginning of the 1990s, Fujitsu changed its focus of development from conventional emitter-coupled logic (ECL) to LSI using CMOS. In 1995, the GS Series, a novel mainframe series characterized by all models integrating CMOS, was announced.

In GS8600, the first product of the GS series, features such as a two-level cache system, branch history, store ahead and pre-fetch mechanism were introduced. In GS8800B (2.5 generation model), a drastic change to the CPU control method was made, from the lock step pipeline method to the out-of-order superscalar method, to further improve performance. Thereafter, in GS8900, a totally novel design was introduced to the cache component to achieve a more sophisticated CPU based on the superscalar method. The pipeline for CPUs, developed for this GS8900, served as the basis of SPARC64 V, which will be explained in the next section.

## 5. SPARC64 V

The early models of SPARC64 V were developed by using 130 nm CMOS, while late models used 90 nm CMOS.<sup>4),5)</sup> The frequency was accelerated from the initial 1.35 GHz to 2.16 GHz in the final stage.

A new pipeline was introduced to SPARC64 V so that its basic structure would be identical with those of the CPU pipeline for mainframes.<sup>6)</sup> While both of SPARC64 GP and SPARC64 V are four instruction issue, out-of-order superscalar processors, their micro-architectures are significantly different from each other.

The main focus in designing the SPARC64 GP was to increase the number of instructions to be executed per cycle by reducing the pipeline steps. Meanwhile, with regard to SPARC64 V, the aim was to produce a structure that could handle

increased frequency by increasing the steps of the pipeline. With regard to hardware, Fujitsu made efforts to improve performance based on out-of-order execution of memory access, while maintaining an appearance that the processing is executed based on the memory access order established by SPARC-V9 in terms of software. Technologies developed for mainframe CPUs were used also for this purpose.

In the development of SPARC64 V, an innovative performance evaluation model was developed in a joint effort with Fujitsu Laboratories, and a detailed configuration was determined based on the results of performance evaluation against typical benchmarks such as Standard Performance Evaluation Corporation (SPEC) and Transaction Processing Performance Council Benchmark C (TPC-C). Further, highly reliable features conventionally adopted in mainframes such as CPU hardware retry, complete rescue of SRAM single-bit error and history function were introduced, for the first time, to CPUs for UNIX.

Moreover, to enhance the OS portability between Fujitsu's SPARC64 Series and the UltraSPARC Series developed by Sun Microsystems, a new specification called joint programmer's specification (JPS) was developed.<sup>7),8)</sup> SPARC64 V and later processors adopted instruction set architectures compliant with SPARC-V9 and JPS.

SPARC64 V was used as a processor for PRIMEPOWER, one of Fujitsu's UNIX servers. PRIMEPOWER 2500 achieved the world's highest performance (21 000 users) in 2-Tier SAP-SD Benchmark tests at that time.

## 6. SPARC64 VI

SPARC64 VI is an extended series based on SPARC64 V, developed as a processor for SPARC Enterprise—a common UNIX server for Fujitsu and Sun Microsystems. This series used Fujitsu's 90 nm CMOS and had a maximum frequency of 2.4 GHz. In SPARC64 VI, a configuration was adopted where two cores on the basis of SPARC64 V were built-in chips and the two threads were executed in each core.<sup>9)</sup> This multi-thread configuration was realized by using a method called vertical multi-threading (VMT), to switch from one thread to another using an event as a trigger. Drastic improvement of throughput was achieved by adopting the multi-core threading based on the SPARC64 V core without sacrificing the singleunit performance.

Further, a new CPU bus was developed. A unique W state was added to the MOESI protocol (routinely used as cache coherency protocol) based on the developers' experience with mainframes, aiming to accelerate large-scale transaction processing.

### 7. SPARC64 VII

A substantial improvement in performance from SPARC64 VI was carried out in SPARC64 VII by adopting four cores. In addition, the multithread control method was changed from VMT to simultaneous multi-threading (SMT) to further enhance throughput.<sup>10)</sup> The maximum frequency achieved as of September 2010 is 2.88 GHz by using Fujitsu's 65 nm CMOS. SPARC64 VII maintains compatibility with SPARC64 VI at the CPU module level. This enables upgrading from SPARC64 VI on the same SPARC Enterprise.

Because SPARC64 VII is used also as the CPU for FX1 Technical Computing Server,<sup>11)</sup> a high-speed thread synchronizing mechanism, called a hardware barrier, and a double speed mechanism for CPU bus were installed. By combining an ASIC chip called a Jupiter system controller (JSC) newly developed for FX1, a memory band width necessary for FX1 is ensured. JSC is used also as a chip set for SPARC Enterprise M3000.

SPARC Enterprise M9000 integrating SPARC64 VII achieved the world's highest performance (39 100 users) in 2-Tier SAP-SD Benchmark tests in July 2010.

### 8. SPARC64 VIIIfx

SPARC64 VIIIfx was developed as a processor for supercomputers. This model was adopted in the Next-Generation Super Computer (nicknamed "K computer") developed within the framework of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) program to establish a high-performance computing infrastructure (HPCI). This series is comprised of 8 cores and secondary cache of 6M bytes.<sup>12)</sup> Memory performance was significantly improved by adding a built-in memory controller to the SPARC64 processor for the first time. This series uses Fujitsu's 45 nm CMOS and the clock frequency is 2 GHz.

Large-scale enhancement of instructions called high-performance computing-arithmetic (HPC-ACE) computational extensions was carried out.<sup>13)</sup> This architecture enables highly computing performance effective allowing various functions such as 256 double-precision registers, single floating-point instruction multiple data (SIMID) technology executing multiple processing simultaneously by just one instruction, and software-controlled cache.

Moreover, as a result of Fujitsu's commitment to energy conservation, a power consumption as low as 58 W could be achieved.<sup>14)</sup> Clock frequency is kept low (2 GHz) to suppress power consumption and the supply of the clock to inactive circuits is suspended to avoid unnecessary power consumption. Further, water cooling is used to minimize the leak current in the CPU's internal circuit.

The performance of SPARC64 VIIIfx per unit of power consumption has been enhanced to six times that of the previous generation's SPARC64 VII.

#### 9. Future approaches

As indicated in Figure 1, Fujitsu has reinforced the value of the SPARC64 series from various aspects to meet the needs of a new era. This includes improving single-unit performance by increasing frequency, enhancing throughput performance based on a multi-core and multithread configuration, improving computing and memory performances, and minimizing power consumption.

We are determined to further enhance the performance of the series through evolving processor architecture to keep pace with applications, while further minimizing power consumption as our priority agenda also in future.

#### 10. Conclusion

Fujitsu has deployed a globally unique R&D system since SPARC64 V, where the same team designed processors for supercomputers, UNIX servers and mainframes. Fujitsu has enhanced the value of its technologies through their horizontal deployment on a mutual basis. For instance, the high reliability of the SPARC64 processor is one of its strengths inherited from mainframes.

Fujitsu has been developing SPARC64 processors for over 15 years. During this time, the clock frequency has increased 20 times and the number of transistors has gone up more than 30 times. Various types of progress are observed also in the approaches to micro-architecture and designing of processors. However, the basic philosophy behind these variants remains always the same. In a SPARC64 processor of any era, the designers' enthusiasm for perfection is always seen.

#### References

- 1) N. Patkar et al.: Microarchitecture of HaL's CPU. COMPCON '95, pp. 259–266.
- 2) SPARC International: The SPARC Architecture Manual (Version 9).
- http://www.sparc.org/standards/SPARCV9.pdf
  T. Hikiji et al.: 64 Bit RISC Processor: SPARC64
- GP. (in Japanese), *FUJITSU*, Vol. 51, No. 4, pp. 226–231 (2000).
- A. Inoue: Fujitsu's new SPARC64 V for Mission Critical Severs. Microprocessor Forum, October 15, 2002.
- A. Inoue: Processor for UNIX Server: SPARC64
   V. (in Japanese), *FUJITSU*, Vol. 53, No. 6,

pp. 450-455 (2002).

6) A. Inoue: High performance, high reliability technologies of SPARC64 V/VI: Scientific System Study Group: Meeting data in FY2006. (in Japanese). http://www.ssken.gr.jp/MAINSITE/download/

newsletter/2006/sci/2/3\_inoue.pdf Sun Microsystems and Fujitsu Limited: SPARC

- Sun Microsystems and Fujitsu Limited: SPARC Joint Programming Specification (JPS1): Commonality. 2002. http://jp.fujitsu.com/solutions/hpc/brochures/
- Fujitsu: SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V. 2002.
- http://jp.fujitsu.com/solutions/hpc/brochures/
  9) A. Inoue: SPARC64 VI: A State of the Art Dual Core Processor. Fall Processor Forum, October 10, 2006.
- T. Maruyama: SPARC64 VIII: Fujitsu's Next Generation Quad-Core Processor. Hot Chips 20, August 26, 2008.
- T. Abe et al.: JAXA Supercomputer Systems with Fujitsu FX1 as Core Computer. *Fujitsu Sci. Tech. J.*, Vol. 44, No. 4, pp. 426–434 (2008).
- 12) T. Maruyama et al.: SPARC64 VIIIfx: A New-Generation Octocore Processor for Petascale Computing. *IEEE Micro*, Vol. 30, No. 2, pp.30–40 (2010).
- 13) SPARC64 VIIIfx Extensions.
- http://jp.fujitsu.com/solutions/hpc/brochures/
- 14) H. Okano et al.: Fine Grained Power Analysis and Low-Power Techniques of a 128GFLOPS/58W SPARC64<sup>™</sup> VIIIfx Processor for Peta-scale Computing. Symposium on VLSI Circuits, June 18, 2010.



**Takumi Maruyama** *Fujitsu Ltd.* Mr. Maruyama is engaged in processor development.



Kuniki Morita *Fujitsu Ltd.* Mr. Morita is engaged in development of secondary cache control.



**Tsuyoshi Motokurumada** *Fujitsu Ltd.* Mr. Motokurumada is engaged in CPU core development.



Naozumi Aoki *Fujitsu Ltd.* Mr. Aoki is engaged in system architecture development.