

# SPARC64<sup>TM</sup> VII Extensions

Fujitsu Limited Ver 1.0, 1 Jul. 2008

Fujitsu Limited 4-1-1 Kamikodanaka Nakahara-ku, Kawasaki, 211-8588 Japan Copyright© 2007, 2008 Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, 211-8588, Japan. All rights reserved.

This product and related documentation are protected by copyright and distributed under licenses restricting their use, copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form by any means without prior written authorization of Fujitsu Limited and its licensors, if any.

The product(s) described in this book may be protected by one or more U.S. patents, foreign patents, or pending applications.

### TRADEMARKS

SPARC® is a registered trademark of SPARC International, Inc. Products bearing SPARC trademarks are based on an architecture developed by Sun Microsystems, Inc.

SPARC64<sup>TM</sup> is a registered trademark of SPARC International, Inc., licensed exclusively to Fujitsu Limited.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Sun, Sun Microsystems, the Sun logo, Solaris, and all Solaris-related trademarks and logos are registered trademarks of Sun Microsystems, Inc. Fujitsu and the Fujitsu logo are trademarks of Fujitsu Limited.

This publication is provided "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or noninfringement. This publication could include technical inaccuracies or typographical errors. Changes are periodically added to the information herein; these changes will be incorporated in new editions of the publication. Fujitsu Limited may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time.

# Contents

### 1. Overview 1

- 1.1 Navigating the SPARC64<sup>™</sup> VII Extensions 1
- 1.2 Fonts and Notational Conventions 1
- 1.3 The SPARC64 VII processor 2
  - 1.3.1 Component Overview 4
  - 1.3.2 Instruction Control Unit (IU) 6
  - 1.3.3 Execution Unit (EU) 6
  - 1.3.4 Storage Unit (SU) 7
  - 1.3.5 Secondary Cache and External Access Unit (SXU) 7

#### 2. Definitions 9

- 3. Architectural Overview 13
- 4. Data Formats 14

#### 5. Registers 15

- 5.1 Nonprivileged Registers 15
  - 5.1.7 Floating-Point State Register (FSR) 15
  - 5.1.9 Tick (TICK) Register 17
- 5.2 Privileged Registers 17
  - 5.2.6 Trap State (TSTATE) Register 17
  - 5.2.9 Version (VER) Register 18
  - 5.2.11 Ancillary State Registers (ASRs) 18
  - 5.2.12 Registers Referenced Through ASIs 20

- 5.2.13 Floating-Point Deferred-Trap Queue (FQ) 22
- 5.2.14 IU Deferred-Trap Queue 23

### 6. Instructions 25

- 6.1 Instruction Execution 25
  - 6.1.1 Data Prefetch 25
  - 6.1.2 Instruction Prefetch 26
  - 6.1.3 Syncing Instructions 26
- 6.2 Instruction Formats and Fields 27
- 6.3 Instruction Categories 28
  - 6.3.3 Control-Transfer Instructions (CTIs) 28
  - 6.3.7 Floating-Point Operate (FPop) Instructions 29
  - 6.3.8 Implementation-Dependent Instructions 29
- 6.4 Processor Pipeline 30
  - 6.4.1 Instruction Fetch Stages 30
  - 6.4.2 Issue Stages 32
  - 6.4.3 Execution Stages 32
  - 6.4.4 Completion Stages 33

### 7. Traps 35

- 7.1 Processor States, Normal and Special Traps 35
  - 7.1.1 RED\_state 36
  - 7.1.2 error\_state 36
- 7.2 Trap Categories 37
  - 7.2.2 Deferred Traps 37
  - 7.2.4 Reset Traps 37
  - 7.2.5 Uses of the Trap Categories 37
- 7.3 Trap Control 38
  - 7.3.1 PIL Control 38
- 7.4 Trap-Table Entry Addresses 38
  - 7.4.2 Trap Type (TT) 38
  - 7.4.4 Details of Supported Traps 39
- 7.5 Trap Processing 39
- 7.6 Exception and Interrupt Descriptions 39
  - 7.6.4 SPARC V9 Implementation-Dependent, Optional Traps That Are Mandatory in SPARC JPS1 39

#### 8. Memory Models 41

- 8.1 Overview 42
- 8.4 SPARC V9 Memory Model 42
  - 8.4.5 Mode Control 42
  - 8.4.7 Synchronizing Instruction and Data Memory 42

#### 9. Multi-Threaded Processing 45

- 9.1 MTP structure 45
  - 9.1.1 General MTP structure 45
  - 9.1.2 MTP structure of SPARC64 VII 46
- 9.2 MTP Programming Model 47
  - 9.2.1 Thread independency 47
  - 9.2.2 How to control threads 48
  - 9.2.3 Shared registers between threads 48

#### A. Instruction Definitions 49

- A.4 Block Load and Store Instructions (VIS I) 51
- A.12 Call and Link 53
- A.24 Implementation-Dependent Instructions 54
  - A.24.1 Floating-Point Multiply-Add/Subtract 55
  - A.24.2 Suspend 59
  - A.24.3 Sleep 60
  - A.24.4 Integer Multiply-Add 61
- A.25 Jump and Link 63
- A.30 Load Quadword, Atomic [Physical] 64
- A.35 Memory Barrier 66
- A.42 Partial Store (VIS I) 68
- A.48 Population Count 69
- A.49 Prefetch Data 70
- A.51 Read State Register 72
- A.59 SHUTDOWN (VIS I) 73
- A.70 Write State Register 74
- A.71 Deprecated Instructions 75 A.71.10 Store Barrier 75

#### B. IEEE Std. 754-1985 Requirements for SPARC-V9 77

- B.1 Traps Inhibiting Results 77
- B.6 Floating-Point Nonstandard Mode 77
  - B.6.1 fp\_exception\_other Exception (ftt=unfinished\_FPop) 78
  - B.6.2 Operation Under FSR.NS = 1 81

#### C. Implementation Dependencies 86

- C.1 Definition of an Implementation Dependency 86
- C.2 Hardware Characteristics 86
- C.3 Implementation Dependency Categories 87
- C.4 List of Implementation Dependencies 87

#### D. Formal Specification of the Memory Models 98

#### E. Opcode Maps 100

#### F. Memory Management Unit 102

- F.1 Virtual Address Translation 102
- F.2 Translation Table Entry (TTE) 103
  - F.3.2 TSB Cacheabllity 105
  - F.3.3 TSB Organization 105
  - F.4.2 TSB Pointer Formation 105
- F.5 Faults and Traps 106
- F.8 Reset, Disable, and RED\_state Behavior 108
- F.10 Internal Registers and ASI Operations 109
  - F.10.1 Accessing MMU Registers 109
  - F.10.2 Context Registers 111
  - F.10.3 Instruction/Data MMU TLB Tag Access Registers 115
  - F.10.4 I/D TLB Data In, Data Access, and Tag Read Registers 116
  - F.10.6 I/D TSB Base Registers 118
  - F.10.7 I/D TSB Extension Registers 118
  - F.10.9 I/D Synchronous Fault Status Registers (I-SFSR, D-SFSR) 118
  - F.10.11 I/D MMU Demap 125
  - F.10.12 Synchronous Fault Physical Addresses 126
  - F.10.13 TSB Prefetch Registers 127
- F.11 MMU Bypass 129
- F.12 Translation Lookaside Buffer Hardware 129

#### F.12.2 TLB Replacement Policy 130

- G. Assembly Language Syntax 132
- H. Software Considerations 133
- I. Extending the SPARC V9 Architecture 134
- J. Changes from SPARC V8 to SPARC V9 135
- K. Programming with the Memory Models 136

#### L. Address Space Identifiers 137

- L.3 SPARC64 VII ASI Assignments 137
  - L.3.2 Special Memory Access ASIs 139
  - L.3.3 Hardware Barrier 141

#### M. Cache Organization 147

- M.1 Cache Types 147
  - M.1.1 Level-1 Instruction Cache (L1I Cache) 148
  - M.1.2 Level-1 Data Cache (L1D Cache) 149
  - M.1.3 Level-2 Unified Cache (L2 Cache) 149
- M.2 Cache Coherency Protocols 150
- M.3 Cache Control/Status Instructions 151
  - M.3.1 Flush Level-1 Instruction Cache (ASI\_FLUSH\_L1I) 151
  - M.3.2 Level-2 Cache Control Register (ASI\_L2\_CTRL) 152
  - M.3.3 Cache invalidation (ASI\_CACHE\_INV) 152

#### N. Interrupt Handling 155

- N.1 Interrupt Dispatch 155
- N.2 Interrupt Receive 157
- N.3 Interrupt Global Registers 158
- N.4 Interrupt-Related ASI Registers 158
  - N.4.2 Interrupt Vector Dispatch Register 158
  - N.4.3 Interrupt Vector Dispatch Status Register 158
  - N.4.5 Interrupt Vector Receive Register 158
- N.5 How to identify an interrupt target 158

#### O. Reset, RED\_state, and error\_state 161

- O.1 Reset Types 161
  - O.1.1 Power-on Reset (POR) 161
  - O.1.2 Watchdog Reset (WDR) 162
  - O.1.3 Externally Initiated Reset (XIR) 162
  - O.1.4 Software-Initiated Reset (SIR) 162
- O.2 RED\_state and error\_state 163
  - O.2.1 RED\_state 164
  - O.2.2 error\_state 164
  - O.2.3 CPU Fatal Error state 164
- O.3 Processor State after Reset and in RED\_state 165
  - O.3.1 Operating Status Register (OPSR) 169

#### P. Error Handling 171

- P.1 Error Classes and Signalling 171
  - P.1.1 Fatal Error 172
  - P.1.2 error\_state Transition Error 172
  - P.1.3 Urgent Error 173
  - P.1.4 Restrainable Error 176
  - P.1.5 instruction\_access\_error 177
  - P.1.6 data\_access\_error 177
- P.2 Action and Error Control 178
  - P.2.1 Registers Related to Error Handling 178
  - P.2.2 Summary of Actions Upon Error Detection 179
  - P.2.3 Extent of Automatic Source Data Correction for Correctable Error 182
  - P.2.4 Error Marking for Cacheable Data Error 182
  - P.2.5 ASI\_EIDR 185
  - P.2.6 Control of Error Action (ASI\_ERROR\_CONTROL) 185
- P.3 Fatal Error and error\_state Transition Error 187
  - P.3.1 ASI\_STCHG\_ERROR\_INFO 187
  - P.3.2 Error\_state Transition Error in Suspended Thread 188
- P.4 Urgent Error 189
  - P.4.1 URGENT ERROR STATUS (ASI UGESR) 189
  - P.4.2 Action of async\_data\_error (ADE) Trap 192
  - P.4.3 Instruction End-Method at ADE Trap 194
  - P.4.4 Expected Software Handling of ADE Trap 195

- P.5 Instruction Access Errors 197
- P.6 Data Access Errors 197
- P.7 Restrainable Errors 198
  - P.7.1 ASI\_ASYNC\_FAULT\_STATUS (ASI\_AFSR) 198
  - P.7.2 ASI\_ASYNC\_FAULT\_ADDR\_D1 199
  - P.7.3 ASI\_ASYNC\_FAULT\_ADDR\_U2 199
  - P.7.4 Expected Software Handling of Restrainable Errors 199
- P.8 Internal Register Error Handling 201
  - P.8.1 Nonprivileged and Privileged Registers Error Handling 201
  - P.8.2 ASR Error Handling 202
  - P.8.3 ASI Register Error Handling 203
- P.9 Cache Error Handling 208
  - P.9.1 Handling of a Cache Tag Error 208
  - P.9.2 Handling of an I1 Cache Data Error 209
  - P.9.3 Handling of a D1 Cache Data Error 209
  - P.9.4 Handling of a U2 Cache Data Error 211
  - P.9.5 Automatic Way Reduction of I1 Cache, D1 Cache, and U2 Cache 212
- P.10 TLB Error Handling 213
  - P.10.1 Handling of TLB Entry Errors 214
  - P.10.2 Automatic Way Reduction of sTLB 215

#### Q. Performance Instrumentation 217

- Q.1 Performance Monitor Overview 217
  - Q.1.1 Sample Pseudo-codes 217
- Q.2 Performance Event Description 219
  - Q.2.1 Instruction and trap Statistics 222
  - Q.2.2 MMU and L1 cache Event Counters 229
  - Q.2.3 L2 cache Event Counters 230
  - Q.2.4 Jupiter Bus Event Counters 232
  - Q.2.5 Multi-thread specific Event Counters 234
- Q.3 CPI analysis 236
- Q.4 Shared performance events between threads 237
- Q.5 Differences of Performance Events Between SPARC64 VI and SPARC64 VII 237

### R. Jupiter Bus Programmer's Model 239

R.3 Jupiter Bus Config Register 239

### S. Summary Differences Between SPARC64 VI and SPARC64 VII 241

# Overview

# 1.1 Navigating *the SPARC64<sup>TM</sup> VII Extensions*

The SPARC64 VII processor fully implements the instruction set architecture that conforms to **Commonality**.

■ SPARC Joint Programming Specification 1 (JPS1): Commonality

This *SPARC64 VII Extensions* describes implementation specific portions of SPARC64 VII. We suggest that you approach this specification as follows.

- **1.** Familiarize yourself with the SPARC64 VII processor and its components by reading the following sections in this specification:
  - The SPARC64 VII processor on page 2
  - *Component Overview* on page 4
  - *Processor Pipeline* on page 30
- 2. Study the terminology in Chapter 2, Definitions.
- **3.** For details of architectural changes, see the remaining chapters in this Specification as your interests dictate.

# 1.2 Fonts and Notational Conventions

Please refer to Section 1.2 of Commonality for font and notational conventions.

# 1.3 The SPARC64 VII processor

The SPARC64 VII processor is a high-performance, high-reliability, and high-integrity processor that fully implements the instruction set architecture that conforms to SPARC V9, as described in **Commonality**. In addition, the SPARC64 VII processor implements the following features:

- 64-bit virtual address space and 47-bit physical address space
- Advanced RAS features that enable high-integrity error handling
- Multi threaded Processing (MTP)

### Microarchitecture for High Performance

The SPARC64 VII is an out-of-order execution superscalar processor that issues up to four instructions per cycle. Instructions in the predicted path are issued in program order and are stored temporarily in *reservation stations* until they are dispatched out of program order to the appropriate execution units. Instructions commit in program order when no exceptions occur during execution and all prior instructions commit (that is, the result of the instruction execution becomes visible). Out-of-order execution in SPARC64 VII contributes to high performance.

SPARC64 VII implements a large branch history buffer to predict its instruction path. The history buffer is large enough to sustain a good prediction rate for large-scale programs such as DBMS and to support the advanced instruction fetch mechanism of SPARC64 VII. This instruction fetch scheme predicts the execution path beyond multiple conditional branches in accordance with the branch history. It then tries to prefetch instructions on the predicted path as much as possible to reduce the effect of the performance penalty caused by instruction cache misses.

### **High Integration**

SPARC64 VII integrates an on-board, associative, level-2 cache. The level-2 cache is unified for instruction and data. It is the lowest layer in the cache hierarchy.

This integration contributes to both the performance and reliability of SPARC64 VII. It enables shorter access time and more associativity and thus contributes to higher performance. It contributes to higher reliability by eliminating the external connections for level-2 cache.

### High Reliability and High Integrity

SPARC64 VII implements the following advanced RAS features for reliability and integrity beyond that of ordinary microprocessors.

### 1. Advanced RAS features for caches

- Strong cache error protection:
  - ECC protection for D1 (Data level 1) cache data, U2 (unified level 2) cache data, and the U2 cache tag.
  - Parity protection for I1 (Instruction level 1) cache data.
  - Parity protection and duplication for the I1 cache tag and the D1 cache tag.
- Automatic correction of all types of single-bit error:
  - Automatic single-bit error correction for the ECC protected data.
  - Invalidation and refilling of I1 cache data for the I1 cache data parity error.
  - Copying from duplicated tag for I1 cache tag and D1 cache tag parity errors.
- Dynamic way reduction while cache consistency is maintained.
- Error marking for cacheable data with uncorrectable errors:
  - Special error-marking pattern for cacheable data with uncorrectable errors. The identification of the module that first detects the error is embedded in the special pattern.
  - Error-source isolation with faulty module identification in the special error-marking. The identification information enables the processor to avoid repetitive error logging for the same error cause.

### 2. Advanced RAS features for the core

- Strong error protection:
  - Parity protection for all data paths.
  - Parity protection for most software-visible registers and internal, temporary registers.
  - Parity prediction or residue checking for the accumulator output.
- Hardware instruction retry
- Support for software instruction retry (after failure of hardware instruction retry)
- Error isolation for software recovery:
  - Error indication for each programmable register group.
  - Indication of retryability of the trapped instruction.
  - Use of different error traps to differentiate degrees of adverse effects on the CPU and the system.

### 3. Extended RAS interface to software

- Error classification according to the severity of the effect on program execution:
  - Urgent error (nonmaskable): Unable to continue execution without OS intervention; reported through a trap.
  - Restrainable error (maskable): OS controls whether the error is reported through a trap, so error does not directly affect program execution.
- Isolated error indication to determine the effect on software
- Asynchronous data error (ADE) trap for additional errors:
  - Relaxed instruction end method (precise, retryable, not retryable) for the async\_data\_error exception to indicate how the instruction should end; depends on the executing instruction and the detected error.

- Some ADE traps that are deferred but retryable.
- Simultaneous reporting of all detected ADE errors at the error barrier for correct handling of retryability.

### Multi threaded Processing.

SPARC64 VII is an octuple threaded processor, which has four dual threaded physical cores. The two threads belong to the same physical core sharing most of the physical resources, while the four cores do not share physical resources except L2 Cache and system interface.

## 1.3.1 Component Overview

The SPARC64 VII processor contains these components.

- Instruction control Unit (IU)
- Execution Unit (EU)
- Storage Unit (SU)
- Secondary cache and eXternal access Unit (SXU)

FIGURE 1-1 illustrates the major units; the following subsections describe them.



FIGURE 1-1 SPARC64 VII Block Diagram

## 1.3.2 Instruction Control Unit (IU)

The IU predicts the instruction execution path, fetches instructions on the predicted path, distributes the fetched instructions to the appropriate reservation stations, and dispatches the instructions to the execution pipeline. The instructions are executed out of order, and the IU commits the instructions in order. Major blocks are defined in TABLE 1-1.

TABLE 1-1 Instruction Control Unit Major Blocks

| Name                       | Description                                                                                                                                                                                                                                                                                          |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Instruction fetch pipeline | Five stages: fetch address generation, iTLB tag access, I-Cache tag match, I-Cache read, and a write to I-buffer.                                                                                                                                                                                    |
| Branch history             | A table to predict branch target and direction.                                                                                                                                                                                                                                                      |
| Instruction buffer         | A buffer to hold instructions fetched.                                                                                                                                                                                                                                                               |
| Reservation station        | Six reservation stations to hold instructions until they can execute:<br>RSBR for branch and the other control-transfer instructions; RSA for<br>load/store instructions; RSEA and RSEB for integer arithmetic<br>instructions; RSFA and RSFB for floating-point arithmetic and VIS<br>instructions. |
| Commit stack entries       | A buffer to hold information about instructions issued but not yet committed.                                                                                                                                                                                                                        |
| PC, nPC, CCR, FSR          | Program-visible registers for instruction execution control.                                                                                                                                                                                                                                         |

## 1.3.3 Execution Unit (EU)

The EU carries out the execution of all integer arithmetic, logical, shift instructions, all floating-point instructions, and all VIS graphic instructions. TABLE 1-2 describes the EU major blocks.

 TABLE 1-2
 Execution Unit Major Blocks

| Name                                          | Description                                                                                     |  |  |  |  |  |  |  |
|-----------------------------------------------|-------------------------------------------------------------------------------------------------|--|--|--|--|--|--|--|
| GUB                                           | General register (gr) renaming register file.                                                   |  |  |  |  |  |  |  |
| GPR                                           | Gr architecture register file.                                                                  |  |  |  |  |  |  |  |
| FUB                                           | Floating-point (fr) renaming register file.                                                     |  |  |  |  |  |  |  |
| FPR                                           | Fr architecture register file.                                                                  |  |  |  |  |  |  |  |
| EU control logic                              | Controls the instruction execution stages: instruction selection, register read, and execution. |  |  |  |  |  |  |  |
| Interface registers                           | Input/output registers to other units.                                                          |  |  |  |  |  |  |  |
| Two integer execution pipelines<br>(EXA, EXB) | 64-bit ALU and shifters.                                                                        |  |  |  |  |  |  |  |

| Name                                                                     | Description                                                                                                                                                                                            |
|--------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Two floating-point and graphics execution pipelines (FLA, FLB)           | Each floating-point execution pipeline can execute floating point multiply, floating point add/sub, floating-point multiply and add, floating point div/sqrt, and floating-point graphics instruction. |
| Two virtual address adders for<br>memory access pipeline (EAGA,<br>EAGB) | Two 64-bit virtual addresses for load/store.                                                                                                                                                           |

 TABLE 1-2
 Execution Unit Major Blocks (Continued)

## 1.3.4 Storage Unit (SU)

The SU handles all sourcing and sinking of data for load and store instructions. TABLE 1-3 describes the SU major blocks.

 TABLE 1-3
 Storage Unit Major Blocks

| Name                           | Description                                                                                                                                                                             |
|--------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Instruction level-1 cache      | 64-Kbyte, 2-way associative, 64-byte line; provides low latency instruction source.                                                                                                     |
| Data level-1 cache             | 64-Kbyte, 2-way associative, 64-byte line, writeback; provides the low latency data source for loads and stores.                                                                        |
| Instruction Translation Buffer | 2048 entries, 2-way associative TLB (sITLB).                                                                                                                                            |
|                                | 32 entries, fully associative TLB (fITLB).                                                                                                                                              |
| Data Translation Buffer        | 2048 entries, 2-way associative TLB (sDTLB).                                                                                                                                            |
|                                | 32 entries, fully associative TLB (fDTLB).                                                                                                                                              |
| Store Buffer and Write Buffer  | Decouples the pipeline from the latency of store operations. Allows the pipeline to continue flowing while the store waits for data, and eventually writes into the data level 1 cache. |

## 1.3.5 Secondary Cache and External Access Unit (SXU)

The SXU controls the operation of the unified level-2 caches and the external data access interface (Jupiter Bus). TABLE 1-4 describes the major blocks of the SXU.

| Name                  | Description                                                                                                                                                                      |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Unified level-2 cache | 6-Mbyte, 12-way associative, 256-byte line (four 64-byte sublines),<br>writeback; provides low latency data source for both instruction level-1<br>cache and data level-1 cache. |
| Movein buffer         | Catches returning data from the memory system in response to the cache line read request.                                                                                        |

 TABLE 1-4
 Secondary Cache and External Access Unit Major Blocks

| Name                                | Description                                                                             |
|-------------------------------------|-----------------------------------------------------------------------------------------|
| Moveout buffer                      | Holds writeback data to memory.                                                         |
| Jupiter Bus interface control logic | Send/receive transaction packets to/from Jupiter Bus interface connected to the system. |

 TABLE 1-4
 Secondary Cache and External Access Unit Major Blocks

# Definitions

This chapter defines concepts unique to SPARC64 VII, the Fujitsu implementation of SPARC JPS1. For definition of terms that are common to all implementations, please refer to Chapter 2 of **Commonality**.

- **committed** Term applied to an instruction when it has completed without error and *all* prior instructions have completed without error *and have been committed*. When an instruction is committed, the state of the machine is permanently changed to reflect the result of the instruction; the previously existing state is no longer needed and can be discarded.
- **completed** Term applied to an instruction after it has *finished*, has sent a non-error status to the issue unit, and all of its source operands are non-speculative. **Note:** Although the state of the machine has been temporarily altered by completion of an instruction, the state has not yet been permanently changed and the old state can be recovered until the instruction has been *committed*.
- **executed** Term applied to an instruction that has been processed by an execution unit such as a load unit. An instruction is in execution as long as it is still being processed by an execution unit.
- **fetched** Term applied to an instruction that is obtained from the I1 instruction cache or from the on-chip internal buffer and sent to the issue unit.
- **finished** Term applied to an instruction when it has completed execution in a functional unit and has forwarded its result onto a result bus. Results on the result bus are transferred to the register file, as are the waiting instructions in the instruction queues.
- **instruction initiated** Term applied to an instruction when it has all of the resources that it needs (for example, source operands) and has been selected for execution.

**instruction dispatched** Synonym: **instruction initiated**.

instruction issued Term applied to an instruction when it has been dispatched to a reservation station.

| instruction retired             | Term applied to an instruction when all machine resources (serial numbers, renamed registers) have been reclaimed and are available for use by other instructions. An instruction can only be retired after it has been <i>committed</i> .                                                                                                                                                                                                                                                                                                                   |
|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| instruction stall               | Term applied to an instruction that is not allowed to be issued. Not every instruction can be issued in a given cycle. The SPARC64 VII implementation imposes certain issue constraints based on resource availability and program requirements.                                                                                                                                                                                                                                                                                                             |
| issue-stalling<br>instruction   | An instruction that prevents new instructions from being issued until it has committed.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| machine sync                    | The state of a machine when all previously executing instructions have committed; that is, when no issued but uncommitted instructions are in the machine.                                                                                                                                                                                                                                                                                                                                                                                                   |
| Memory Management<br>Unit (MMU) | Refers to the address translation hardware in SPARC64 VII that translates a 64-bit virtual address into physical address. The MMU is composed of the mITLB, mDTLB, uITLB, uDTLB, and the ASI registers used to manage address translation.                                                                                                                                                                                                                                                                                                                   |
| mTLB                            | Main TLB. Split into I and D, called mITLB and mDTLB, respectively. Contains address translations for the uITLB and uDTLB. When the uITLB or uDTLB do not contain a translation, they ask the mTLB for the translation. If the mTLB contains the translation, it sends the translation to the respective uTLB. If the mTLB does not contain the translation, it generates a fast access exception to a software translation trap handler, which will load the translation information (TTE) into the mTLB and retry the access. <i>See also</i> <b>TLB</b> . |
| uDTLB                           | Micro Data TLB. A small, fully associative buffer that contains address translations for data accesses. Misses in the uDTLB are handled by the mTLB.                                                                                                                                                                                                                                                                                                                                                                                                         |
| uITLB                           | Micro Instruction TLB. A small, fully associative buffer that contains address translations for instruction accesses. Misses in the uTLB are handled by the mTLB.                                                                                                                                                                                                                                                                                                                                                                                            |
| МТР                             | Multi Threaded Processor. A processor module containing more than one thread. (May also be used as an acronym for Multi threaded Processing.)                                                                                                                                                                                                                                                                                                                                                                                                                |
| non-speculative                 | A distribution system whereby a result is guaranteed known correct or an operand state<br>is known to be valid. SPARC64 VII employs speculative distribution, meaning that<br>results can be distributed from functional units before the point at which guaranteed<br>validity of the result is known.                                                                                                                                                                                                                                                      |
| physical core                   | A physical core includes an execution pipeline and associated structures, such as caches, that are required for performing the execution of instructions from one or more software threads. A physical core contains one or more threads. The physical core provides the necessary resources for each thread to make forward progress at a reasonable rate.                                                                                                                                                                                                  |
| processor module                | A <i>processor module</i> is the unit on which a shared interface is provided to control the configuration and execution of a collection of threads. A <i>processor module</i> contains one or more physical cores, each of which contains one or more threads. On a more                                                                                                                                                                                                                                                                                    |

physical side, a *processor module* is a physical module that plugs into a system. And a *processor module* is expected to appear logically as a single agent on the system interconnect fabric.

- **reclaimed** The status when all instruction-related resources that were held until *commit* have been released and are available for subsequent instructions. Instruction resources are usually reclaimed a few cycles after they are committed.
- **rename registers** A large set of hardware registers implemented by SPARC64 VII that are invisible to the programmer. Before instructions are *issued*, source and destination registers are mapped onto this set of rename registers. This allows instructions that normally would be blocked, waiting for an architecture register, to proceed in parallel. When instructions are *committed*, results in renamed registers are posted to the architecture registers in the proper sequence to produce the correct program results.
- **reservation station** A holding location that buffers dispatched instructions until all input operands are available. SPARC64 VII implements dataflow execution based on operand availability. When operands are available, the instructions in the reservation station are scheduled for execution. Reservation stations also contain special tag-matching logic that captures the appropriate operand data. Reservation stations are sometimes referred to as queues (for example, the integer queue).
  - **scan** A method used to initialize all of the machine state within a chip. In a chip that has been designed to be scannable, all of the machine state is connected in one or several loops called "scan rings." Initialization data can be scanned into the chip through the scan rings. The state of the machine also can be scanned out through the scan rings.
  - **sleeping** Describes a thread that is suspended from operation. While sleeping, a thread is not issuing instructions for execution but still maintains cache coherency. Unlike *suspended*, a *sleeping* thread awakes automatically within limited number of cycles.
  - **speculative** A distribution system whereby a result is not guaranteed as known to be correct or an operand state is not known to be valid. SPARC64 VII employs speculative distribution, meaning results can be distributed from functional units before the point at which guaranteed validity of the result is known.
  - **superscalar** An implementation that allows several instructions to be issued, executed, and committed in one clock cycle. SPARC64 VII *issues* up to 4 instructions per clock cycle.
  - **suspended** Describes a thread that is suspended from operation. When suspended, a thread is not issuing instructions for execution but still maintains cache coherency. Unlike *sleeping*, a *suspended* thread does not awake automatically without certain stimuli.
    - sync Synonym: machine sync.
- **syncing instruction** An instruction that causes a *machine sync*. Thus, before a syncing instruction is issued, all previous instructions (in program order) must have been committed. At that point, the syncing instruction is issued, executed, completed, and committed by itself.

**thread** A term that identifies the hardware state used to hold a software thread in order to execute it. A thread is specifically the software visible architecture state (PC, next PC, general purpose registers, floating-point registers, condition codes, status registers, ASRs, etc.) of a thread and any micro architecture state required by hardware for its execution.

F.CHAPTER 3

# Architectural Overview

Please refer to Chapter 3 in Commonality.

F.CHAPTER **4** 

# Data Formats

Please refer to Chapter 4 in Commonality.

# Registers

The SPARC64 VII processor includes two types of registers: general-purpose—that is, working, data, control/status—and ASI registers.

The SPARC V9 architecture also defines two implementation-dependent registers: the IU Deferred-Trap Queue and the Floating-Point Deferred-Trap Queue (FQ); SPARC64 VII does not need or contain either queue. All processor traps caused by instruction execution are precise, and there are several disrupting traps caused by asynchronous events, such as interrupts, asynchronous error conditions, and RED\_state entry traps.

For general information, please see parallel subsections of Chapter 5 in **Commonality**. For easier referencing, this chapter follows the organization of Chapter 5 in **Commonality**.

For information on MMU registers, please refer to Section F.10, *Internal Registers and ASI Operations*, on page 109.

The chapter contains these sections:

- Nonprivileged Registers on page 15
- Privileged Registers on page 17

# 5.1 Nonprivileged Registers

Most of the definitions for the registers are as described in the corresponding sections of **Commonality**. Only SPARC64 VII-specific features are described in this section.

## 5.1.7 Floating-Point State Register (FSR)

Please refer to Section 5.1.7 of Commonality for the description of FSR.

The sections below describe SPARC64 VII-specific features of the FSR register.

### FSR\_nonstandard\_fp (NS)

SPARC V9 defines the FSR.NS bit which, when set to 1, causes the FPU to produce implementation-dependent results that may not conform to IEEE Std 754-1985. SPARC64 VII implements this bit.

When FSR.NS = 1, denormalized input operands and denormalized results that would otherwise trap are flushed to 0 of the same sign and an inexact exception is signalled (that may be masked by FSR.TEM.NXM). See Section B.6, *Floating-Point Nonstandard Mode*, on page 77 for details.

When FSR.NS = 0, the normal IEEE Std 754-1985 behavior is implemented.

### FSR\_version (ver)

For each SPARC V9 IU implementation (as identified by its VER.impl field), there may be one or more FPU implementations or none. This field identifies the particular FPU implementation present. For the first SPARC64 VII, FSR.ver = 0 (impl. dep. #19); however, future versions of the architecture may set FSR.ver to other values. Consult the SPARC64 VII Data Sheet for the setting of FSR.ver for your chipset.

### FSR\_floating-point\_trap\_type (*ftt*)

The complete conditions under which SPARC64 VII triggers *fp\_exception\_other* with trap type *unfinished\_FPop* is described in Section B.6, *Floating-Point Nonstandard Mode*, on page 77 (impl. dep. #248).

### FSR\_current\_exception (cexc)

Bits 4 through 0 indicate that one or more IEEE\_754 floating-point exceptions were generated by the most recently executed FPop instruction. The absence of an exception causes the corresponding bit to be cleared.

In SPARC64 VII, the cexc bits are set according to the following pseudocode:

```
if (<LDFSR or LDXFSR commits>)
        <update using data from LDFSR or LDXFSR>;
else if (<FPop commits with ftt = 0>)
        <update using value from FPU>
else if (<FPop commits with IEEE_754_exception>)
        <set one bit in the CEXC field as supplied by FPU>;
else if (<FPop commits with unfinished_FPop error>)
        <no change>;
else if (<FPop commits with unimplemented_FPop error>)
        <no change>;
else
        <no change>;
```

### FSR Conformance

SPARC V9 allows the TEM, cexc, and aexc fields to be implemented in hardware in either of two ways (both of which comply with IEEE Std 754-1985). SPARC64 VII follows case (1); that is, it implements all three fields in conformance with IEEE Std 754-1985. See FSR Conformance in Section 5.1.7 of **Commonality** for more information about other implementation methods.

# 5.1.9 Tick (TICK) Register

SPARC64 VII implements TICK. counter register as a 63-bit register (impl. dep. #105).

**Implementation Note** – On SPARC64 VII, the counter part of the value returned when the TICK register is read is the value of TICK.counter when the RDTICK instruction is *executed*. The difference between the counter values read from the TICK register on two reads reflects the number of processor cycles executed between the *executions* of the RDTICK instructions, not their *commits*. In longer code sequences, the difference between this value and the value that would have been obtained when the instructions are committed would be small.

# 5.2 Privileged Registers

Please refer to Section 5.2 of Commonality for the description of privileged registers.

## 5.2.6 Trap State (TSTATE) Register

SPARC64 VII implements only bits 2:0 of the TSTATE. CWP field. Writes to bits 4 and 3 are ignored, and reads of these bits always return zeroes.

**Note** – Spurious setting of the PSTATE.RED bit by privileged software should not be performed, since it will take the SPARC64 VII into RED\_state without the required sequencing.

## 5.2.9 Version (VER) Register

TABLE 5-1 shows the values for the VER register for SPARC64 VII.

TABLE 5-1 VER Register Encoding

| Bits  | Field  | Value                                                    |
|-------|--------|----------------------------------------------------------|
| 63:48 | manuf  | 0004 <sub>16</sub> (impl. dep. #104)                     |
| 47:32 | impl   | 7                                                        |
| 31:24 | mask   | n (The value of n depends on the processor chip version) |
| 15:8  | maxtl  | 5                                                        |
| 4:0   | maxwin | 7                                                        |

The manuf field contains Fujitsu's 8-bit JEDEC code in the lower 8 bits and zeroes in the upper 8 bits. The manuf, impl, and mask fields are implemented so that they may change in future SPARC64 processor versions. The mask field generally increases numerically with successive releases of the processor, but does not necessarily increase by one for consecutive releases.

## 5.2.11 Ancillary State Registers (ASRs)

Please refer to Section 5.2.11 of Commonality for details of the ASRs.

### Performance Control Register (PCR) (ASR 16)

SPARC64 VII implements the PCR register as described in **Commonality**, with additional features as described in this section.

In SPARC64 VII, the accessibility of PCR when PSTATE.PRIV = 0 is determined by PCR.PRIV. If PSTATE.PRIV = 0 and PCR.PRIV = 1, an attempt to execute either RDPCR or WRPCR will cause a *privileged\_action* exception. If PSTATE.PRIV = 0 and PCR.PRIV = 0, RDPCR operates without privilege violation and WRPCR causes a *privileged\_action* exception only when an attempt is made to change (that is, write 1 to) PCR.PRIV (impl. dep. #250).

See Appendix Q for a detailed discussion of the PCR and PIC register usage and event count definitions.

# The Performance Control Register in SPARC64 VII is illustrated in FIGURE 5-1 and described in TABLE 5-2.

|    | 0 |    | 0\ | √F  | 0    | OVRO | 0  | NC    | 0  | SC    | 0  | SL | J  | 0  | SL |   | ULRO | UT | ST | PRIV |
|----|---|----|----|-----|------|------|----|-------|----|-------|----|----|----|----|----|---|------|----|----|------|
| 63 |   | 48 | 47 | 323 | 1 27 | 26   | 25 | 24 22 | 21 | 20 18 | 17 | 16 | 11 | 10 | 9  | 4 | 3    | 2  | 1  | 0    |

FIGURE 5-1 SPARC64 VII Performance Control Register (PCR) (ASR 16)

### TABLE 5-2PCR Bit Description

| Bit   | Field | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |  |  |  |  |  |  |  |  |  |  |  |  |
|-------|-------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|--|--|--|--|--|--|
| 47:32 | OVF   | Overflow Clear/Set/Status. Used to read counter overflow status (via RDPCR) and clear or set counter overflow status bits (via WRPCR). PCR.OVF is a SPARC64 VII-specific field (impl. dep. #207).                                                                                                                                                                                                                                                                                                                                                                                                    |  |  |  |  |  |  |  |  |  |  |  |  |
|       |       | The following figure depicts the bit layout of SPARC64 VII OVF field for four counter pairs.<br>Counter status bits are cleared on write of 0 to the appropriate OVF bit.                                                                                                                                                                                                                                                                                                                                                                                                                            |  |  |  |  |  |  |  |  |  |  |  |  |
|       |       | 0 U3 L3 U2 L2 U1 L1 U0 L0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |  |  |  |  |  |  |  |  |  |  |  |  |
|       |       | 15 7 6 5 4 3 2 1 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |  |  |  |  |  |  |  |  |  |
| 26    | OVRO  | Overflow read-only. Write-only/read-as-zero field specifying PCR.OVF update behavior for WRPCR. The OVRO field is implementation dependent (impl. dep. #207). WRPCR with PCR.OVRO = 1 inhibits updating of PCR.OVF for the current write only. The intention of PCR.OVRO is to write PCR while preserving current PCR.OVF value. PCR.OVF is maintained internally by hardware, so a subsequent RDPCR returns accurate overflow status at the time.                                                                                                                                                   |  |  |  |  |  |  |  |  |  |  |  |  |
| 24:22 | NC    | Number of counter pairs. Three-bit, read-only field specifying the number of counter pairs, encoded as 0–7 for 1–8 counter pairs (impl. dep. #207). For SPARC64 VII, the hardcoded value of NC is 3 (indicating presence of 4 counter pairs).                                                                                                                                                                                                                                                                                                                                                        |  |  |  |  |  |  |  |  |  |  |  |  |
| 20:18 | SC    | Select PIC. In SPARC64 VII, three-bit field specifying which counter pair is currently selected as PIC (ASR 17) and which SU/SL values are visible to software. On write, PCR.SC selects which counter pair is updated. On read, currently selected PIC is returned.                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |  |  |  |  |  |  |  |
| 16:11 | SU    | Defined (as S1) in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |  |  |  |  |  |  |  |
| 9:4   | SL    | Defined (as S0) in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |  |  |  |  |  |  |  |
| 3     | ULRO  | Implementation-dependent field (impl. dep. #207) that specifies whether SU/SL are read-only. In SPARC64 VII, this field is write-only/read-as-zero, specifying update behavior of SU/SL on write. On a write with PCR.ULRO = 1, SU/SL are considered as read-only; the values set on PCR.SU/PCR.SL are not written into SU/SL. When PCR.ULRO = 0, SU/SL are updated. PCR.ULRO is intended to switch the visible PIC by writing PCR.SC, without affecting the current selection of SU/SL for that PIC. On PCR read, PCR.SU/PCR.SL always shows the current setting of the PIC regardless of PCR.ULRO. |  |  |  |  |  |  |  |  |  |  |  |  |
| 2     | UT    | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |  |  |  |  |  |  |  |  |  |  |  |  |
| 1     | ST    | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |  |  |  |  |  |  |  |  |  |  |  |  |
| 0     | PRIV  | Defined in <b>Commonality</b> , with the additional function of controlling PCR accessibility as described above (impl. dep. #250).                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |  |  |  |  |  |  |  |

### Performance Instrumentation Counter (PIC) Register (ASR 17)

The PIC register is implemented as described in Commonality.

Four PICs are implemented in SPARC64 VII. Each is accessed through ASR 17, using PCR.SC as a select field. Read/write access to the PIC will access the PICU/PICL counter pair selected by PCR. For PICU/PICL encoding of specific event counters, see Appendix Q.

On overflow, counters wrap to 0, SOFTINT register bit 15 is set, and an interrupt level-15 exception is generated. The counter overflow trap is triggered on the transition from value FFFF FFFF<sub>16</sub> to value 0. If multiple overflows are generated simultaneously, then multiple overflow status bits will be set. If overflow status bits are already set, then they remain set on counter overflow.

Overflow status bits are cleared by software writing 0 to the appropriate bit of PCR.OVF and may be set by writing 1 to the appropriate bit. Setting these bits by software does not generate a level 15 interrupt.

### Dispatch Control Register (DCR) (ASR 18)

The DCR is not implemented in SPARC64 VII. Zero is returned on read, and writes to the register are ignored. The DCR is a privileged register; attempted access by nonprivileged (user) code generates a *privileged\_opcode* exception.

## 5.2.12 Registers Referenced Through ASIs

### Data Cache Unit Control Register (DCUCR)

ASI  $45_{16}$  (ASI\_DCU\_CONTROL\_REGISTER), VA =  $0_{16}$ .

The Data Cache Unit Control Register contains fields that control several memory-related hardware functions. The functions include Instruction, Prefetch, write and data caches, MMUs, and watchpoint setting. SPARC64 VII implements most of DCUCUR's functions described in Section 5.2.12 of **Commonality**.

After a power-on reset (POR), all fields of DCUCR, including implementation-dependent fields, are set to 0. After a WDR, XIR, or SIR reset, all fields of DCUCR, including implementation-dependent fields, are set to 0.

The Data Cache Unit Control Register is illustrated in FIGURE 5-2 and described in TABLE 5-3. In the table, bits are grouped by function rather than by strict bit sequence.

|    | _  | 0  | 0  | Implementation dependent | WEAK_SPCA | PM    | VM    | PR | PW | VR | VW | _    | DM | IM | 0 | 0 |
|----|----|----|----|--------------------------|-----------|-------|-------|----|----|----|----|------|----|----|---|---|
| 63 | 50 | 49 | 48 | 47 42                    | 41        | 40 33 | 32 25 | 24 | 23 | 22 | 21 | 20 4 | 3  | 2  | 1 | 0 |

| Bits   | Field      | Туре | Use — Description                                                                                                                                                                                                                                                                                                                                                                       |
|--------|------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 49:48  | CP, CV     | RW   | Not implemented in SPARC64 VII (impl. dep. #232). It reads as 0 and writes to it are ignored.                                                                                                                                                                                                                                                                                           |
| 47:42  | impl. dep. |      | Not used. It reads as 0 and writes to it are ignored.                                                                                                                                                                                                                                                                                                                                   |
| 41     | WEAK_SPCA  | RW   | Disable speculative memory access (impl. dep. #240). When setting weak_spca = 1, the branch prediction mechanism is disabled and no load, store, or instruction fetches in the speculative path are issued. Loads and stores after the CTI instruction are also paused until the correct path is determined. Also, software prefetch instructions, including strong prefetch, are lost. |
|        |            |      | Due to the absence of branch prediction, all CTI instructions are considered as not taken, and subsequent instructions beyond CTI will be fetched. Instruction fetch is eventually stopped by an internal resource limitation, so the memory area being accessed beyond CTI is predictable.                                                                                             |
|        |            |      | L2 cache flush by supervisor software is always executed regardless of DCUCR.WEAK_SPCA setting. Autonomous L2 cache flush by RAS is pending until all DCUCR.WEAK_SPCA in a CPU module is set to 0.                                                                                                                                                                                      |
|        |            |      | In SPARC64 VII, the branch predection is disabled by setting weak_spca to 1 in either of the threads. That is, even though a thread does not set weak_spca it may sometimes with branch prediction disabled.                                                                                                                                                                            |
| 40:33  | PM<7:0>    |      | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                         |
| 32:25  | VM<7:0>    |      | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                         |
| 24, 23 | PR, PW     |      | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                         |
| 22, 21 | VR, VW     |      | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                         |
| 20:4   | -          |      | Reserved.                                                                                                                                                                                                                                                                                                                                                                               |
| 3      | DM         |      | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                         |
| 2      | IM         |      | Defined in <b>Commonality</b> .                                                                                                                                                                                                                                                                                                                                                         |
| 1      | DC         | RW   | Not implemented in SPARC64 VII (impl. dep. #252). It reads as 0 and writes to it are ignored.                                                                                                                                                                                                                                                                                           |
| 0      | IC         | RW   | Not implemented in SPARC64 VII (impl. dep. #253). It reads as 0 and writes to it are ignored.                                                                                                                                                                                                                                                                                           |

**Implementation Note** – When DCUCR.WEAK\_SPCA = 1, the memory area being accessed beyond CTI can not exceed 1KB of that CTI.

**Programming Note** – Supervisor software should issue membar #Sync immediately after setting DCUCR.WEAK\_SPCA = 1, to make sure no speculative memory access is issued thereafter.

**Programming Note** – Changing IM(IMMU enable) and DM(DMMU Enable) in DCUCR requires the following instruction sequence for SPARC64 VII to work correctly.

# DCUCR.IM update stxa DCUCR flush

#DCUDR.DM update stxa DCUCR membar #sync

### Data Watchpoint Registers

No implementation-dependent feature of SPARC64 VII reduces the reliability of data watchpoints (impl. dep. #244).

SPARC64 VII employs a conservative check of the PA/VA watchpoint for partial store instructions. See Section A.42, *Partial Store (VIS I)*, on page 68 for details.

In SPARC64 VII, the PA/VA watchpoint register is shared by both threads in a core.

### Instruction Trap Register

SPARC64 VII implements the Instruction Trap Register (impl. dep. #205).

In SPARC64 VII, the least significant 11 bits (bits 10:0) of a CALL or branch (BPcc, FBPfcc, Bicc, BPr) instruction in the instruction cache are identical to their architectural encoding (as it appears in main memory) (impl. dep. #245).

## 5.2.13 Floating-Point Deferred-Trap Queue (FQ)

SPARC64 VII does not contain a Floating-Point Deferred-trap Queue (impl. dep. #24). An attempt to read FQ with an RDPR instruction generates an *illegal\_instruction* exception (impl. dep. #25).

# 5.2.14 IU Deferred-Trap Queue

SPARC64 VII neither has nor needs an IU deferred-trap queue (impl. dep. #16)

# Instructions

This chapter presents SPARC64 VII implementation-specific instruction details and the processor pipeline information in these subsections:

- Instruction Execution on page 25
- Instruction Formats and Fields on page 27
- Instruction Categories on page 28
- *Processor Pipeline* on page 30

For additional, general information, please see parallel subsections of Chapter 6 in **Commonality**. For easy referencing, we follow the organization of Chapter 6 in **Commonality**.

# 6.1 Instruction Execution

SPARC64 VII is an advanced superscalar implementation of SPARC V9. Several instructions may be issued and executed in parallel. Although SPARC64 VII provides serial program execution semantics, some of the implementation characteristics described below are part of the architecture visible to software for correctness and efficiency.

## 6.1.1 Data Prefetch

SPARC64 VII employs speculative (out of program order) execution of instructions; in most cases, the effect of these instructions can be undone if the speculation proves to be incorrect.<sup>1</sup> However, exceptions can occur because of speculative data prefetching. Formally, SPARC64 VII employs the following rules regarding speculative prefetching:

<sup>1.</sup> An *async\_data\_error* may be signalled during speculative data prefetching.

- 1. If a memory operation *x* resolves to a volatile memory address (*location*[*x*]), SPARC64 VII will not speculatively prefetch *location*[*x*] for any reason; *location*[*x*] will be fetched or stored to only when operation *x* is *committable*.
- 2. If a memory operation x resolves to a nonvolatile memory address (*location[x]*), SPARC64 VII *may* speculatively prefetch *location[x]* subject, adhering to the following sub-rules:
  - a. If an operation x can be speculatively prefetched according to the prior rule, operations with store semantics are speculatively prefetched for ownership only if they are prefetched to cacheable locations. Operations without store semantics are speculatively prefetched even if they are noncacheable as long as they are not volatile.
  - b. Atomic operations (CAS (X) A, LDSTUB, SWAP) are never speculatively prefetched.

SPARC64 VII provides two mechanisms to avoid speculative execution of a load:

- 1. Avoid speculation by disallowing speculative accesses to certain memory pages or I/O spaces. This can be done by setting the E (side-effect) bit in the PTE for all memory pages that should not allow speculation. All accesses made to memory pages that have the E bit set in their PTE will be delayed until they are no longer speculative or until they are cancelled. See Appendix F for details.
- 2. Alternate space load instructions that force program order, such as  $ASI_PHYS_BYPASS_WITH\_EBIT[\_L]$  (AS I = 15<sub>16</sub>, 1D<sub>16</sub>), will not be speculatively executed.

## 6.1.2 Instruction Prefetch

The processor prefetches instructions to minimize cases where the processor must wait for instruction fetch. In combination with branch prediction, prefetching may cause the processor to access instructions that are not subsequently executed. In some cases, the speculative instruction accesses will reference data pages. SPARC64 VII does not generate a trap for any exception that is caused by an instruction fetch until all of the instructions before it (in program order) have been committed.<sup>1</sup>

## 6.1.3 Syncing Instructions

SPARC64 VII has instructions called *syncing instructions*, that stop execution for the number of cycles it takes to clear the pipeline and to synchronize the processor. There are two types of synchronization, *pre* and *post*. A presyncing instruction waits for all previous instructions

<sup>1.</sup> Hardware errors and other asynchronous errors may generate a trap even if the instruction that caused the trap is never committed.

to commit, commits by itself, and then issues successive instructions. A postsyncing instruction issues by itself and prevents the successive instructions from issuing until it is committed. Some instructions have both pre- and post-sync attributes.

In SPARC64 VII almost all instructions commit in order, but store instructions commit before becoming globally visible. A few syncing instructions cause the processor to discard prefetched instructions and to refetch the successive instructions.

## 6.2 Instruction Formats and Fields

Instructions are encoded in five major 32-bit formats and several minor formats. Please refer to Section 6.2 of **Commonality** for illustrations of four major formats. FIGURE 6-1 illustrates Format 5, unique to SPARC64 VII.

*Format 5* (op = 2, op3 = 37<sub>16</sub>): FMADD, FMSUB, FNMADD, FNMSUB, FPMADDXHI, and FPMADDX (*in place of* IMPDEP2A and IMPDEP2B)

|   | ор    |    | rd | op3   | rs1   | rs3  | var | size | rs2 |   |
|---|-------|----|----|-------|-------|------|-----|------|-----|---|
| З | 31 30 | 29 | 25 | 24 19 | 18 14 | 13 9 | 87  | 65   | 4   | 0 |

FIGURE 6-1 Summary of Instruction Formats: Format 5

Instruction fields are those shown in Section 6.2 of **Commonality**. Three additional fields are implemented in SPARC64 VII. They are described in TABLE 6-1.

TABLE 6-1 Instruction Fields Specific to SPARC64 VII

| Bits | Field | Description                                                                                                                                          |
|------|-------|------------------------------------------------------------------------------------------------------------------------------------------------------|
| 13:9 | rs3   | This 5-bit field is the address of the third f register source operand for the floating-point multiply-add and integer multiply-add instructions.    |
| 8:7  | var   | This 2-bit field specifies which specific operation (variation) to perform for the floating-point multiply-add and integer multiply-add instructions |
| 6:5  | size  | This 2-bit field specifies the size of the operands for the floating-point multiply-add and integer multiply-add instructions.                       |

Since  $size = 11_2$  assumes quad operations but is not implemented in SPARC64 VII, an instruction with  $size = 11_2$  generates an *illegal\_instruction* exception in SPARC64 VII.

# 6.3 Instruction Categories

SPARC V9 instructions comprise the categories listed below. All categories are described in Section 6.3 of **Commonality**. Subsections in bold face are SPARC64 VII implementation dependencies.

- Memory access
- Memory synchronization
- Integer arithmetic
- Control transfer (CTI)
- Conditional moves
- Register window management
- State register access
- Privileged register access
- Floating-point operate (FPop)
- Implementation-dependent

## 6.3.3 Control-Transfer Instructions (CTIs)

These are the basic control-transfer instruction types:

- Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc)
- Unconditional branch
- Call and link (CALL)
- Jump and link (JMPL, RETURN)
- Return from trap (DONE, RETRY)
- Trap (Tcc)

Instructions other than CALL and JMPL are described in their entirety in Section 6.3.2 of **Commonality**. SPARC64 VII implements CALL and JMPL as described below.

### CALL and JMPL Instructions

SPARC64 VII writes all 64 bits of the PC into the destination register when PSTATE.AM = 0. The upper 32 bits of r[15] (CALL) or of r[rd] (JMPL) are written as zeroes when PSTATE.AM = 1 (impl. dep. #125).

SPARC64 VII implements JMPL and CALL return prediction hardware in the form of a special stack, called the Return Address Stack (RAS). Whenever a CALL or JMPL that writes to %07 (r[15]) occurs, SPARC64 VII "pushes" the return address (%PC+8) onto the RAS. When either of the synthetic instructions *retl* (JMPL [%07+8]) or *ret* (JMPL [%17+8]) are subsequently executed, the return address is predicted to be the address stored on the top of

the RAS and the RAS is "popped." If the prediction in the RAS is incorrect, SPARC64 VII backs up and starts issuing instructions from the correct target address. This backup takes a few extra cycles.

**Programming Note** – For maximum performance, software and compilers must take into account how the RAS works. For example, tricks that do nonstandard returns in hopes of boosting performance may require more cycles if they cause the wrong RAS value to be used for predicting the address of the return. Heavily nested calls can also cause earlier entries in the RAS to be overwritten by newer entries, since the RAS only has a limited number of entries. Eventually, some return addresses will be mis-predicted because of the overflow of the RAS.

## 6.3.7 Floating-Point Operate (FPop) Instructions

The complete conditions of generating an *fp\_exception\_other exception with FSR.ftt = unfinished\_FPop are described in* Section B.6, *Floating-Point Nonstandard Mode*, on page 77.

The SPARC64 VII-specific FMADD, FMSUB, FPMADDXHI, and FPMADDX instructions (described below) are also floating-point operations. They require the floating-point unit to be enabled; otherwise, an *fp\_disabled* trap is generated. The Floating-point multiply-add instructions also affect the FSR, like FPop instructions, while integer multiply-add instructions don't. These instructions are not included in the FPop category and, hence, reserved encodings in these opcodes generate an *illegal\_instruction* exception, as defined in Section 6.3.9 of **Commonality**.

## 6.3.8 Implementation-Dependent Instructions

SPARC64 VII uses the IMPDEP2 instruction to implement the floating-point multiply-add/ subtract, negative multiply-add/subtract and integer multiply-add instructions; these have an op3 field = 37<sub>16</sub> (IMPDEP2). See Section A.24.1, *Floating-Point Multiply-Add/Subtract*, on page 55 and Section A.24.4, *Integer Multiply-Add*, on page 61 for full definitions of these instructions. Opcode space is reserved in IMPDEP2 for the quad-precision forms of these instructions. However, SPARC64 VII does not currently implement the quad-precision forms, and the processor generates an *illegal\_instruction* exception if a quad-precision form is specified. Since these instructions are not part of the required SPARC V9 architecture, the operating system does not supply software emulation routines for the quad versions of these instructions.

SPARC64 VII uses the IMPDEP1 instruction to implement the graphics acceleration instructions.

# 6.4 Processor Pipeline

The pipeline of SPARC64 VII consists of fifteen stages, shown in FIGURE 6-2. Each stage is referenced by one or two letters as follows:

IA IT IM IB IR

E D P B X U C W Ps Ts Ms Bs Rs

FIGURE 6-2 SPARC64 VII pipeline stages

## 6.4.1 Instruction Fetch Stages

- IA: Instruction Address generation
- IT: Instruction TLB Tag access
- IM: Instruction cache tag Match
- IB: Instruction cache read to Buffer
- IR: Instruction read Result

IA through IR stages are dedicated to instruction fetch. These stages work in concert with the cache access unit to supply instructions to subsequent stages. The instructions fetched from memory or cache are stored in the Instruction Buffer (I-buffer).

SPARC64 VII has a branch prediction mechanism and resources named BRHIS (BRanch HIStory) and RAS (Return Address Stack). Instruction fetch stages use these resources to determine fetch addresses.

Instruction fetch stages are designed so that they work independently of subsequent stages as much as possible. And they can fetch instructions even when execution stages stall. These stages fetch until the Instruction Buffer I-Buffer is full; further fetches are possible by requesting prefetches to the L1 cache.



FIGURE 6-3 SPARC64 VII Pipeline Diagram

## 6.4.2 Issue Stages

- E: Entry
- D: Decode

SPARC64 VII is an out-of-order execution CPU. It has six execution units (two arithmetic and logic units, two floating-point units, two load/store units). Every unit except the load/ store unit has its own reservation station. E and D stages are issue stages that decode instructions and dispatch them to the target RS. SPARC64 VII can issue up to four instructions per cycle.

The resources needed to execute an instruction are assigned in the issue stages. The resources to be allocated include the following:

- Commit stack entry (CSE)
- Renaming registers of integer (GUB) and floating-point (FUB)
- Entries of reservation stations
- Memory access ports

Resources needed for an instruction are specific to the instruction, but all resources must be assigned at these stages. In normal execution, assigned resources are released at the very last stage of the pipeline, W-stage.<sup>1</sup> Instructions between the E-stage and W-stage are considered to be in-flight. When an exception is signalled, all in-flight instructions and the resources used by them are released immediately. This behavior enables the decoder to restart issuing instructions as quickly as possible.

## 6.4.3 Execution Stages

- P: Priority
- B: Buffer read
- X: Execute
- U: Update

Instructions in reservation stations will be executed when certain conditions are met, for example, the values of source registers are known, the execution unit is available. Execution latency varies from one to many cycles, depending on the instruction.

<sup>1.</sup> An entry in a reservation station is released at the X-stage.

### **Execution Stages for Cache Access**

Memory access requests are passed to the cache access pipeline after the target address is calculated. Cache access stages work the same way as instruction fetch stages, except for the handling of branch prediction. See Section 6.4.1, *Instruction Fetch Stages*, for details. Stages in instruction fetch and cache access correspond as follows:

| Instruction Fetch Stages | Cache Access |
|--------------------------|--------------|
| IA                       | Ps           |
| IT                       | Ts           |
| IM                       | Ms           |
| IB                       | Bs           |
| IR                       | Rs           |

When an exception is signalled, fetch ports and store ports used by memory access instructions are released. The cache access pipeline itself remains working in order to complete outgoing memory accesses. When data is returned, it is then stored to the cache.

## 6.4.4 Completion Stages

- W: Write
- After an out-of-order execution, execution reverts to program order to complete. Exception handling is done in the completion stages. Exceptions occurring in execution stages are not handled immediately but are signalled when the instruction is completed.<sup>1</sup>

<sup>1.</sup> RAS-related exception may be signalled before completion.

## Traps

Please refer to Chapter 7 of **Commonality**. Section numbers in this chapter correspond to those in Chapter 7 of **Commonality**.

This chapter adds SPARC64 VII-specific information in the following sections:

- Processor States, Normal and Special Traps on page 35
  - *RED\_state* on page 36
  - error\_state on page 36
- *Trap Categories* on page 37
  - Deferred Traps on page 37
  - Reset Traps on page 37
  - Uses of the Trap Categories on page 37
- *Trap Control* on page 38
  - *PIL Control* on page 38
- Trap-Table Entry Addresses on page 38
  - *Trap Type (TT)* on page 38
  - Details of Supported Traps on page 39
- Exception and Interrupt Descriptions on page 39

# 7.1 Processor States, Normal and Special Traps

Please refer to Section 7.1 of Commonality.

## 7.1.1 RED\_state

### RED\_state Trap Table

The RED\_state trap vector is located at an implementation-dependent address referred to as RSTVaddr. The value of RSTVaddr is a constant within each implementation; in SPARC64 VII this virtual address is FFFF FFFF F000  $0000_{16}$ , which translates to physical address  $0000 07FF F000 0000_{16}$  in RED\_state (impl. dep. #114).

#### **RED\_state Execution Environment**

In RED\_state, the processor is forced to execute in a restricted environment by overriding the values of some processor controls and state registers.

Note – The values are overridden, not set, allowing them to be switched atomically.

SPARC64 VII has the following implementation-dependent behavior in RED\_state (impl. dep. #115):

- While in RED\_state, all internal ITLB-based translation functions are disabled. DTLBbased translations are disabled upon entry but may be re-enabled by software while in RED\_state. Regardless, ASI-based access functions to the TLBs are still available.
- While mTLBs and uTLBs are disabled, all accesses are assumed to be noncacheable and strongly ordered for data access.
- XIR errors are not masked and can cause a trap.

**Note** – When RED\_state is entered because of component failures, the handler should attempt to recover from potentially catastrophic error conditions or to disable the failing components. When RED\_state is entered after a reset, the software should create the environment necessary to restore the system to a running state.

### 7.1.2 error\_state

The processor enters error\_state when a trap occurs while the processor is already at its maximum supported trap level (that is, when TL = MAXTL) (impl. dep. #39).

Although the standard behavior of the CPU upon an entry into error\_state is to internally generate a *watchdog\_reset* (WDR), the CPU optionally stays halted upon an entry to error\_state depending on a setting in the OPSR register (impl. dep #40, #254).

# 7.2 Trap Categories

Please refer to Section 7.2 of Commonality.

An exception or interrupt request can cause any of the following trap types:

- Precise trap
- Deferred trap
- Disrupting trap
- Reset trap

## 7.2.2 Deferred Traps

Please refer to Section 7.2.2 of Commonality.

SPARC64 VII implements a deferred trap to signal certain error conditions (impl. dep. #32). Please refer to the description of I\_UGE error on "Relation between %tpc and the instruction that caused the error" row in TABLE P-2 on page 179 for details. See also *Instruction End-Method at ADE Trap* on page 194.

## 7.2.4 Reset Traps

Please refer to Section 7.2.4 of Commonality.

In SPARC64 VII, a watchdog reset (WDR) occurs when the processor has not committed an instruction for  $2^{33}$  processor cycles.

## 7.2.5 Uses of the Trap Categories

Please refer to Section 7.2.5 of Commonality.

All exceptions that occur as the result of program execution are precise in SPARC64 VII (impl. dep. #33).

An exception caused after the initial access of a multiple-access load or store instruction (LDD(A), STD(A), LDSTUB, CASA, CASXA, or SWAP) that causes a catastrophic exception is precise in SPARC64 VII.

# 7.3 Trap Control

Please refer to Section 7.3 of Commonality.

### 7.3.1 PIL Control

SPARC64 VII receives external interrupts from the Jupiter Bus. They cause an *interrupt\_vector\_trap* ( $TT = 60_{16}$ ). The interrupt vector trap handler reads the interrupt information and then schedules SPARC V9-compatible interrupts by writing bits in the SOFTINT register. Please refer to Section 5.2.11 of **Commonality** for details.

During handling of SPARC V9-compatible interrupts by SPARC64 VII, the PIL register is checked. If an interrupt has sufficient priority, SPARC64 VII will stop issuing new instructions, will flush all uncommitted instructions, and then will pass to the trap handler. The only exception to this process occurs when SPARC64 VII is processing a higher-priority trap.

SPARC64 VII takes a normal disrupting trap upon receipt of an interrupt request.

## 7.4 Trap-Table Entry Addresses

Please refer to Section 7.4 of Commonality.

## 7.4.2 Trap Type (TT)

Please refer to Section 7.4.2 of Commonality.

SPARC64 VII implements all mandatory SPARC V9 and SPARC JPS1 exceptions, as described in Chapter 7 of **Commonality**, plus the exception listed in TABLE 7-1, which is specific to SPARC64 VII (impl. dep. #35; impl. dep. #36).

 TABLE 7-1
 Exceptions Specific to SPARC64 VII

| Exception or Interrupt Request | TT                | Priority |
|--------------------------------|-------------------|----------|
| async_data_error               | 040 <sub>16</sub> | 2        |

## 7.4.4 Details of Supported Traps

Please refer to Section 7.4.4 in Commonality.

#### SPARC64 VII Implementation-Specific Traps

SPARC64 VII supports the following implementation-specific trap type:

async\_data\_error

# 7.5 Trap Processing

Please refer to Section 7.5 of Commonality.

# 7.6 Exception and Interrupt Descriptions

Please refer to Section 7.6 of Commonality.

## 7.6.4 SPARC V9 Implementation-Dependent, Optional Traps That Are Mandatory in SPARC JPS1

Please refer to Section 7.6.4 of Commonality.

SPARC64 VII implements all six traps that are implementation dependent in SPARC V9 but mandatory in JPSI (impl. dep. #35). See Section 7.6.4 of **Commonality** for details.

## 7.6.5 SPARC JPS1 Implementation-Dependent Traps

Please refer to Section 7.6.5 of Commonality.

SPARC64 VII implements the following traps that are implementation dependent (impl. dep. #35).

- async\_data\_error [tt = 040<sub>16</sub>] (Preemptive or disrupting) (impl. dep. #218) SPARC64 VII implements the async\_data\_error exception to signal the following errors.
  - Uncorrectable errors in the internal architecture registers (general registers-gr, floating-point registers-fr, ASR, ASI registers)

- Uncorrectable errors in the core pipeline
- Watch dog time-out first time
- TLB access error upon access by an ldxa or stxa instruction

Multiple errors may be reported in a single generation of the *async\_data\_error* exception. Depending on the situation, the *async\_data\_error* trap becomes a precise trap, a disrupting trap, or a preemptive trap upon error detection. The TPC and TNPC stacked by the exception may indicate the exact instruction, the preceding instruction, or the subsequent instruction inducing the error. See Appendix P for details of the *async\_data\_error* exception in SPARC64 VII.

## Memory Models

The SPARC V9 architecture is a *model* that specifies the behavior observable by software on SPARC V9 systems. Therefore, access to memory can be implemented in any manner, as long as the behavior observed by software conforms to that of the models described in Chapter 8 of **Commonality** and defined in Appendix D, *Formal Specification of the Memory Models*, also in **Commonality**.

The SPARC V9 architecture defines three different memory models: *Total Store Order* (*TSO*), *Partial Store Order* (*PSO*), and *Relaxed Memory Order* (*RMO*). All SPARC V9 processors must provide Total Store Order (or a more strongly ordered model, for example, Sequential Consistency) to ensure SPARC V8 compatibility.

Whether the PSO or RMO models are supported by SPARC V9 systems is implementation dependent; SPARC64 VII behaves in a manner that guarantees adherence to whichever memory model is currently in effect.

This chapter describes the following major SPARC64 VII-specific details of memory models.

■ SPARC V9 Memory Model on page 42

For general information, please see parallel subsections of Chapter 8 in **Commonality**. For easier referencing, this chapter follows the organization of Chapter 8 in **Commonality**, listing subsections whether or not there are implementation-specific details.

## 8.1 Overview

**Note** – The words "*hardware memory model*" denote the underlying hardware memory models as differentiated from the "SPARC V9 memory model," which is the memory model the programmer selects in PSTATE.MM.

SPARC64 VII supports only one mode of memory handling to guarantee correct operation under any of the three SPARC V9 memory ordering models (impl. dep. #113):

Total Store Order — All loads are ordered with respect to loads, and all stores are ordered with respect to loads and stores. This behavior is a superset of the requirements for the SPARC V9 memory models TSO, PSO, and RMO. When PSTATE.MM selects PSO or RMO, SPARC64 VII operates in this mode. Since programs written for PSO (or RMO) will always work if run under Total Store Order, this behavior is safe but does not take advantage of the reduced restrictions of PSO (or RMO).

# 8.4 SPARC V9 Memory Model

Please refer to Section 8.4 of Commonality.

In addition, this section describes SPARC64 VII-specific details about the processor/memory interface model.

### 8.4.5 Mode Control

SPARC64 VII implements Total Store Ordering for all PSTATE.MM. Writing  $11_2$  into PSTATE.MM also causes the machine to use TSO (impl. dep. #119). However, the encoding  $11_2$  should not be used, since future version of SPARC64 VII may use this encoding for a new memory model.

### 8.4.7 Synchronizing Instruction and Data Memory

All caches in a SPARC64 VII-based system (uniprocessor or multiprocessor) have a unified cache consistency protocol and implement strong coherence between instruction and data caches. Writes to any data cache cause invalidations to the corresponding locations in all

instruction caches; references to any instruction cache cause the corresponding modified data to be flushed and corresponding unmodified data to be invalidated from all data caches. The flush operation is still operative in SPARC64 VII, however.

## Multi-Threaded Processing

SPARC64 VII can process two threads in each of the four cores in the same processor module to provide a dense, high throughput system. This chapter specifies the required interface between hardware and software to handle multiple threads on the same processor module.

## 9.1 MTP structure

### 9.1.1 General MTP structure

Three structures are known for Multi threaded Processing.

1. Chip Multi Processing

One processor module includes multiple physical cores, where each physical core is able to run a single thread independently from other cores at any given time. This structure is called Chip Multi-Processing (CMP).

2. Multi-thread (MT)

One processor module includes a single physical core. The core is able to run multiple threads in parallel from the software's point of view. Although there is only a single physical core, the physical core behaves as if it were multiple virtual processors. This is because the core includes multiple software visible resources (PC, next PC, general purpose registers, floating-point registers, condition codes, status registers, ASRs, etc.). This virtual processor is called a thread.

There are two types of Multi-thread implementations.

a. Vertical Multi-thread (VMT)

The physical core is able to run only a single thread at any given time. But multiple threads can run in parallel from the software's point of view by using time-sharing techniques. That is, the core includes multiple software visible resources (PC, next PC, general purpose registers, floating-point registers, condition codes, status registers, ASRs, etc.), and hardware switches threads to run in a relatively-short time.

b. Simultaneous Multi-thread (SMT)

The physical core is able to run multiple threads at any given time. That is, the core includes multiple software visible resources (PC, next PC, general purpose registers, floating-point registers, condition codes, status registers, ASRs, etc.) as well as multiple execution units, and multiple threads run at the same time.

### 9.1.2 MTP structure of SPARC64 VII

SPARC64 VII implements a combination of CMP and SMT. That is, it has four physical cores where each core has two threads with an SMT structure. In other words, eight threads are able to run in parallel. The two threads which belong to the same physical core share most of the physical resources, while the four physical cores do not share any physical resources except the L2 cache and system interface.

Threads execution in SPARC64 VII is illustrated in FIGURE 9-1. Basically two threads in a core always active and execute instructions, but sometime stops due to cache miss, waiting for internal resources, and so on. Gaps in a thread in FIGURE 9-1 represent such kind of pause. Meanwhile, a thread can yield its execution priority with the help of software. See *How to control threads* on page 48. for detail.



FIGURE 9-1 Multiple threads in SPARC64 VII

# 9.2 MTP Programming Model

## 9.2.1 Thread independency

In principle, because the software visible resources are not shared between threads, each thread of SPARC64 VII is independent of each other like a conventional Symmetric Multi Processor. Even for supervisor software, this is true except in the following cases:

#### Shared TLBs

Thread0 and thread1 belong to the same physical core and share fTLB and sTLB. See Section F.12, *Translation Lookaside Buffer Hardware*, on page 129 for details.

#### Error handling

An error asynchronous to thread execution is always signalled to all related threads. See Section P.1, *Error Classes and Signalling*, on page 171 for details.

#### Issue and Committ Stage Contention

Although each thread has its own hardware for issuing and committing instructions, only one thread's hardware may operate at a time. This means that in a single cycle, only one thread's hardware gets exclusive access to issue or commit instructions (up to 4). Each cycle with 2 active threads, the priority automatically switches between thread 0 and thread 1 for both issuing and committing instructions.

#### Performance

Since each thread has its own software visible resources, they are independent of each other from the programming model point of view. But this is not true for performance. Since threads belonging to the same physical core share most of the physical resources, it is highly recommended for the OS to schedule threads in the following manner:

- Run threads belonging to the same process space on thread0 and thread1
- Suspend thread1 to run a single threaded program at maximum speed

**Note** – Since threads belonging to different physical cores share none of physical resources except the L2 cache and the system interface, it is not required to pay as much attention to them.

## 9.2.2 How to control threads

When controlling MT operation, it is important to note that there are 3 different classification states for a thread. A thread may be designated as one of the following:

- active: currently in execution
- empty: a thread is present but it is currently not undergoing execution
- suspend/sleep: no thread is present

In a single core, if one of the threads is designated as *suspend/sleep*, the core will enter single-thread mode. This is meant to enhance the execution performance of the lone thread executing in the core.

When in single-thread mode, two important things happen. One is that certain resources (invisibile to software) reserved for the second thread's execution are aggregated to the lone executing thread. The second is that the reamaining thread's issue and commit functions receive priority each cycle. This allows the remaining thread to achieve a greater instruction thoroughput.

There are special instructions for switching the state of a threads. For more information on relegating threads to a *suspend/sleep* state to halt their execution, see Section A.24.2, *Suspend*, on page 59 and Section A.24.3, *Sleep*, on page 60 for details.

### 9.2.3 Shared registers between threads

The following ASR and ASI registers are shared among all the threads within a processor module.

- PA/VA Watchpoint
- ASI\_SERIAL\_ID

# Instruction Definitions

This appendix describes the SPARC64 VII-specific implementation of the instructions in Appendix A of **Commonality**. If an instruction is not described in this appendix, then no SPARC64 VII implementation-dependency applies.

- See TABLE A-1 of **Commonality** for the location at which general information about the instruction can be found.
- Section numbers refer to the parallel section numbers in Appendix A of Commonality.

TABLE A-1 lists eight instructions that are unique to SPARC64 VII.

| Operation          | Name                                    | Page |  |
|--------------------|-----------------------------------------|------|--|
| FMADD(s,d)         | Floating-point multiply add             | 55   |  |
| FMSUB(s,d)         | Floating-point multiply subtract        | 55   |  |
| FNMADD(s,d)        | Floating-point multiply negate add      | 55   |  |
| FNMSUB(s,d)        | Floating-point multiply negate subtract | 55   |  |
| POPC               | Population Count                        | 69   |  |
| SUSPEND            | Suspend a thread                        | 59   |  |
| SLEEP              | Put a thread to sleep                   | 60   |  |
| FPMADDX, FPMADDXHI | Integer multiply-add                    | 61   |  |

 TABLE A-1
 Implementation-Specific Instructions

Each instruction definition consists of these parts:

- 1. A table of the opcodes defined in the subsection with the values of the field(s) that uniquely identify the instruction(s).
- 2. An illustration of the applicable instruction format(s). In these illustrations a dash (—) indicates that the field is *reserved* for future versions of the architecture and shall be 0 in any instance of the instruction. If a conforming SPARC V9 implementation encounters nonzero values in these fields, its behavior is undefined.
- 3. A list of the suggested assembly language syntax, as described in Appendix G.

- 4. A description of the features, restrictions, and exception-causing conditions.
- 5. A list of exceptions that can occur as a consequence of attempting to execute the instruction(s). Exceptions due to an *instruction\_access\_error*, *instruction\_access\_exception*, *fast\_instruction\_access\_MMU\_miss*, *async\_data\_error*, *ECC\_error*, and interrupts are not listed because they can occur on any instruction.

Also, any instruction that is not implemented in hardware shall generate an *illegal\_instruction* exception (or *fp\_exception\_other* exception with ftt = *unimplemented\_FPop* for floating-point instructions) when it is executed.

The *illegal\_instruction* trap can occur during chip debug on any instruction that has been programmed into the processor's IIU\_INST\_TRAP (ASI =  $60_{16}$ , VA = 0). These traps are also not listed under each instruction.

The following traps never occur in SPARC64 VII:

- instruction\_access\_MMU\_miss
- data\_access\_MMU\_miss
- data\_access\_protection
- unimplemented\_LDD
- unimplemented\_STD
- LDQF\_mem\_address\_not\_aligned
- STQF\_mem\_address\_not\_aligned
- internal\_processor\_error
- fp\_exception\_other (ftt = invalid\_fp\_register)

This appendix does not include any timing information (in either cycles or clock time).

The following SPARC64 VII-specific extensions are described.

- Block Load and Store Instructions (VIS I) on page 51
- *Call and Link* on page 53
- Implementation-Dependent Instructions on page 54
- Jump and Link on page 63
- Load Quadword, Atomic [Physical] on page 64
- *Memory Barrier* on page 66
- Partial Store (VIS I) on page 68
- *Prefetch Data* on page 70
- *Read State Register* on page 72
- SHUTDOWN (VIS I) on page 73
- Write State Register on page 74
- Deprecated Instructions on page 75

# A.4 Block Load and Store Instructions (VIS I)

The following notes summarize behavior of block load/store instructions in SPARC64 VII.

- 1. Block load and store operations are not atomic, in that they are internally decomposed into eight independent, 8-byte load/store operations in SPARC64 VII. Each load/store is always issued and performed in the RMO memory model and obeys all prior MEMBAR and atomic instruction-imposed ordering constraints.
- 2. Block load/store instructions are out of the scope of V9 memory models, meaning that self-consistency of memory reference instruction is not always maintained if block load/ store instructions are involved in the execution flow. The following table describes the implemented ordering constraints for block load/store instructions with respect to the other memory reference instructions with an operand address conflict in SPARC64 VII:

| Program Order for | conflicting bld/bst/ld/st | Ordered/     |  |
|-------------------|---------------------------|--------------|--|
| first             | next                      | Out-of-Order |  |
| store             | blockstore                | Ordered      |  |
| store             | blockload                 | Ordered      |  |
| load              | blockstore                | Ordered      |  |
| load              | blockload                 | Ordered      |  |
| blockstore        | store                     | Out-of-Order |  |
| blockstore        | load                      | Out-of-Order |  |
| blockstore        | blockstore                | Out-of-Order |  |
| blockstore        | blockload                 | Out-of-Order |  |
| blockload         | store                     | Ordered      |  |
| blockload         | load                      | Ordered      |  |
| blockload         | blockstore                | Ordered      |  |
| blockload         | blockload                 | Ordered      |  |

To maintain the memory ordering even for the memory address conflicts, MEMBAR instructions shall be inserted into appropriate locations in the program.

Although self-consistency with respect to the block load/store and the other memory reference instructions is not maintained in some cases, register conflicts between the other instructions and block load/store instructions are maintained in SPARC64 VII. The read-after-write, write-after-read, and write-after-write obstructions between a block load/store instruction and the other arithmetic instructions are detected and handled appropriately.

- 3. Block load instructions operate on the cache if the operand is present.
- 4. The block store with commit instruction always stores the operand in main storage and invalidates the line in the L1D and L2 cache if it is present.

5. The block store instruction stores the operand into main storage if it is not present in the L1D and the status of the line is invalid, shared, or owned. In case the line is not present in the L1D cache and is exclusive or modified in the L2 cache, the block store instruction modifies only the line in L2 cache. If the line is present in the L1D and the status is either clean/shared or clean/owned, the line is stored in main storage. If the line is present in the L1D and the operand is stored in the L2 cache. If the line is in the L1D is invalidated and the operand is stored in the L2 cache. If the line is in the L1D and the status is modified/modified or clean/modified, the operand is stored in the L1D or L2 with L1D invalidation, respectively. The following table summarizes each cache status before block store and the results of the block store. Blank cells mean that no action occurred in the corresponding cache or memory, and the data, if it exists, is unchanged<sup>1</sup>.

|              | Storage |        |         | Status     |                       |        |  |
|--------------|---------|--------|---------|------------|-----------------------|--------|--|
| Cache status | L1      | I      | Invalid |            | Valid                 |        |  |
| before bst   | L2      | Е, М   | I, S, O | Е          | М                     | S, O   |  |
| •            | L1      | _      | —       | invalidate | update/<br>invalidate | _      |  |
| Action       | L2      | update | —       | update     | —/update              | —      |  |
|              | Memory  | _      | update  | _          | _                     | update |  |

6. The block load and block store instructions on a page with TTE.E = 0 may signal a fast\_data\_access\_MMU\_miss trap in the any 8-byte load or store in a 64-byte data when the TTE being used is dropped by the other thread. On a block load, the registers may contain new value or old value. The incompleted block load instructions will be re-executed at the first 8-byte load after TLB miss handling is done. When the trap is signalled on a block store, none of the registers value is written into the memory or cache.

Exceptions fp\_disabled

PA\_watchpoint PA\_watchpoint VA\_watchpoint illegal\_instruction (misaligned rd) mem\_address\_not\_aligned (see Block Load and Store ASIs on page 140) data\_access\_exception (see Block Load and Store ASIs on page 140) LDDF\_mem\_address\_not\_aligned (see Block Load and Store ASIs on page 140) data\_access\_error fast\_data\_access\_MMU\_miss fast\_data\_access\_protection

<sup>1.</sup> The inconsistency between memory and caches will eventually resolved by an invalidation request from the system.

# A.12 Call and Link

SPARC64 VII clears the upper 32 bits of the PC value in r[15] when PSTATE. AM is set (impl. dep. #125). The value written into r[15] is visible to the instruction in the delay slot.

SPARC64 VII has a special hardware table, called Return Address Stack, to predict the return address from a subroutine. Though the return prediction stack achieves better performance in normal cases, there is a special use of the CALL instruction (call.+8) that may have an undesirable effect on the return address stack. In this case, the CALL instruction is used to read the PC contents, not to call a subroutine. In SPARC64 VII, the return address of the CALL (PC + 8) is not stored in its return address stack, to avoid a detrimental performance effect. When a ret or retl is executed, the value in the return address stack is used to predict the return address.

# A.24 Implementation-Dependent Instructions

| Opcode  | ор3     | Operation                              |
|---------|---------|----------------------------------------|
| IMPDEP1 | 11 0110 | Implementation-Dependent Instruction 1 |
| IMPDEP2 | 11 0111 | Implementation-Dependent Instruction 2 |

The IMPDEP1 and IMPDEP2 instructions are completely implementation dependent. Implementation-dependent aspects include their operation, the interpretation of bits 29-25 and 18-0 in their encoding, and which (if any) exceptions they may cause.

SPARC64 VII uses IMPDEP1 to encode VIS, SUSPEND, and SLEEP instructions (impl. dep. #106), IMPDEP2A to encode the Integer Multiply-Add instructions, and IMPDEP2B to encode the Floating-Point Multiply Add/Subtract instructions (impl. dep. #106).

See I.1.2, *Implementation-Dependent and Reserved Opcodes*, in **Commonality** for information about extending the SPARC V9 instruction set by means of the implementation-dependent instructions.

**Compatibility Note** – These instructions replace the CPop*n* instructions in SPARC V8.

*Exceptions* implementation-dependent

## A.24.1 Floating-Point Multiply-Add/Subtract

SPARC64 VII uses IMPDEP2B opcode space to encode the Floating-Point Multiply Add/ Subtract instructions.

| Opcode  | Variation | Size <sup>†12</sup> | Operation                         |
|---------|-----------|---------------------|-----------------------------------|
| FMADDs  | 00        | 01                  | Multiply-Add Single               |
| FMADDd  | 00        | 10                  | Multiply-Add Double               |
| FMSUBs  | 01        | 01                  | Multiply-Subtract Single          |
| FMSUBd  | 01        | 10                  | Multiply-Subtract Double          |
| FNMSUBs | 10        | 01                  | Negative Multiply-Subtract Single |
| FNMSUBd | 10        | 10                  | Negative Multiply-Subtract Double |
| FNMADDs | 11        | 01                  | Negative Multiply-Add Single      |
| FNMADDd | 11        | 10                  | Negative Multiply-Add Double      |

1.For an instruction with size = 00, see Section A.24.4, *Integer Multiply-Add*. 2.11 is reserved for quad precision.

Format (5)

| 10    | rd |       | 110111 | rs1   | rs3  | var | size | rs2 |   |
|-------|----|-------|--------|-------|------|-----|------|-----|---|
| 31 30 | 29 | 25 24 | 19     | 18 14 | 13 9 | 8 7 | 6 5  | 4   | 0 |

| Operation                  | Implementation                        |
|----------------------------|---------------------------------------|
| Multiply-Add               | $rd \leftarrow rs1 \times rs2 + rs3$  |
| Multiply-Subtract          | $rd \leftarrow rs1 \times rs2 - rs3$  |
| Negative Multiply-Subtract | $rd \leftarrow -rs1 \times rs2 + rs3$ |
| Negative Multiply-Add      | $rd \leftarrow -rs1 \times rs2 - rs3$ |

| Assembly Lang | juage Syntax                                                                         |
|---------------|--------------------------------------------------------------------------------------|
| fmadds        | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fmaddd        | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fmsubs        | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fmsubd        | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fnmadds       | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fnmaddd       | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fnmsubs       | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fnmsubd       | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |

Description The Floating-point Multiply-Add instructions multiply the register(s) specified by the rs1 field times the register(s) specified by the rs2 field, add that product to the register(s) specified by the rs3 field, then write the result into the register(s) specified by the rd field.

The Floating-point Multiply-Subtract instructions multiply the register(s) specified by the rs1 field times the register(s) specified by the rs2 field, subtract from that product the register(s) specified by the rs3 field, and then write the result into the register(s) specified by the rd field.

The Floating-point Negative Multiply-Add instructions multiply the register(s) specified by the rs1 field times the register(s) specified by the rs2 field, *negate* the product, *subtract* from that negated value the register(s) specified by the rs3 field, and then write the result into the register(s) specified by the rd field.

The Floating-point Negative Multiply-Subtract instructions multiply the register(s) specified by the rs1 field times the register(s) specified by the rs2 field, *negate* the product, *add* that negated product to the register(s) specified by the rs3 field, and then write the result into the register(s) specified by the rd field.

The instruction is treated as fused multiply and add/subtract operations on SPARC64 VII. That is, a multiply operation is first performed with infinite precision without a rounding step, and then an add/subtract operation is performed with a complete rounding step. Consequently, at most one rounding error could be incurred.

**Programming Note** – SPARC64 V treats the instruction as separate multiply and add/ subtract operations. That is, a multiply operation is first performed with a complete rounding step (as if it were a single multiply operation), and then an add/subtract operation is performed with a complete rounding step (as if it were a single add/subtract operation). Consequently, at most two rounding errors could be incurred. Also fnmadd and fnmsub behavior with rs1=NaN or rs2=NaN is different between SPARC64 V and SPARC64 VII. SPARC64 VII outputs one of the NaN inputs as it is, while SPARC64 V outputs the one with the sign bit inverted.

The behavior of SPARC64 VII in handling traps in Floating-point Multiply-Add/Subtract instructions is described in TABLE A-2. If a trapping *invalid* exception or a denormal source operand with FSR.NS=1 is detected in the multiply part in the process of a Floating-point Multiply-Add/Subtract instruction, the execution of the instruction is aborted, the exception condition is recorded in FSR.cexc, the aexc is not modified, and the CPU traps with the exception condition. The add/subtract part of the instruction is only performed when the multiply-part of the instruction does not have a trapping *invalid* exception.

If there are trapping IEEE754 exception conditions in the add/subtract part, only the trapping exception condition is recorded in the cexc, and the aexc is not modified. If there are no trapping IEEE754 exception conditions, nontrapping exception condition of the add/subtract part is written into the cexc and the cexc is accumulated into the aexc. The boundary

conditions of an *unfinished\_FPop* trap for Floating-point Multiply-Add/Subtract instructions are the same as the FMUL boundary conditions for the source operand 1 and 2, and the same as the FADD ones for the source operand 3 and the destination.

| TABLE A-2 | IEEE754 Exceptions in | Floating-Point M | Iultiply-Add/Subtract Instructions |
|-----------|-----------------------|------------------|------------------------------------|
|           |                       |                  |                                    |

| FMUL | IEEE754 trap ( <i>inv</i> or <i>nx</i> only) | No trap                     | No trap                                     |
|------|----------------------------------------------|-----------------------------|---------------------------------------------|
| FADD | _                                            | IEEE754 trap                | No trap                                     |
| cexc | Exception condition of FMUL                  | Exception condition of FADD | Nontrapping exception conditions of FADD    |
| aexc | No change                                    | No change                   | Logical OR of the cexc (above) and the aexc |

Detailed contents of cexc depending on the various conditions are described in TABLE A-3 and TABLE A-4. The following terminology is used: uf, of, inv, and nx are nontrapping IEEE exception conditions—underflow, overflow, invalid operation, and inexact, respectively.

**TABLE A-3** Non-Trapping cexc When FSR.NS = 0

|           |      | FADD |       |     |  |  |  |  |  |  |
|-----------|------|------|-------|-----|--|--|--|--|--|--|
|           | none | nx   | of nx | inv |  |  |  |  |  |  |
| FMUL none | none | nx   | of nx | inv |  |  |  |  |  |  |
| inv       | inv  | —    | —     | inv |  |  |  |  |  |  |

|           |      |    | FAD   | D     |        |
|-----------|------|----|-------|-------|--------|
|           | none | nx | of nx | uf nx | inv    |
| FMUL none | none | nx | of nx | uf nx | inv    |
| inv       | inv  | —  | _     | _     | inv    |
| nx        | nx   | nx | of nx | uf nx | inv nx |

In the tables, the conditions with "—" do not exist.

**Programming Note** – The Floating-point Multiply-Add instructions are encoded in the SPARC V9 IMPDEP2 opcode space, and they are specific to the SPARC64 VII implementation. They *cannot* be used in any programs that will be executed on any other SPARC V9 processor, unless that implementation exactly matches the SPARC64 VII use of the IMPDEP2 opcode.

*Exceptions* fp\_disabled

fp\_exception\_ieee\_754 (NV, NX, OF, UF)
illegal\_instruction (size = 11<sub>2</sub>) (fp\_disabled is not checked for these encoding)
For an exception of size = 00<sub>2</sub>, see Section A.24.4, Integer Multiply-Add.
fp\_exception\_other (unfinished\_FPop)

## A.24.2 Suspend

| opcode               | opf         | operation        |
|----------------------|-------------|------------------|
| SUSPEND <sup>P</sup> | 0 1000 0010 | suspend a thread |

Format (3)

| 10    | —    |       | 110110 | —    |      | opf |   | — |   |
|-------|------|-------|--------|------|------|-----|---|---|---|
| 31 30 | 0 29 | 25 24 | 19     | 18 1 | 4 13 | 5   | 4 | 0 | ) |

| Assembly Language Syntax |  |
|--------------------------|--|
| suspend                  |  |

*Description* The instruction puts the thread executed it into the SUSPENDED state. The instruction sets PSTATE. IE to "1". Exit conditions from the SUSPENDED state are:

- POR,WDR,XIR
- interrupt\_vector trap
- interrupt\_level\_n trap

*Exceptions:* privileged\_opcode

## A.24.3 Sleep

| opcode | opf         | operation             |
|--------|-------------|-----------------------|
| SLEEP  | 0 1000 0011 | put a thread to sleep |

Format (3)

| 10   |       | _  |      | 110110 |    | _  |    | opf |   | - |   |
|------|-------|----|------|--------|----|----|----|-----|---|---|---|
| 31 3 | 30 29 | 25 | 5 24 | 19     | 18 | 14 | 13 |     | 5 | 4 | 0 |

Assembly Language Syntax sleep

*Description* The instruction puts the thread executed it to sleep. Conditions to wake up are:

- POR,WDR,XIR
- interrupt\_vector trap
- interrupt\_level\_n trap
- After a certain period, where the period is implementation-dependent. The value of SPARC64 VII is about 1.6 micro-seconds. The period is measured by clock to SPARC64 VII; and the same clock is used to increment STICK.
- An update of a LBSY assigned to any of ASI\_LBSYs of the thread. An update of a LBSY that is *not* assigned to ASI\_LBSY does not wake up the thread.

**Note** – When the instruction is executed with PSTATE.IE=0, the thread will not wake up even if there is an *interrupt\_vector*.

**Implementation Note** – If a LBSY is updated and a hardware thread that uses the LBSY does not sleep, the next sleep instruction may not put the thread into sleep.

If a given thread (A) executes the SLEEP instruction while the other thread (B) in the same core is already in the sleep state, then the thread (A) is relegated to the sleep state and the thread (B) wakes up instead.

*Exceptions:* None

## A.24.4 Integer Multiply-Add

SPARC64 VII uses IMPDEP2A opcode space to encode the Integer Multiply-Add instructions.

| Opcode    | Variation | Size <sup>1</sup> | Operation                                      |
|-----------|-----------|-------------------|------------------------------------------------|
| FPMADDX   | 00        | 00                | Unsigned Integer Multiply-Add for lower 8-byte |
| FPMADDXHI | 01        | 00                | Unsigned Integer Multiply-Add for upper 8-byte |

1. For an instruction with size = 01, 10 and 11, see Section A.24.1, *Floating-Point Multiply-Add/Sub-tract*.

#### Format (5)

| 10    |    | rd |    | 110111 |      | rs1 |    |    | rs3 |   | var |   | size |   |   | rs2 |   |
|-------|----|----|----|--------|------|-----|----|----|-----|---|-----|---|------|---|---|-----|---|
| 31 30 | 29 | 25 | 24 | 19     | 9 18 | 8   | 14 | 13 |     | 9 | 8   | 7 | 6    | 5 | 4 |     | 0 |

| Assembly Language Syntax |                                                                                      |
|--------------------------|--------------------------------------------------------------------------------------|
| fpmaddx                  | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |
| fpmaddxhi                | freg <sub>rs1</sub> , freg <sub>rs2</sub> , freg <sub>rs3</sub> , freg <sub>rd</sub> |

# *Description* The Integer Multiply-Add instruction performs fused multiply and add instruction on the data in double-precision floating-point registers that contains unsigned 8-byte integer values.

FPMADDX multiplies the register specified by the rs1 field and the rs2 field, adds that product to the register specified by the rs3 field, then writes the lower 8-byte result into the register specified by the rd field. rs1, rs2 and rs3 all contain unsigned 8-byte integer values.

FPMADDXHI multiplies the register specified by the rs1 field and the rs2 field, adds that product to the register specified by the rs3 field, then writes the upper 8-byte result into the register specified by the rd field. rs1, rs2 and rs3 all contain unsigned 8-byte integer values.

FPMADDX and FPMADDXHI never alter any bit of % fsr.

Although FPMADDX and FPMADDXHI are IMPDEP2 instructions, they are not counted by *Impdep2\_instruction* performance counter. See Section Q.2.1, *Instruction and trap Statistics*, on page 222 for detail.

#### *Exceptions:* fp\_disabled

*illegal\_instruction* (var =  $10_2$  or  $11_2$ ) For an exception of size =  $01_2$ ,  $10_2$ , or  $11_2$ , see Section A.24.1, *Floating-Point Multiply-Add/ Subtract.* 

## A.25 Jump and Link

SPARC64 VII clears the upper 32 bits of the PC value in r[rd] when PSTATE. AM is set (impl. dep. #125). The value written into r[rd] is visible to the instruction in the delay slot.

If either of the low-order two bits of the jump address is nonzero, a *mem\_address\_not\_aligned* exception occurs. However, when the JMPL instruction causes a *mem\_address\_not\_aligned* trap, DSFSR and DSFAR are not updated (impl. dep. #237).

If the JMPL instruction has r [rd] = 15, SPARC64 VII stores PC + 8 in a hardware table called the return address stack (RAS). When a RET (jmpl %i7+8, %g0) or RETL (jmpl %o7+8, %g0) is executed, the value in the RAS is used to predict the return address.

JMPL with rd = 0 can be used to return from a subroutine. The typical return address is "r[31] + 8" if a non leaf routine (one that uses the SAVE instruction) is entered by a CALL instruction, or "r[15] + 8" if a leaf routine (one that does not use the SAVE instruction) is entered by a CALL instruction or by a JMPL instruction with rd = 15.

## A.30 Load Quadword, Atomic [Physical]

The Load Quadword ASIs in this section are specific to SPARC64 VII, as an extension to SPARC JPS1.

| opcode | imm_asi             | ASI value        | operation                                                |
|--------|---------------------|------------------|----------------------------------------------------------|
| LDDA   | ASI_QUAD_LDD_PHYS   | 34 <sub>16</sub> | 128-bit atomic load, physically addressed                |
| LDDA   | ASI_QUAD_LDD_PHYS_L | 3C <sub>16</sub> | 128-bit atomic load, little-endian, physically addressed |

#### Format (3) LDDA

| 11    | rd    | 010011 | rs1 i=0  | ) imm_asi | rs2 |
|-------|-------|--------|----------|-----------|-----|
| 11    | rd    | 010011 | rs1 i=1  | simm_13   |     |
| 31 30 | 29 25 | 24 19  | 18 14 13 | 5         | 4 0 |

| Assembly Language Syntax |                                        |  |
|--------------------------|----------------------------------------|--|
| ldda                     | [reg_addr] imm_asi, reg <sub>rd</sub>  |  |
| ldda                     | [reg_plus_imm] %asi, reg <sub>rd</sub> |  |

# **Description** ASIs $34_{16}$ and $3C_{16}$ are used with the LDDA instruction to atomically read a 128-bit data item, using physical addressing. The data are placed in an even/odd pair of 64-bit registers. The lower-addressed 64 bits are placed in the even-numbered register; the higher-addressed 64 bits are placed in the odd-numbered register. The reference is made from the nucleus context.

In addition to the usual traps for LDDA using a privileged ASI, a *data\_access\_exception* exception occurs for a noncacheable access or for the use of the quadword-load ASIs with any instruction other than LDDA. A *mem\_address\_not\_aligned* exception is generated if the access is not aligned on a 16-byte boundary.

ASIs  $34_{16}$  and  $3C_{16}$  are supported in SPARC64 VII in addition to those for Load Quadword Atomic for virtually addressed data (ASIs  $24_{16}$  and  $2C_{16}$ ).

The memory access for a load quad instruction with ASI\_QUAD\_LDD\_PHYS{\_L} behaves as if the following TTE are set:

- TTE.NFO = 0
- TTE.CP = 1
- TTE.CV = 0

TTE.E = 0
 TTE.P = 1
 TTE.W = 0

**Note** – TTE. IE depends on the endianness of the ASI. When the ASI is  $034_{16}$ , TTE. IE = 0; TTE. IE = 1 when the ASI is  $03C_{16}$ .

Therefore, the atomic quad load physical instruction can only be applied to a cacheable memory area. Semantically, ASI\_QUAD\_LDD\_PHYS{\_L} (034\_{16} and 03C\_{16}) is a combination of ASI\_NUCLEUS\_QUAD\_LDD and ASI\_PHYS\_USE\_EC.

With respect to little endian memory, a Load Quadword Atomic instruction behaves as if it comprises two 64-bit loads, each of which is byte-swapped independently before being written into its respective destination register.

## *Exceptions:* privileged\_action

PA\_watchpoint (recognized on only the first 8 bytes of a transfer) illegal\_instruction (misaligned rd) mem\_address\_not\_aligned data\_access\_exception data\_access\_error fast\_data\_access\_MMU\_miss fast\_data\_access\_protection

# A.35 Memory Barrier

Format (3)



Description The memory barrier instruction, MEMBAR, has two complementary functions: to express order constraints between memory references and to provide explicit control of memory-reference completion. The membar\_mask field in the suggested assembly language is the concatenation of the cmask and mmask instruction fields.

The mmask field is encoded in bits 3 through 0 of the instruction. TABLE A-5 specifies the order constraint that each bit of mmask (selected when set to 1) imposes on memory references appearing before and after the MEMBAR. From zero to four mask bits can be selected in the mmask field.

| Mask Bit             | Name       | Description                                                                                                                                                                                                                                                                                                   |  |
|----------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| mmask<3> #StoreStore |            | The effects of all stores appearing before the MEMBAR instruction must be visible to all processors before the effect of any stores following the MEMBAR. Equivalent to the deprecated STBAR instruction. Has no effect on SPARC64 VII since all stores are performed in program order.                       |  |
| mmask<2>             | #LoadStore | All loads appearing before the MEMBAR instruction must have been performed before<br>the effects of any stores following the MEMBAR are visible to any other processor. This<br>has no effect on SPARC64 VII since all stores are performed in program order and<br>must occur after performance of any load. |  |
| mmask<1>             | #StoreLoad | The effects of all stores appearing before the MEMBAR instruction must be visible to all processors before loads following the MEMBAR may be performed.                                                                                                                                                       |  |
| mmask<0>             | #LoadLoad  | All loads appearing before the MEMBAR instruction must have been performed before<br>any loads following the MEMBAR may be performed. This has no effect on<br>SPARC64 VII since all loads are performed after any prior loads.                                                                               |  |

TABLE A-5 Order Constraints Imposed by mmask Bits

The cmask field is encoded in bits 6 through 4 of the instruction. Bits in the cmask field, described in TABLE A-6, specify additional constraints on the order of memory references and the processing of instructions. If cmask is zero, then MEMBAR enforces the partial ordering specified by the mmask field; if cmask is nonzero, then completion and partial order constraints are applied.

| Mask Bit | Function                   | Name       | Description                                                                                                                                                                                                                         |
|----------|----------------------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| cmask<2> | Synchronization<br>barrier | #Sync      | All operations (including nonmemory reference operations)<br>appearing before the MEMBAR must have been performed, and the<br>effects of any exceptions become visible before any instruction after<br>the MEMBAR may be initiated. |
| cmask<1> | Memory issue<br>barrier    | #MemIssue  | All memory reference operations appearing before the MEMBAR must<br>have been performed before any memory operation after the<br>MEMBAR may be initiated. Equivalent to #Sync in SPARC64 VII.                                       |
| cmask<0> | Lookaside<br>barrier       | #Lookaside | A store appearing before the MEMBAR must complete before any load following the MEMBAR referencing the same address can be initiated. Equivalent to #Sync in SPARC64 VII.                                                           |

TABLE A-6 Bits in the cmask Field

## A.42 Partial Store (VIS I)

Please refer A.42 in Commonality for general details.

Watchpoint exceptions on partial store instructions occur conservatively on SPARC64 VII. The DCUCR Data Watchpoint masks are only checked for nonzero value (watchpoint enabled). The byte store mask (r[rs2]) in the partial store instruction is ignored, and a watchpoint exception can occur even if the mask is zero (that is, no store will take place) (impl. dep. #249).

**Implementation Note** – For a partial store instruction to a noncacheable area with mask = 0, SPARC64 VII still issues a Jupiter Bus transaction with zero-byte mask.

Exceptions:

fp\_disabled PA\_watchpoint VA\_watchpoint illegal\_instruction (i = 1) mem\_address\_not\_aligned (see Partial Store ASIs on page 140) data\_access\_exception (see Partial Store ASIs on page 140) LDDF\_mem\_address\_not\_aligned (see Partial Store ASIs on page 140) data\_access\_error fast\_data\_access\_MMU\_miss fast\_data\_access\_protection

# A.48 Population Count

| opcode | op3     | operation        |  |
|--------|---------|------------------|--|
| POPC   | 10 1110 | Population Count |  |

Format (3)

| 10 | )     | rd | op3   | 0 0000 | i=0 | _      | rs | s2 |
|----|-------|----|-------|--------|-----|--------|----|----|
|    | 1     |    |       |        | 1   |        |    |    |
| 10 |       | rd | ор3   | 0 0000 | i=1 | simm13 |    |    |
| 31 | 30 29 | 25 | 24 19 | 18 14  | 13  | 5      | 4  | 0  |

| Assembly Language Syntax |                   |  |
|--------------------------|-------------------|--|
| рорс                     | reg_or_imm, regrd |  |

**Description** POPC counts the number of one bits in r[rs2] if i = 0, or the number of one bits in sign\_ext(simm13) if i = 1, and stores the count in r[rd]. This instruction does not modify the condition codes.

Note – Unlike SPARC64 V, SPARC64 VII implements the instruction in hardware.

*Exceptions:* illegal\_instruction (instruction<18:14>  $\neq$  0)

# A.49 Prefetch Data

Please refer to Section A.49, Prefetch Data, of Commonality for principal information.

The prefetcha instruction of SPARC64 VII works for the following ASIs.

- ASI\_PRIMARY (080<sub>16</sub>), ASI\_PRIMARY\_LITTLE (088<sub>16</sub>)
- ASI\_SECONDARY (081<sub>16</sub>), ASI\_SECONDARY\_LITTLE (089<sub>16</sub>)
- ASI\_NUCLEUS (04<sub>16</sub>), ASI\_NUCLEUS\_LITTLE (0C<sub>16</sub>)
- ASI\_PRIMARY\_AS\_IF\_USER (010<sub>16</sub>), ASI\_PRIMARY\_AS\_IF\_USER\_LITTLE (018<sub>16</sub>)
- ASI\_SECONDARY\_AS\_IF\_USER (011<sub>16</sub>), ASI\_SECONDARY\_AS\_IF\_USER\_LITTLE (019<sub>16</sub>)

If an ASI other than the above is specified, prefetcha is executed as a nop.

TABLE A-7 describes prefetch variants implemented in SPARC64 VII.

| fcn   | Fetch to:                 | Status | Description                                 |
|-------|---------------------------|--------|---------------------------------------------|
| 0     | L1D                       | S,E    |                                             |
| 1     | L2                        | S,E    |                                             |
| 2     | L1D                       | M,E    |                                             |
| 3     | L2                        | M,E    |                                             |
| 4     | _                         | _      | NOP                                         |
| 5-15  | reserved (SPARC V9)       |        | illegal_instruction exception is signalled. |
| 16-19 | implementat<br>dependent. | tion   | NOP                                         |
| 20    | LID                       | S,E    | Strong Prefetch                             |
| 21    | L2                        | S,E    | Strong Prefetch                             |
| 22    | L1D                       | M,E    | Strong Prefetch                             |
| 23    | L2                        | M,E    | Strong Prefetch                             |
| 24-31 | implementat<br>dependent  | tion   | NOP                                         |

TABLE A-7Prefetch Variants

## Strong Prefetch

A prefetch with fcn = 20, 21, 22 or 23 is defined as a Strong Prefetch. In SPARC64 VII, these prefetch are never lost in any case except a TLB miss and DCUCR.weak spca = 1.

**Programming Note** – While a not-strong prefetch sometimes loses due to lack of internal resources, a strong prefetch is firmly executed in these cases. This will cause a negative effect on subsequent loads and stores. Avoid using strong prefetch for unnecessary data.

SPARC64 VII does not cause a *fast\_data\_access\_MMU\_miss* miss on fcn = 20, 21, 22 or 23 (impl. dep. #103(2)).

# A.51 Read State Register

In SPARC64 VII, an RDPCR instruction will generate a *privileged\_action* exception if PSTATE.PRIV = 0 and PCR.PRIV = 1. If PSTATE.PRIV = 0 and PCR.PRIV = 0, RDPCR will not cause any access privilege violation exceptions (impl. dep. #250).

# A.59 SHUTDOWN (VIS I)

In SPARC64 VII, SHUTDOWN acts as a NOP in privileged mode (impl. dep. #206).

# A.70 Write State Register

In SPARC64 VII, a WRPCR instruction will cause a *privileged\_action* exception if PSTATE.PRIV = 0 and PCR.PRIV = 1. If PSTATE.PRIV = 0 and PCR.PRIV = 0, WRPCR causes a *privileged\_action* exception only when an attempt is made to change (that is, write 1 to) PCR.PRIV (impl. dep. #250).

# A.71 Deprecated Instructions

The deprecated instructions in A.71 of **Commonality** are provided only for compatibility with previous versions of the architecture. They should not be used in new software.

## A.71.10 Store Barrier

In SPARC64 VII, STBAR behaves as NOP since the hardware memory models always enforce the semantics of these MEMBARs for all memory accesses.

# IEEE Std. 754-1985 Requirements for SPARC-V9

The IEEE Std. 754-1985 floating-point standard contains a number of implementation dependencies.

Please see Appendix B of **Commonality** for choices for these implementation dependencies, to ensure that SPARC V9 implementations are as consistent as possible.

Following is information specific to the SPARC64 VII implementation of SPARC V9 in these sections:

- Traps Inhibiting Results on page 77
- Floating-Point Nonstandard Mode on page 77

## **B.1** Traps Inhibiting Results

Please refer to Section B.1 of Commonality.

The SPARC64 VII hardware, in conjunction with kernel or emulation code, produces the results described in this section.

## B.6 Floating-Point Nonstandard Mode

In this section, the hardware boundary conditions for the *unfinished\_FPop* exception and the nonstandard mode of SPARC64 VII floating-point hardware are discussed.

SPARC64 VII floating-point hardware has its specific range of computation. If either the values of input operands or the value of the intermediate result shows that the computation may not fall in the range that hardware provides, SPARC64 VII generates an  $fp\_exception\_other$  exception ( $tt = 022_{16}$ ) with FSR.ftt =  $02_{16}$  (unfinished\_FPop) and the operation is taken over by software.

The kernel emulation routine completes the remaining floating-point operation in accordance with the IEEE 754-1985 floating-point standard (impl. dep. #3).

SPARC64 VII implements a nonstandard mode, enabled when FSR.NS is set (see  $FSR\_nonstandard\_fp$  (NS) on page 16). Depending on the setting in FSR.NS, the behavior of SPARC64 VII with respect to the floating-point computation varies.

## B.6.1 *fp\_exception\_other* Exception (ftt=*unfinished\_FPop*)

SPARC64 VII may invoke an  $fp\_exception\_other$  (tt = 022<sub>16</sub>) exception with FSR.ftt = unfinished\_FPop (ftt = 02<sub>16</sub>) in FsTOd, FdTOs, FADD(s, d), FSUB(s, d), FsMULd(s, d), FMUL(s, d), FDIV(s, d), FSQRT(s, d) floating-point instructions. In addition, Floating-point Multiply-Add/Subtract instructions generate the exception, since the instruction is the combination of a multiply and an add/subtract operation: FMADD(s,d), FMSUB(s,d), FNMADD(s,d), and FNMADD(s,d).

The following basic policies govern the detection of boundary conditions:

- 1. When one of the operands is a denormalized number and the other operand is a normal non-zero floating-point number (except for a NaN or an infinity), an *fp\_exception\_other* with *unfinished\_FPop* condition is signalled. The cases in which the result is a zero or an overflow are excluded.
- 2. When all operands are denormalized numbers, except for the cases in which the result is a zero or an overflow, an *fp\_exception\_other* with *unfinished\_FPop* condition is signalled.
- 3. When all operands are normal, the result before rounding is a denormalized number and TEM.UFM = 0, and *fp\_exception\_other* with *unfinished\_FPop* condition is signalled, except for the cases in which the result is a zero.

When the result is expected to be a constant, such as an exact zero or an infinity, and an insignificant computation will furnish the result, SPARC64 VII tries to calculate the result without signalling an *unfinished\_FPop* exception.

**Implementation Note** – Detecting the exact boundary conditions requires a large amount of hardware. To avoid from such hardware cost, SPARC64 VII detects approximate boundary conditions by calculating the exponent intermediate result (the exponent before rounding) from input operands. Since the computation of the boundary conditions is approximate, the detection of a zero result or an overflow result will be pessimistic. SPARC64 VII generates an *unfinished\_FPop* exception pessimistically.

The equations to calculate the result exponent to detect the boundary conditions from the input exponents are presented in TABLE B-1, where Er is the approximation of the biased result exponent before rounding and is calculated only from the input exponents (esrc1, esrc2). Er is to be used for detecting the boundary condition for an *unfinished\_FPop*.

 TABLE B-1
 Result Exponent Approximation for Detecting unfinished\_FPop Boundary Conditions

| Operation | Formula                   |  |
|-----------|---------------------------|--|
| fmuls     | Er = esrc1 + esrc2 - 126  |  |
| fmuld     | Er = esrc1 + esrc2 - 1022 |  |
| fdivs     | Er = esrc1 - esrc2 + 126  |  |
| fdivd     | Er = esrc1 - esrc2 + 1022 |  |

esrc1 and esrc2 are the biased exponents of the input operands. When the corresponding input operand is a denormalized number, the value is 0.

From Er, eres is calculated. eres is a biased result exponent, after mantissa alignment and before rounding, where the appropriate adjustment of the exponent is applied to the result mantissa: left-shifting or right-shifting the mantissa to the implicit 1 at the left of the binary point, subtracting or adding the shift-amount to the exponent. The result mantissa is assumed to be 1.xxxx in calculating eres. If the result is a denormalized number, eres is less than zero.

TABLE B-2 describes the boundary condition of each floating-point instruction that generates an *unfinished\_FPop* exception.

| Operation                     | Boundary Conditions                                                                                                                                                 |
|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| FdTOs                         | -25 < eres < 1 and TEM.UFM = 0.                                                                                                                                     |
| FsTOd                         | Second operand (rs2) is a denormalized number.                                                                                                                      |
| FADDS, FSUBS, FADDd,<br>FSUBd | 1. One of the operands is a denormalized number, and the other operand is a normal, nonzero floating-point number (except for a NaN and an infinity) <sup>1</sup> . |
|                               | 2. Both operands are denormalized numbers.                                                                                                                          |
|                               | 3. Both operands are normal nonzero floating-point numbers (except for a NaN and an infinity), eres < 1, and $TEM.UFM = 0$ .                                        |

TABLE B-2 unfinished\_FPop Boundary Conditions

| Operation                                                                   | Boundary Conditions                                                                                                                                                                                                                                                    |  |  |
|-----------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| FMULs, FMULd                                                                | <ol> <li>One of the operands is a denormalized number, the other operand is a normal, nonzero floating-point number (except for a NaN and an infinity), and         single precision: -25 &lt; Er         double precision: -54 &lt; Er</li> </ol>                     |  |  |
|                                                                             | <ul> <li>2. Both operands are normal, nonzero floating-point numbers (except for a NaN and an infinity), TEM.UFM = 0, and single precision: -25 &lt; eres &lt; 1 double precision: -54 &lt; eres &lt; 1</li> </ul>                                                     |  |  |
| FsMULd                                                                      | 1. One of the operands is a denormalized number, and the other operand is a normal, nonzero floating-point number (except for a NaN and an infinity).                                                                                                                  |  |  |
|                                                                             | 2. Both operands are denormalized numbers.                                                                                                                                                                                                                             |  |  |
| FDIVs, FDIVd                                                                | <ol> <li>The dividend (operand1; rs1) is a normal, nonzero floating-point<br/>number (except for a NaN and an infinity), the divisor (operand2; rs2) is<br/>a denormalized number, and<br/>single precision: Er &lt; 255<br/>double precision: Er &lt; 2047</li> </ol> |  |  |
|                                                                             | <ul> <li>2. The dividend (operand1; rs1) is a denormalized number, the divisor (operand2; rs2) is a normal, nonzero floating-point number (except for a NaN and an infinity), and single precision: -25 &lt; Er double precision: -54 &lt; Er</li> </ul>               |  |  |
|                                                                             | 3. Both operands are denormalized numbers.                                                                                                                                                                                                                             |  |  |
|                                                                             | <ul> <li>4. Both operands are normal, nonzero floating-point numbers (except for a NaN and an infinity), TEM.UFM = 0 and single precision: -25 &lt; eres &lt; 1 double precision: -54 &lt; eres &lt; 1</li> </ul>                                                      |  |  |
| FSQRTS, FSQRTd                                                              | The input operand (operand2; rs2) is a positive nonzero and is a denormalized number.                                                                                                                                                                                  |  |  |
| FMADDs, FMADDd,<br>FMSUBs, FMSUBd,<br>FNMADDs, FNMADDd,<br>FNMSUBs, FNMSUBd | Same as FMULS, FMULd for multiplication part, and same as FADDS, FSUBS, FADDd, FSUBd for addition/subtraction part.                                                                                                                                                    |  |  |

 TABLE B-2
 unfinished\_FPop Boundary Conditions (Continued)

1.Operation of zero and denormalized number generates a result in accordance with the IEEE754-1985 standard.

## Pessimistic Zero

If a condition in TABLE B-3 is true, SPARC64 VII generates the result as a pessimistic zero, meaning that the result is a minimum denormalized number or a zero, depending on the rounding mode (FSR.RD).

|                 |                                                                  | Conditions            |                                                                      |
|-----------------|------------------------------------------------------------------|-----------------------|----------------------------------------------------------------------|
| Operations      | One operand is denormalized <sup>1</sup>                         | Both are denormalized | Both are normal fp-number <sup>2</sup>                               |
| FdTOs           | always                                                           | —                     | $eres \leq -25$                                                      |
| FMULs,<br>FMULd | single precision: $Er \le -25$<br>double precision: $Er \le -54$ | Always                | single precision: $eres \le -25$<br>double precision: $eres \le -54$ |
| FDIVs,<br>FDIVd | single precision: $Er \le -25$<br>double precision: $Er \le -54$ | Never                 | single precision: $eres \le -25$<br>double precision: $eres \le -54$ |

 TABLE B-3
 Conditions for a Pessimistic Zero

1.Both operands are non-zero, non-NaN, and non-infinity numbers.

2.Both may be zero, but both are non-NaN and non-infinity numbers.

## Pessimistic Overflow

If a condition in TABLE B-4 is true, SPARC64 VII regards the operation as having an overflow condition.

TABLE B-4 Pessimistic Overflow Conditions

| Operations | Conditions                                                               |
|------------|--------------------------------------------------------------------------|
| FDIVs      | The divisor (operand2; rs2) is a denormalized number and, $Er \ge 255$ . |
| FDIVd      | The divisor (operand2; rs2) is a denormalized number and, $E \ge 2047$ . |

## B.6.2 Operation Under FSR.NS = 1

When FSR.NS = 1 (nonstandard mode), SPARC64 VII zeroes all the input denormalized operands before the operation and signals an inexact exception if enabled. If the operation generates a denormalized result, SPARC64 VII zeroes the result and also signals an inexact exception if enabled. The following list defines the operation in detail.

- If either operand is a denormalized number and both operands are non-zero, non-NaN, and non-infinity numbers, the input denormalized operand is replaced with a zero with same sign, and the operation is performed. If enabled, an inexact exception is signalled; an *fp\_exception\_ieee\_754* (tt = 021<sub>16</sub>) is generated, with nxc=1 in FSR.cexc (FSR.ftt=01<sub>16</sub>; *IEEE754\_exception*). However, if the operation is FDIV(s, d) and either a *division\_by\_zero* or an *invalid\_operation* condition is detected, the inexact condition is not reported.
- If the result before rounding is a denormalized number, the result is flushed to a zero with the same sign and signals either an underflow exception or an inexact exception, depending on FSR.TEM.

As observed from the preceding, when FSR.NS = 1, SPARC64 VII generates neither an *unfinished\_FPop* exception nor a denormalized number as a result. TABLE B-5 summarizes the behavior of SPARC64 VII floating-point hardware depending on FSR.NS.

**Note** – The result and behavior of SPARC64 VII of the shaded column in the tables Table B-5 and Table B-6 conform to IEEE754-1985 standard.

**Note** – Throughout Table B-5 and Table B-6, lowercase exception conditions such as nx, uf, of, dv and nv are nontrapping IEEE 754 exceptions. Uppercase exception conditions such as NX, UF, OF, DZ and NV are trapping IEEE 754 exceptions.

| FSR.NS | Input<br>Denorm <sup>1</sup> | Result<br>Denorm <sup>2</sup> | Pessimistic<br>Zero | Pessimistic<br>Overflow | UFM | OFM | NXM | Result                                                      |
|--------|------------------------------|-------------------------------|---------------------|-------------------------|-----|-----|-----|-------------------------------------------------------------|
|        |                              |                               |                     |                         | 1   | _   | _   | UF                                                          |
|        |                              |                               | Yes                 |                         |     | _   | 1   | NX                                                          |
|        | No                           | Yes                           | 105                 |                         | 0   | _   | 0   | uf + nx, a signed zero, or a signed Dmin <sup>3</sup>       |
|        |                              |                               | No                  |                         | 1   | _   | _   | UF                                                          |
|        |                              |                               | NO                  | _                       | 0   |     | —   | unfinished_FPop <sup>4</sup>                                |
|        |                              | No                            | _                   | _                       | _   |     | _   | Conforms to IEEE754-1985                                    |
| 0      | )                            |                               | Yes                 | _                       | 1   |     | _   | UF                                                          |
|        |                              |                               |                     |                         | 0   |     | 1   | NX                                                          |
|        |                              |                               |                     |                         |     |     | 0   | uf + nx, a signed zero, or a signed Dmin                    |
|        | Yes                          | n/a                           |                     | Yes                     | _   | 1   |     | OF                                                          |
|        | 105                          | n/ u                          |                     |                         |     |     | 1   | NX                                                          |
|        |                              |                               | No                  |                         |     | 0   | 0   | of $+$ nx, a signed infinity, or a signed Nmax <sup>5</sup> |
|        |                              |                               |                     | No                      | _   | _   |     | unfinished_FPop                                             |
|        |                              |                               |                     |                         | 1   |     | _   | UF                                                          |
|        | No                           | Yes                           | —                   | —                       | 0   |     | 1   | NX                                                          |
| 1      | INO                          |                               |                     |                         | 0   |     | 0   | uf + nx, a signed zero                                      |
|        |                              | No                            | _                   | _                       | _   | _   | _   | Conforms to IEEE754-1985                                    |
|        | Yes                          |                               |                     |                         | —   |     | —   | see TABLE B-6                                               |

TABLE B-5 Floating-Point Exceptional Conditions and Results

1. One of the operands is a denormalized number, and the other operand is a normal or a denormalized number (non-zero, non-NaN, and non-infinity).

2. The result before rounding turns out to be a denormalized number.

3.Dmin = denormalized minimum.

4.If the FPop is either FADD{s,d}, or FSUB{s,d} and the operands are zero and a denormalized number, SPARC64 VII does not generate an *unfinished\_FPop* and generates a result according to IEEE754-1985 standard.

5.Nmax = normalized maximum.

|                            |           | Type of Value |          |     | FSR. | TEM |     |                                   |
|----------------------------|-----------|---------------|----------|-----|------|-----|-----|-----------------------------------|
| Operations                 | op1       | op2           | op3      | UFM | NXM  | DVM | NVM | Result                            |
| FsTOd                      |           | D.            |          |     | 1    | —   | —   | NX                                |
|                            |           | Denorm        | _        |     | 0    |     |     | nx, a signed zero                 |
| FdTOs                      |           |               |          | 1   |      |     |     | UF                                |
|                            | —         | Denorm        | —        | 0   | 1    |     |     | NX                                |
|                            |           |               |          | 0   | 0    |     |     | uf + nx, a signed zero            |
| FADDs,                     | Denorm    | Normal        |          |     | 1    |     |     | NX                                |
| FSUBs,                     | Denomi    | Normai        |          |     | 0    | —   | —   | nx, op2                           |
| FADDd,<br>FSUBd            | Normal    | Denorm        |          |     | 1    | —   | —   | NX                                |
|                            | INOTIIIai | Denomi        | _        |     | 0    |     |     | nx, op1                           |
|                            | Denorm    | Denorm        |          |     | 1    |     |     | NX                                |
|                            | Dellom    | Denomi        | _        |     | 0    |     |     | nx, a signed zero                 |
| FFMULs,                    | Denorm    |               |          |     | 1    |     |     | NX                                |
| FMULd,                     | Denorm    |               | _        |     | 0    |     |     | nx, a signed zero                 |
| FsMULd                     |           | Denorm        |          |     | 1    |     |     | NX                                |
|                            |           | Denomi        | _        |     | 0    |     |     | nx, a signed zero                 |
| FDIVs,                     | Danarm    | Normal        |          |     | 1    |     |     | NX                                |
| FDIVd                      | Denorm    | Normai        |          | -   | 0    |     |     | nx, a signed zero                 |
|                            | Normal    | Denorm        |          |     |      | 1   |     | DZ                                |
|                            |           |               |          |     |      | 0   |     | dz, a signed infinity             |
|                            | Denorm    | D             | —        |     |      |     | 1   | NV                                |
|                            | Denorm    | Denorm        |          |     |      |     | 0   | nv, dNaN <sup>1</sup>             |
| FSQRTs,                    |           | Denorm and    |          |     | 1    |     |     | NX                                |
| FSQRTd                     |           | op2 > 0       | _        |     | 0    |     |     | nx, zero                          |
|                            |           | Denorm and    |          |     |      |     | 1   | NV                                |
|                            |           | op2 < 0       | _        |     |      |     | 0   | nv, dNaN <sup>1</sup>             |
| FMADD{s,d}                 |           |               | No mu ol |     | 1    |     |     | NX                                |
| FMSUB{s,d}                 | Denemo    |               | Normal   | _   | 0    |     |     | nx, op3                           |
| FNMADD{s,d}<br>FNMSUB{s,d} | Denorm    | _             | Denemi   |     | 1    |     |     | NX                                |
|                            |           |               | Denorm   |     | 0    |     |     | nx, a signed zero                 |
|                            |           |               | Normal   |     | 1    |     |     | NX                                |
|                            |           | Donor         | Normal   |     | 0    |     |     | nx, op3                           |
|                            | —         | Denorm        | Dage     |     | 1    |     |     | NX                                |
|                            |           |               | Denorm   |     | 0    |     |     | nx, a signed zero                 |
|                            | Normal    | Normal        | Dans     |     | 1    |     |     | NX                                |
|                            | Normal    | Normal        | Denorm   |     | 0    | _   | _   | nx, op1 $\times$ op2 <sup>2</sup> |

TABLE B-6Non arithmetic Operations Under FSR.NS = 1

1.A single precision dNaN is 7FFF.FFFF $_{16}$ , and a double precision dNaN is 7FFF.FFFF.FFFF $_{16}$ .

2. When op1  $\times$  op2 falls into denormalized number, a zero with the same sign of op1  $\times$  op2 is returned as a result.

## Implementation Dependencies

This appendix summarizes implementation dependencies. In SPARC V9 and SPARC JPS1, the notation "**IMPL. DEP. #***nn*:" identifies the definition of an implementation dependency; the notation "(impl. dep. #*nn*)" identifies a reference to an implementation dependency. These dependencies are described by their number *nn* in TABLE C-1 on page 87. These numbers have been removed from the body of this document for SPARC64 VII to make the document more readable. TABLE C-1 has been modified to include descriptions of the manner in which SPARC64 VII has resolved each implementation dependency.

**Note** – SPARC International maintains a document, *Implementation Characteristics of Current SPARC-V9-based Products, Revision 9.x*, that describes the implementationdependent design features of all SPARC V9-compliant implementations. Contact SPARC International for this document at

> home page: www.sparc.org email: info@sparc.org

## C.1 Definition of an Implementation Dependency

Please refer to Section C.1 of Commonality.

## C.2 Hardware Characteristics

Please refer to Section C.2 of Commonality.

# C.3 Implementation Dependency Categories

Please refer to Section C.3 of Commonality.

# C.4 List of Implementation Dependencies

TABLE C-1 provides a complete list of how each implementation dependency is treated in the SPARC64 VII implementation.

 TABLE C-1
 SPARC64 VII Implementation Dependencies (1 of 11)

| Nbr   | SPARC64 VII Implementation Notes                                                                                                                                                                                                  | Page |
|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 1     | <b>Software emulation of instructions</b><br>The operating system emulates all instructions that generate <i>illegal_instruction</i> or <i>unimplemented_FPop</i> exceptions.                                                     | —    |
| 2     | Number of IU registers<br>SPARC64 VII supports eight register windows (NWINDOWS = 8).<br>SPARC64 VII supports an additional two global register sets (Interrupt globals and<br>MMU globals) for a total of 160 integer registers. | _    |
| 3     | Incorrect IEEE Std. 754-1985 results<br>See Section B.6, <i>Floating-Point Nonstandard Mode</i> for details.                                                                                                                      | 77   |
| 4–5   | Reserved.                                                                                                                                                                                                                         |      |
| 6     | <b>I/O registers privileged status</b><br>This dependency is beyond the scope of this publication. It should be defined in each system that uses SPARC64 VII.                                                                     | —    |
| 7     | <b>I/O register definitions</b><br>This dependency is beyond the scope of this publication. It should be defined in each system that uses SPARC64 VII.                                                                            | —    |
| 8     | RDASR/WRASR target registers<br>SPARC64 VII does not define implementation dependent ASR registers.                                                                                                                               | —    |
| 9     | RDASR/WRASR <b>privileged status</b><br>SPARC64 VII does not define implementation dependent ASR registers.                                                                                                                       | —    |
| 10-12 | Reserved.                                                                                                                                                                                                                         |      |
| 13    | VER.impl<br>VER.impl = 7 for the SPARC64 VII processor.                                                                                                                                                                           | 18   |
| 14–15 | Reserved.                                                                                                                                                                                                                         | _    |
| 16    | IU deferred-trap queue<br>SPARC64 VII neither has nor needs an IU deferred-trap queue.                                                                                                                                            | 22   |

 TABLE C-1
 SPARC64 VII Implementation Dependencies (2 of 11)

| Nbr   | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                                                                                                                                                                                                       | Page    |
|-------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|
| 17    | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                              |         |
| 18    | <b>Nonstandard IEEE 754-1985 results</b><br>SPARC64 VII flushes denormalized operands and results to zero when<br>FSR.NS = 1. For the treatment of denormalized numbers, please refer to<br>Section B.6, <i>Floating-Point Nonstandard Mode</i> for details.                                                                                                                                                                                                           | 16      |
| 19    | <b>FPU version,</b> FSR.ver<br>FSR.ver = 0 for SPARC64 VII.                                                                                                                                                                                                                                                                                                                                                                                                            | 16      |
| 20-21 | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                              |         |
| 22    | FPU TEM, cexc, and aexc<br>SPARC64 VII implements all bits in the TEM, cexc, and aexc fields in hardware.                                                                                                                                                                                                                                                                                                                                                              | 15      |
| 23    | Floating-point traps<br>In SPARC64 VII floating-point traps are always precise; no FQ is needed.                                                                                                                                                                                                                                                                                                                                                                       | 22      |
| 24    | <b>FPU deferred-trap queue (FQ)</b><br>SPARC64 VII neither has nor needs a floating-point deferred-trap queue.                                                                                                                                                                                                                                                                                                                                                         | 22      |
| 25    | RDPR of FQ with nonexistent FQ<br>Attempting to execute an RDPR of the FQ causes an <i>illegal_instruction</i> exception.                                                                                                                                                                                                                                                                                                                                              | 23      |
| 26–28 | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                              | _       |
| 29    | Address space identifier (ASI) definitions<br>The ASIs that are supported by SPARC64 VII are defined in Appendix L.                                                                                                                                                                                                                                                                                                                                                    | —       |
| 30    | ASI address decoding<br>SPARC64 VII decodes all 8bit of ASI specifier.                                                                                                                                                                                                                                                                                                                                                                                                 | —       |
| 31    | <b>Catastrophic error exceptions</b><br>SPARC64 VII contains a watchdog timer that times out after no instruction has<br>been committed for a specified number of cycles. If the timer times out, the CPU<br>tries to invoke an <i>async_data_error</i> trap. If the counter continues and reaches 2 <sup>33</sup> ,<br>the processor enters error_state. Upon an entry to error_state, the<br>processor optionally generates a WDR reset to recover from error_state. | 162     |
| 32    | <b>Deferred traps</b><br>SPARC64 VII signals a deferred trap in a few of its severe error conditions.<br>SPARC64 VII does not contain a deferred trap queue.                                                                                                                                                                                                                                                                                                           | 37, 171 |
| 33    | <b>Trap precision</b><br>There are no deferred traps in SPARC64 VII other than the trap caused by a few severe error conditions. All traps that occur as the result of program execution are precise.                                                                                                                                                                                                                                                                  | 37      |
| 34    | <b>Interrupt clearing</b><br>For details of interrupt handling see Appendix N.                                                                                                                                                                                                                                                                                                                                                                                         | 155     |

### TABLE C-1 SPARC64 VII Implementation Dependencies (3 of 11)

| Nbr   | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Page |
|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 35    | eq:space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space-space- | 39   |
| 36    | <pre>Trap priorities SPARC64 VII's implementation-dependent traps have the following priorities:     interrupt_vector_trap (priority=16)     PA_watchpoint (priority=12)     VA_watchpoint (priority=1)     ECC_error (priority=33)     fast_instruction_access_MMU_miss (priority = 2)     fast_data_access_MMU_miss (priority = 12)     fast_data_access_protection (priority = 12)     async_data_error (priority = 2)</pre>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 39   |
| 37    | <b>Reset trap</b><br>SPARC64 VII implements power-on reset (POR) and watchdog reset.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 37   |
| 38    | <b>Effect of reset trap on implementation-dependent registers</b><br>See Section O.2, <i>RED_state and error_state</i> .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 163  |
| 39    | <b>Entering</b> error_state on implementation-dependent errors<br>CPU watchdog timeout at $2^{33}$ ticks, a normal trap, or an SIR at TL = MAXTL causes<br>the CPU to enter error_state.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 36   |
| 40    | Error_state <b>processor state</b><br>SPARC64 VII optionally takes a watchdog reset trap after entry to<br>error_state. Most error-logging register states will be preserved. (See also<br>impl. dep. #254.)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 36   |
| 41    | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |      |
| 42    | FLUSH <b>instruction</b><br>SPARC64 VII implements the FLUSH instruction in hardware.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | _    |
| 43    | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |      |
| 44    | <b>Data access FPU trap</b><br>The destination register(s) are unchanged if an access error occurs.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | —    |
| 45–46 | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |      |
| 47    | RDASR<br>SPARC64 VII does not define this implementation dependent ASR register.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | —    |
| 48    | WRASR<br>SPARC64 VII does not define this implementation dependent ASR register.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | —    |

 TABLE C-1
 SPARC64 VII Implementation Dependencies (4 of 11)

| Nbr    | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Page |
|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 49–54  | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |      |
| 55     | Floating-point underflow detection<br>See FSR_underflow in Section 5.1.7 of Commonality for details.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | —    |
| 56–100 | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |      |
| 101    | Maximum trap level<br>MAXTL = 5.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 18   |
| 102    | <b>Clean windows trap</b><br>SPARC64 VII generates a <i>clean_window</i> exception; register windows are cleaned<br>in software.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | _    |
| 103    | <ul> <li>Prefetch instructions</li> <li>SPARC64 VII implements PREFETCH variations 0–3 and 20–23 with the following implementation-dependent characteristics:</li> <li>The prefetches have observable effects in privileged code.</li> <li>All variants never cause a fast_data_access_MMU_miss trap.</li> <li>All prefetches are for 64-byte cache lines, which are aligned on a 64-byte boundary.</li> <li>See Section A.49, Prefetch Data, for implemented variations and their characteristics.</li> <li>Prefetches will work normally if the ASI is ASI_PRIMARY, ASI_SECONDARY, or ASI_NUCLEUS, ASI_PRIMARY_AS_IF_USER, ASI_SECONDARY_AS_IF_USER, and their little-endian pairs.</li> </ul> | 70   |
| 104    | VER.manuf<br>VER.manuf = $0004_{16}$ . The least significant 8 bits are Fujitsu's JEDEC<br>manufacturing code.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 18   |
| 105    | TICK register<br>SPARC64 VII implements 63 bits of the TICK register; it increments on every<br>clock cycle.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 17   |
| 106    | IMPDEP <i>n</i> instructions<br>SPARC64 VII uses the IMPDEP1 opcode for SUSPEND and SLEEP instructions,<br>and the IMPDEP2 opcode for the Multiply Add/Subtract instructions.<br>SPARC64 VII also conforms to Sun's specification for VIS-1 and VIS-2.                                                                                                                                                                                                                                                                                                                                                                                                                                           | 54   |
| 107    | Unimplemented LDD trap<br>SPARC64 VII implements LDD in hardware.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | _    |
| 108    | <b>Unimplemented</b> STD <b>trap</b><br>SPARC64 VII implements STD in hardware.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |      |
| 109    | <b>LDDF_mem_address_not_aligned</b><br>If the address is word aligned but not doubleword aligned, SPARC64 VII generates<br>the <i>LDDF_mem_address_not_aligned</i> exception. The trap handler software<br>emulates the instruction.                                                                                                                                                                                                                                                                                                                                                                                                                                                             | _    |

### TABLE C-1 SPARC64 VII Implementation Dependencies (5 of 11)

| Nbr | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                       | Page |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 110 | <b>STDF_mem_address_not_aligned</b><br>If the address is word aligned but not doubleword aligned, SPARC64 VII generates<br>the <i>STDF_mem_address_not_aligned</i> exception. The trap handler software<br>emulates the instruction.                                                   |      |
| 111 | <b>LDQF_mem_address_not_aligned</b><br>SPARC64 VII generates an <i>illegal_instruction</i> exception for all LDQFs. The<br>processor does not perform the check for <i>fp_disabled</i> . The trap handler software<br>emulates the instruction.                                        |      |
| 112 | <b>STQF_mem_address_not_aligned</b><br>SPARC64 VII generates an <i>illegal_instruction</i> exception for all STQFs. The processor does not perform the check for <i>fp_disabled</i> . The trap handler software emulates the instruction.                                              |      |
| 113 | <b>Implemented memory models</b><br>SPARC64 VII implements Total Store Order (TSO) for all the memory models<br>specified in PSTATE.MM. See Chapter 8, <i>Memory Models</i> , for details.                                                                                             | 41   |
| 114 | RED_state <b>trap vector address</b> (RSTVaddr)<br>RSTVaddr is a constant in SPARC64 VII, where:<br>VA=FFFF FFFF F000 0000 <sub>16</sub> and<br>PA=07FF F000 0000 <sub>16</sub>                                                                                                        | 36   |
| 115 | RED_state <b>processor state</b><br>See <i>RED_state</i> on page 36 for details of implementation-specific actions in<br>RED_state.                                                                                                                                                    | 36   |
| 116 | SIR_enable control flag<br>See Section A.60 SIR in Commonality for details.                                                                                                                                                                                                            | —    |
| 117 | <b>MMU disabled prefetch behavior</b><br>When the MMU is disabled, prefetch comletes without memory access and<br>nonfaulting load causes an <i>data_access_exception</i> .                                                                                                            | 108  |
| 118 | <b>Identifying I/O locations</b><br>This dependency is beyond the scope of this publication. It should be defined in a system that uses SPARC64 VII.                                                                                                                                   | _    |
| 119 | <b>Unimplemented values for</b> PSTATE.MM<br>Writing 11 <sub>2</sub> into PSTATE.MM causes the machine to use the TSO memory model.<br>However, the encoding 11 <sub>2</sub> should not be used, since future versions of<br>SPARC64 VII may use this encoding for a new memory model. | 42   |
| 120 | <b>Coherence and atomicity of memory operations</b><br>Although SPARC64 VII implements the Jupiter Bus based cache coherency<br>mechanism, this dependency is beyond the scope of this publication. It should be<br>defined in a system that uses SPARC64 VII.                         | _    |

 TABLE C-1
 SPARC64 VII Implementation Dependencies (6 of 11)

| Nbr     | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Page          |
|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| 121     | <b>Implementation-dependent memory model</b><br>SPARC64 VII implements TSO, PSO, and RMO memory models. See Chapter 8,<br><i>Memory Models</i> , for details.<br>Accesses to pages with the E (Volatile) bit of their MMU page table entry set are                                                                                                                                                                                                                                                  | _             |
| 122     | also made in program order.<br>FLUSH <b>latency</b><br>Since the FLUSH instruction synchronizes the processor, its total latency varies<br>depending on many portions of the SPARC64 VII processor's state. Assuming that<br>all prior instructions are completed, the latency of FLUSH is 18 processor cycles.                                                                                                                                                                                     | _             |
| 123     | <b>Input/output (I/O) semantics</b><br>This dependency is beyond the scope of this publication. It should be defined in a system that uses SPARC64 VII.                                                                                                                                                                                                                                                                                                                                             | —             |
| 124     | <b>Implicit ASI when</b> TL > 0<br>See Section 5.1.7 of <b>Commonality</b> for details.                                                                                                                                                                                                                                                                                                                                                                                                             | —             |
| 125     | Address masking<br>When PSTATE.AM = 1, SPARC64 VII <i>does</i> mask out the high-order 32 bits of the<br>PC when transmitting it to the destination register.                                                                                                                                                                                                                                                                                                                                       | 28, 53,<br>63 |
| 126     | <b>Register Windows State Registers width</b><br>NWINDOWS for SPARC64 VII is 8; therefore, only 3 bits are implemented for the<br>following registers: CWP, CANSAVE, CANRESTORE, OTHERWIN. If an attempt is<br>made to write a value greater than NWINDOWS – 1 to any of these registers, the<br>extraneous upper bits are discarded. The CLEANWIN register contains 3 bits.                                                                                                                        | _             |
| 127-201 | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |               |
| 202     | fast_ECC_error trap<br>fast_ECC_error trap is not implemented in SPARC64 VII.                                                                                                                                                                                                                                                                                                                                                                                                                       | _             |
| 203     | <b>Dispatch Control Register bits 13:6 and 1</b><br>SPARC64 VII does not implement DCR.                                                                                                                                                                                                                                                                                                                                                                                                             | 20            |
| 204     | DCR <b>bits 5:3 and 0</b><br>SPARC64 VII does not implement DCR.                                                                                                                                                                                                                                                                                                                                                                                                                                    | 20            |
| 205     | <b>Instruction Trap Register</b><br>SPARC64 VII implements the Instruction Trap Register.                                                                                                                                                                                                                                                                                                                                                                                                           | 22            |
| 206     | SHUTDOWN <b>instruction</b><br>In privileged mode, the SHUTDOWN instruction executes as a NOP in<br>SPARC64 VII.                                                                                                                                                                                                                                                                                                                                                                                    | 73            |
| 207     | <ul> <li>PCR register bits 47:32, 26:17, and bit 3</li> <li>SPARC64 VII uses these bits for the following purposes:</li> <li>Bits 47:32 for set/clear/show status of overflow (OVF).</li> <li>Bit 26 for validity of OVF field (OVRO).</li> <li>Bits 24:22 for number of counter pair (NC).</li> <li>Bits 20:18 for counter selector (SC).</li> <li>Bit 3 for validity of SU/SL field (ULRO).</li> <li>Other implementation-dependent bits are read as 0 and writes to them are ignored.</li> </ul> | 18            |

### TABLE C-1 SPARC64 VII Implementation Dependencies (7 of 11)

| Nbr | SPARC64 VII Implementation Notes                                                                                                                                                                                                 | Page |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 208 | <b>Ordering of errors captured in instruction execution</b><br>The order in which errors are captured during instruction execution is<br>implementation dependent. Ordering can be in program order or in order of<br>detection. | _    |
| 209 | <b>Software intervention after instruction-induced error</b><br>Precision of the trap to signal an instruction-induced error for which recovery<br>requires software intervention is implementation dependent.                   | —    |
| 210 | <b>ERROR output signal</b><br>The causes and the semantics of ERROR output signal are implementation dependent.                                                                                                                  | _    |
| 211 | <b>Error logging registers' information</b><br>The information that the error logging registers preserves beyond the reset induced<br>by an ERROR signal is implementation dependent.                                            | —    |
| 212 | <b>Trap with fatal error</b><br>Generation of a trap along with ERROR signal assertion upon detection of a fatal error is implementation dependent.                                                                              | —    |
| 213 | AFSR.PRIV<br>SPARC64 VII does not implement the AFSR.PRIV bit.                                                                                                                                                                   | —    |
| 214 | Enable/disable control for deferred traps<br>SPARC64 VII does not implement a control feature for deferred traps.                                                                                                                | —    |
| 215 | <b>Error barrier</b><br>DONE and RETRY instructions may implicitly provide an error barrier function as<br>MEMBAR #Sync. Whether DONE and RETRY instructions provide an error barrier is<br>implementation dependent.            | —    |
| 216 | <i>data_access_error</i> trap precision<br><i>data_access_error</i> trap is always precise in SPARC64 VII.                                                                                                                       | —    |
| 217 | <i>instruction_access_error</i> trap precision<br><i>instruction_access_error</i> trap is always precise in SPARC64 VII.                                                                                                         | —    |
| 218 | <b>async_data_error</b><br>async_data_error trap is implemented in SPARC64 VII, using $tt = 40_{16}$ . See Appendix P for details.                                                                                               | 39   |
| 219 | Asynchronous Fault Address Register (AFAR) allocation<br>SPARC64 VII does not implement an AFAR.                                                                                                                                 | 199  |
| 220 | Addition of logging and control registers for error handling<br>SPARC64 VII implements various features for sustaining reliability. See<br>Appendix P for details.                                                               | _    |
| 221 | <b>Special/signalling ECCs</b><br>The method to generate "special" or "signalling" ECCs and whether processor-ID is<br>embedded into the data associated with special/signalling ECCs is implementation<br>dependent.            | _    |

 TABLE C-1
 SPARC64 VII Implementation Dependencies (8 of 11)

| Nbr | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                                                                                                                                                                                            | Page |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 222 | <ul> <li>TLB organization</li> <li>SPARC64 VII has the following TLB organization:</li> <li>Level-1 micro ITLB (uITLB), fully associative</li> <li>Level-1 micro DTLB (uDTLB), fully associative</li> <li>Level-2 IMMU-TLB—consisting of sITLB (set-associative Instruction TLB) and fITLB (fully associative Instruction TLB).</li> <li>Level-2 DMMU-TLB—consisting of sDTLB (set-associative Data TLB) and fDTLB (fully associative Data TLB).</li> </ul> | 102  |
| 223 | <b>TLB multiple-hit detection</b><br>On SPARC64 VII, TLB multiple hit detection is supported. However, the multiple<br>hit is not detected at every TLB reference. When the micro-TLB (uTLB), which is<br>the cache of sTLB and fTLB, matches the virtual address, a multiple hit in sTLB<br>and fTLB is not detected. The multiple hit is detected only when the micro-TLB<br>misses and the main TLB is referenced.                                       | 103  |
| 224 | <b>MMU physical address width</b><br>The SPARC64 VII MMU implements 47-bit physical addresses. The PA field of the<br>TTE holds a 47-bit physical address. The MMU translates virtual addresses into<br>47-bit physical addresses. Each cache tag holds bits 46:6 of the physical addresses.                                                                                                                                                                | 104  |
| 225 | <b>TLB locking of entries</b><br>In SPARC64 VII, when a TTE with its lock bit set is written into TLB through the<br>Data In register, the TTE is automatically written into the corresponding fully<br>associative TLB and locked in the TLB. Otherwise, the TTE is written into the<br>corresponding sTLB of fTLB, depending on its page size.                                                                                                            | 104  |
| 226 | <b>TTE support for</b> CV <b>bit</b><br>SPARC64 VII does not support the CV bit in TTE. Since I1 and D1 are virtuall-<br>indexed cache, and unaliasing is supported by hardware. See also impl. dep. #232.                                                                                                                                                                                                                                                  | 104  |
| 227 | <b>TSB number of entries</b><br>SPARC64 VII supports a maximum of 16 million entries in the common TSB and<br>a maximum of 32 million lines in the Split TSB.                                                                                                                                                                                                                                                                                               | 105  |
| 228 | <b>TSB_Hash supplied from TSB or context-ID register</b><br>TSB_Hash is generated from the context-ID register in SPARC64 VII.                                                                                                                                                                                                                                                                                                                              | 105  |
| 229 | <b>TSB_Base address generation</b><br>SPARC64 VII generates the TSB_Base address directly from the TLB Extension<br>Registers. By maintaining compatibility with UltraSPARC I/II, SPARC64 VII<br>provides mode flag MCNTL.JPS1_TSBP. When MCNTL.JPS1_TSBP = 0, the<br>TSB_Base register is used.                                                                                                                                                            | 105  |
| 230 | <i>data_access_exception</i> trap<br>SPARC64 VII generates <i>data_access_exception</i> only for the causes listed in<br>Appendix F.5 of <b>Commonality</b> .                                                                                                                                                                                                                                                                                               | 106  |
| 231 | <b>MMU physical address variability</b><br>The width of a physical address is 47 bits in SPARC64 VII.                                                                                                                                                                                                                                                                                                                                                       | 108  |

### TABLE C-1 SPARC64 VII Implementation Dependencies (9 of 11)

| Nbr | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                               | Page               |
|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
| 232 | <b>DCU Control Register</b> CP and CV bits<br>SPARC64 VII does not implement CP and CV bits in the DCU Control Register.<br>See also impl. dep. #226.                                                                                                                                          | 20, 108            |
| 233 | TSB_Hash <b>field</b><br>SPARC64 VII does not implement TSB_Hash.                                                                                                                                                                                                                              | 109                |
| 234 | <b>TLB replacement algorithm</b><br>For fTLB, SPARC64 VII implements a pseudo-LRU. For sTLB, LRU is used. An<br>entry in the fTLB may also be replaced by a dropped TTE from the sTLB.                                                                                                         | 116                |
| 235 | <b>TLB data access address assignment</b><br>VA of TLB Data Access register is described in Table F-8                                                                                                                                                                                          | 116                |
| 236 | <b>TSB_Size field width</b><br>In SPARC64 VII, TSB_Size is 4 bits wide, occupying bits 3:0 of the TSB register. The maximum number of TSB entries is, therefore, $512 \times 2^{15}$ (16M entries).                                                                                            | 118                |
| 237 | DSFAR/DSFSR <b>for</b> JMPL/RETURN <b>mem_address_not_aligned</b><br>A mem_address_not_aligned exception that occurs during a JMPL or RETURN<br>instruction does not update either the D-SFAR or D-SFSR register.                                                                              | 63,<br>106,<br>118 |
| 238 | <b>TLB page offset for large page sizes</b><br>On SPARC64 VII, page offset data is discarded on a TLB write, and an arbitrary<br>data is returned on a read.                                                                                                                                   | 104                |
| 239 | <b>Register access by ASIs 55<sub>16</sub> and 5D<sub>16</sub></b><br>In SPARC64 VII, VA<63:19> of IMMU ASI 55 <sub>16</sub> and DMMU ASI 5D <sub>16</sub> are<br>ignored. An access to virtual addresses $40000_{16}$ to $60FF8_{16}$ is treated as an access<br>$00000_{16}$ to $20FF8_{16}$ | 109                |
| 240 | DCU Control Register bits 47:41<br>SPARC64 VII uses bit 41 for WEAK_SPCA, which enables/disables memory access<br>on speculative paths.                                                                                                                                                        | 20                 |
| 241 | Address Masking and DSFAR<br>When PSTATE.AM = 1, SPARC64 VII writes zeroes to the most significant 32 bits<br>of DSFAR.                                                                                                                                                                        | ?                  |
| 242 | <b>TLB lock bit</b><br>In SPARC64 VII, only the fITLB and the fDTLB support the lock bit. The lock bit<br>in sITLB and sDTLB is read as 0 and writes to it are ignored.                                                                                                                        | 104                |
| 243 | <b>Interrupt Vector Dispatch Status Register BUSY/NACK pairs</b><br>In SPARC64 VII, 32 BUSY/NACK pairs are implemented in the Interrupt Vector Dispatch Status Register.                                                                                                                       | 158                |
| 244 | <b>Data Watchpoint Reliability</b><br>No implementation-dependent features of SPARC64 VII reduce the reliability of data watchpoints.                                                                                                                                                          | 22                 |

 TABLE C-1
 SPARC64 VII Implementation Dependencies (10 of 11)

| Nbr | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Page         |
|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
| 245 | <b>Call/Branch displacement encoding in I-Cache</b><br>In SPARC64 VII, the least significant 11 bits (bits 10:0) of a CALL or branch (BPcc, FBPfcc, Bicc, BPr) instruction in an instruction cache are identical to the architectural encoding (as they appear in main memory).                                                                                                                                                                                                                     | ?            |
| 246 | VA< <b>38:29&gt; for Interrupt Vector Dispatch Register Access</b><br>SPARC64 VII ignores all 10 bits of VA<38:29> when the Interrupt Vector Dispatch<br>Register is written.                                                                                                                                                                                                                                                                                                                       | 158          |
| 247 | Interrupt Vector Receive Register SID fields<br>SID_H and SID_L values are undefined.                                                                                                                                                                                                                                                                                                                                                                                                               | 158          |
| 248 | <b>Conditions for fp_exception_other with unfinished_FPop</b><br>SPARC64 VII triggers fp_exception_other with trap type unfinished_FPop under<br>the standard conditions described in <b>Commonality</b> Section 5.1.7.                                                                                                                                                                                                                                                                             | 16           |
| 249 | <b>Data watchpoint for Partial Store instruction</b><br>Watchpoint exceptions on Partial Store instructions occur conservatively on<br>SPARC64 VII. The DCUCR Data Watchpoint masks are only checked for nonzero<br>value (watchpoint enabled). The byte store mask (r [rs2]) in the Partial Store<br>instruction is ignored, and a watchpoint exception can occur even if the mask is<br>zero (that is, no store will take place).                                                                 | 68           |
| 250 | PCR accessibility when PSTATE.PRIV = 0<br>In SPARC64 VII, the accessibility of PCR when PSTATE.PRIV = 0 is determined<br>by PCR.PRIV. If PSTATE.PRIV = 0 and PCR.PRIV = 1, an attempt to execute<br>either RDPCR or WRPCR will cause a <i>privileged_action</i> exception. If<br>PSTATE.PRIV = 0 and PCR.PRIV = 0, RDPCR operates without privilege<br>violation and WRPCR generates a <i>privileged_action</i> exception only when an attempt<br>is made to change (that is, write 1 to) PCR.PRIV. | 18, 20<br>72 |
| 251 | Reserved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | _            |
| 252 | DCUCR.DC (Data Cache Enable)<br>SPARC64 VII does not implement DCUCR.DC.                                                                                                                                                                                                                                                                                                                                                                                                                            | 20           |
| 253 | DCUCR.IC (Instruction Cache Enable)<br>SPARC64 VII does not implement DCUCR.IC.                                                                                                                                                                                                                                                                                                                                                                                                                     | 20           |
| 254 | Means of exiting error_state<br>The standard behavior of a SPARC64 VII CPU upon entry into error_state is<br>to reset itself by internally generating a <i>watchdog_reset</i> (WDR). However, OPSR<br>can be set so that when error_state is entered, the processor remains halted in<br>error_state instead of generating a <i>watchdog_reset</i> .                                                                                                                                                | 36, 16       |
| 255 | LDDFA with ASI E0 <sub>16</sub> or E1 <sub>16</sub> and misaligned destination register number<br>No exception is generated based on the destination register $rd$ .                                                                                                                                                                                                                                                                                                                                | 140          |

#### TABLE C-1 SPARC64 VII Implementation Dependencies (11 of 11)

| Nbr | SPARC64 VII Implementation Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Page |
|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| 256 | LDDFA with ASI $E0_{16}$ or $E1_{16}$ and misaligned memory address<br>For LDDFA with ASI $E0_{16}$ or $E1_1$ and a memory address aligned on a $2^n$ -byte<br>boundary, a SPARC64 V processor behaves as follows:<br>$n \ge 3 (\ge 8$ -byte alignment): no exception related to memory address alignment is<br>generated.<br>$n = 2$ (4-byte alignment): LDDF_mem_address_not_aligned exception is<br>generated.<br>$n \le 1 (\le 2$ -byte alignment): mem_address_not_aligned exception is generated.                                                                                                          | 140  |
| 257 | LDDFA with ASI C0 <sub>16</sub> -C5 <sub>16</sub> or C8 <sub>16</sub> -CD <sub>16</sub> and misaligned memory address<br>For LDDFA with C0 <sub>16</sub> -C5 <sub>16</sub> or C8 <sub>16</sub> -CD <sub>16</sub> and a memory address aligned on a $2^n$ -<br>byte boundary, a SPARC64 V processor behaves as follows:<br>$n \ge 3$ ( $\ge$ 8-byte alignment): no exception related to memory address alignment is<br>generated.<br>$n = 2$ (4-byte alignment): LDDF_mem_address_not_aligned exception is<br>generated.<br>$n \le 1$ ( $\le 2$ -byte alignment): mem_address_not_aligned exception is generated. | 140  |
| 258 | ASI_SERIAL_ID<br>SPARC64 VII provides an identification code for each processor.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 139  |

F.APPENDIX  ${f D}$ 

# Formal Specification of the Memory Models

Please refer to Appendix D of Commonality.

# Opcode Maps

Please refer to Appendix E in *SPARC Joint Programming Specification 1 (JPS1): Commonality.* TABLE E-1 lists the opcode maps for the SPARC64 VII IMPDEP2 instructions, and lists the one for the IMPDEP1 instructions.

**TABLE E-1** IMPDEP2 (op = 2, op3 =  $37_{16}$ )

|                    |    |         | var (instruction <8:7>) |            |         |  |  |  |  |  |  |  |
|--------------------|----|---------|-------------------------|------------|---------|--|--|--|--|--|--|--|
|                    |    | 00      | 01                      | 10         | 11      |  |  |  |  |  |  |  |
|                    | 00 | FPMADDX | FPMADDXHI               | (reserved) |         |  |  |  |  |  |  |  |
| size               | 01 | FMADDs  | FMSUBs                  | FNMSUBs    | FNMADDs |  |  |  |  |  |  |  |
| (instruction<6:5>) | 10 | FMADDd  | FMSUBd                  | FNMSUBd    | FNMADDd |  |  |  |  |  |  |  |
|                    | 11 |         | (reserved for qu        |            |         |  |  |  |  |  |  |  |

| $(op = 2, op3 = 36_{16})$ |
|---------------------------|
| /IS opcodes               |
| ppf<8:0> for V            |
| TABLE E-2 IMPDEP1: 0      |

|           | 09-1F |          |              |          |                |          |                |                |                |                  |                 |                            |          |          |          |          |        |
|-----------|-------|----------|--------------|----------|----------------|----------|----------------|----------------|----------------|------------------|-----------------|----------------------------|----------|----------|----------|----------|--------|
|           | 80    | SHUTDOWN | SIAM         | SUSPEND  | SLEEP          |          |                |                |                |                  | I               |                            |          |          |          |          |        |
|           | 07    | FAND     | FANDS        | FXNOR    | FXNORS         | FSRC1    | FSRC1S         | FORNOT2        | FORNOT2S       | FSRC2            | FSRC2S          | FORNOT1                    | FORNOR1S | FOR      | FORS     | FONE     | FONES  |
|           | 90    | FZERO    | FZEROS       | FNOR     | FNORS          | FANDNOT2 | FANDNOT2S      | FNOT2          | FNOT2S         | FANDNOT1         | FANDNOT1S       | FNOT1                      | FNOTIS   | FXOR     | FXORS    | FNAND    | FNANDS |
| :4>       | 95    | FPADD16  | FPADD16S     | FPADD32  | FPADD32S       | FPSUB16  | FPSUB16S       | FPSUB32        | FPSUB32S       |                  | _               |                            | —        |          |          |          |        |
| opf <8:4> | 04    |          | I            | Ι        |                | l        |                | I              |                | FALIGNDATA       | I               |                            | FPMERGE  | BSHUFFLE | FEXPAND  | Ι        |        |
|           | 03    |          | FMUL<br>8x16 |          | FMUL<br>8x16AU |          | FMUL<br>8x16AL | FMUL<br>8SUX16 | FMUL<br>8ULX16 | FMULD<br>8SUX16  | FMULD<br>8ULX16 | FPACK32                    | FPACK16  |          | FPACKFIX | PDIST    |        |
|           | 02    | FCMPLE16 |              | FCMPNE16 |                | FCMPLE32 |                | FCMPNE32       |                | FCMPGT16         | I               | FCMPEQ16                   |          | FCMPGT32 |          | FCMPEQ32 |        |
|           | 01    | ARRAY8   |              | ARRAY1 6 |                | ARRAY3 2 |                |                |                | ALIGN<br>ADDRESS | BMASK           | ALIGN<br>ADDRESS<br>LITTLE | Ι        |          |          | Ι        |        |
|           | 00    | EDGE8    | EDGE8N       | EDGE8L   | EDGE 8 LN      | EDGE16   | EDGE16N        | EDGE16L        | EDGE16LN       | EDGE32           | EDGE 3 2N       | EDGE 32L                   | EDGE32LN |          |          |          |        |
|           |       | 0        | 1            | 2        | 3              | 4        | S              | 9              | 7              | 8                | 6               | V                          | B        | С        | D        | E        | F      |
|           |       |          |              |          |                |          |                |                | opf<br>^3:0>   |                  |                 |                            |          |          |          |          |        |

# Memory Management Unit

The Memory Management Unit (MMU) architecture of SPARC64 VII conforms to the MMU architecture defined in Appendix F of **Commonality** but with some model dependency. See Appendix F in **Commonality** for the basic definitions of the SPARC64 VII MMU.

Section numbers in this appendix correspond to those in Appendix F of **Commonality**. Figures and tables, however, are numbered consecutively.

This appendix describes the implementation dependencies and other additional information about the SPARC64 VII MMU. For SPARC64 VII implementations, we first list the implementation dependency as given in TABLE C-1 of **Commonality**, then describe the SPARC64 VII implementation.

# F.1 Virtual Address Translation

IMPL. DEP. #222: TLB organization is JPS1 implementation dependent.

SPARC64 VII has the following TLB organization:

- Level-1 micro ITLB (uITLB), fully associative
- Level-1 micro DTLB (uDTLB), fully associative
- Level-2 IMMU-TLB consists of sITLB (set-associative Instruction TLB) and fITLB (fully associative Instruction TLB).
- Level-2 DMMU-TLB consists of sDTLB (set-associative Data TLB) and fDTLB (fully associative Data TLB).

TABLE F-1 shows the organization of SPARC64 VII TLBs.

The hardware contains micro-ITLB and micro-DTLB as the temporary memory of the main TLBs, as shown in TABLE F-1. In contrast to the micro-TLBs, sTLB and fTLB are called main TLBs.

The micro-TLBs are coherent to main TLBs and are not visible to software with the exception of TLB multiple hit detection. Hardware maintains the consistency between micro-TLBs and main TLBs.

No other details on micro-TLB are provided because software cannot execute direct operations to micro-TLB and its configuration is invisible to software.

TABLE F-1 Organization of SPARC64 VII TLBs

| Feature                    | sITLB and sDTLB       | fITLB and fDTLB                                 |
|----------------------------|-----------------------|-------------------------------------------------|
| Entries                    | 2048                  | 32                                              |
| Associativity              | 2-way set associative | Fully associative                               |
| Locked translation entry   | Not supported         | Supported                                       |
| Unlocked translation entry | Supported             | Supported                                       |
| Miscellaneous              | Hashing not supported | Also works as a victim cache of sITLB and sDTLB |

**IMPL. DEP. #223:** Whether TLB multiple-hit detections are supported in JPS1 is implementation dependent.

On SPARC64 VII, TLB multiple hit detection is supported. However, the multiple hit is not detected for every TLB reference. When the micro-TLB (uTLB), which is the cache of sTLB and fTLB, matches the virtual address, the multiple hit in sTLB and fTLB is not detected. The multiple hit is detected only when the micro-TLB mismatches and main TLB is referenced.

# F.2 Translation Table Entry (TTE)

The size field of TTE is extended from 2bits to 3bits on SPARC64 VII to support over 4M pages. The MSB of the size is located at bit 48 of TTE.

 TABLE F-2
 TSB and TTE
 Bit Description

| Bits             | Field Name | Description                                                                                                                                                          |
|------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data <48, 62:61> | size       | The page size of the entry, encoded as shown below.<br>Size<2:0> Page Size<br>000 = 8 KB<br>001 = 64 KB<br>010 = 512 KB<br>011 = 4 MB<br>100 = 32 MB<br>101 = 256 MB |
| Data <46:13>     | PA         | The physical page number.                                                                                                                                            |

**IMPL DEP. in Commonality TABLE F-1:** TTE\_Data bits 46:43 are implementation dependent.

On SPARC64 VII, TTE\_Data bits 46:43 are used for PA<46:43>.

**IMPL. DEP. #224:** Physical address width support by the MMU is implementation dependent in JPS1; minimum PA width is 43 bits.

The SPARC64 VII MMU implements 47-bit physical addresses. The PA field of the TTE holds a 47-bit physical address. The MMU translates virtual addresses into 47-bit physical addresses. Each cache tag holds bits 46:6 of physical addresses.

**IMPL. DEP. #238:** When page offset bits for larger page size (PA<15:13>, PA<18:13>, and PA<21:13> for 64-Kbyte, 512-Kbyte, and 4-Mbyte, respectively) are stored in the TLB, it is implementation dependent whether the data returned from those fields by a Data Access read are zero or the data previously written to them.

On SPARC64 VII, the data returned from PA<15:13>, PA<18:13>, PA<21:13>, PA<24:13>, and PA<27:13> for 64-Kbyte, 512-Kbyte, 4-Mbyte, 32-Mbyte, and 256-Mbyte pages, respectively, by a Data Access read is neither zero nor the data previously written to them, but an arbitrary data is returned. Likewise, the corresponding VA bits of a TLB Tag Read Register are read as arbitrary data.

**IMPL. DEP. #225:** The mechanism by which entries in TLB are locked is implementation dependent in JPS1.

In SPARC64 VII, when a TTE with its lock bit set is written into TLB through the Data In register, the TTE is automatically written into the corresponding fully associative TLB and locked in the TLB. Otherwise, the TTE is written into the corresponding sTLB or fTLB, depending on its page size.

**IMPL. DEP. #242:** An implementation containing multiple TLBs may implement the L (lock) bit in all TLBs but is only required to implement a lock bit in one TLB for each page size. If the lock bit is not implemented in a particular TLB, it is read as 0 and writes to it are ignored.

In SPARC64 VII, only the fITLB and the fDTLB support the lock bit as described in TABLE F-1. The lock bit in sITLB and sDTLB is read as 0 and writes to it are ignored.

**IMPL. DEP. #226:** Whether the CV bit is supported in TTE is implementation dependent in JPS1. When the CV bit in TTE is not provided and the implementation has virtually indexed caches, the implementation should support hardware unaliasing for the caches.

In SPARC64 VII, no TLB supports the CV bit in TTE. SPARC64 VII supports hardware unaliasing for the caches. The CV bit in any TLB entry is read as 0 and writes to it are ignored.

### F.3.2 TSB Cacheabllity

Since the TSB is a normal data structure and therefore is cacheable, it is quite important to performance whether the target entry is in cache or not when a TLB miss occurs. When a TLB miss is signalled and a TSB access misses the caches in the miss handler, the CPU must wait until the data returns from memory. The loss from this wait is considerably larger as the memory latency is longer. To reduce the loss, SPARC64 VII implements automatic TSB prefetch when a TLB miss is signalled.

### F.3.3 TSB Organization

**IMPL. DEP. #227:** The maximum number of entries in a TSB is implementation dependent in JPS1. See impl. dep. #228 for the limitation of TSB\_size in TSB registers.

SPARC64 VII supports a maximum of 16 million lines in the common TSB and a maximum 32 million lines in the split TSB. The maximum number N in FIGURE F-4 of **Commonality** is 16 million (16 \*  $2^{20}$ ).

### F.4.2 TSB Pointer Formation

**IMPL. DEP. #228:** Whether TSB\_Hash is supplied from a TSB Extension Register or from a context-ID register is implementation dependent in JPS1. Only for cases of direct hash with context-ID can the width of the TSB size field be wider than 3 bits.

On SPARC64 VII, TSB\_Hash is supplied from a context-ID register. The width of the TSB size field is 4 bits.

**IMPL. DEP. #229:** Whether the implementation generates the TSB Base address by exclusive-ORing the TSB Base Register and a TSB Extension Register or by taking the TSB\_Base field directly from the TSB Extension Register is implementation dependent in JPS1. This implementation dependency is only to maintain compatibility with the TLB miss handling software of UltraSPARC I/II.

On SPARC64 VII, when ASI\_MCNTL.JPS1\_TSBP = 1, the TSB Base address is generated by taking TSB Base field directly from the TSB Extension Register.

#### **TSB** Pointer Formation

On SPARC64 VII, the number N in the following equations ranges from 0 to 15; N is defined to be the TSB\_Size field of the TSB Base or TSB Extension Register.

SPARC64 VII supports the TSB Base from TSB Extension Registers as follows when ASI\_MCNTL.JPS1\_TSBP = 1.

#### For a shared TSB (TSB Register split field = 0):

```
8K_POINTER = TSB_Extension[63:13+N] | (VA[21+N:13] \oplus TSB_Hash) | 0000
```

```
64K_POINTER = TSB_Extension[63:13+N] [] (VA[24+N:16] \oplus TSB_Hash) [] 0000
```

#### For a split TSB (TSB Register split field = 1):

TSB Hash[N+8:13] = 0 (N-4 bits zero)

### F.5 Faults and Traps

**IMPL. DEP. #230:** The cause of a *data\_access\_exception* trap is implementation dependent in JPS1, but there are several mandatory causes of a *data\_access\_exception* trap.

SPARC64 VII signals a *data\_access\_exception* for the causes, as defined in Appendix F.5 in **Commonality**. However, caution is needed when dealing with an invalid ASI. See Section F.10.9, *I/D Synchronous Fault Status Registers (I-SFSR, D-SFSR)* for details.

**IMPL. DEP. #237:** Whether the fault status and/or address (DSFSR/DSFAR) are captured when *mem\_address\_not\_aligned* is generated during a JMPL or RETURN instruction is implementation dependent.

On SPARC64 VII, the fault status and address (DSFSR/DSFAR) are not captured when a *mem\_address\_not\_aligned* exception is generated during a JMPL or RETURN instruction.

Additional information: On SPARC64 VII, the two precise traps—

*instruction\_access\_error* and *data\_access\_error*—are recorded by the MMU in addition to those in TABLE F-2 of **Commonality**. A modification (the two traps are added) of that table is shown below.

|      |                                  |                       |        |                        | Registers<br>tored Stat | •                      | )                                   |
|------|----------------------------------|-----------------------|--------|------------------------|-------------------------|------------------------|-------------------------------------|
| Refi | #Trap Name                       | Trap Cause            | I-SFSR | I-MMU<br>Tag<br>Access | D-SFSR,<br>SFAR         | D-MMU<br>Tag<br>Access | Тгар Туре                           |
| 1.   | fast_instruction_access_MMU_miss | I-TLB miss            | X2     | Х                      |                         |                        | 6416-6716                           |
| 2.   | instruction_access_exception     | Several (see below)   | X2     | Х                      |                         |                        | 0816                                |
| 3.   | fast_data_access_MMU_miss        | D-TLB miss            |        |                        | X3                      | Х                      | 68 <sub>16</sub> -6B <sub>16</sub>  |
| 4.   | data_access_exception            | Several (see below)   |        |                        | X3                      | X1                     | 30 <sub>16</sub>                    |
| 5.   | fast_data_access_protection      | Protection violation  |        |                        | X3                      | Х                      | 6C <sub>16</sub> -6F <sub>16</sub>  |
| 6.   | privileged_action                | Use of privileged ASI |        |                        | X3                      |                        | 37 <sub>16</sub>                    |
| 7.   | watchpoint                       | Watchpoint hit        |        |                        | X3                      |                        | 61 <sub>16</sub> -62 <sub>16</sub>  |
| 8.   | mem_address_not_aligned,         | Misaligned memory     |        |                        | (impl.                  |                        | 35 <sub>16</sub> , 36 <sub>16</sub> |
|      | *_mem_address_not_aligned        | operation             |        |                        | dep<br>#237)            |                        | 38 <sub>16</sub> , 39 <sub>16</sub> |
| 9.   | instruction_access_error         | Several (see below)   | X2     |                        |                         |                        | 0A <sub>16</sub>                    |
| 10   | data_access_error                | Several (see below)   |        |                        | X3                      |                        | 32 <sub>16</sub>                    |

 TABLE F-3
 MMU Trap Types, Causes, and Stored State Register Update Policy

- X1: The contents of the context field of the D-MMU Tag Access Register are undefined after a *data\_access\_exception*.
- X2: I-SFSR is updated according to its update policy described in Section F.10.9
- X3: D-SFSR and D-SFAR are updated according to the update policy described in Section F.10.9

The traps with Ref #1~8 in TABLE F-3 conform to the specification defined in Section F.5 of **Commonality**.

The additional traps (Ref #9 and #10) are described below.

**Ref 9:** *instruction\_access\_error* — Signalled upon detection of at least one of the following errors.

- An uncorrectable error is detected upon an instruction fetch reference.
- A bus error response from the Jupiter Bus is detected upon an instruction fetch reference.
- fITLB multiple hits are detected in a fITLB lookup for an instruction reference.
- An fITLB entry parity error is detected in an fTLB lookup for an instruction reference.

**Ref 10:** *data\_access\_error* — Signalled upon the detection of at least one of the following errors.

- An uncorrectable error is detected upon an instruction operand access.
- A bus error response from the Jupiter Bus is detected upon an operand access.

- fDTLB multiple hits are detected in an fDTLB lookup for an operand access.
- An fDTLB entry parity error is detected in a fDTLB lookup for an instruction operand access.

**Note** – A load request may not cause *data\_access\_error* when a store with the same address is executed prior to the load and the data exists in the store buffer. In this case, a restrainable error is reported instead. See also Appendix P.7.1.

# F.8 Reset, Disable, and RED\_state Behavior

**IMPL. DEP. #231:** The variability of the width of physical address is implementation dependent in JPS1, and if variable, the initial width of the physical address after reset is also implementation dependent in JPS1.

See impl. dep. #224 on page 104 for the variability of the width of the physical address. The physical address width to pass to the Jupiter Bus interface is 47 bits.

**IMPL. DEP. #232:** Whether CP and CV bits exist in the DCU Control Register is implementation dependent in JPS1.

On SPARC64 VII, CP and CV bits do not exist in the DCU Control Register.

When DMMU is disabled, the processor behaves as if the TTE bits were set as:

- TTE.IE  $\leftarrow 0$
- TTE.P  $\leftarrow 0$
- TTE.W  $\leftarrow 1$
- TTE.NFO $\leftarrow 0$
- TTE.CV  $\leftarrow 0$
- TTE.CP  $\leftarrow 0$
- TTE.E  $\leftarrow 1$

**IMPL. DEP. #117:** Whether prefetch and nonfaulting loads always succeed when the MMU is disabled is implementation dependent.

On SPARC64 VII, the PREFETCH instruction completes without memory access when the DMMU is disabled.

A *data\_access\_exception* is generated at the execution of the nonfaulting load instruction when the DMMU is disabled, as defined in Appendix F.5 of **Commonality**.

# F.10 Internal Registers and ASI Operations

### F.10.1 Accessing MMU Registers

**IMPL. DEP. #233:** Whether the TSB\_Hash field is implemented in I/D Primary/Secondary/ Nucleus TSB Extension Register is implementation dependent in JPS1.

In SPARC64 VII, the TSB\_Hash field is not implemented in the I/D Primary/Secondary/ Nucleus TSB Extension Register. See *TSB Pointer Formation* on page 105 for details.

**IMPL. DEP. #239:** The register(s) accessed by IMMU ASI  $55_{16}$  and DMMU ASI  $5D_{16}$  at virtual addresses  $40000_{16}$  to  $60FF8_{16}$  are implementation dependent.

See impl. dep. #235 in *I/D TLB Data In, Data Access, and Tag Read Registers* on page 116.

Additional information: The ASI\_DCUCR register also affects the MMUs. ASI\_DCUCR is described in Section 5.2.12 of Commonality. The SPARC64 VII implementation dependency in ASI\_DCUCR is described in *Data Cache Unit Control Register (DCUCR)* on page 20.

SPARC64 VII also has an additional MMU internal register ASI\_MCNTL (Memory Control Register) that is shared between the IMMU and the DMMU. The register is illustrated in FIGURE F-1 and described in TABLE F-4.

#### ASI\_MCNTL (Memory Control Register)

ASI:  $45_{16}$ VA:  $08_{16}$ Access Modes: Supervisor read/write

|    | reserved | NC_<br>Cache | fw_<br>fITLB | fw_<br>fDTLB | RMD   | 000  | JPS1_<br>TSBP | mpg_<br>sITLB | mpg_<br>sDTLB | 0 | 00000 |
|----|----------|--------------|--------------|--------------|-------|------|---------------|---------------|---------------|---|-------|
| 63 | 17       | 16           | 15           | 14           | 13 12 | 11 9 | 8             | 7             | 6             | 5 | 0     |

FIGURE F-1 Format of ASI MCNTL

#### TABLE F-4 MCNTL Field Description

| Bits         | Field Name | RW  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|--------------|------------|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data <16>    | NC_Cache   | R/W | Force instruction caching. When set, the instruction lines fetched from a noncacheable area are cached in the instruction cache. The NC_Cache has no effect on operand references. If MCNTL.NC_Cache = 1, the CPU fetches a noncacheable line in four consecutive 16-byte fetches and stores the entire 64 bytes in the I-Cache. NC_Cache is provided for use by OBP, and OBP should clear the bit before exiting.                          |
|              |            |     | A write to ASI_FLUSH_L1I must be performed before MCNTL.NC_CACHE = 0 is set. Otherwise, noncacheable instructions may remain in the L1 cache.                                                                                                                                                                                                                                                                                               |
| Data <15>    | fw_fITLB   | R/W | Force write to fITLB. This is the mITLB version of fTLB force write. When $fw_fITLB = 1$ , a TTE write to mITLB through ITLB Data In Register is directed to fITLB. $fw_fITLB$ is provided for use by OBP to register the TTEs that map the address translations themselves into fDTLB.                                                                                                                                                     |
| Data <14>    | fw_fDTLB   | R/W | Force write to fDTLB. When $fw_fDTLB = 1$ , a TTE write to mDTLB through DTLB Data In Register is directed to fDTLB. $fw_fDTLB$ is provided for use by OBP to register the TTEs that map the address translations themselves into fDTLB.                                                                                                                                                                                                    |
| Data <13:12> | RMD        | R   | TLB RAM MODE. The value is always 2. This field is read-only and writes to this field are ignored.                                                                                                                                                                                                                                                                                                                                          |
| Data <8>     | JPS1_TSBP  | R/W | TSB-pointer context-hashing enable. When JPS1_TSBP = 0, SPARC64 VII does not apply the context-ID hashing for 8-Kbyte or 64-Kbyte TSB pointer generation. The pointer generation technique is compatible with UltraSPARC. When JPS1_TSBP = 1, SPARC64 VII is in JPS1_TSBP mode, meaning that the CPU applies the context-ID hashing to generate an 8-Kbyte or 64-Kbyte page TSB pointer.                                                    |
| Data<7>      | mpg_sITLB  | RW  | This bit enables translating multiple page sizes on sITLBs.<br>When this bit is set, page size fields in the context register are activated, and the sITLB can simultaneously have multiple page sizes dedicated for each context.<br>When this bit is cleared, the page size field in the context register and the IMMU_TAG_ACCESS_EXT register are ignored and default page sizes (8K for the first sTLB and 4M for the second) are used. |
| Data<6>      | mpg_sDTLB  | RW  | This bit enables translating multiple page sizes on the sDTLB.                                                                                                                                                                                                                                                                                                                                                                              |
|              | _          |     | When this bit is set, page size fields in the context register are activated, and the sDTLB can simultaneously have multiple page sizes dedicated for each context. When this bit is cleared, page size field in the context register and the DMMU_TAG_ACCESS_EXT are ignored and default page sizes (8K for the first sTLB and 4M for the second) are used.                                                                                |

Setting "10" into mpg\_sITLB and mpg\_sDTLB is not allowed. SPARC64 VII behavior is undefined with this setting.

### F.10.2 Context Registers

sTLBs consist of two parts, where the first sTLB is 1024-entry two-way associative and the second sTLB is 1024 entry two-way associative. Normally the first sTLB holds 8KB pages and the second sTLB holds 4M pages for translations. ut software can program sTLBs to be used for 8 KB, 64 KB, 512 KB, 4 MB, 32MB and 256MB page translations, by setting MCNTL#mpg\_sTLB. Each sTLB can hold any of the 6 page sizes, but are programmed to only one page size at any given time. Each sTLB can be programmed to either the same or different page sizes.

Each sTLB page size (PgSz) is programmable independently, one PgSz per context (Primary/ Secondary/ Nucleus). PgSz specified Kernel can set the PgSz fields in ASI\_PRIMARY\_CONTEXT\_REG and ASI\_SECONDARY\_CONTEXT\_REG. PgSz specified in ASI\_PRIMARY\_CONTEXT\_REG are used for both sITLBs and sDTLBs. When both sDTLBs are programmed to have identical page size, the behavior is a "single" 4-way 2048-entry sDTLB.

The following is the page size bit encoding:

- 000 = 8 KB
- 001 = 64 KB
- 010 = 512 KB
- 011 = 4 MB
- 100 = 32 MB
- 101 = 256 MB

Note - SPARC64 VII behavior with undefined page size (110,111) is undefined.

In addition to the Primary, Secondary and Nucleus Context defined in **Commonality**, a Shared Context is introduced in SPARC64 VII. Shared Context is a virtual address space shared by two or more processes, to locate instructions or data which can be shared among them. It is similar to the Secondary Context register in the point of enabling access to another context from a context, but these are distinctly different in the following points:

- An explicit ASI load/store instruction is needed to use Secondary Context Register, while Shared Context Register is used implicitly along with the memory access.
- The Shared Context Register is used both for instruction fetch and data access.

In the following description, the term 'Effective Context' is used. This term represents the context ID used in MMU. The definition is as follows:

- PContext for instruction fetch and data access without explicit ASI designation on TL = 0.
- Nucleus Context Register value, which is always zero, for instruction fetch and data access without explicit ASI designation on TL > 0.
- Value of the relevant context register for data access with an explicit ASI.

#### ASI\_PRIMARY\_CONTEXT

| ASI:          | 58 <sub>16</sub>      |
|---------------|-----------------------|
| VA:           | 0816                  |
| Access Modes: | Supervisor read/write |

|    | N_pgsz0 | N_p | gsz1 | N_lp | gsz0 | N_lp | gsz1 | _ | P_1 | pgsz1 | P_lp | gsz0 | _ | P_p | gsz1 | P_pg | sz0 | _  | -  |    | PContext |   |
|----|---------|-----|------|------|------|------|------|---|-----|-------|------|------|---|-----|------|------|-----|----|----|----|----------|---|
| 63 | 3 61    | 60  | 58   | 55   | 53   | 52   | 50   |   | 29  | 27    | 26   | 24   |   | 21  | 19   | 18   | 16  | 15 | 13 | 12 |          | 0 |

FIGURE F-2 IMMU and DMMU Primary Context Registers

TABLE F-5 IMMU and DMMU Primary Context Registers

| Bit   | Field    | Туре | Description                                     |
|-------|----------|------|-------------------------------------------------|
| 63:61 | N_pgsz0  | RW   | Nucleus context's page size at the first sDTLB  |
| 60:58 | N_pgsz1  | RW   | Nucleus context's page size at the second sDTLB |
| 55:53 | N_Ipgsz0 | RW   | Nucleus context's page size at the first sITLB  |
| 52:50 | N_Ipgsz1 | RW   | Nucleus context's page size at the second sITLB |
| 29:27 | P_Ipgsz1 | RW   | Primary context's page size at the second sITLB |
| 26:24 | P_Ipgsz0 | RW   | Primary context's page size at the first sITLB  |
| 21:19 | P_pgsz1  | RW   | Primary context's page size at the second sDTLB |
| 18:16 | P_pgsz0  | RW   | Primary context's page size at the first sDTLB  |
| 12:0  | PContext | RW   | Primary context                                 |

The value written to any of PgSz fields can be read regardless of MCNTL.mpg\_sITLB/ mpg\_sDTLB setting.

**Programming Note** – Mpgsz of a context must be consistent in the two threads in a given core. Different mpgsz setting in the two threads to a context may create entries that cause multiple-hit error.

#### ASI\_SECONDARY\_CONTEXT

ASI: $58_{16}$ VA: $10_{16}$ Access Modes:Supervisor read/write

|   | _  | S_pg | sz1 | S_pg | sz0 |    |    |    | SContext |   |
|---|----|------|-----|------|-----|----|----|----|----------|---|
| - | 63 | 21   | 19  | 18   | 16  | 15 | 13 | 12 |          | 0 |

FIGURE F-3 DMMU Secondary Context Register

| TABLE F-6 | DMMU | Secondary | Context | Register |
|-----------|------|-----------|---------|----------|
|-----------|------|-----------|---------|----------|

| Bit   | Field    | Туре | Description                                        |
|-------|----------|------|----------------------------------------------------|
| 21:19 | S_pgsz1  | RW   | Secondary context's page size at the second sDTLB. |
| 18:16 | S_pgsz0  | RW   | Secondary context's page size at the first sDTLB.  |
| 12:0  | SContext | RW   | Secondary context                                  |

The value written to any of PgSz fields can be read regardless of MCNTL.mpg\_sITLB/ mpg\_sDTLB setting.

**Programming Note** – Mpgsz of a context must be consistent in the two threads in a given core. Different mpgsz setting in the two threads to a context may create entries that cause multiple-hit error.

#### ASI\_SHARED\_CONTEXT

| ASI:          | 58 <sub>16</sub>      |
|---------------|-----------------------|
| VA:           | 68 <sub>16</sub>      |
| Access Modes: | Supervisor read/write |



FIGURE F-4 IMMU and DMMU Primary Context Register

|     |       | -    |             |
|-----|-------|------|-------------|
| Bit | Field | Туре | Descriptior |

Shared Context Register

TABLE F-7

| Bit   | Field           | Туре | Description                                                                                                                                                                                                                                                                     |
|-------|-----------------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 47    | IV              | RW   | Valid for Ishared_Context. When $IV = 1$ and the effective context is not 0, the value in Ishared_Context is valid and used by the MMU for instruction fetch as well as the effective context. When $IV = 0$ or the effective context is 0, only the effective context is used. |
| 44:32 | Ishared_Context | RW   | Context identifier of Shared Context for instruction fetch.                                                                                                                                                                                                                     |
| 15    | DV              | RW   | Valid for Dshared_Context. When $DV = 1$ and the effective context is not 0, the value in Dshared_Context is valid and used by the MMU for data access as well as the effective context. When $DV = 0$ or the effective context is 0, only the effective context is used.       |
| 12:0  | Dshared_Context | RW   | Context identifier of Shared Context for data access.                                                                                                                                                                                                                           |

The ASI\_SHARED\_CONTEXT register is used to enable or disable lookup with the shared context id along with the effective context. The shared context id is used when IV or DV is set to 1 and the effective context id is not 0. When the effective context id is 0, the shared context id is not used regardless of IV or DV setting. For example, a load from alternate space with ASI\_AS\_IF\_USER\_SECONDARY at TL > 0 yields the SContext as the effective context, therefore, the lookup with shared context id is determined by the value in SContext.

The functionality of the shared context is the same as the effective context, except for pagesize assignment. SPARC64 VII has two sITLBs and two sDTLBs, each sTLB can contain TTEs of which pagesize is configurable per context id. But for the shared context, the same pagesize of the effective context is used for lookup. Consequently, when mcntl.mpg\_sI/DTLB = 0, one sTLB has a 8-KB and the other one has a 4-MB page entry, and when mcntl.mpg\_sI/DTLB = 1, p\_mpgsz\_1/2 or s\_mpgsz\_1/2, depending on the effective context value, is used for the pagesize of shared context.

**Note**  $-n_pgsz0/1$  is not used since the shared context is not valid when the effective context is 0.

**Programming Note** – The efficient use of sTLB for shared context TTE is achieved by assignment of the same  $p_pgsz0/1$  among all contexts which uses the same shared context id.

### F.10.3 Instruction/Data MMU TLB Tag Access Registers

If Shared Context is enabled on an TLB miss, exception, or protection, the context identifier of the effective context is indicated in the Context fields of TLB Tag Access Registers.

**Programming Note** – In order to store a shared context TTE, an explicit write of the context identifier for a shared context to the TLB Tag Access Register is needed prior to TLB Data In/Data Access.

#### ASI\_I/DMMU\_TAG\_ACCESS\_EXT

ASI:  $50_{16}$ (IMMU) /  $58_{16}$ (DMMU) VA:  $60_{16}$ Access Modes: Supervisor read/write

| — | pgsz1 | pgsz | :0   | -    |
|---|-------|------|------|------|
|   | 21 19 | 18   | 16 1 | 15 0 |

FIGURE F-5 I/D MMU Tag Access Extension Register

When the MMU signals a trap due to a miss, exception, or protection, hardware automatically saves the missing VA and context to the Tag Access Register (ASI\_I/ DMMU\_TAG\_ACCESS). To ease indexing of the sTLBs when the TTE data is presented (via STXA ASI\_I/DTLB\_DATA\_IN\_REG), the missing page size information of the sTLBs is captured into a new Extension Register, called ASI\_I/DMMU\_TAG\_ACCESS\_EXT.

**Note** – If SIZE of TTE to be written is different from PgSz of the ASI register, the TTE is written into fTLB rather than sTLB.

The ASI\_I/DMMU\_TAG\_ACCESS\_EXT register value on an *instruction\_access\_exception* or a *data\_access\_exception* is not valid (undefined).

The register values are not valid (undefined) when the corresponding ASI MCNCTL#mpg\_sI/DTLB value is zero.

### F.10.4 I/D TLB Data In, Data Access, and Tag Read Registers

**IMPL. DEP. #234:** The replacement algorithm of a TLB entry is implementation dependent in JPS1.

For fTLB, SPARC64 VII implements a pseudo-LRU. For sTLB, LRU is used. An entry in the fTLB may also be replaced by dropping a TTE from the sTLB.

**IMPL. DEP. #235:** The MMU TLB data access address assignment and the purpose of the address are implementation dependent in JPS1.

The MMU TLB data access address assignment and the purpose of the address on SPARC64 VII are shown in TABLE F-8.

| VA Bit | Field     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |  |  |  |  |
|--------|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| 17:16  | TLB#      | <ul> <li>TLB to be accessed: fTLB or sTLB is designated as follows.</li> <li>00: fTLB (32 entries)</li> <li>01: reserved</li> <li>10: sTLB(2048 entries of 8-Kbyte page and 4-Mbyte page)</li> <li>11: reserved</li> </ul>                                                                                                                                                                                                                                                          |  |  |  |  |
| 15     | ER        | Error insertion into mTLB: When set on a write, an entry with parity<br>error is inserted into a selected TLB location.<br>This field is ignored for a TLB entry read operation.                                                                                                                                                                                                                                                                                                    |  |  |  |  |
| 13:3   | TLB index | Index number of the TLB. Specifies an index number for the TLB reference. When fTLB is specified in TLB# field, the upper 6-bits of the specified index are ignored.<br>When sTLB is specified in TLB# field,                                                                                                                                                                                                                                                                       |  |  |  |  |
|        |           | Index 0-511 addresses way0 of 8K-byte page sTLB<br>Index 512-1023 addresses way1 of 8K-byte page sTLB<br>Index 1024-1535 addresses way1 of 4M-byte page sTLB<br>Index 1536-2047 addresses way1 of 4M-byte page sTLB<br>When the entry to be written has a lock bit set and the specified TLB is<br>the sTLB, the entry is written into the sTLB with its lock bit cleared.<br>When the entry is to be written into the fTLB, the entry is written<br>without lock bit modification. |  |  |  |  |
| Other  | Reserved  | Ignored.                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |  |  |  |  |

TABLE F-8 MMU TLB Data Access Address Assignment

#### sTLB index hash

Unlike SPARC64 VI, SPARC64 VII no longer supports index hashing in the sTLB.

**Note** – Though the hashing is not supported, pages with TTE#G = 1 is always written into fTLB on TLB Data In.

#### fTLB as a Victim Cache

In SPARC64 VII, fTLB may also work as a victim cache to mitigate the occurrence of thrashing in the sTLB. A victim cache is generally a supplement to other caches by keeping dropped entries in it. In SPARC64 VII, fTLB is one of the main TLB, a complement of sTLB, and it may also work as a victim cache, saving dropped entries from sTLB.

Because of the existence of a victim cache, an entry originally found in sTLB is eventually moved to fTLB. When a write of a TTE by TLB Data Access is made and a replacement of that entry is confirmed with subsequent TLB Data Access, an access which uses that TTE may still succeed without an exception.

**Programming Note** – Only the dropped entries from sTLB which would otherwise disappear are moved to fTLB. No entry is moved without replacement in the sTLB.

#### I/D MMU TLB Tag Read Register

On SPARC64 VII, page offset bits in VA of the Tag Read Register return an arbitrary data on read (impl. dep. #238).

#### I/D MMU TLB Tag Access Register

On an ASI store to the TLB Data Access or Data In Register, SPARC64 VII verifies the consistency between the Tag Access Register and the data to be written. If their indices are inconsistent, the TLB entry is not updated. However, SPARC64 VII does not verify the consistency if TTE.V = 0 for the TTE to be written. This enables demapping of specified TLB entries through the TLB Data Access Register. Software can use this feature to validate faulty TLB entries.

**Implementation Note** – A read on a TTE.V = 0 entry returns all 0 value.

### F.10.6 I/D TSB Base Registers

**IMPL. DEP. #236:** The width of the TSB\_Size field in the TSB Base Register is implementation dependent; the permitted range is from 2 to 6 bits. The least significant bit of TSB\_Size is always at bit 0 of the TSB Base Register. Any bits unimplemented at the most significant end of TSB\_Size read as 0, and writes to them are ignored.

On SPARC64 VII, the width of the TSB\_Size field in the TSB Base Register is 4 bits. The number of entries in the TSB ranges from 512 entries at TSB\_Size = 0 (8 Kbyte for common TSB, 16 Kbyte for split TSB), to 16 million entries at TSB\_Size = 15 (256 Mbyte for common TSB, 512 Mbyte for split TSB).

### F.10.7 I/D TSB Extension Registers

**IMPL DEP. in Commonality FIGURE F-13:** Bits 11:3 in I/D TSB Extension Register are an implementation-dependent field.

In SPARC64 VII, bits 11:0 in I/D TSB Extension Registers are assigned as follows.

- Bits 11:4 Reserved. Always read as 0, and writes to it are ignored.
- Bits 3:0 TSB\_Size field is expanded to be a 4-bit field in SPARC64 VII.

# F.10.9 I/D Synchronous Fault Status Registers (I-SFSR, D-SFSR)

IMPL DEP. in Commonality FIGURE F-15 and TABLE F-12: Bits 63:25 in I/D

Synchronous Fault Status Registers (I-SFSR, D-SFSR) are an implementation-dependent field.

The format of I/D-MMU SFSR in SPARC64 VII is shown in FIGURE F-6.

| -  | TLB | #  | reserve | d     | index |    | reserve | đ M   | <        |    | EID |    | UE |    | BERR<br>BRTO | reserve | ed 1 | mTLB | NC |
|----|-----|----|---------|-------|-------|----|---------|-------|----------|----|-----|----|----|----|--------------|---------|------|------|----|
| 63 |     | 62 | 61 6    | 60 59 |       | 49 | 48      | 47 46 | 5 45     |    |     | 32 | 31 | 30 | 29           | 28      | 27   | 26   | 25 |
|    | NF  |    |         |       | ASI   |    |         | ТМ    | reserved | l  | FT  |    |    | Е  | СТ           | PR      | W    | OW   | FV |
|    | 24  | 23 |         |       |       |    | 16      | 15    | 14       | 13 |     |    | 7  | 6  | 5 4          | 43      | 2    | 1    | 0  |



The specification of bits 24:0 in the SPARC64 VII SFSR conforms to the specification defined in Section F.10.9 in **Commonality**. Bits 63:25 in SPARC64 VII SFSR are implementation dependent. TABLE F-9 describes the I-SFSR bits, and TABLE F-9 describes the D-SFSR bits.

TABLE F-9 I-SFSR Bit Description

| Bits  | Field Name | RW  | Description                                                                                                                                                                                                                                                                                                               |
|-------|------------|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 63:62 | TLB#       | R/W | Faulty TLB# log. Recorded upon an mITLB error to identify the faulty TLB         (fITLB: 002 or SITLB: 102). The priority of error logging for multiple error         conditions (parity error and multiple-hit error) is as follows:         fTLB parity         sTLB parity         sTLB multihit         fTLB multihit |
| 59:49 | index      | R/W | Faulty TLB index log. Recorded upon an mITLB error and is the index number for<br>the faulty TLB. The priority of error logging for multiple error conditions (parity<br>error and multiple-hit error) is as follows:<br>fTLB parity high<br>sTLB parity<br>sTLB multihit<br>fTLB multihit low                            |
|       |            |     | On multiple hit error, any one of the index numbers is shown.                                                                                                                                                                                                                                                             |
| 46    | МК         | R/W | Marked UE. In SPARC64 VII, all uncorrectable errors are reported as marked, so this bit is always set whenever $ISFSR.UE = 1$ .                                                                                                                                                                                           |
|       |            |     | See Appendix P.2.4, Error Marking for Cacheable Data Error for details.                                                                                                                                                                                                                                                   |
| 45:32 | EID        | R/W | Error mark ID. Valid for a marked UE.                                                                                                                                                                                                                                                                                     |
|       |            |     | See Appendix P.2.4, <i>Error Marking for Cacheable Data Error</i> for ERROR_MARK_ID.                                                                                                                                                                                                                                      |
| 31    | UE         | R/W | Instruction error status; uncorrectable error. When UE = 1, an uncorrectable error in a fetched instruction word has been detected. Valid only for an <i>instruction_access_error</i> exception.                                                                                                                          |
| 30    | BERR       | RW  | Bus error response has been received from an instruction fetch transaction. Valid only for a <i>instruction_access_error</i> exception.                                                                                                                                                                                   |
| 29    | BRTO       | RW  | Bus time-out response has been received from an instruction fetch transaction. Valid only for a <i>instruction_access_error</i> exception.                                                                                                                                                                                |
| 27:26 | mITLB<1:0> | R/W | mITLB error status. Either a multiple-hit status (mITLB<1>) or a parity error status (mITLB<0>) has been encountered upon a mITLB lookup. Valid only for an <i>instruction_access_error</i> exception.                                                                                                                    |
| 25    | NC         | R/W | Noncacheable reference. The reference that has invoked an exception is a noncacheable reference. Valid for an <i>instruction_access_error</i> exception caused by ISFSR.UE, ISFSR.BERR, or ISFSR.BRTO only. For other causes of the trap, the value is unknown.                                                           |
| 23:16 | ASI<7:0>   | R/W | ASI. The 8-bit address space identifier applied to the reference that has invoked an exception. This field is valid for the exception in which the ISFSR.FV bit is set. A recorded ASI is $80_{16}(ASI\_PRIMARY)$ or $04_{16}(ASI\_NUCLEUS)$ depending on the trap level (when TL > 0, the ASI is ASI_NUCLEUS.).          |

#### TABLE F-9I-SFSR Bit Description

| Bits | Field Name | RW  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |
|------|------------|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| 15   | ТМ         | R/W | Translation miss. When $TM = 1$ , it signifies an occurrence of a mITLB miss upon an instruction reference.                                                                                                                                                                                                                                                                                                                                 |  |  |  |
| 13:7 | FT<6:0>    | R/W | Fault type. Saves and indicates an exact condition that caused the recorded exception. See TABLE F-10 for the field encoding.                                                                                                                                                                                                                                                                                                               |  |  |  |
|      |            |     | In the IMMU, FT is valid only for an <i>instruction_access_exception</i> . The ISFSR.FT always reads as 0 for a <i>fast_instruction_access_MMU_miss</i> and reads 01 <sub>16</sub> for an <i>instruction_access_exception</i> , since no other fault types apply.                                                                                                                                                                           |  |  |  |
| 5:4  | CT<1:0>    | R/W | Context type; Saves the context attribute for the reference that invokes anexception. For nontranslating ASI or invalid ASI, ISFSR.CT = $11_{02}$ . $00_{02}$ :Primary $01_{02}$ :Reserved $10_{02}$ :Nucleus $11_{02}$ :ReservedNote that the context attribute for Shared Context is not indicated in any case.When multiple hits involving a shared context are detected, the CT field indicates the attribute of the effective context. |  |  |  |
| 3    | PR         | R/W | Privileged. Indicates the CPU privilege status during the instruction reference that generates the exception. This field is valid when $ISFSR.FV = 1$ .                                                                                                                                                                                                                                                                                     |  |  |  |
| 1    | OW         | R/W | Overwritten. Set when $ISFSR.FV = 1$ upon the detection of a exception. This means that the fault valid bit is not yet cleared when another fault is detected.                                                                                                                                                                                                                                                                              |  |  |  |
| 0    | FV         | R/W | Fault valid. Set when the IMMU detects an exception. The bit is not set on an IMMU miss. When the Fault Valid bit is not set, the values of the remaining fields in the ISFSR are undefined, except for an IMMU miss.                                                                                                                                                                                                                       |  |  |  |

TABLE F-10 describes the field encoding for ISFSR.FT.

 TABLE F-10
 Instruction Synchronous Fault Status Register FT (Fault Type) Field

| FT<6:0>          | Error Description                                                                              |
|------------------|------------------------------------------------------------------------------------------------|
| 01 <sub>16</sub> | Privilege violation. Set when $TTE.P = 1$ and $PSTATE.PRIV = 0$ for the instruction reference. |
| 0216             | Reserved                                                                                       |
| 0416             | Reserved                                                                                       |
| 0816             | Reserved                                                                                       |
| 10 <sub>16</sub> | Reserved                                                                                       |
| 2016             | Reserved                                                                                       |
| 40 <sub>16</sub> | Reserved                                                                                       |

ISFSR is updated either on an occurrence of a *fast\_instruction\_access\_MMU\_miss*, an *instruction\_access\_exception*, or an *instruction\_access\_error* trap. TABLE F-11 shows the detailed update policy of each field, and TABLE F-12 describes the fields.

| TABLE F-11 | ISFSR | Update | Policy |
|------------|-------|--------|--------|
|------------|-------|--------|--------|

|                         | Field            | TLB#, index        | FV     | ow             | PR, CT <sup>1</sup> | FT | тм | ASI | UE, BERR,<br>BRTO,<br>mITLB, NC <sup>2</sup> |
|-------------------------|------------------|--------------------|--------|----------------|---------------------|----|----|-----|----------------------------------------------|
|                         |                  | Fresh fault        | or mis | s <sup>3</sup> | •                   |    |    |     |                                              |
| Miss                    | MMU miss         | _                  | 0      | 0              | V                   |    | 1  |     | —                                            |
| Exception               | Access exception | —                  | 1      | 0              | V                   | V  | 0  | V   | _                                            |
| Error                   | Access error     | V <sup>4</sup>     | 1      | 0              | V                   | _  | 0  | V   | V                                            |
|                         | Overwrite p      | olicy <sup>5</sup> |        |                |                     |    |    |     |                                              |
| Error on exce           | ption            | U <sup>4</sup>     | 1      | 1              | U                   | Κ  | Κ  | U   | U                                            |
| Exception on            | error            | K                  | 1      | 1              | U                   | U  | Κ  | U   | K                                            |
| Error on miss           |                  | U                  | 1      | Κ              | U                   | Κ  | 1  | U   | U                                            |
| Exception on miss       |                  | К                  | 1      | Κ              | U                   | U  | 1  | U   | K                                            |
| Miss on exception/error |                  | K                  | 1      | Κ              | K                   | Κ  | 1  | K   | K                                            |
| Miss on miss            |                  | K                  | K      | Κ              | U                   | Κ  | 1  | K   | K                                            |

1. The value of ISFSR. CT is 11 when the ASI is not a translating ASI. The value 11 is recorded in ISFSR. CT for an illegal value in the ASI  $(00_{16}-03_{16}, 12_{16}-13_{16}, 16_{16}-17_{16}, 1A_{16}-1B_{16}, 1E_{16}-23_{16}, 2D_{16}-2F_{16}, and 35_{16}-3B_{16})$ .

2. Valid only for the *instruction\_access\_error* caused by ISFSR.UE, ISFSR.BERR, or ISFSR.BRTO.

3. Types: 0 - logical 0; 1 -logical 1; V- Valid field to be updated; "-" - not a valid field

4.Updated when multiple hit or parity error on mITLB is detected.

5.Types: 0 - logical 0; 1 - logical 1; K - keep; U - Update as per fault/miss

TABLE F-12D-SFSR Bit Description (1 of 3)

| Bits  | Field Name | RW  | Description                                                                                                                                                                                                                                                                                                                                                                                |
|-------|------------|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 63:62 | TLB#       | R/W | Faulty TLB# log. Recorded upon an mDTLB error to identify the faulty TLB         (fDTLB: 002 or sDTLB: 102). The priority of error logging for multiple error         conditions (parity error and multiple-hit error) is as follows:         fTLB parity         sTLB parity         sTLB multihit         fTLB multihit                                                                  |
| 59:49 | index      | R/W | Faulty TLB index log. Recorded upon an mDTLB error. This is index number for the faulty TLB. The priority of error logging for multiple error conditions (parity error and multiple-hit error) is as follows:         fTLB parity       high         sTLB parity       sTLB multihit         fTLB-multihit       low         On multiple hit error, any one of the index numbers is shown. |

#### TABLE F-12D-SFSR Bit Description (2 of 3)

| Bits  | Field Name | RW  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |  |  |
|-------|------------|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| 46    | MK         | R/W | Marked UE. In SPARC64 VII, all uncorrectable errors are reported as marked, so this bit is always set whenever DSFSR.UE = $1$ .                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |
|       |            |     | See Appendix P.2.4, Error Marking for Cacheable Data Error for details.                                                                                                                                                                                                                                                                                                                                                                                                                                                       |  |  |  |
| 45:32 | EID        | R/W | Error-mark ID. Valid for a marked UE.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |  |  |  |
|       |            |     | See Appendix P.2.4, <i>Error Marking for Cacheable Data Error</i> for details about ERROR_MARK_ID.                                                                                                                                                                                                                                                                                                                                                                                                                            |  |  |  |
| 31    | UE         | R/W | Operand access error status. Uncorrectable error. When UE = 1, it signifies an occurrence of an uncorrectable error in an operand fetch reference. Valid only for a <i>data_access_error</i> exception.                                                                                                                                                                                                                                                                                                                       |  |  |  |
| 30    | BERR       | RW  | Bus error response has been received from an operand fetch transaction. Valid only for a <i>data_access_error</i> exception.                                                                                                                                                                                                                                                                                                                                                                                                  |  |  |  |
| 29    | BRTO       | RW  | Bus time-out response has been received from an operand fetch transaction. Valid only for a <i>data_access_error</i> exception.                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |
| 27:26 | mDTLB<1:0> | R/W | mDTLB error status. Either a multiple-hit status (mDTLB<1>) or a parity error status (mDTLB<0>) has been encountered upon a mDTLB lookup. Valid only for a <i>data_access_error</i> exception.                                                                                                                                                                                                                                                                                                                                |  |  |  |
| 25    | NC         | R/W | Noncacheable reference. The reference that invoked an exception is a non-<br>cacheable reference. This field indicates that the faulty reference is a non-<br>cacheable operand access. Valid only for an <i>data_access_error</i> exception can<br>by DSFSR.UE, DSFSR.BERR, or DSFSR.BRTO. For other causes of the trap-<br>value is unknown.                                                                                                                                                                                |  |  |  |
| 24    | NF         | R/W | Nonfaulting load. The instruction which generated the exception was a nonfaulting load instruction.                                                                                                                                                                                                                                                                                                                                                                                                                           |  |  |  |
| 23:16 | ASI<7:0>   | R/W | ASI. The 8-bit address space identifier applied to the reference that has invoked an exception. This field is valid for the exception in which the DSFSR.FV bit is set. When the reference does not specify an ASI, the reference is regarded as with an implicit ASI and a recorded ASI is as follows:<br>TL = 0, PSTATE.CLE = 0 $80_{16}$ (ASI_PRIMARY)<br>TL = 0, PSTATE.CLE = 1 $88_{16}$ (ASI_PRIMARY_LITTLE)<br>TL > 0, PSTATE.CLE = 0 $04_{16}$ (ASI_NUCLEUS)<br>TL > 0, PSTATE.CLE = 1 $0C_{16}$ (ASI_NUCLEUS_LITTLE) |  |  |  |
| 15    | ТМ         | R/W | Translation miss. When $TM = 1$ , it signifies an occurrence of a mDTLB miss upon an operand reference.                                                                                                                                                                                                                                                                                                                                                                                                                       |  |  |  |
| 13:7  | FT<6:0>    | R/W | Fault type. Saves and indicates an exact condition that caused the recorded exception. The encoding of this field is described in TABLE F-13.                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |
| 6     | E          | R/W | Side-effect page. Associated with faulting data access. The reference is mapped to the translation with an E bit set, or the ASI for the reference was either $015_{16}$ or $01D_{16}$ . Valid only for an <i>data_access_error</i> exception caused by DSFSR.UE, DSFSR.BERR, or DSFSR.BRTO. For other causes of the trap, the value is unknown.                                                                                                                                                                              |  |  |  |

| Bits | Field Name | RW  | Description                                                                                                                                                                                                                                                                                                                                |
|------|------------|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 5:4  | CT<1:0>    | R/W | Context type. Saves the context attribute for the reference that invokes an exception. For nontranslating ASI or invalid ASI, DSFSR. $CT = 11_{02}$ .                                                                                                                                                                                      |
|      |            |     | $00_{02}$ :Primary $01_{02}$ :Secondary $10_{02}$ :Nucleus $11_{02}$ :Reserved                                                                                                                                                                                                                                                             |
|      |            |     | When a <i>data_access_exception</i> trap is caused by an invalid combination of an ASI and an opcode (e.g., atomic load quad, block load/store, block commit store, partial store, or short floating-point load/store instructions), the recording of the DSFSR.CT field is based on the encoding of the ASI specified by the instruction. |
|      |            |     | Note that the context attribute for Shared Context is not indicated in any case.<br>When multiple hits involving a shared context are detected, the CT field indicates<br>the attribute of the effective context.                                                                                                                          |
| 3    | PR         | R/W | Privileged. Indicates the CPU privilege status during the operand reference that generates the exception. This field is valid when DSFSR. $FV = 1$ .                                                                                                                                                                                       |
| 2    | W          | R/W | Write. $W = 1$ if the reference is for an operand write operation (a store or atomic load/store instruction).                                                                                                                                                                                                                              |
| 1    | OW         | R/W | Overwritten. Set when $DSFSR.FV = 1$ upon detection of a exception. This means that the fault valid bit is not yet cleared when another fault is detected.                                                                                                                                                                                 |
| 0    | FV         | R/W | Fault valid. Set when the DMMU detects an exception. The bit is not set on a DMMU miss. When the FV bit is not set, the values of the remaining fields in the DSFSR and DSFAR are undefined, except for a DMMU miss.                                                                                                                       |

**TABLE F-12**D-SFSR Bit Description (3 of 3)

TABLE F-13 defines the encoding of the FT<6:0> field.

 TABLE F-13
 MMU Synchronous Fault Status Register FT (Fault Type) Field

| FT<6:0>          | Error Description                                                                                                                                                                                                                                     |
|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 01 <sub>16</sub> | Privilege violation. An attempt was made to access a privileged page (TTE.P=1)<br>under nonprivileged mode (PSTATE.PRIV = 0) or through a *_AS_IF_USER<br>ASI. This exception has priority over a <i>fast_data_access_protection</i> exception.       |
| 02 <sub>16</sub> | Nonfaulting load instruction to page marked with the E bit. This bit is zero for internal ASI accesses.                                                                                                                                               |
| 04 <sub>16</sub> | An attempt was made to access a noncacheable page or an internal ASI by an atomic instruction (CASA, CASXA, SWAP, SWAPA, LDSTUB, LDSTUBA) or an atomic quad load instruction (LDDA with ASI = $024_{16}$ , $02C_{16}$ , $034_{16}$ , or $03C_{16}$ ). |

 TABLE F-13
 MMU Synchronous Fault Status Register FT (Fault Type) Field (Continued)

| FT<6:0>          | Error Description                                                                                                                                                                                                                                                                                        |  |  |  |  |  |  |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|
| 0816             | An attempt was made to access an alternate address space with an illegal ASI value, an illegal VA, an invalid read/write attribute, or an illegally sized operand. If the quad load ASI is used with an opcode other than LDDA, this bit is set.                                                         |  |  |  |  |  |  |
|                  | <b>Note:</b> Since an illegal ASI check is done prior to a TTE unmatch check,<br>DSFSR.FT<3> = 1 causes the value of other bits of DSFSR.FT to be undetermined<br>and generates a <i>data_access_exception</i> exception (which otherwise has lower priority<br>than <i>fast_data_access_MMU_miss</i> ). |  |  |  |  |  |  |
|                  | Note, too, that a reference to an internal ASI may generate a <i>mem_address_not_aligned</i> exception.                                                                                                                                                                                                  |  |  |  |  |  |  |
| 10 <sub>16</sub> | Access other than an nonfaulting load was made to a page marked NFO. This bit is zero for internal ASI accesses.                                                                                                                                                                                         |  |  |  |  |  |  |
| 2016             | Reserved                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |  |
| 4016             | Reserved                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |  |

Multiple bits of DSFSR.FT may be set by a trap as long as the cause of the trap matches multiply in TABLE F-13.

DSFSR is updated upon various traps, including fast\_data\_access\_MMU\_miss, data\_access\_exception, fast\_data\_access\_protection, PA\_watchpoint, VA\_watchpoint, privileged\_action, mem\_address\_not\_aligned, and data\_access\_error traps. TABLE F-14 shows the detailed update policy of each field.

|                    | Field                          | TLB#,<br>index | FV     | ow                  | W, PR,<br>NF, CT <sup>1</sup> | FT | тм | ASI | UE, BERR,<br>BRTO,<br>mDTLB, NC <sup>2</sup> , E <sup>2</sup> | DSFAR |
|--------------------|--------------------------------|----------------|--------|---------------------|-------------------------------|----|----|-----|---------------------------------------------------------------|-------|
|                    |                                | Fresh f        | ault o | r miss <sup>3</sup> |                               |    |    |     |                                                               |       |
| Miss               | MMU miss                       | 0 0 V 1        |        |                     |                               |    |    |     |                                                               | V     |
| Exception          | Access exception               | _              | 1      | 0                   | V                             | V  | 0  | V   | _                                                             | V     |
|                    | Access protection              |                | 1      | 0                   | V                             |    | 0  | V   |                                                               | V     |
|                    | PA watchpoint                  | —              | 1      | 0                   | V                             | _  | 0  | V   | _                                                             | V     |
| Faults             | VA watchpoint                  |                | 1      | 0 V _ 0 V _         |                               |    | V  |     |                                                               |       |
|                    | Privileged action <sup>4</sup> |                | 1      | 0                   | V                             |    | 0  | V   |                                                               | v     |
|                    | Access misaligned              |                | 1      | 0                   | V                             |    | 0  | V   |                                                               | V     |
|                    | Access error                   | V <sup>5</sup> | 1      | 0                   | V                             |    | 0  | V   | V                                                             | V     |
|                    | •                              |                | Ov     | erwrite             | Policy <sup>6</sup>           |    |    |     |                                                               |       |
| Exception on fault |                                | К              | 1      | 1                   | U                             | U  | Κ  | U   | К                                                             | U     |
| Fault on exception |                                | $U^4$          | 1      | 1                   | U                             | K  | K  | U   | U                                                             | U     |
| Exception on r     | К                              | 1              | K      | U                   | U                             | 1  | U  | K   | U                                                             |       |
| Fault on miss      | $U^4$                          | 1              | K      | U                   | K                             | 1  | U  | U   | U                                                             |       |

TABLE F-14DSFSR Update Policy

TABLE F-14 DSFSR Update Policy

|                 | Field                   | TLB#,<br>index | FV | ow | W, PR,<br>NF, CT <sup>1</sup> | FT | тм | ASI | UE, BERR,<br>BRTO,<br>mDTLB, NC <sup>2</sup> , E <sup>2</sup> | DSFAR |
|-----------------|-------------------------|----------------|----|----|-------------------------------|----|----|-----|---------------------------------------------------------------|-------|
| Miss on fault/e | Miss on fault/exception |                |    | Κ  | K                             | K  | 1  | Κ   | K                                                             | K     |
| Miss on miss    |                         | K              | K  | K  | U                             | K  | 1  | K   | К                                                             | K     |

1. The value of DSFSR. CT is 11 when the ASI is not a translating ASI. The value 11 is recorded in DSFSR. CT for an illegal value in ASI  $(00_{16}-03_{16}, 12_{16}-13_{16}, 16_{16}-17_{16}, 1A_{16}-1B_{16}, 1E_{16}-23_{16}, 2D_{16}-2F_{16}, \text{ or } 35_{16}-3B_{16})$ .

2. Valid only for the data\_access\_error caused by DSFSR.UE, Or DSFSR.BERR, Or DSFSR.BRTO.

3. Types: 0 - logic 0; 1 - logic 1; V - Valid field to be updated; "-" - not a valid field

4.Memory reference instruction only.

5.Updated when multiple hit or parity error on mDTLB is detected.

6. Types: 0 - logic 0; 1 - logic 1; V- Valid field to be updated; "-" - not a valid field

7.Fault/exception on miss means the miss happened first, then a fault/exception was encountered before software had a chance to clear the DSFSR register.

### F.10.11 I/D MMU Demap

For Demap Page in sTLBs, the page size used to index sTLBs is derived based on the Context bits (Primary/Secondary/Nucleus). Hardware will automatically select proper PgSz bits based on the "context" field (Primary/Secondary/Nucleus) defined in ASI\_I/ DMMU\_DEMAP. These two PgSz fields are used to properly index the first sTLB and the second sTLB.

In addition, the selected PgSz based on the context bits is used to check if the demap operation is valid or not for Demap Page and Demap Context operations with sTLBs. That is, if the PgSz is different from SIZE of the corresponding TLB entry, the TLB entry will not be demapped.

**Note** – Valid context ID should be specified on Demap Page and Context operations. Demap operation with non-existing Context ID ( $01_2$  for IMMU and  $11_2$  for IMMU/DMMU) might demap unexpected sTLB entries.

Demap All operations with sTLBs are straight forward.

There is no way to remove all TLB entries for a shared context by Demap Context.

**Programming Note** – To accomplish removing all shared context entries from TLB, temporary use of the secondary context register is needed.

### F.10.12 Synchronous Fault Physical Addresses

This section describes how the IMMU and DMMU obtain a fault physical address.

#### IMMU Synchronous Fault Physical Address

The Instruction Synchronous Fault Physical Address Register is newly added to capture the physical memory address of the fault recorded in the IMMU Synchronous Fault Status Register (I-SFSR). The registers are updated on *instruction\_access\_error* exception, while the value is valid only when corresponding ISFSR.MK = 1, ISFSR.UE = 1, ISFSR.BERR = 1 or ISFSR.BRTO = 1.

The values of bits 2:0 are undefined.

ASI:  $50_{16}$ VA:  $78_{16}$ Access Modes: Supervisor read/write

| _     | Fault Address (PA<46:3>) | Undefined |   |
|-------|--------------------------|-----------|---|
| 63 47 | 46 3                     | 32 (      | ) |

FIGURE F-7 MMU Instruction Synchronous Fault Physical Address Register (I-SFPAR)

#### DMMU Synchronous Fault Physical Address

The Data Synchronous Fault Physical Address Register is newly added to capture the physical memory address of the fault recorded in the DMMU Synchronous Fault Status Register (D-SFSR). The registers are updated on *data\_access\_error* exception, while the value is valid only when corresponding DSFSR.MK = 1, DSFSR.UE = 1, DSFSR.BERR = 1 or DSFSR.BETO = 1.

The values of bits 2:0 are undefined.

ASI:  $58_{16}$ VA:  $78_{16}$ Access Modes: Supervisor read/write





### F.10.13 TSB Prefetch Registers

When a *fast\_instruction\_access\_MMU\_miss* or a *fast\_data\_access\_MMU\_miss* is signalled, the operating system software looks up a TSB with the help of hardware's automatic pointer calculation. TSB is an array of TTE located in memory, hence, it sometime exists in the cache memory. When the data address calculated by hardware misses the outermost cache, the performance of TLB miss handling degrades substantially. Generally, use of software prefetch could be a solution. However, since the TSB index is known after the exception is signalled, it must be the TLB miss handler that issues a software prefetch, which does not help to hide memory access latency.

To deal with this difficulty, SPARC64 VII employs a TSB prefetch in hardware. When an instruction fetch or a memory access misses the TLB, then the MMU calculates a possible TSB index and then issues a prefetch request. The base address of the TSB is designated by one of the TSB Prefetch Registers, chosen by context and access type. TABLE F-15 shows all TSB Prefetch Registers.

TABLE F-15 ASI and VA assignment of TSB Prefetch Registers

| IMMU                          | DMMU                          | Description  |
|-------------------------------|-------------------------------|--------------|
| $ASI = 61_{16}, VA = 00_{16}$ | $ASI = 62_{16}, VA = 00_{16}$ | ctxnon0, 1st |
| $ASI = 61_{16}, VA = 08_{16}$ | $ASI = 62_{16}, VA = 08_{16}$ | ctxnon0, 2nd |
| $ASI = 61_{16}, VA = 40_{16}$ | $ASI = 62_{16}, VA = 40_{16}$ | ctx0, 1st    |
| $ASI = 61_{16}, VA = 48_{16}$ | $ASI = 62_{16}, VA = 48_{16}$ | ctx0, 2nd    |

There are two registers each for four groups, instruction fetch in context 0 and non-0, data access in context 0 and non-0. There is no distinction for each register in a group. They work exactly the same. The format and bit description of the TSB Prefetch Register is similar to the TSB Base Register. FIGURE F-9 shows the format of the TSB Prefetch Register.

| TSB_base<63:13> (physical) |   | _  | page_sz | V | _ |   | TSB_size |   |
|----------------------------|---|----|---------|---|---|---|----------|---|
| 63                         | 3 | 12 | 11 9    | 8 | 7 | 6 | 5        | 0 |

FIGURE F-9 TSB Prefetch Register

TABLE F-16 describes the bit description of the TSB Prefetch Register. Note that unused bits are always read as 0 and write is ignored.

| Bit   | Field Name | RW  | Description                                                                                                                                                                                                                                                                                                                                                                   |  |  |
|-------|------------|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| 63:13 | TSB_base   | R/W | Base address of TSB array in physical address.                                                                                                                                                                                                                                                                                                                                |  |  |
| 11:9  | page_sz    | R/W | Pagesize of the TSB. The encoding of pagesize is same as TTE.                                                                                                                                                                                                                                                                                                                 |  |  |
|       |            |     | $000_{02}$ 8KB $001_{02}$ 64KB $010_{02}$ 512KB $011_{02}$ 4MB $100_{02}$ 32MB $101_{02}$ 256MB                                                                                                                                                                                                                                                                               |  |  |
| 8     | V          | R/W | Valid. When $V = 1$ , TSB prefetch is performed on TLB miss, and when $V = 0$ , prefetch is not done.                                                                                                                                                                                                                                                                         |  |  |
| 5:0   | TSB_size   | R/W | This field is subjected <b>IMPL.DEP.#236</b> , same as the corresponding field in TSB Base Register. In SPARC64 VII, the width of TSB_size is 4 bits. Bits 5:4 are read as 0 and write is ignored. See Section F.10.6, <i>I/D TSB Base Registers</i> , on page 118 for more detail.<br>The size of the TSB. The number of entries in the TSB is 512 x 2 <sup>TSB_size</sup> . |  |  |

 TABLE F-16
 TSB Prefetch Register Bit Description

The major difference between the TSB Base Register and the TSB Prefetch Register is that the base address is designated by a physical address in the TSB Prefetch Register. The result of using a nonexistent physical address is undefined.

The pagesize of TTEs in a TSB is configurable by the TSB Prefetch Register, so system software can provide TSBs of any of two pagesizes for each group at a given time. Since there are two relevant registers for each group, system software can designate TSBs for two important pagesizes, which could be stored in two sTLBs by the system software.

The prefetch begins when a TLB lookup fails, but not when an exception is signalled. Due to the nature of the TSB Prefetch Register, the earlier the start of a prefetch the better. SPARC64 VII prefetches a TSB for a TLB miss even on a speculative path.

Since the TSB Prefetch Register does not support index hashing or a shared/split, the TSB pointer calculation is made as follows:

```
TSB_POINTER = TSB_Prefetch_Base[63:13+N] 
VA[21+N+3*page_sz:13+3*page_sz] 0000
```

# F.11 MMU Bypass

On SPARC64 VII, two additional ASIs are supported as DMMU bypass accesses:  $ASI_ATOMIC_QUAD_LDD_PHYS$  (ASI  $34_{16}$ ) and  $ASI_ATOMIC_QUAD_LDD_PHYS_LITTLE$  (ASI  $3C_{16}$ )

TABLE F-17 shows the bypass attribute bits on SPARC64 VII. The first four rows conform to the bypass attribute bits defined in TABLE F-15 of **Commonality**.

| ASI                                 | ASI              | Attribute Bits |    |    |   |   |   |     |          |
|-------------------------------------|------------------|----------------|----|----|---|---|---|-----|----------|
| NAME                                | VALUE            | СР             | IE | cv | Е | Р | w | NFO | Size     |
| ASI_PHYS_USE_EC                     | 1416             | 1              | 0  | 0  | 0 | 0 | 1 | 0   | 8 Kbytes |
| ASI_PHYS_USE_EC_LITTLE              | $1C_{16}$        |                |    |    |   |   |   |     |          |
| ASI_PHYS_BYPASS_EC_WITH_EBIT        | 15 <sub>16</sub> | 0              | 0  | 0  | 1 | 0 | 1 | 0   | 8 Kbytes |
| ASI_PHYS_BYPASS_EC_WITH_EBIT_LITTLE | $1D_{16}$        |                |    |    |   |   |   |     |          |
| ASI_ATOMIC_QUAD_LDD_PHYS            | 34 <sub>16</sub> | 1              | 0  | 0  | 0 | 0 | 0 | 0   | 8 Kbytes |
| ASI_ATOMIC_QUAD_LDD_PHYS_LITTLE     | 3C <sub>16</sub> |                |    |    |   |   |   |     |          |

 TABLE F-17
 Bypass Attribute Bits on SPARC64 VII

# F.12 Translation Lookaside Buffer Hardware

Unlike other software visible resources, thread0 and thread1 within the same core logically share fTLBs and sTLBs. That is, a TLB entry written by one thread can be referenced by the other thread.

**Note** – Threads belonging to different physical cores do not share TLBs.

If two identical TTEs are written, no multiple-hit error is detected during a virtual address translation. Instead, one of the two TTEs is used for the translation. In other words, it is allowed for both the threads to write identical contents into a TLB independently. Hardware guarantees no multi-hit error will occur in this case.

However, it is not allowed to write two TTEs with the same VA and CONTEXT but different page sizes into a TLB. This might result in a multi-hit error.

### F.12.2 TLB Replacement Policy

#### Automatic TLB Replacement Rule

On an automatic replacement write to the TLB, the MMU picks the entry to write according to the following rules:

- 1. If the following conditions are satisfied-
  - the new entry is unlocked or TTE.G = 0 ,
  - and page size is either 8KB or 4MB when ASI\_MCNTL.mpg\_sITLB/ mpg\_sDTLB = 0, or page size matches either pgsz0/1 field of the relevant CONTEXT register when ASI MCNTL.mpg\_sITLB/mpg\_sDTLB = 1,
  - and ASI\_MCNTRL.fw\_fITLB = 0 for IMMU automatic replacement,
  - and ASI\_MCNTRL.fw\_fDTLB = 0 for DMMU automatic replacement,

—then the replacement is directed to the sTLB (2-way TLB). Otherwise, the replacement occurs in the fully associative TLB (fTLB).

- 2. If replacement is directed to the 2-way TLB, then the replacement set index is generated from the TLB Tag Access Register with bits determined by the page size.
- 3. If a replacement is directed to the fully associative TLB (fTLB), then the following alternatives are evaluated:
  - a. The first invalid entry is replaced (measuring from entry 0). If there is no invalid entry, then
  - b. the first unused, unlocked (LRU, but clear) entry will be replaced (measuring from entry 0). If there is no unused unlocked entry, then
  - c. all used bits are reset, and the process is repeated from Step 3b.

If fTLB is the target of the automatic replacement and all entries in the fTLB have their lock bit set, the automatic replacement operation is ignored and the entries in the target fTLB remain unchanged.

#### Restriction of sTLB Entry Direct Replacement

In SPARC64 VII, no restriction check is applied to the stxa address and the contents of I/D TLB Data Access Register.

# Assembly Language Syntax

Please refer to Appendix G of Commonality.

F.APPENDIX  ${f H}$ 

# Software Considerations

Please refer to Appendix H of Commonality.

F.APPENDIX **I** 

# Extending the SPARC V9 Architecture

Please refer to Appendix I of Commonality.

F.APPENDIX **J** 

## Changes from SPARC V8 to SPARC V9

Please refer to Appendix J of Commonality.

## Programming with the Memory Models

Please refer to Appendix K of Commonality.

## **Address Space Identifiers**

Every load or store address in a SPARC V9 processor has an 8-bit Address Space Identifier (ASI) appended to the VA. The VA plus the ASI fully specifies the address. For instruction loads and for data loads or stores that do not use the load or store alternate instructions, the ASI is an implicit ASI generated by the hardware. If a load alternate or store alternate instruction is used, the value of the ASI can be specified in the <code>%asi</code> register or as an immediate value in the instruction. In practice, ASIs are not only used to differentiate address spaces but are also used for other functions, such as referencing registers in the MMU unit.

This chapter summarizes SPARC64 VII enhanced ASIs. Please refer to **Commonality** for Sections L.1 and L.2.

## L.3 SPARC64 VII ASI Assignments

For SPARC64 VII, all accesses made with ASI values in the range  $00_{16}$ -7F<sub>16</sub> when PSTATE . PRIV = 0 cause a *privileged\_action* exception.

**Warning** – The software should follow the ASI assignments and VA assignments in TABLE L-1. Some illegal ASI or VA accesses will cause the machine to enter unknown states.

| Value                              | ASI Name (Suggested Macro Syntax) | Туре | VA <sub>16</sub> | Description | Page |
|------------------------------------|-----------------------------------|------|------------------|-------------|------|
| 0016-3316                          | (JPS1)                            |      |                  |             |      |
| 34 <sub>16</sub>                   | ASI_ATOMIC_QUAD_LDD_PHYS          | R    | —                |             | 64   |
| 35 <sub>16</sub> -3B <sub>16</sub> | (JPS1)                            |      |                  |             |      |
| 3C <sub>16</sub>                   | ASI_ATOMIC_QUAD_LDD_PHYS_LITTLE   | R    | _                |             | 64   |
| 3D <sub>16</sub> -44 <sub>16</sub> | (JPS1)                            |      |                  |             |      |

TABLE L-1SPARC64 VII ASI Assignments (1 of 3)

#### TABLE L-1SPARC64 VII ASI Assignments (2 of 3)

| Value                             | ASI Name (Suggested Macro Syntax)      | Туре | VA <sub>16</sub>                     | Description         | Page |
|-----------------------------------|----------------------------------------|------|--------------------------------------|---------------------|------|
| 45 <sub>16</sub>                  | ASI_DCU_CONTROL_REG (ASI_DCUCR)        | RW   | 0016                                 |                     | 20   |
| 5 <sub>16</sub>                   | ASI_MEMORY_CONTROL_REG (ASI_MCNTL)     | RW   | 0816                                 |                     | 109  |
| 5 <sub>16</sub> -49 <sub>16</sub> | (JPS1)                                 |      |                                      |                     |      |
| A <sub>16</sub>                   | ASI_JB_CONFIG_REGISTER                 | R    | 0016                                 |                     | 239  |
| B <sub>16</sub>                   | (JPS1)                                 |      |                                      |                     |      |
| 2 <sub>16</sub>                   | ASI_ASYNC_FAULT_STATUS                 | RW   | 0016                                 |                     | 118  |
| C <sub>16</sub>                   | ASI_URGENT_ERROR_STATUS<br>(ASI_UGESR) | R    | 08 <sub>16</sub>                     |                     | 189  |
| 2 <sub>16</sub>                   | ASI_ERROR_CONTROL                      | RW   | 10 <sub>16</sub>                     |                     | 185  |
| C <sub>16</sub>                   | ASI_STCHG_ERROR_INFO                   | RW   | 1816                                 |                     | 187  |
| D <sub>16</sub>                   | ASI_ASYNC_FAULT_ADDR_D1                | R    | 0016                                 | Always read as zero | 199  |
| D <sub>16</sub>                   | ASI_ASYNC_FAULT_ADDR_U2                | R    | 0816                                 | Always read as zero | 199  |
| E <sub>16</sub>                   | (JPS1)                                 |      |                                      |                     |      |
| F16                               | ASI_SCRATCH_REG0                       | RW   | 0016                                 |                     | 140  |
| -16                               | ASI_SCRATCH_REG1                       | RW   | 0816                                 |                     | 140  |
| F16                               | ASI_SCRATCH_REG2                       | RW   | 10 <sub>16</sub>                     |                     | 140  |
| F16                               | ASI_SCRATCH_REG3                       | RW   | 1816                                 |                     | 140  |
| F16                               | ASI_SCRATCH_REG4                       | RW   | $20_{16}$                            |                     | 140  |
| 16                                | ASI_SCRATCH_REG5                       | RW   | 28 <sub>16</sub>                     |                     | 140  |
| -16                               | ASI_SCRATCH_REG6                       | RW   | 3016                                 |                     | 140  |
| -16                               | ASI_SCRATCH_REG7                       | RW   | 3816                                 |                     | 140  |
| 16                                | (JPS1)                                 |      | $00_{16}$ -58 <sub>16</sub>          |                     |      |
| ) <sub>16</sub>                   | ASI_IMMU_TAG_ACCESS_EXT                | RW   | 60 <sub>16</sub>                     |                     | 115  |
| ) <sub>16</sub>                   | ASI_IMMU_SFPAR                         | RW   | 78 <sub>16</sub>                     |                     | 126  |
| 16-57 <sub>16</sub>               | (JPS1)                                 |      |                                      |                     |      |
| B <sub>16</sub>                   | ASI_DMMU_TAG_ACCESS_EXT                | RW   | 60 <sub>16</sub>                     |                     | 115  |
| B <sub>16</sub>                   | ASI_SHARED_CONTEXT_REG                 | RW   | 68 <sub>16</sub>                     |                     | 114  |
| B <sub>16</sub>                   | ASI_DMMU_SFPAR                         | RW   | 78 <sub>16</sub>                     |                     | 126  |
| 9 <sub>16</sub> -60 <sub>16</sub> | (JPS1)                                 |      |                                      |                     |      |
| l <sub>16</sub>                   | ASI_ITSB_PREFETCH                      | RW   | $00_{16}, 08_{16}, 40_{16}, 48_{16}$ |                     | 127  |
| 2 <sub>16</sub>                   | ASI_DTSB_PREFETCH                      | RW   | $00_{16}, 08_{16}, 40_{16}, 48_{16}$ |                     | 127  |

| Value                              | ASI Name (Suggested Macro Syntax) | Туре | VA <sub>16</sub>                   | Description | Page |
|------------------------------------|-----------------------------------|------|------------------------------------|-------------|------|
| 63 <sub>16</sub> –66 <sub>16</sub> | (JPS1)                            |      |                                    |             |      |
| 67 <sub>16</sub>                   | ASI_FLUSH_L1I                     | W    | —                                  |             | 151  |
| 68 <sub>16</sub> –69 <sub>16</sub> | (JPS1)                            |      |                                    |             |      |
| 6A <sub>16</sub>                   | ASI_L2_CTRL                       | RW   | —                                  |             | 152  |
| 6D <sub>16</sub>                   | ASI_BARRIER_INIT                  | RW   | $00_{16}$ -3E $0_{16}$             |             | 143  |
| 6E <sub>16</sub>                   | ASI_ERROR_IDENT (ASI_EIDR)        | RW   | $00_{16}$                          |             | 185  |
| 6F <sub>16</sub>                   | ASI_BARRIER_ASSIGN                | RW   | 00 <sub>16</sub> -50 <sub>16</sub> |             | 144  |
| 70 <sub>16</sub> -73 <sub>16</sub> | (JPS1)                            |      |                                    |             |      |
| 74 <sub>16</sub>                   | ASI_CACHE_INV                     | W    | _                                  |             | 152  |
| 75 <sub>16</sub> -FD <sub>16</sub> | (JPS1)                            |      |                                    |             |      |
| FE <sub>16</sub>                   | ASI_LBSY, ASI_BST                 | RW   | _                                  |             | 145  |
| FF <sub>16</sub>                   | (JPS1)                            |      |                                    |             |      |

TABLE L-1SPARC64 VII ASI Assignments (3 of 3)

### L.3.2 Special Memory Access ASIs

Please refer to Section L.3.3 in Commonality.

In addition to the ASIs described in **Commonality**, SPARC64 VII supports the ASIs described below.

#### ASI 53<sub>16</sub> (ASI\_SERIAL\_ID)

SPARC64 VII provides an identification code for each processor. In other words, this ID is unique for each processor chip. In conjunction with the Version Register (please refer to *Version (VER) Register* on page 18), software can attain completely unique chip identification code.

This register is defined as read-only. A write to this register causes data\_access\_exception.

Chip\_ID<63:0>

63

0

#### ASI 4F<sub>16</sub> (ASI\_SCRATCH\_REGx)

SPARC64 VII provides eight of 64-bit registers that can be used temporary storage for supervisor software.

| Data<63:0> |     |                |                                                          |   |  |  |
|------------|-----|----------------|----------------------------------------------------------|---|--|--|
| 63         |     |                |                                                          | 0 |  |  |
|            | [1] | Register Name: | ASI SCRATCH REG $x$ ( $x = 0-7$ )                        |   |  |  |
|            | [2] | ASI:           | 4F <sub>16</sub>                                         |   |  |  |
|            | [3] | VA:            | VA < 5:3 > = register number                             |   |  |  |
|            | [4] | RW:            | The other VA bits must be zero.<br>Supervisor read/write |   |  |  |

#### Block Load and Store ASIs

ASIs  $E0_{16}$  and  $E1_{16}$  exist only for use with STDFA instructions as Block Store with Commit operations (see *Block Load and Store Instructions (VIS I)* on page 51). Neither ASI  $E0_{16}$  nor ASI  $E1_{16}$  should be used with LDDFA; however, if either is used, the LDDFA behaves as follows:

- 1. No exception is generated based on the destination register rd (impl. dep. #255).
- 2. For LDDFA with ASI  $E0_{16}$  or  $E1_1$  and a memory address aligned on a  $2^n$ -byte boundary, a SPARC64 VII processor behaves as follows (impl. dep. #256):

 $n \ge 3$  ( $\ge$  8-byte alignment): no exception related to memory address alignment is generated, but a *data\_access\_exception* is generated (see case 3, below). n = 2 (4-byte alignment): *LDDF\_mem\_address\_not\_aligned* exception is generated.

 $n \le 1$  ( $\le 2$ -byte alignment): mem\_address\_not\_aligned exception is generated.

3. If the memory address is correctly aligned, a *data\_access\_exception* with an DSFSR.FT = "invalid ASI" is generated.

#### Partial Store ASIs

ASIs  $C0_{16}$ -C5<sub>16</sub> and  $C8_{16}$ -CD<sub>16</sub> exist for use with the STDFA instruction for Partial Store operations (see *Partial Store (VIS I)* on page 68). None of these ASIs should be used with LDDFA; however, if one of them is used, the LDDFA behaves as follows on a SPARC64 VII processor (impl. dep. #257):

1. For LDDFA with C0<sub>16</sub>-C5<sub>16</sub> or C8<sub>16</sub>-CD<sub>16</sub> and a memory address aligned on a 2<sup>*n*</sup>-byte boundary, a SPARC64 VII processor behaves as follows:

 $n \ge 3$  ( $\ge 8$ -byte alignment): no exception related to memory address alignment is generated.

n = 2 (4-byte alignment): *LDDF\_mem\_address\_not\_aligned* exception is generated.

- $n \le 1$  ( $\le 2$ -byte alignment): *mem\_address\_not\_aligned* exception is generated.
- 2. If the memory address is correctly aligned, SPARC64 VII generates a *data\_access\_exception* with DSFSR.FT = "invalid ASI."

### L.3.3 Hardware Barrier

SPARC64 VII provides a hardware barrier mechanism which facilitates high speed synchronization among threads in a CPU Chip. The barrier resources are located inside of the CPU Chip and are shared with all executing threads. The BPU (Barrier Processing Unit) is the main barrier resource. It consists of a BST (Barrier STatus) and some BBs (Barrier Blades). FIGURE L-1 illustrates the barrier resources.



FIGURE L-1 The Barrier Resources of SPARC64 VII

SPARC64 VII has two BPUs in a CPU chip. These two BPUs are functionally equivalent. Each BPU contains a twenty-four bit BST and twelve Barrier Blades. A Barrier Blade defines a logical barrier component shared among threads for synchronization. Each Barrier Blade has a BST\_mask to select bits in BST, and a LBSY (Last Barrier SYnchronization status) which remembers the previous synchronization status of the Barrier Blade.

The barrier synchronization is established when all BST bits selected by the BST\_mask are set to the same value, either 1 or 0. When all bits become the same value, then the value is copied into LBSY. Update of LBSY is done atomically so that a read of LBSY before modifying a BST always returns the old value. Software threads that reach the barrier point first modify a BST bit, then wait for an update of LBSY. This is usually done by a spin loop with LBSY polling, which may negatively impact the other thread in a core. In SPARC64 VII, an update of LBSY causes all threads which use that LBSY to wake up, so the use of a sleep instruction in the spin loop achieves both high-speed synchronization and efficient use of CPU resources by the other core's thread.

Since LBSY keeps the last synchronization status of the barrier, threads can easily determine the value to be used in the next synchronization by negating the current LBSY. When a Barrier Blade is used repeatedly in one piece of software, such as in the middle of a loop, threads set their BST bit to 1 once, then set it to 0 in the next iteration.

The user software may not operate on these resources directly. User software accesses them through the window ASI. A hardware thread has six window ASIs. The window ASI is a mechanism to ease the barrier handling for user threads, and isolate the resources from other threads in order to minimize the possibility of destroying current barrier status.

The memory ordering between barrier resources or barrier resources and real memory conforms to TSO as defined in Section 8 of **Commonality**. All kinds of memory accesses except a store followed by a load are performed in that order. A member with #loadstore is needed when a store through a window ASI and a subsequent load are to be performed in this order.

**Note** – Hardware barrier resources in SPARC64 VII does not provide synchronization across CPU Chips.

# Initialization and State Acquisition of Barrier Resources (ASI\_BARRIER\_INIT)



ASI\_BARRIER\_INIT initialize and get the current status of Barrier Blade determined by BPU\_num and BB\_num in VA. Unused bits of VA are ignored. TABLE L-2 describes the data bits of the ASI.

#### TABLE L-2 ASI\_BARRIER\_INIT Bit Description

| Bit   | Field     | Туре | Description                                   |
|-------|-----------|------|-----------------------------------------------|
| 48    | LBSY      | RW   | The BST value of last synchronization.        |
| 47:24 | BST_mask  | RW   | Mask bit of the BST.                          |
| 23:0  | BST_value | RW   | BST value of the BPU to which the BB belongs. |

Unused bits are read as undefined and a write is ignored.

- On read, the value of LBSY and BST\_mask of the Barrier Blade designated by BPU\_num, BB\_num in VA and BST value of the BPU to which the BB belongs are returned. An arbitrary number is returned when BB\_num > 11<sub>10</sub> is designated.
- On write, the value of LBSY and BST\_mask of the Barrier Blade designated by BPU\_num, BB\_num in VA and BST value of the BPU to which the BB belongs are updated. Only the bit in the BST corresponding to the specified bst\_mask is updated. The following formula describes the write process:

BST = (BST & ~BST\_mask) | (BST\_mask & BST\_value)

A write with BB\_num >  $11_{10}$  is ignored and no exception is signalled.

After a write is completed, the hardware checks whether the Barrier Blade is synchronized or not, then updates the LBSY accordingly. For example, a write with all bits in  $BST_mask$  and  $BST_value$  to 1 and LBSY at 0 causes an immediate update of LBSY to 1. LBSY value after a write with  $BST_mask = 0$  are undefined.

A subsequent read of ASI\_BARRIER\_INIT after a write with bst\_mask = 0 may return an arbitrary LBSY value, but not a written value.

**Programming Note** – Hardware does not track whether a Barrier Blade or BST is designated as used. Software takes full responsibility for not initializing an in use BB.

#### Assignment of Barrier Resources (ASI\_BARRIER\_ASSIGN)

| ASI:          | 6F <sub>16</sub>                                       |
|---------------|--------------------------------------------------------|
| VA:           | $00_{16}, 10_{16}, 20_{16}, 30_{16}, 40_{16}, 50_{16}$ |
| Access Modes: | Supervisor read/write                                  |

DATA:

| Valid | _  |    | BPU_num |   | BB_num |   | BST_bit |   |
|-------|----|----|---------|---|--------|---|---------|---|
| 63    | 62 | 10 | 9       | 8 | 5      | 4 |         | 0 |

ASI\_BARRIER\_ASSIGN sets and gets the mapping of barrier resources to a window ASI through which user programs can access it. There are six window ASIs in SPARC64 VII; they are distinguished by VA. TABLE L-3 describes the data bits of the ASI.

| Bit | Field   | Туре | Description                                                                                                                                                                     |
|-----|---------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 63  | Valid   | RW   | Valid bit. On read, the validity of a window ASI is returned. On write, valid = 1 requests hardware to make a new assignment, while valid = 0 releases the existing assignment. |
| 9   | BPU_num | RW   | Designation of BPU.                                                                                                                                                             |
| 8:5 | BB_num  | RW   | Designation of a BB in the BPU.                                                                                                                                                 |
| 4:0 | BST_bit | RW   | Designation of a bit in the BST.                                                                                                                                                |

TABLE L-3 ASI\_BARRIER\_ASSIGN Bit Description

Unused bits are read as undefined and a write is ignored.

- On read, the assignment of a window ASI is returned. When the window ASI designated by VA is assigned to specific barrier resources, valid is set to 1 and assignment is shown in BPU\_num, BB\_num, and BST\_bit. When the window ASI designated by VA is not assigned, valid is set to 0 and other fields are meaningless.
- On write,
  - When valid = 1, a new assignment is made to the window ASI. After completion of this write, user software can write designated bit in the BST by a write to ASI\_BST, and the LBSY value is obtained by a read to ASI\_LBSY. Note that a write operation does not alter the corresponding bit of BST\_mask in Barrier Blade.
  - When valid = 0, the existing assignment is released. After completion of this write, a write to ASI\_BST is ignored and an undefined value is returned by a read to ASI LESY.

If a nonexistent barrier resource is designated, such as  $BST_bit > 23_{10}$  or  $BB_num > 11_{10}$ , a write is ignored and no exception is signalled.

Hardware does not detect any discrepancy between initialization and assignment of barrier resources. This includes things such as initialization of Barrier Blades currently being used, assignment of a BST bit of which the corresponding bit in BST\_mask is zero, or two or more Barrier Blade sharing a specific BST bit. System Software takes responsibility for avoiding these discrepancies.

**Programming Note** – System software should only assign a Barrier Blade after it has been initialized. Assignment of a non-initialized Barrier Blade may cause unexpected results.

#### Window ASI for Barrier Resources (ASI\_LBSY/BST)

| ASI:          | EF <sub>16</sub>                                       |
|---------------|--------------------------------------------------------|
| VA:           | $00_{16}, 10_{16}, 20_{16}, 30_{16}, 40_{16}, 50_{16}$ |
| Access Modes: | Read/Write                                             |



ASI\_LBSY/BST is a window ASI through which user programs can access barrier resources. There are six window ASIs in SPARC64 VII; they are distinguished by VA. TABLE L-4 describes the data bits of the ASI.

#### TABLE L-4 ASI\_LBSY/BST Bit Description

| Bit | Field | Туре | Description                                                                                                                                                          |
|-----|-------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0   | Value | RW   | On read, LBSY of the Barrier Blade which is assigned<br>to the window is returned. On write, the value of the<br>BST bit which is assigned to the window is updated. |

Unused bits are read as undefined and a write is ignored.

A read to an unassigned window ASI returns an unknown value and a write to an unassigned window is ignored without signalling an exception.

#### Sample Code of Barrier Synchronization

```
/*

* %r1: VA of a window ASI

* %r2:, %r3: work

*/
```

| ldxa                                            | [%r1]ASI_LBSY, %r2                                         | ! read current LBSY                                                                                         |
|-------------------------------------------------|------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| not                                             | %r2                                                        | ! inverse LBSY                                                                                              |
| and                                             | %r2, 1, %r2                                                | ! mask out reserved bits                                                                                    |
| stxa                                            | %r2, [%r1]ASI_BST                                          | ! update BST                                                                                                |
| membar                                          | #storeload                                                 | ! to make sure stxa is complete                                                                             |
| loop:<br>ldxa<br>and<br>subcc<br>bne,a<br>sleep | [%r1]ASI_LBSY, %r3<br>%r3, 1, %r3<br>%r3, %r2, %g0<br>loop | ! read LBSY<br>! mask out reserved bits<br>! check if status changed<br>! if not changed, sleep for a while |

## **Cache Organization**

This appendix describes SPARC64 VII cache organization in the following sections:

- *Cache Types* on page 147
- *Cache Coherency Protocols* on page 150
- Cache Control/Status Instructions on page 151

## M.1 Cache Types

SPARC64 VII has two levels of on-chip caches, with these characteristics:

- Level-1 cache is split for instruction and data; level-2 cache is unified.
- Level-1 caches are virtually indexed, physically tagged (VIPT), and level-2 caches are physically indexed, physically tagged (PIPT).
- Level-1 caches are 64 bytes in line size, and level-2 cache are 256 bytes in line size (4 64byte sub-line).
- All lines in the level-1 caches are included in the level-2 cache.
- Between level-1 caches, or level-1 and level-2 caches, coherency is maintained by hardware. In other words,
  - eviction of a cache line from a level-2 cache causes flush-and-invalidation of all level-1 caches, and
  - self-modification of an instruction stream modifies a level-1 data cache with invalidation of a level-1 instruction cache.
- Level-1 caches are shared by the two threads in the core, and Level-2 is shared by all the threads in the processor module.

### M.1.1 Level-1 Instruction Cache (L1I Cache)

TABLE M-1 shows the characteristics of a level-1 instruction cache.

| TABLE M-1 L1I Cache Character | isti | cs |
|-------------------------------|------|----|
|-------------------------------|------|----|

| Feature         | Value                                       |
|-----------------|---------------------------------------------|
| Size            | 64 Kbytes                                   |
| Associativity   | 2-way                                       |
| Line Size       | 64-byte                                     |
| Indexing        | Virtually indexed, physically tagged (VIPT) |
| Tag Protection  | Parity and duplicate                        |
| Data Protection | Parity                                      |

Although an L1I cache is VIPT, TTE.CV is ineffective since SPARC64 VII has unaliasing features in hardware.

Instruction fetches bypass the L1I cache when they are noncacheable accesses. Noncacheable accesses occur under one of three conditions:

- PSTATE.RED = 1
- DCUCR.IM = 0
- TTE.CP = 0

When MCNTL.NC\_CACHE = 1, SPARC64 VII treats all instructions as cacheable, regardless of the conditions listed above. See *ASI\_MCNTL* (*Memory Control Register*) on page 109 for details.

**Programming Note** – This feature is intended to be used by the OBP to facilitate diagnostics procedures. When the OBP uses this feature, it must clear MCNTL.NC\_CACHE and invalidate all L1I data by ASI\_FLUSH L1I before it exits.

### M.1.2 Level-1 Data Cache (L1D Cache)

The level-1 data cache is a writeback cache. Its characteristics are shown in TABLE M-2.

TABLE M-2 L1D Cache Characteristics

| Feature         | Value                                       |
|-----------------|---------------------------------------------|
| Size            | 64 Kbytes                                   |
| Associativity   | 2-way                                       |
| Line Size       | 64-byte                                     |
| Indexing        | Virtually indexed, physically tagged (VIPT) |
| Tag Protection  | Parity and duplicate                        |
| Data Protection | ECC                                         |

Although L1D cache is VIPT, TTE.CV is ineffective since SPARC64 VII has unaliasing features in hardware.

Data accesses bypass the L1D cache when they are noncacheable accesses. Noncacheable accesses occur under one of three conditions:

- The ASI used for the access is either ASI\_PHYS\_BYPASS\_EC\_WITH\_E\_BIT (15<sub>16</sub>) or ASI\_PHYS\_BYPASS\_EC\_WITH\_E\_BIT\_LITTLE (1D<sub>16</sub>).
- DCUCR.DM = 0
- TTE.CP = 0

Unlike the L1I cache, the L1D cache does not use MCNTL.NC\_CACHE.

### M.1.3 Level-2 Unified Cache (L2 Cache)

The level-2 unified cache is a writeback cache. Its characteristics are shown in TABLE M-3.

| Feature         | Value                                        |
|-----------------|----------------------------------------------|
| Size            | 6 Mbyte (max)                                |
| Associativity   | 12-way (max)                                 |
| Line Size       | 256-byte consists of 4 64-byte sublines      |
| Indexing        | Physically indexed, physically tagged (PIPT) |
| Tag Protection  | ECC                                          |
| Data Protection | ECC                                          |

 TABLE M-3
 L2 Cache Characteristics

The L2 cache is bypassed when the access is noncacheable. MCNTL.NC\_CACHE is not used in the L2 cache.

## M.2 Cache Coherency Protocols

The CPU uses the enhanced MOESI cache-coherence protocol; these letters are acronyms for cache line states as follows:

- MExclusive modifiedOShared modified (owned)EExclusive cleanSShared clean
- I Invalid

A subset of the MOESI protocol is used in the on-chip caches as well as the D-Tags in the system controller. TABLE M-4 shows the relationships between the protocols.

| L2-Cache               | L1D-Cache                | SAT (store ownership) | L1I-Cache                |
|------------------------|--------------------------|-----------------------|--------------------------|
| Invalid (I)            | Invalid (I)              | Invalid (I)           | Invalid (I)              |
| Shared Clean (S)       |                          |                       |                          |
| Shared Modified (O)    | Invalid (I) or Clean (C) | Involid (I)           | Involid (I) or Valid (V) |
| Exclusive Clean (E)    |                          | Invalid (I)           | Invalid (I) or Valid (V) |
|                        |                          |                       |                          |
| Exclusive Modified (M) | Exclusive Modified (M)   | Valid (V)             | Invalid (I)              |

TABLE M-4 Relationships Between Cache Coherency Protocols

TABLE M-5 shows the encoding of the MOESI states in the L2 Cache.

TABLE M-5 L2 Cache MOESI States

| Bit 2 (Valid) | Bit 1 (Exclusive) | Bit 0 (Modified) | State                  |
|---------------|-------------------|------------------|------------------------|
| 0             | —                 | —                | Invalid (I)            |
| 1             | 0                 | 0                | Shared clean (S)       |
| 1             | 1                 | 0                | Exclusive clean (E)    |
| 1             | 0                 | 1                | Shared modified (O)    |
| 1             | 1                 | 1                | Exclusive modified (M) |

## M.3 Cache Control/Status Instructions

Several ASI instructions are defined to manipulate the caches. The following conventions are common to all of the load and store alternate instructions defined in this section:

- 1. The opcode of the instructions should be ldda, ldxa, lddfa, stda, stxa, or stdfa. Otherwise, a *data\_access\_exception* exception with D-SFSR.FT =  $08_{16}$  (Invalid ASI) is generated.
- 2. No operand address translation is performed for these instructions.
- 3. VA<2:0> of all of the operand addresses should be 0. Otherwise, a *mem\_address\_not\_aligned* exception is generated.
- 4. The don't-care bits (designated "—" in the format) in the VA of the load or store alternate can be of any value. It is recommended that software use zero for these bits in the operand address of the instruction.
- 5. The don't-care bits (designated "—" in the format) in DATA are read as zero and ignored on write.
- 6. The instruction operations are not affected by PSTATE.CLE. They are always treated as big-endian.

Multiple Asynchronous Fault Address Registers are maintained in hardware, one for each major source of asynchronous errors. These ASIs are described in *ASI\_ASYNC\_FAULT\_STATUS (ASI\_AFSR)* on page 198. The following subsections describe all other cache-related ASIs in detail.

#### M.3.1 Flush Level-1 Instruction Cache (ASI\_FLUSH\_L1I)

| [1] | Register Name: | ASI_FLUSH_L1I         |
|-----|----------------|-----------------------|
| [2] | ASI:           | 67 <sub>16</sub>      |
| [3] | VA:            | 8-byte aligned any VA |
| [4] | RW             | Supervisor write      |

ASI\_FLUSH\_L1I flushes and invalidates the entire level-1 instruction cache. VA can be any value as long as it is aligned at 8-byte. A write to this ASI with any VA and any data causes flushing and invalidation.

### M.3.2 Level-2 Cache Control Register (ASI\_L2\_CTRL)

| [1] | Register Name: | ASI_L2_CTRL           |
|-----|----------------|-----------------------|
| [2] | ASI:           | 6A <sub>16</sub>      |
| [3] | VA:            | any                   |
| [4] | RW             | Supervisor read/write |

ASI\_L2\_CTRL is a control register for L2 training, interface, and size configuration. It is illustrated below and described in TABLE M-6.

| Rese | rved | URGENT_ERROR_TRAP | Reserved |   | U2_FLUSH |
|------|------|-------------------|----------|---|----------|
| 63   | 25   | 24                | 23       | 1 | 0        |

#### TABLE M-6 ASI\_L2\_CTRL Register Bits

| Bit | Field             | RW   | Description                                                                                                                                                                                                                                                             |
|-----|-------------------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 24  | URGENT_ERROR_TRAP | RW1C | This bit is set to 1 when one of the error exceptions ( <i>instruction_access_error</i> , <i>data_access_error</i> , or <i>asynchronous_data_error</i> ) is generated. The bit remains set to 1 until supervisor software explicitly clears it by writing 1 to the bit. |
| 0   | U2_FLUSH          | W    | Setting this bit to 1 causes the entire level-2 cache to<br>flush. Until the flushing of the level-2 cache<br>completes, the processor ceases operation and does<br>not perform further instruction execution.                                                          |
|     |                   |      | Writing 0 to this bit is ignored.                                                                                                                                                                                                                                       |

Programming Note - To wait for completion of cache flush, a membar #sync is needed.

#### M.3.3 Cache invalidation (ASI\_CACHE\_INV)

| [1] | Register Name: | ASI_CACHE_INV    |
|-----|----------------|------------------|
| [2] | ASI:           | 74 <sub>16</sub> |
| [3] | VA:            | Physical Address |
| [4] | RW             | Supervisor write |

ASI\_CACHE\_INV flushes and invalidates cache lines of all processor modules in the same partition. The cache lines to be invalidated are specified by the VA field which keeps the physical address (that is, ASI\_CACHE\_INV is bypass ASI). Thus PSTATE.AM is ignored. Also the Physical Address Data Watchpoint Register (ASI 5816, VA=4016) is ignored unlike other bypass ASIs.

The ASI is write-only and read to it causes *data\_access\_exception* with AFSR.FTYPE = "invalid ASI".

**Note** – DCUCR.WEAK SPCA has to be set to "1" before executing the instruction.

## Interrupt Handling

Interrupt handling in SPARC64 VII is described in these sections:

- Interrupt Dispatch on page 155
- Interrupt Receive on page 157
- Interrupt-Related ASI Registers on page 158

## N.1 Interrupt Dispatch

When a processor wants to dispatch an interrupt to another processor, it first sets up the interrupt data registers (ASI\_INTR\_W data 0-7) with the outgoing interrupt packet data by using ASI instructions. It then performs an ASI\_INTR\_W (interrupt dispatch) write to trigger delivery of the interrupt. The interrupt packet and the associated data are forwarded to the target processor by the system controller. The processor polls the BUSY bit in the INTR\_DISPATCH\_STATUS register to determine whether the interrupt has been dispatched successfully.

FIGURE N-1 illustrates the steps required to dispatch an interrupt.



FIGURE N-1 Dispatching an Interrupt

## N.2 Interrupt Receive

When an interrupt packet is received, eight interrupt data registers are updated with the associated incoming data and the BUSY bit in the ASI\_INTR\_RECEIVE register is set. If interrupts are enabled (PSTATE.IE = 1), then the processor enters a trap and the interrupt data registers are read by the software to determine the appropriate trap handler. The handler may reprioritize this interrupt packet to a lower priority.

If an incoming packet is marked as an error, the BUSY bit in the ASI\_INTR\_RECEIVE register is not set. In this case, other interrupt related ASI registers may also be corrupted and should not be accessed. See Section P.8.3, *ASI Register Error Handling*, on page 203 for details.

FIGURE N-2 is an example of the interrupt receive flow.



FIGURE N-2 Interrupt Receive Flow

## N.3 Interrupt Global Registers

Please refer to Section N.3. of Commonality.

## N.4 Interrupt-Related ASI Registers

Please refer to Section N.4 of Commonality for details of these registers.

### N.4.2 Interrupt Vector Dispatch Register

SPARC64 VII ignores all 10 bits of VA<38:29> when the Interrupt Vector Dispatch Register is written (impl. dep. #246).

#### N.4.3 Interrupt Vector Dispatch Status Register

In SPARC64 VII, 32 BUSY/NACK pairs are implemented in the Interrupt Vector Dispatch Status Register (impl. dep. #243).

#### N.4.5 Interrupt Vector Receive Register

SPARC64 VII sets a 10-bit value in the SID\_H and SID\_L fields of the Interrupt Vector Receive Register, but the value to be set is undefined. (impl. dep. #247).

## N.5 How to identify an interrupt target

SPARC64 VII has multiple threads in a processor module. As a result, SPARC64 VII needs a mechanism to identify which thread should receive a given interrupt (*interrupt\_vector*).

ASI\_EIDR is used to identify the thread to receive a given interrupt (interrupt\_vector).

The firmware is supposed to initialize ASI\_EIDR with the Interrupt Target Identifier (ITID) on boot. The behavior of SPARC64 VII when it receives an interrupt packet is as follows.

## a. If at least one of the ASI\_EIDRs remain uninitialized, and none of the initialized ASI\_EIDR values are equal to the ITID value in the interrupt packet

The interrupt packet is sent to the thread specified by ITID<1:0> in the packet.

## **b.** If all of the ASI\_EIDRs have been initialized, but zero or more than one of the ASI\_EIDR values are equal to the ITID value in the interrupt packet

Which thread receives the packet or if none receives it is undefined. The sender sees ASI INTR DISPATCH STATUS#NACK=0 in both the cases, though.

## c. If one but only one of the initialized ASI\_EIDR values is equal to the ITID value in the interrupt packet.

The interrupt packet is sent to the thread of which ASI\_EIDR value matches with the ITID value in the packet.

### Reset, RED\_state, and error\_state

The appendix contains these sections:

- *Reset Types* on page 161
- *RED\_state and error\_state* on page 163
- Processor State after Reset and in RED\_state on page 165

## O.1 Reset Types

This section describes the four reset types: power-on reset, watchdog reset, externally initiated reset, and software-initiated reset.

POR and XIR are applied to all the threads within a processor module. In other words, all the threads go through the same trap process. WDR, SIR, are RED\_state are applied only to the particular thread which invoked the reset. Other threads are unaffected and continue to run.

#### O.1.1 Power-on Reset (POR)

For execution of the power-on reset on SPARC64 VII, an external facility must issue the required sequence of JTAG commands to the processor.

While the reset pin is asserted or the Power ready signal is de-asserted, the processor stops and executes only the specified JTAG command. The processor does not change any software-visible resources in the processor except the changes by JTAG command execution and does not change any memory system state.

On POR, the processor enters RED\_state with TT = 1 trap to RSTVaddr +  $20_{16}$  and starts the instruction execution.

### O.1.2 Watchdog Reset (WDR)

The watchdog reset trap is generated internally in the following cases:

- Second watchdog timeout detection while TL < MAXTL.
- First watchdog timeout detection while TL = MAXTL
- When a trap occurs while TL = MAXTL

When triggered by a watchdog timeout, a WDR trap has TT = 2 and control transfers to RSTVaddr + 40<sub>16</sub>. Otherwise, the TT of the trap is preserved, causing an entry into error\_state.

#### O.1.3 Externally Initiated Reset (XIR)

When SPARC64 VII receives a packet requesting XIR through the Jupiter Bus, it generates a trap of TT = 3 and causes the processor to transfer execution to RSTVaddr +  $60_{16}$  and enter RED\_state.

#### O.1.4 Software-Initiated Reset (SIR)

Any processor can initiate a software-initiated reset with an SIR instruction.

If TL (Trap Level) < MAXTL (5), an SIR instruction causes a trap of TT = 4 and causes the processor to execute instructions from RSTVaddr +  $80_{16}$  and enter RED\_state.

If a processor executes an SIR instruction while TL = 5, it enters error\_state and ultimately generates a watchdog reset trap.

## O.2 RED\_state and error\_state

The suspended\_state is added to support MTP effectively. There is no way for a given thread to tell if the other thread is in the suspended\_state or not .



\* WDT1 is the first watchdog timeout.

FIGURE O-1 Processor State Diagram

<sup>\*\*</sup> WDT2 is the second watchdog timeout. WDT2 takes the CPU into error\_state. In a normal setting, error\_state immediately generates a watchdog reset trap and brings the CPU into RED\_state. Thus, the state is transient. When the OPSR (Operation Status Register) specifies the stop on error\_state, an entry into error\_state does not cause a watchdog reset and the CPU remains in the error\_state.

<sup>\*\*\*</sup>CPU\_fatal\_error\_state signals the detection of a fatal error to the system through P\_FERR signal to the system, and the system causes a FATAL reset. Soft POR will be applied to the all threads in the system at the FATAL reset.

### O.2.1 RED\_state

Once the processor enters RED\_state for any reason except when a power-on reset (POR) is performed, the software should not attempt to return to execute\_state; if software attempts a return, then the state of the processor is unpredictable.

When the processor processes a reset or a trap that enters RED\_state, it enters a trap at an offset relative to the RED\_state trap table (RSTVaddr); in the processor, this is at virtual address VA = FFFFFFFF0000000<sub>16</sub> and physical address PA =  $000007FFF0000000_{16}$ .

The following list further describes the processor behavior upon entry into RED\_state, and during RED\_state:

- Whenever the processor enters RED\_state, all fetch buffers are invalidated.
- When the processor enters RED\_state because of a trap or reset, the DCUCR register is updated by hardware to disable several hardware features. Software must set these bits when required (for example, when the processor exits from RED\_state).
- When the processor enters RED\_state not because of a trap or reset (that is, when the PSTATE.RED bit has been set by WRPR), these register bits are unchanged—unlike the case above. The only side effect is the disabling of the instruction MMU.
- When the processor is in RED\_state, it behaves as if the IMMU is disabled (DCUCR.IM is clear), regardless of the actual values in the respective control register.
- Caches continue to snoop and maintain coherence while the processor is in RED\_state.

#### O.2.2 error\_state

The processor enters error\_state when a trap occurs and TL = MAXTL (5) or when the second watchdog time-out has occurred.

Under normal settings, the processor immediately generates a watchdog reset trap (WDR) and transitions to RED\_state. Otherwise, the OPSR (Operating Status Register) specifies the stop on error\_state, that is, the processor does not generate a watchdog reset after error\_state transition and remains in the error\_state.

#### O.2.3 CPU Fatal Error state

The processor enters CPU fatal error state when a fatal error is detected in the processor. A fatal error is one that breaks the cache coherency or the system data integrity.

The processor reports the fatal error detection to the system, and the system causes the fatal reset. Soft POR will be applied to the all CPUs in the system at the fatal reset.

## O.3 Processor State after Reset and in RED\_state

TABLE O-1 shows the various processor states after resets and when entering RED\_state.

In this table, it is assumed that RED\_state entry happens as a result of resets or traps. If RED\_state entry occurs because the WRPR instruction sets the PSTATE.RED bit, no processor state will be changed except the PSTATE.RED bit itself; the effects of this are described in *RED\_state* on page 164.

| Name                                       |            | POR <sup>1</sup>                                                     | WDR <sup>2</sup>                                      | XIR                          | SIR                     | RED_state                                             |
|--------------------------------------------|------------|----------------------------------------------------------------------|-------------------------------------------------------|------------------------------|-------------------------|-------------------------------------------------------|
| Integer registers Unknown/Unchang          |            | Unknown/Unchanged                                                    | Unchanged                                             |                              |                         |                                                       |
| Floating Point registers Unknown/Unchanged |            | Unchanged                                                            |                                                       |                              |                         |                                                       |
| RSTV valu                                  | ie         | VA = FFFF FFFF FC                                                    | 000 0000 <sub>16</sub>                                |                              |                         |                                                       |
|                                            |            | PA = 07FF F000 00                                                    | 0016                                                  |                              | •                       |                                                       |
| PC                                         |            | $RSTV \mid 20_{16}$                                                  | $\mathtt{RSTV} \mid 40_{16}$                          | $\texttt{RSTV} \mid 60_{16}$ | $RSTV \mid 80_{16}$     | $\texttt{RSTV} \mid \texttt{A0}_{16}$                 |
| nPC                                        |            | rstv   24 <sub>16</sub>                                              | RSTV   44 <sub>16</sub>                               | RSTV   64 <sub>16</sub>      | RSTV   84 <sub>16</sub> | $RSTV   A4_{16}$                                      |
| PSTATE                                     | AG         | 1 (Alternate globals)                                                |                                                       |                              |                         |                                                       |
|                                            | MG         | 0 (MMU globals not sel                                               | ,                                                     |                              |                         |                                                       |
|                                            | IG         | 0 (Interrupt globals not                                             | selected)                                             |                              |                         |                                                       |
|                                            | IE         | 0 (Interrupt disable)                                                |                                                       |                              |                         |                                                       |
|                                            | PRIV       | <ol> <li>(Privileged mode)</li> <li>(Full 64-bit address)</li> </ol> |                                                       |                              |                         |                                                       |
|                                            | AM         | 1 (FPU on)                                                           |                                                       |                              |                         |                                                       |
|                                            | PEF<br>RED | 1 (Red_state)                                                        |                                                       |                              |                         |                                                       |
|                                            | MM         | 00 (TSO)                                                             |                                                       |                              |                         |                                                       |
|                                            |            | 0                                                                    | Unchanged                                             |                              |                         |                                                       |
|                                            | TLE<br>CLE | 0                                                                    | Copied from TLE                                       | Ξ                            |                         |                                                       |
| TBA<63:1:                                  |            | Unknown/Unchanged                                                    | Unchanged                                             |                              |                         |                                                       |
| Y                                          |            | Unknown/Unchanged                                                    | Unchanged                                             |                              |                         |                                                       |
| PIL                                        |            | Unknown/Unchanged                                                    | Unchanged                                             |                              |                         |                                                       |
| CWP                                        |            | Unknown/Unchanged                                                    | Unchanged<br>except for reg-<br>ister window<br>traps | Unchanged                    | Unchanged               | Unchanged<br>except for reg-<br>ister window<br>traps |
| TT[TL]                                     |            | 1                                                                    | trap type<br>or 2                                     | 3                            | 4                       | trap type                                             |
| CCR                                        |            | Unknown/Unchanged                                                    | Unchanged                                             |                              |                         |                                                       |
| ASI                                        |            | Unknown/Unchanged                                                    | Unchanged                                             |                              |                         |                                                       |
| TL                                         |            | MAXTL                                                                | min (TL + 1, MAXTL)                                   |                              |                         |                                                       |
| TPC [TL]                                   |            | Unknown/Unchanged                                                    | PC                                                    |                              |                         |                                                       |
| TNPC [TL                                   | ]          | Unknown/Unchanged                                                    | nPC                                                   |                              |                         |                                                       |

TABLE 0-1 Nonprivileged and Privileged Register State after Reset and in RED\_state

| Name    |                                          | POR <sup>1</sup>                                                                                  | WDR <sup>2</sup>                         | XIR                       | SIR                | RED_state |
|---------|------------------------------------------|---------------------------------------------------------------------------------------------------|------------------------------------------|---------------------------|--------------------|-----------|
| TSTATE  | CCR<br>ASI<br>PSTATE<br>CWP<br>PC<br>nPC | Unknown/Unchanged                                                                                 | CCR<br>ASI<br>PSTATE<br>CWP<br>PC<br>nPC |                           |                    |           |
| TICK    | NPT<br>Counter                           | 1<br>Restart at 0                                                                                 | Unchanged<br>Count                       | Unchanged<br>Restart at 0 | Unchanged<br>Count |           |
| CANSAVE |                                          | Unknown/Unchanged                                                                                 | Unchanged                                |                           |                    |           |
| CANREST | ORE                                      | Unknown/Unchanged                                                                                 | Unchanged                                |                           |                    |           |
| OTHERWI | N                                        | Unknown/Unchanged                                                                                 | Unchanged                                |                           |                    |           |
| CLEARWI | N                                        | Unknown/Unchanged                                                                                 | Unchanged                                |                           |                    |           |
| WSTATE  | OTHER<br>NORMAL                          | Unknown/Unchanged<br>Unknown/Unchanged                                                            | Unchanged<br>Unchanged                   |                           |                    |           |
| VER     | MANUF<br>IMPL<br>MASK<br>MAXTL<br>MAXWIN | $ \begin{array}{c} 0004_{16} \\ 7_{16} \\ \text{Mask dependent} \\ 5_{16} \\ 7_{16} \end{array} $ |                                          |                           |                    |           |
| FSR     |                                          | 0                                                                                                 | Unchanged                                |                           |                    |           |
| FPRS    |                                          | Unknown/Unchanged                                                                                 | Unchanged                                |                           |                    |           |

TABLE 0-1 Nonprivileged and Privileged Register State after Reset and in RED state (Continued)

1.Hard POR occurs when power is cycled. Values are unknown following hard POR. Soft POR occurs when the reset signal is asserted. Values are unchanged following soft POR.

2.The first watchdog time-out trap is taken in execute\_state (i.e. PSTATE.RED = 0), subsequent watchdog time-out traps as well as watchdog traps due to a trap @ TL = MAX\_TL are taken in RED\_state. See Section O.1.2, Watchdog Reset (WDR), on page 162 for more details.

TABLE O-2 ASR State after Reset and in RED\_state

| ASR | Name    |                      | POR <sup>1</sup>            | WDR <sup>2</sup>                    | XIR | SIR | RED_state |
|-----|---------|----------------------|-----------------------------|-------------------------------------|-----|-----|-----------|
| 16  | PCR     | UT<br>ST<br>Others   | 0<br>0<br>Unknown/Unchanged | Unchanged                           |     |     |           |
| 17  | PIC     |                      | Unknown/Unchanged           | Unchanged                           |     |     |           |
| 18  | DCR     |                      | Always 0                    |                                     |     |     |           |
| 19  | GSR     | IM<br>IRND<br>Others | 0<br>0<br>Unknown/Unchanged | Unchanged<br>Unchanged<br>Unchanged |     |     |           |
| 22  | SOFTINT |                      | Unknown/Unchanged           | Unchanged                           |     |     |           |

| ASR | Name                                  | POR <sup>1</sup>  | WDR <sup>2</sup>       | XIR | SIR | RED_state |
|-----|---------------------------------------|-------------------|------------------------|-----|-----|-----------|
| 23  | TICK_COMPARE<br>INT_DIS<br>TICK_CMPR  | 1<br>0            | Unchanged<br>Unchanged |     |     |           |
| 24  | STICK NPT<br>Counter                  | 1<br>Restart at 0 | Unchanged<br>Count     |     |     |           |
| 25  | STICK_COMPARE<br>INT_DIS<br>TICK_CMPR | 1<br>0            | Unchanged<br>Unchanged |     |     |           |

 TABLE 0-2
 ASR State after Reset and in RED\_state (Continued)

1.Hard POR occurs when power is cycled. Values are unknown following hard POR. Soft POR occurs when the reset signal is asserted. Values are unchanged following soft POR.

2.The first watchdog time-out trap is taken in execute\_state (i.e. PSTATE.RED=0), subsequent watchdog time-out traps, as well as watchdog traps due to a trap @ TL = MAX\_TL, are taken in RED\_state. See Section O.1.2, Watchdog Reset (WDR), on page 162or more details

| ASI | VA    | Name                                             | POR <sup>1</sup>                                                                                 | WDR <sup>2</sup>                                 | XIR       | SIR | RED_state |  |  |  |
|-----|-------|--------------------------------------------------|--------------------------------------------------------------------------------------------------|--------------------------------------------------|-----------|-----|-----------|--|--|--|
| 45  | 00    | DCUCR                                            | 0                                                                                                | 0                                                | )         |     |           |  |  |  |
| 45  |       |                                                  | 2<br>0                                                                                           | 2 0                                              |           |     |           |  |  |  |
| 48  | 00    | INTR_DISPATCH_STATUS                             | 0                                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 49  | 00    | INTR_RECEIVE                                     | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 4A  | 00    | JBUS_CONFIG<br>UC_S<br>UC_SW<br>CLK_MODE<br>ITID | Pre-defined/Unchanged<br>Pre-defined/Unchanged<br>Pre-defined/Unchanged<br>Pre-defined/Unchanged | Unchanged<br>Unchanged<br>Unchanged<br>Unchanged |           |     |           |  |  |  |
| 4C  | 00    | AFSR                                             | Unknown/Unchanged                                                                                | Unchanged                                        | Unchanged |     |           |  |  |  |
| 4C  | 08    | UGESR                                            | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 4C  | 10    | ERROR_CONTROL<br>WEAK_ED<br>Others               | 1<br>Unknown/Unchanged                                                                           | 1<br>Unchanged                                   |           |     |           |  |  |  |
| 4C  | 18    | STCHG_ERR_INFO                                   | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 4D  | 00    | AFAR_D1                                          | Constant Value                                                                                   | Constant Value                                   |           |     |           |  |  |  |
| 4D  | 08    | AFAR_U2                                          | Constant Value                                                                                   | Constant Value                                   |           |     |           |  |  |  |
| 4F  | 00–38 | SCRATCH_REGs                                     | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 50  | 00    | IMMU_TAG_TARGET                                  | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 50  | 18    | IMMU_SFSR                                        | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 50  | 28    | IMMU_TSB_BASE                                    | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |
| 50  | 30    | IMMU_TAG_ACCESS                                  | Unknown/Unchanged                                                                                | Unchanged                                        |           |     |           |  |  |  |

 TABLE 0-3
 ASI Register State After Reset and in RED\_state (1 of 3)

| ASI | VA                | Name                | POR <sup>1</sup>  | WDR <sup>2</sup> | XIR            | SIR | RED_state |  |  |
|-----|-------------------|---------------------|-------------------|------------------|----------------|-----|-----------|--|--|
| 50  | 48                | IMMU_TAG_TSB_PEXT   | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 50  | 58                | IMMU_TAG_TSB_NEXT   | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 50  | 60                | IMMU_TAG_ACCESS_EXT | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 50  | 78                | IMMU_SFPAR          | Unknown/Unchanged | Unchanged        | Unchanged      |     |           |  |  |
| 51  |                   | IMMU_TSB_8KB_PTR    | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 52  |                   | IMMU_TSB_64KB_PTR   | Unknown/Unchanged | Unchanged        | Unchanged      |     |           |  |  |
| 53  |                   | SERIAL_ID           | Constant value    | Constant value   | Constant value |     |           |  |  |
| 54  | —                 | ITLB_DATA_IN        | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 55  | —                 | ITLB_DATA_ACCESS    | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 56  |                   | ITLB_TAG_READ       | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 57  |                   | ITLB_DEMAP          | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 00                | DMMU_TAG_TARGET     | Unknown/Unchanged | Unchanged        | Unchanged      |     |           |  |  |
| 58  | 08                | PRIMARY_CONTEXT     | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 10                | SECONDARY_CONTEXT   | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 18                | DMMU_SFSR           | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 20                | DMMU_SFAR           | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 28                | DMMU_TSB_BASE       | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 30                | DMMU_TAG_ACCESS     | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 38                | DMMU_VA_WATCHPOINT  | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 40                | DMMU_PA_WATCHPOINT  | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 48                | DMMU_TSB_PEXT       | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 50                | DMMU_TSB_SEXT       | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 58                | DMMU_TSB_NEXT       | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 60                | SHARED_CONTEXT      | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 58  | 68                | DMMU_TAG_ACCESS_EXT | Unknown/Unchanged | Unchanged        | Unchanged      |     |           |  |  |
| 58  | 78                | DMMU_SFPAR          | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 59  |                   | DMMU_TSB_8KB_PTR    | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 5A  |                   | DMMU_TSB_64KB_PTR   | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 5B  |                   | DMMU_TSB_DIRECT_PTR | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 5C  |                   | DTLB_DATA_IN        | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 5D  |                   | DTLB_DATA_ACCESS    | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 5E  |                   | DTLB_TAG_READ       | Unknown/Unchanged | Unchanged        |                |     |           |  |  |
| 5F  | —                 | DMMU_DEMAP          | Unchanged         |                  |                |     |           |  |  |
| 60  | _                 | IIU_INST_TRAP       | 0                 | Unchanged        | Unchanged      |     |           |  |  |
| 61  | 00, 08,<br>40, 48 | ITSB_PREFETCH       | 0/Unchanged       | Unchanged        |                |     |           |  |  |
| 62  | 00, 08,<br>40, 48 | DTSB_PREFETCH       | 0/Unchanged       | Unchanged        |                |     |           |  |  |

TABLE 0-3ASI Register State After Reset and in RED\_state (2 of 3)

| ASI | VA    | Name                                        | POR <sup>1</sup>  | WDR <sup>2</sup> | XIR | SIR | RED_state |
|-----|-------|---------------------------------------------|-------------------|------------------|-----|-----|-----------|
| 6D  | _     | BARRIER_INIT                                | 0                 | Unchanged        |     |     |           |
| 6E  | _     | EIDR                                        | 0/Unchanged       | Unchanged        |     |     |           |
| 6F  | 00-50 | BARRIER_ASSIGN 0 Unchanged                  |                   |                  |     |     |           |
| 77  | 40:68 | INTR_DATA0:5_W                              | Unknown/Unchanged | Unchanged        |     |     |           |
| 77  | 70    | INTR_DISPATCH_W Unknown/Unchanged Unchanged |                   |                  |     |     |           |
| 77  | 80:88 | INTR_DATA6:7_W Unknown/Unchanged Unchanged  |                   |                  |     |     |           |
| 7F  | 40:88 | INTR_DATA0:7_R                              | Unknown/Unchanged | Unchanged        |     |     |           |
| EF  | 00-50 | LBSY, BST                                   | 0                 | Unchanged        |     |     |           |

 TABLE 0-3
 ASI Register State After Reset and in RED state (3 of 3)

1.Hard POR occurs when power is cycled. Values are unknown following hard POR. Soft POR occurs when the reset signal is asserted. Values are unchanged following soft POR

2. The first watchdog time-out trap is taken in execute\_state (i.e. PSTATE.RED = 0), subsequent watchdog time-out traps as well as watchdog traps due to a trap @ TL = MAX\_TL, are taken in RED\_state. See Section O.1.2, *Watchdog Reset (WDR)*, on page 162 for more details.

### O.3.1 Operating Status Register (OPSR)

OPSR is the control register in the CPU that is scanned in during the hardware power-on reset sequence before the CPU starts running.

The value of the OPSR is specified outside of the CPU and is never changed by software. OPSR is set by scan-in during hardware power-on reset and by a JTAG command after hardware POR.

Most of the OPSR settings are not visible to the software.

# Error Handling

This appendix describes the processor behavior to a programmer writing an operating system, firmware, or recovery code for SPARC64 VII. Section headings differ from those of Appendix P of **Commonality**.

# P.1 Error Classes and Signalling

On SPARC64 VII, an error is classified into one of the following four categories, depending on the degree to which it obstructs program execution:

- 1.Fatal error
- 2.Error state transition error
- 3.Urgent error
- 4.Restrainable error

SPARC64 VII includes four COREs in the same processor module, where each core contains two threads. When an error is detected, how to identify the threads where an error is logged and gets reported depends on the error type.

An error detected in the course of an instruction or occurring in a resource specific to a thread (ex. IUG\_%R) are called synchronous to thread execution. In this case, the error is logged and reported to the thread executing the instruction or the thread includes the resource with the error. By their nature, *instruction\_access\_error* and *data\_access\_error* belong to this category.

An error independent from instruction execution or occurring in the shared resources between multiple threads is called asynchronous to tread execution. In this case, the error is logged and reported to all the threads related to the resource causing the error. Error marking is essentially asynchronous to thread execution. When an L1\$ or an L2\$ raw uncorrectable error is detected, ASI\_EIDR of the valid (that is, not degraded) threads with the smallest thread ID (core0-thread0 < core0-thread1 < core1-thread0 ... < core3-thread1) related to that cache is used for error marking.

Another issue is how to log and report an error when a corresponding thread is in the suspended state. Except for fatal errors, the error logging and report are postponed until the corresponding thread exits from the suspended state.

## P.1.1 Fatal Error

A fatal error is one of the following errors that damages the entire system.

#### a. Error breaking data integrity in the system

All errors that break cache coherency are in this category.

# **b.** Invalid system control flow is detected and therefore validity of the subsequent system behavior cannot be guaranteed.

When the CPU detects a fatal error, the CPU enters FATAL error\_state and reports the fatal error occurrence to the system controller. The system controller transfers the entire system state to the FATAL state and stops the system. After the system stops, a FATAL reset, which is a type of power-on reset, will be issued to the whole system.

All fatal errors are asynchronous to thread execution. If a fatal error is detected in a given thread, all the threads within the processor module log the cause into ASI\_STCHG\_ERROR\_INFO and go through the POR sequence even if they are in the suspended state.

## P.1.2 error\_state Transition Error

An error\_state transition error is a serious error that prevents the CPU from reporting the error by generating a trap. However, any damage caused by the error is limited to within the CPU.

When the CPU detects an error\_state transition error, it enters error\_state. The CPU exits error\_state by causing a watchdog reset, entering RED\_state, and starting instruction execution at the watchdog reset trap handler.

### EE asynchronous to thread execution

The following error\_state transition errors are asynchronous to thread execution. If such an EE is detected in a given thread, both the threads within the core which caused the error log it into ASI\_STCHG\_ERROR\_INFO and go through WDR, unless they are in the suspended state. The threads in the other core are unaffected.

- EE\_TRAP\_ADR\_UE
- EE\_OTHER

### EE synchronous to thread execution

The following error\_state transition errors are synchronous to thread execution. If such an EE is detected in a given thread, only that thread logs the cause of the error into ASI STCHG ERROR INFO and goes through WDR. All the other threads are unaffected.

- EE\_SIR\_IN\_MAXTL
- EE\_TRAP\_IN\_MAXTL
- EE\_WDT\_IN\_MAXTL
- EE\_SECOND\_WDT

## P.1.3 Urgent Error

An urgent error (UGE) is an error that requires immediate processing by privileged software, which is reported by an error trap. The types of urgent errors are listed below and then described in further detail.

- Instruction-obstructing error
  - I\_UGE: Instruction urgent error
  - IAE: Instruction access error
  - DAE: Data access error
- Urgent error that is independent of the instruction execution
  - A\_UGE: Autonomous urgent error

## Instruction-Obstructing Error

An instruction-obstructing error is one that is detected by instruction execution and results in the instruction being unable to complete.

When the instruction-obstructing error is detected while

ASI\_ERROR\_CONTROL.WEAK\_ED = 0 (as set by privileged software for a normal program execution environment), then an exception is generated to report the error. This trap is nonmaskable.

Otherwise, when ASI\_ERROR\_CONTROL.WEAK\_ED = 1, as with multiple errors or a POST/OBP reset routine, one of the following actions occurs:

- Whenever possible, the CPU writes an unpredictable value to the target of the damaged instruction and the instruction ends.
- Otherwise, an error exception is generated and the damaged instruction is executed as when ASI ERROR CONTROL.WEAK ED = 0 is set.

The three types of instruction-obstructing errors are described below.

- I\_UGE (instruction urgent error) All of the instruction-obstructing errors except IAE (instruction access error) and DAE (data access error). There are two categories of I\_UGEs.
  - An uncorrectable error in an internal program-visible register that obstructs instruction execution.

An uncorrectable error in the PSTATE, PC, NPC, CCR, ASI, FSR, or GSR register is treated as an I\_UGE that obstructs the execution of any instruction. See Appendix P.8.1 and P.8.2 for details.

The first-time watchdog time-out is also treated as this type of I\_UGE.

• An error in the hardware unit executing the instruction, other than an error in a program-visible register.

Among these errors are ALU output errors, errors in temporary registers during instruction execution, CPU internal data bus errors, and so forth.

I\_UGE is a preemptive error with the characteristics shown in TABLE P-2.

IAE (instruction access error) — The instruction\_access\_error exception, as specified in JPS1 Commonality. On SPARC64 VII, only an uncorrectable error in the cache or main memory during instruction fetch is reported to software as an IAE.

IAE is a precise error.

 DAE (data access error) — The data\_access\_error exception, as specified in JPS1 Commonality. On SPARC64 VII, only an uncorrectable error in the cache or main memory during access by a load, store, or load-store instruction is reported to software as a DAE.

DAE is a precise error.

## Urgent Error Independent of Instruction Execution

■ A\_UGE (Autonomous Urgent Error) — An error that requires immediate processing and that occurs independently of instruction execution.

In normal program execution,  $ASI\_ERROR\_CONTROL.WEAK\_ED = 0$  is specified by privileged software. In this case, the A\_UGE trap is suppressed only in the trap handler used to process UGE (that is, the *async\_data\_error* trap handler).

Otherwise, in special program execution such as the handling of the occurrence of multiple errors or the POST/OBP reset routine,  $ASI\_ERROR\_CONTROL.WEAK\_ED = 1$  is specified by the program. In this case, no A\_UGE generates an exception.

There are two categories of A\_UGEs:

• An error in an important resource that will cause a fatal error or error\_state transition error when the resource is used.

When the resource with the error is used, the program cannot continue execution, and an error\_state transition error or a fatal error is detected.

• The error in an important resource that is expected to invoke the operating system "panic" process

The operating system panic process is expected when this error is detected because the normal processing cannot be expected to continue after this error occurs.

The A\_UGE is a disrupting error with the following deviations.

- The trap for A\_UGE is not masked by PSTATE.IE.
- The instruction designated by TPC may not end precisely. The instruction end-method is reported in the trap status register for A\_UGE.

### Traps for Urgent Errors

When an urgent error is detected and not masked, the error is reported to privileged software by the following exceptions:

- I\_UGE, A\_UGE: *async\_data\_error* exception
- IAE: instruction\_access\_error exception
- DAE: *data\_access\_error* exception

### Urgent error asynchronous to thread execution

The following urgent errors are asynchronous to thread execution. If such an urgent error is detected in a given thread, both of the threads within the core which caused the error log it into ASI\_UGESR and activate an *async\_data\_error* trap, unless they are in the suspended state. The threads in the other cores are unaffected.

- IAUG\_CRE
- IAUG\_TSBCTXT
- IUG\_TSBP
- IUG\_PSTATE

- IUG\_TSTATE
- IUG\_%F (except %fn parity error)
- IUR\_%R (except %rn and Y parity error)
- IUG\_WDT
- IUG\_DTLB
- IUG\_ITLB
- IUG\_COREERR

### Urgent error synchronous to thread execution

The following urgent errors are synchronous to thread execution. If such an urgent error is detected in a given thread, only that thread logs the cause of the error into ASI\_UGESR and activates an *async\_data\_error* trap, unless it is in the suspended state. All the other threads are unaffected.

- IUG\_%F (%fn parity error only)
- IUR\_%R (%rn and Y parity error only)

## P.1.4 Restrainable Error

A restrainable error is one that does not adversely affect the currently executing program and that does not require immediate handling by privileged software. A restrainable error causes a disrupting trap with low priority.

There are three types of restrainable errors.

#### ■ Correctable Error (CE), corrected by hardware

Upon detecting the CE, the hardware uses the data corrected by hardware. So a CE has no deleterious effect on the CPU.

When a CE is detected, data seen by the CPU is always corrected by hardware. But it depends on the CE type whether the source data containing the CE is corrected or not.

• Uncorrectable error without direct damage to the currently executing instruction sequence.

An error detected in cache line writeback or copyback data is of this type.

Degradation

SPARC64 VII can isolate an internal hardware resource that generates frequent errors and continue processing without deleterious effect to the software during program execution. However, performance is degraded by the resource isolation. This degradation is reported as a restrainable error.

The restrainable error can be reported to privileged software by the ECC\_error trap.

When PSTATE.IE = 1 and the trap enable mask for any restrainable error is 1, the *ECC\_error* exception is generated for the restrainable error.

## DG\_U2\$, DG\_U2\$x, UE\_RAW\_L2\$INSD

DG\_U2\$, DG\_U2\$x, and UE\_RAW\_L2\$INSID are asynchronous to thread execution. If such an error is detected, all the threads within the processor module log the cause of the error into ASI\_AFSR and activate an *ECC\_error* trap, unless they are in the suspended state.

## DG\_D1\$sTLB, UE\_RAW\_D1\$INSD

These restrainable errors are asynchronous to thread execution. If such an error is detected, both the threads within the core which caused the error log it into ASI\_AFSR and activate an *ECC\_error* trap, unless they are in the suspended state. The threads in the other cores are unaffected.

## UE\_DST\_BETO

An UE\_DST\_BETO error is synchronous to thread execution. If such an error is detected in a given thread, only that thread logs the cause of the error into ASI\_AFSR and activates an *ECC\_error* trap, unless it is in the suspended state. All the other threads in the other cores are unaffected.

## P.1.5 instruction\_access\_error

*instruction\_access\_error* is synchronous to thread execution. If such an error is detected in a given thread, only that thread logs the cause of the error into ASI\_ISFSR, TPC, and ASI\_ISFPAR, and activates an *instruction\_access\_error* trap. All the other threads are unaffected.

## P.1.6 data\_access\_error

*data\_access\_error* is synchronous to thread execution. If such an error is detected in a given thread, only that thread logs the cause of the error into ASI\_DSFSR, ASI\_DSFAR, and ASI\_DSFPAR, and activates an *data\_access\_error* trap. All the other threads are unaffected.

# P.2 Action and Error Control

## P.2.1 Registers Related to Error Handling

The following registers are related to the error handling.

- ASI registers: Indicate an error. All ASI registers in TABLE P-1 except ASI\_EIDR and ASI\_ERROR\_CONTROL are used to specify the nature of an error to privileged software.
- ASI\_ERROR\_CONTROL: Controls error action. This register designates error detection masks and error trap enable masks.
- ASI EIDR: Marks errors. This register identifies the error source ID for error marking.

TABLE P-1 lists the registers related to the error handling.

 TABLE P-1
 Registers Related to Error Handling

| ASI              | VA               | R/W    | Checking Code | Name                    | Defined in             |
|------------------|------------------|--------|---------------|-------------------------|------------------------|
| 4C <sub>16</sub> | 0016             | RW1C   | None          | ASI_ASYNC_FAULT_STATUS  | P.7.1                  |
| 4C <sub>16</sub> | 0816             | R      | None          | ASI_URGENT_ERROR_STATUS | P.4.1                  |
| $4C_{16}$        | 10 <sub>16</sub> | RW     | Parity        | ASI_ERROR_CONTROL       | P.2.1                  |
| $4C_{16}$        | 18 <sub>16</sub> | R,W1AC | None          | ASI_STCHG_ERROR_INFO    | P.3.1                  |
| 50 <sub>16</sub> | 18 <sub>16</sub> | RW     | None          | ASI_IMMU_SFSR           | F.10.9                 |
| 58 <sub>16</sub> | 18 <sub>16</sub> | RW     | None          | ASI_DMMU_SFSR           | F.10.9                 |
| 58 <sub>16</sub> | 2016             | RW     | Parity        | ASI_DMMU_SFAR           | F.10.10 of Commonality |
| 6E <sub>16</sub> | 0016             | RW     | Parity        | ASI_EIDR                | P.2.5                  |

# P.2.2 Summary of Actions Upon Error Detection

TABLE P-2 summarizes what happens when an error is detected.

|                                                                              | Fatal Error (FE) | Error State Transition<br>Error (EE)                                                   | Urgent Error (UGE)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Restrainable Error (RE)                                                                                                                                                                                                                       |
|------------------------------------------------------------------------------|------------------|----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Error detection<br>mask (the<br>condition to<br>suppress error<br>detection) | None             | When<br>ASI_ECR.WEAK_E<br>D = 1, the error<br>detection is suppressed<br>incompletely. | I_UGE, IAE, DAE<br>When<br>ASI_ECR.WEAK_ED = 1 or<br>in the SUSPENDED state,<br>error detection is suppressed<br>incompletely.<br>A_UGE<br>In the SUSPENDED state,<br>error detection is suppressed<br>incompletely.<br>Error detection except in register<br>usage is suppressed when<br>ASI_ECR.WEAK_ED = 1 or<br>upon a condition unique to each<br>error.<br>Error detection in the register<br>usage is suppressed by<br>conditions unique to each error.<br>Only some A_UGEs have the<br>above unique conditions to<br>suppress error detection; most do<br>not. | None                                                                                                                                                                                                                                          |
| Trap mask (the<br>condition to<br>suppress the error<br>trap occurrence)     | None             | None                                                                                   | I_UGE, IAE, IAE<br>the SUSPENDED state.<br>A_UGE<br>ASI_ECR.UGE_HANDLER =<br>1<br>or<br>ASI_ECR.WEAK_ED = 1<br>The A_UGE detected during the<br>trap is suppressed, is kept<br>pending in the hardware, and<br>causes the async_data_error<br>trap when the trap is enabled<br>or<br>the SUSPENDED state.                                                                                                                                                                                                                                                              | ASI_ECR.UGE_HANDLER =<br>1<br>or<br>ASI_ECR.WEAK_ED = 1<br>or<br>PSTATE.IE = 0<br>or<br>ASI_ECR.RTE_xx = 0, where<br>RTE_xx is the trap enable mask<br>for each error group.<br>RTE_xx is RTE_CEDG or<br>RTE_UE<br>or<br>the SUSPENDED state. |

**TABLE P-2**Action Upon Detection of an Error (1 of 3)

|                                                                                          | Fatal Error (FE)                                                                                                                                                                                                                                                                              | Error State Transition<br>Error (EE)                                              | Urgent Error (UGE)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Restrainable Error (RE)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                          | . ,                                                                                                                                                                                                                                                                                           | . ,                                                                               | Urgent Error (UGE)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | . ,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Action upon the<br>error detection                                                       | <ol> <li>CPU enters<br/>CPU fatal state.</li> <li>CPU informs<br/>the system of<br/>fatal error<br/>occurrence.</li> <li>The FATAL<br/>reset (which is a<br/>form of POR<br/>reset) is issued<br/>to the whole<br/>system.</li> <li>POR is sent to<br/>all CPUs in the<br/>system.</li> </ol> | 1. CPU enters<br>error_state.<br>2. Watchdog reset<br>(WDR) is set on the<br>CPU. | Detection of I_UGE<br>When<br>ASI_ECR.UGE_HANDLER =<br>0, a single-ADE trap is set.<br>Otherwise, when<br>ASI_ECR.UGE_HANDLER =<br>1, a multiple-ADE trap is set.<br>Detection of A_UGE<br>When the trap is enabled, a<br>single-ADE trap is set.<br>When the trap is disabled, the<br>trap condition is kept pending in<br>hardware.<br>Detection of IAE<br>When<br>ASI_ECR.UGE_HANDLER =<br>0, an IAE trap is set. Otherwise, a<br>multiple-ADE trap is set.<br>Detection of DAE<br>When<br>ASI_ECR.UGE_HANDLER =<br>0, a DAE trap is set. Otherwise, a<br>multiple-ADE trap is set. | <ul> <li>An ECC_error trap can occur<br/>even though ASI_AFSR does<br/>not indicate any detected<br/>error(s) corresponding to any<br/>trap-enable bit (RTE_UE or<br/>RTE_CEDG) set to 1 in<br/>ASI_ECR, in the following<br/>cases:</li> <li>A pending detected error is<br/>erased from ASI_ASFR (by<br/>writing 1 to ASI_AFSR)<br/>after the error is detected but<br/>before the ECC_error trap is<br/>generated.</li> <li>A pending CE or DG is<br/>erased by writing 1 to<br/>ASI_AFSR after the<br/>ECC_error trap is set by the<br/>UE error detection.</li> <li>A pending UE is erased by<br/>writing 1 to ASI_AFSR<br/>after the ECC_error trap is<br/>set by CE or DG detection.</li> <li>Privileged software should<br/>ignore an ECC_error trap<br/>when the AFSR contains no<br/>errors corresponding to those<br/>enabled in ASI_ECR to cause<br/>a trap.</li> </ul> |
| Priority of action<br>when multiple<br>types of errors are<br>simultaneously<br>detected | 1 — CPU fatal<br>state                                                                                                                                                                                                                                                                        | 2 — error_state                                                                   | <ul> <li>3 — async_data_error trap</li> <li>4 — data_access_error trap</li> <li>5 — instruction_access_error trap</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 6 — ECC_error trap                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| tt (trap type)                                                                           | 1(RED_state)                                                                                                                                                                                                                                                                                  | 2(RED_state)                                                                      | $\begin{array}{l} \text{async\_data\_error: } 40_{16} \\ \text{data\_access\_error: } 32_{16} \\ \text{instruction\_access\_error: } 0A_{16} \end{array}$                                                                                                                                                                                                                                                                                                                                                                                                                              | 63 <sub>16</sub>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Trap priority                                                                            | 1                                                                                                                                                                                                                                                                                             | 1                                                                                 | async_data_error — 2<br>data_access_error — 12<br>instruction_access_error — 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 32                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| End-method of<br>trapped<br>instruction                                                  | Abandoned                                                                                                                                                                                                                                                                                     | Abandoned.                                                                        | ADE trap<br>Precise, retryable or<br>nonretryable. See P.4.3.<br>IAE trap, DAE trap<br>Precise.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Precise                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |

TABLE P-2Action Upon Detection of an Error (2 of 3)

|                                                                        | Fatal Error (FE)                                                          | Error State Transition<br>Error (EE)                                   | Urgent Error (UGE)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Restrainable Error (RE)                                             |
|------------------------------------------------------------------------|---------------------------------------------------------------------------|------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|
| Relation<br>between TPC<br>and instruction<br>that caused the<br>error | None                                                                      | None                                                                   | I_UGE<br>For errors other than TLB write<br>errors, the error was caused by<br>the instruction pointed to by TPC<br>or by the instruction subsequent<br>in the control flow to the one<br>indicated by TPC.<br>For a TLB write error, the<br>instruction pointed to by TPC or<br>the already executed instruction<br>previous in the control flow to<br>the one indicated by TPC wrote a<br>TLB entry and the TLB write<br>failed. The TLB write error is<br>detected after the instruction<br>execution and before any trap,<br>RETRY, or DONE instruction.<br>A_UGE<br>None.<br>IAE, DAE<br>The instruction pointed to by<br>TPC caused the error. | None                                                                |
| Register that<br>indicates the error                                   | ASI_STCHG_<br>ERROR_INFO                                                  | ASI_STCHG_<br>ERROR_INFO                                               | I_UGE, A_UGE<br>ASI_UGESR<br>IAE<br>ASI_ISFSR<br>DAE<br>ASI DSFSR                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | ASI_AFSR                                                            |
| Number of errors<br>indicated at trap                                  | All FEs are<br>detected and<br>accumulated in<br>ASI_STCHG_<br>ERROR_INFO | All EEs are detected<br>and accumulated in<br>ASI_STCHG_<br>ERROR_INFO | Single-ADE trap<br>All I_UGEs and A_UGEs<br>detected at trap.<br>Multiple-ADE trap<br>The multiple-ADE indication +<br>UGEs at first ADE trap.<br>IAE<br>One error<br>DAE<br>One error                                                                                                                                                                                                                                                                                                                                                                                                                                                               | All restrainable errors<br>detected and accumulated in<br>ASI_AFSR. |
| Error address<br>indication register                                   | None                                                                      | None                                                                   | I_UGE, A_UGE: None<br>IAE: TPC<br>DAE: ASI_DFAR                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | ASI_AFAR_D1<br>ASI_AFAR_U2                                          |

| TABLE P-2 | Action Upon Detection of an Er | cor (3 of 3) |
|-----------|--------------------------------|--------------|
|-----------|--------------------------------|--------------|

# P.2.3 Extent of Automatic Source Data Correction for Correctable Error

Upon detection of the following correctable errors (CE), the CPU corrects the input data and uses the corrected data; however, the source data with the CE is not corrected automatically.

- CE in memory (DIMM)
- CE in ASI\_INTR\_DATA\_R

Upon detection of other correctable errors, the CPU automatically corrects the source data containing the CE.

For correctable errors in ASI\_INTR\_DATA, no special action is required by privileged software because the erroneous data will be overwritten when the next interrupt is received. For CE in memory (DIMM), it is expected that privileged software will correct the error in memory.

## P.2.4 Error Marking for Cacheable Data Error

### Error Marking for Cacheable Data

Error marking for cacheable data involves the following action:

• When a hardware unit first detects an uncorrected error in the cacheable data, the hardware unit replaces the data and ECC of the cacheable data with a special pattern that identifies the original error source and signifies that the data is already marked.

The error marking helps identify the error source and prevents multiple error reports by a single error even after several cache lines transfer with uncorrected data.

The following data are protected by the single-bit error correction and double-bit error detection ECC code attached to every doubleword:

- Main memory (DIMM)
- Jupiter Bus packet data containing cache line data and interrupt packet data
- U2 (unified level 2) cache data
- D1 cache data
- The cacheable area block held by the channel

The ECC applied to these data is the ECC specified for Jupiter Bus.

When the CPU and channel detect an uncorrected error in the above cacheable data that is not yet marked, the CPU and channel execute error marking for the data block with an UE. Whether the data with UE is marked or not is determined by the syndrome of the doubleword data, as shown in TABLE P-3.

 TABLE P-3
 Syndrome for Data Marked for Error

| Syndrome                                    | Error Marking Status | Type of Uncorrected Error |
|---------------------------------------------|----------------------|---------------------------|
| 7F <sub>16</sub>                            | Marked               | Marked UE                 |
| Multibit error pattern except for $7F_{16}$ | Not marked yet       | Raw UE                    |

The syndrome  $7F_{16}$  indicates a 3-bit error in the specified location in the doubleword. The error marking replaces the original data and ECC to the data and ECC, as described in the following section. The probability of syndrome  $7F_{16}$  occurrence other than the error marking is considered to be zero.

### The Format of Error-Marking Data

When the raw UE is detected in the cacheable data doubleword, the erroneous doubleword and its ECC are replaced in the data by error marking, as listed in TABLE P-4.

 TABLE P-4
 Format of Error-Marked Data

| Data/ECC | Bit   | Value                                                                                                          |
|----------|-------|----------------------------------------------------------------------------------------------------------------|
| data     | 63    | Error bit. The value is unpredictable.                                                                         |
|          | 62:56 | 0 (7 bits).                                                                                                    |
|          | 55:42 | ERROR_MARK_ID (14 bits).                                                                                       |
|          | 41:36 | 0 (6 bits).                                                                                                    |
|          | 35    | Error bit. The value is unpredictable.                                                                         |
|          | 34:23 | 0 (12 bits).                                                                                                   |
|          | 22    | Error bit. The value is unpredictable.                                                                         |
|          | 21:14 | 0 (8 bits).                                                                                                    |
|          | 13:0  | ERROR_MARK_ID (14 bits).                                                                                       |
| ECC      |       | The pattern indicates 3-bit error in bits 63, 35, and 22, that is, the pattern causing the $7F_{16}$ syndrome. |

The ERROR\_MARK\_ID (14 bits wide) identifies the error source. The hardware unit that detects the error provides the error source\_ID and sets the ERROR MARK\_ID value.

The format of ERROR MARK ID < 13:0 > is defined in TABLE P-5.

 TABLE P-5
 ERROR\_MARK\_ID Bit Description

| Bit   | Value                                                                                                                                                             |  |  |
|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| 13:12 | Module_ID: Indicates the type of error source hardware as follows:                                                                                                |  |  |
|       | 00 <sub>2</sub> : Memory system including DIMM                                                                                                                    |  |  |
|       | 01 <sub>2</sub> : Channel                                                                                                                                         |  |  |
|       | 10 <sub>2</sub> : CPU                                                                                                                                             |  |  |
|       | 11 <sub>2</sub> : Reserved                                                                                                                                        |  |  |
| 11:0  | Source_ID: When Module_ID = $00_2$ , the 12-bit Source_ID field is always set to 0. Otherwise, the identification number of each Module type is set to Source ID. |  |  |

### ERROR MARK ID Set by CPU

TABLE P-6 shows the ERROR\_MARK\_ID set by the CPU.

 TABLE P-6
 ERROR\_MARK\_ID Set by CPU

| Type of data with RAW UE       | Module_ID value (binary)                            | Source_ID value                   |
|--------------------------------|-----------------------------------------------------|-----------------------------------|
| Incoming data from Jupiter Bus | 00 <sub>2</sub> (Memory system)                     | 0                                 |
| Outgoing data to Jupiter Bus   | ASI_EIDR<13:12>. 10 <sub>2</sub> (CPU) is expected. | ASI_EIDR (Identifier of self CPU) |
| U2 cache data, D1 cache data   | ASI_EIDR<13:12>. 10 <sub>2</sub> (CPU) is expected. | ASI_EIDR (Identifier of self CPU) |

# P.2.5 ASI\_EIDR

The ASI\_EIDR register designates the source ID in the ERROR\_MARK\_ID of the CPU.

| [1] | Register name:     | ASI_EIDR         |
|-----|--------------------|------------------|
| [2] | ASI:               | 6E <sub>16</sub> |
| [3] | VA:                | 0016             |
| [4] | Error checking:    | Parity.          |
| [5] | Format & function: | See TABLE P-7.   |

 TABLE P-7
 ASI\_EIDR Bit Description

| Bit   | Name          | RW | Description                                    |
|-------|---------------|----|------------------------------------------------|
| 63:14 | Reserved      | R  | Always 0.                                      |
| 13:0  | ERROR_MARK_ID | RW | ERROR_MARK_ID for the error caused by the CPU. |

# P.2.6 Control of Error Action (ASI\_ERROR\_CONTROL)

Error detection masking and the action after error detection are controlled by the value in ASI\_ERROR\_CONTROL, as defined in TABLE P-8.

| [1] | Register name:          | ASI_ERROR_CONTROL (ASI_ECR)                                                                                                                                                                                             |
|-----|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [2] | ASI:                    | 4C <sub>16</sub>                                                                                                                                                                                                        |
| [3] | VA:                     | 10 <sub>16</sub>                                                                                                                                                                                                        |
| [4] | Error checking:         | None                                                                                                                                                                                                                    |
| [5] | Format & function:      | See TABLE P-8.                                                                                                                                                                                                          |
| [6] | Initial value at reset: | Hard POR: ASI_ERROR_CONTROL.WEAK_ED is set to 1. Other<br>fields are set to 0.<br>Other resets: After UGE_HANDLER and WEAK_ED are copied into<br>ASI_STCHG_ERROR_INFO, all fields in<br>ASI_ERROR_CONTROL are set to 0. |

The ASI\_ERROR\_CONTROL register controls error detection masking, error trap occurrence masking, and the multiple-ADE trap occurrence. The register fields are described in TABLE P-8.

| Bit       | Name        | RW | Description                                                                                                                                                                     |  |
|-----------|-------------|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 9         | RTE_UE      | RW | Restrainable Error Trap Enable submask for UE and Raw UE.<br>The bit works as defined in TABLE P-2.                                                                             |  |
| 8         | RTE_CEDG    | RW | Restrainable Error Trap Enable submask for Corrected Error (CE) and Degradation (DG). The bit works as defined in TABLE P-2.                                                    |  |
| 1         | WEAK_ED     | RW | Weak Error Detection. Controls whether the detection of I_UGE<br>and DAE is suppressed:<br>When WEAK_ED = 0, error detection is not suppressed.                                 |  |
|           |             |    | When $WEAK\_ED = 1$ , error detection is suppressed if the CPU can continue processing.                                                                                         |  |
|           |             |    | When I_UGE or DAE is detected during instruction execution<br>while WEAK_ED = 1, the value of the output register or the<br>store target memory location becomes unpredictable. |  |
|           |             |    | Even if WEAK_ED = 1, I_UGE or DAE is detected and the corresponding trap is set when the CPU cannot continue processing by ignoring the error.                                  |  |
|           |             |    | WEAK_ED is the trap disabling mask for A_UGE and restrainable errors, as defined in TABLE P-2.                                                                                  |  |
|           |             |    | When a multiple-ADE trap is set (I_UGE, IAE, or DAE detection<br>while ASI_ERROR_CONTROL.UGE_HANDLER = 1),<br>WEAK_ED is set to 1 by hardware.                                  |  |
| 0         | UGE_HANDLER | RW | Designates whether hardware can expect a UGE handler to run<br>in privileged software (operating system) when a UGE error<br>occurs:                                            |  |
|           |             |    | 0: Hardware recognizes that the privileged software UGE                                                                                                                         |  |
|           |             |    | handler does not run.<br>1: Hardware expects that the privileged software UGE<br>handler runs.                                                                                  |  |
|           |             |    | UGE_HANDLER is the trap disabling mask for A_UGE and restrainable errors, as defined in TABLE P-2.                                                                              |  |
|           |             |    | The value of UGE_HANDLER determines whether a multiple-<br>ADE trap is caused or not upon detection of I_UGE, IAE, and<br>DAE.                                                  |  |
|           |             |    | When an <i>async_data_error</i> trap occurs, UGE_HANDLER is set to 1.                                                                                                           |  |
|           |             |    | When a RETRY or DONE instruction is completed, UGE_HANDLER is set to 0.                                                                                                         |  |
| Othe<br>r | Reserved    | R  | Always reads as 0.                                                                                                                                                              |  |

 TABLE P-8
 ASI\_ERROR\_CONTROL Bit Description

# P.3 Fatal Error and error\_state Transition Error

## P.3.1 ASI\_STCHG\_ERROR\_INFO

The ASI\_STCHG\_ERROR\_INFO register stores detected FATAL error and error\_state transition error information, for access by OBP (Open Boot PROM) software.

| [1] | Register name:          | ASI_STCHG_ERROR_INFO                                            |
|-----|-------------------------|-----------------------------------------------------------------|
| [2] | ASI:                    | 4C <sub>16</sub>                                                |
| [3] | VA:                     | 18 <sub>16</sub>                                                |
| [4] | Error checking:         | None                                                            |
| [5] | Format & function:      | See TABLE P-9                                                   |
| [6] | Initial value at reset: | Hard POR: All fields are set to 0.                              |
|     |                         | Other resets: Values are unchanged.                             |
| [7] | Update policy:          | Upon detection of each related error, the corresponding bit in  |
|     |                         | ASI_STCHG_ERROR_INFO is set to 1. Writing 1 to bit 0 erases all |
|     |                         | error indications in ASI_STCHG_ERROR_INFO (sets all bits in the |
|     |                         | register, including bit 0, to 0).                               |

TABLE P-9 describes the fields in the ASI\_STCHG\_ERROR\_INFO register.

| Bit   | Name            | RW | Description                                                                                            |
|-------|-----------------|----|--------------------------------------------------------------------------------------------------------|
| 63:34 | Reserved        | R  | Always 0.                                                                                              |
| 33    | ECR_WEAK_ED     | R  | ASI_ERROR_CONTROL.WEAK_ED is copied into this field at the beginning of a POR or watchdog reset.       |
| 32    | ECR_UGE_HANDLER | R  | ASI_ERROR_CONTROL.UGE_HANDLER is copied into this field at the beginning of the POR or watchdog reset. |
| 31:24 | Reserved        | R  | Always 0.                                                                                              |
| 23    | EE_MODULE       | RW | Error state transient error requires module degradation,<br>Sticky                                     |
| 22    | EE_CORE         | RW | Error state transient error requires core degradation, Sticky                                          |
| 21    | EE_THREAD       | RW | Error state transient error requires thread degradation, Sticky                                        |
| 20    | UGE_MODULE      | RW | Urgent error requires module degradation, Sticky                                                       |
| 19    | UGE_CORE        | RW | Urgent error requires core degradation, Sticky                                                         |
| 18    | UGE_THREAD      | RW | Urgent error requires thread degradation, Sticky                                                       |
| 17    | rawUE_MODULE    | RW | RawUE detected in L2\$, sticky                                                                         |
| 16    | rawUE_CORE      | RW | RawUE detected in L1\$, sticky                                                                         |

| Bit | Name                   | RW | Description                                                                                                                                                                                                                                              |
|-----|------------------------|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 15  | EE_DCUCR_MCNTL_EC<br>R | R  | Uncorrectable error in any of the following:<br>(A) ASI_DCUCR<br>(A) ASI_MCNTL<br>(A) ASI_ECR                                                                                                                                                            |
| 14  | EE_OTHER               | R  | Set to 1 upon detection of error_state transition errors not listed elsewhere. The field is always 0 for SPARC64 VII.                                                                                                                                    |
| 13  | EE_TRAP_ADR_UE         | R  | When hardware calculated the trap address to cause a trap,<br>the valid address could not be obtained because of a UE in<br>%tba, a UE in %tt, or a UE in the address calculator.                                                                        |
| 12  | FE_OPSR                |    | An uncorrectable error occurred in OPSR (Operation Status<br>Register); valid CPU operation after such an error cannot be<br>guaranteed. OPSR is the hardware mode-setting register.<br>OSPR is not visible to software and is set by a JTAG<br>command. |
| 11  | EE_WDT_IN_MAXTL        | R  | A watchdog time-out occurred while TL = MAXTL.                                                                                                                                                                                                           |
| 10  | EE_SECOND_WDT          | R  | A second watchdog time-out was detected after an <i>async_data_error</i> exception with watchdog time-out indication (first watchdog time-out) was generated.                                                                                            |
| 9   | EE_SIR_IN_MAXTL        | R  | An SIR occurred while TL = MAXTL.                                                                                                                                                                                                                        |
| 8   | EE_TRAP_IN_MAXTL       | R  | A trap occurred while $TL = MAXTL$ .                                                                                                                                                                                                                     |
| 7:3 | Reserved               | R  | Always 0.                                                                                                                                                                                                                                                |
| 2   | FE_OTHER               | R  | Set to 1 upon detection of urgent errors not listed elsewhere.                                                                                                                                                                                           |
| 1   | FE_U2TAG_UE            | R  | Upon detection of the corresponding error, set to 1.                                                                                                                                                                                                     |
| 0   | FE_JBUS_UE             | RW | An uncorrected error in the Jupiter bus.                                                                                                                                                                                                                 |
|     |                        |    | Writing 1 to this bit sets all fields in this register to 0.                                                                                                                                                                                             |

#### TABLE P-9 ASI\_STCHG\_ERROR\_INFO bit description

**Compatibility Note** – EE\_OPSR in SPARC64 V is changed to FE\_OPSR in SPARC64 VII. There are no changes in the other error\_state transition errors.

## P.3.2 Error\_state Transition Error in Suspended Thread

SPARC64 VII allows itself to enter the suspend state by means of a suspend instruction. Only POR, WDR, XDR, *interrupt\_vector* and *interrupt\_level\_n* exceptions can return it back to the running state. If an error occurred in the resources related to those exceptions, the thread stays suspended forever. To prevent this situation, an urgent error regarding the following registers is reported as error\_state transition error in suspended state.

- ASI\_EIDR
- STICK, STICK\_CMPR

■ TICK, TICK\_CMPR

In this case, ASI\_STCHG\_ERROR\_INFO.UGE\_CORE, along with corresponding bit of ASI\_UGESR is set to 1.

# P.4 Urgent Error

This section presents details about urgent errors: status monitoring, actions, and endmethods.

## P.4.1 URGENT ERROR STATUS (ASI UGESR)

| [1] | Register name:          | ASI_URGENT_ERROR_STATUS                                         |
|-----|-------------------------|-----------------------------------------------------------------|
| [2] | ASI:                    | 4C <sub>16</sub>                                                |
| [3] | VA:                     | 08 <sub>16</sub>                                                |
| [4] | Error checking:         | None                                                            |
| [5] | Format & function:      | See TABLE P-10.                                                 |
| [6] | Initial value at reset: | Hard POR: All fields are set to 0.                              |
|     |                         | Other resets: The values of all ASI UGESR fields are unchanged. |

The ASI\_UGESR register contains the following information when an *async\_data\_error* (ADE) exception is generated.

- Detected I\_UGEs and A\_UGEs, and related information
- The type of second error to cause multiple async\_data\_error traps

TABLE P-10 describes the fields of the ASI\_UGESR register. In the table, the prefixes in the name field have the following meaning:

- IUG\_ Instruction Urgent error
- IAG\_ Autonomous Urgent error
- IAUG\_ The error detected as both I\_UGE and A\_UGE

### TABLE P-10 ASI\_UGESR Bit Description (1 of 4)

| Bit  | Name       | RW          | Description                                                         |
|------|------------|-------------|---------------------------------------------------------------------|
| Each | bit in ASI | UGESR<22:8> | indicates the occurrence of its corresponding error in a single-ADE |

trap as follows:

- 0: The error is not detected.
- 1: The error is detected.

Each bit in ASI\_UGESR<22:16> indicates an error in a CPU register. The error detection conditions for these errors are defined in *Internal Register Error Handling* on page 201.

| Bit | Name          | RW | Description                                                                                                                                                                                                                                                                                                             |  |
|-----|---------------|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 22  | 22 IAUG_CRE R |    | Uncorrectable error in any of the following:<br>(IA) ASI_EIDR<br>(IA) ASI_PA_WATCH_POINT when enabled<br>(IA) ASI_VA_WATCH_POINT when enabled<br>(I) ASI_AFAR_D1<br>(I) ASI_AFAR_U2<br>(I) ASI_INTR_R<br>(A) ASI_INTR_DISPATCH_W (UE at store)<br>(IA) SOFTINT<br>(IA) STICK<br>(IA) STICK_COMP                         |  |
| 21  | IAUG_TSBCTXT  | R  | Uncorrectable error in any of the following:<br>(IA) ASI_DMMU_TSB_BASE<br>(IA) ASI_DMMU_TSB_PEXT<br>(IA) ASI_DMMU_TSB_SEXT<br>(IA) ASI_DMMU_TSB_NEXT<br>(IA) ASI_PRIMARY_CONTEXT<br>(IA) ASI_SECONDARY_CONTEXT<br>(IA) ASI_SHARED_CONTEXT<br>(IA) ASI_IMMU_TSB_BASE<br>(IA) ASI_IMMU_TSB_PEXT<br>(IA) ASI_IMMU_TSB_NEXT |  |
| 20  | IUG_TSBP      | R  | Uncorrectable error in any of the following:<br>(I) ASI_DMMU_TAG_TARGET<br>(I) ASI_DMMU_TAG_ACCESS<br>(I) ASI_DMMU_TSB_8KB_PTR<br>(I) ASI_DMMU_TSB_64KB_PTR<br>(I) ASI_DMMU_TSB_DIRECT_PTR<br>(I) ASI_IMMU_TAG_TARGET<br>(I) ASI_IMMU_TAG_ACCESS<br>(I) ASI_IMMU_TSB_8KB_PTR<br>(I) ASI_IMMU_TSB_64KB_PTR               |  |
| 19  | IUG_PSTATE    | R  | Uncorrectable error in any of the following: %pstate, %pc,<br>%npc, %cwp, %cansave, %canrestore, %otherwin,<br>%cleanwin, %pil, %wstate                                                                                                                                                                                 |  |
| 18  | IUG_TSTATE    | R  | Uncorrectable error in any of %tstate, %tpc, %tnpc.                                                                                                                                                                                                                                                                     |  |
| 17  | IUG_%F        | R  | Uncorrectable error in any floating-point register or in the FPRS, FSR, or GSR register.                                                                                                                                                                                                                                |  |
| 16  | IUG_%R        | R  | Uncorrectable error in any general-purpose (integer) register, or in the Y, CCR, or ASI register.                                                                                                                                                                                                                       |  |
| 14  | IUG_WDT       | R  | Watchdog timeout first time. Indicates the first watchdog timeout. If IUG_WDT = 1 when a single-ADE trap occurs, the instruction pointed to by TPC is abandoned and its result is unpredictable.                                                                                                                        |  |

### TABLE P-10 ASI\_UGESR Bit Description (2 of 4)

| 10  | IUG_DTLB    | R | Uncorrectable error in DTLB during load, store, or demap. Indicates that one of the following errors was detected during a data TLB access:                                                                                                                                                                                                                                                                                                       |
|-----|-------------|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|     |             |   | <ul> <li>An uncorrectable error in TLB data or TLB tag was detected when an LDXA instruction attempted to read ASI_DTLB_DATA_ACCESS or ASI_DTLB_TAG_ACCESS. TPC indicates either the instruction causing the error or the previous instruction.</li> <li>A store to the data TLB or a demap of the data TLB failed. TPC indicates either the instruction causing the error or the instruction following the one that caused the error.</li> </ul> |
| 9   | IUG_ITLB    | R | Uncorrectable error in ITLB during load, store, or demap. Indicates that one of the following errors was detected during an instruction TLB access:                                                                                                                                                                                                                                                                                               |
|     |             |   | <ul> <li>An uncorrectable error in TLB data or TLB tag was detected when an LDXA instruction attempted to read ASI_ITLB_DATA_ACCESS or ASI_ITLB_TAG_ACCESS. TPC indicates either the instruction causing the error or the previous instruction.</li> <li>A store to the instruction TLB or a demap of the instruction TLB failed. TPC indicates either the instruction causing the error or the</li> </ul>                                        |
| 8   | IUG COREERR | R | following instruction.<br>CPU core error. Indicates an uncorrectable error in a CPU internal                                                                                                                                                                                                                                                                                                                                                      |
|     | -           |   | resource used to execute instructions.                                                                                                                                                                                                                                                                                                                                                                                                            |
|     |             |   | When there is an uncorrectable error in a program-visible register<br>and the instruction reading the register with UE is executed, the<br>error in the register is always indicated. In this case,<br>IUG_COREERR may or may not be indicated simultaneously with<br>the register error.                                                                                                                                                         |
| 5:4 | INSTEND     | R | Trapped instruction end-method. Upon a single async_data_error<br>trap without watchdog time-out detection, INSTEND indicates the<br>instruction end-method of the trapped instruction pointed to by TPC<br>as follows:<br>00 <sub>2</sub> : Precise<br>01 <sub>2</sub> : Retryable but not precise<br>10 <sub>2</sub> : Reserved<br>11 <sub>2</sub> : Not retryable                                                                              |
|     |             |   | See Section P.4.3 for the instruction end-method for the                                                                                                                                                                                                                                                                                                                                                                                          |

### TABLE P-10 ASI\_UGESR Bit Description (3 of 4)

RW

Description

Bit

Name

See Section P.4.3 for the instruction end-method for the *async\_data\_error* trap. When a watchdog time-out is detected, the instruction end-method is undefined.

#### TABLE P-10 ASI\_UGESR Bit Description (4 of 4)

| Bit   | Name      | RW | Description                                                                                                                                                                                                                                                                   |  |
|-------|-----------|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 3     | PRIV      | R  | Privileged mode. Upon a single <i>async_data_error</i> trap, the PRIV field is set as follows:                                                                                                                                                                                |  |
|       |           |    | When the value of PSTATE.PRIV immediately before the single-<br>ADE trap is unknown because of an uncorrectable error in PSTATE,<br>ASI_UGESR.PRIV is set to 1. Otherwise, the value of<br>PSTATE.PRIV immediately before the single-ADE trap is copied<br>to ASI_UGESR.PRIV. |  |
| 2     | MUGE_DAE  | R  | Multiple UGEs caused by DAE. Upon a single-ADE, MUGE_DAE is<br>set to 0. Upon a multiple-ADE trap caused by a DAE, MUGE_DAE is<br>set to 1. Upon a multiple-ADE trap not caused by a DAE,<br>MUGE_DAE is unchanged.                                                           |  |
| 1     | MUGE_IAE  | R  | Multiple UGEs caused by IAE. Upon a single-ADE trap, MUGE_1<br>is set to 0. Upon a multiple-ADE trap caused by an IAE, MUGE_1<br>is set to 1. Upon a multiple-ADE trap not caused by an IAE,<br>MUGE_IAE is unchanged.                                                        |  |
| 0     | MUGE_IUGE | R  | Multiple UGEs caused by I_UGE. Upon a single-ADE trap,<br>MUGE_IUGE is set to 0. Upon a multiple-ADE trap caused by an<br>I_UGE, MUGE_IUGE is set to 1. Upon a multiple-ADE trap not<br>caused by an I_UGE, MUGE_IUGE is unchanged.                                           |  |
| Other | Reserved  | R  | Always 0.                                                                                                                                                                                                                                                                     |  |

## P.4.2 Action of *async\_data\_error* (ADE) Trap

The single-ADE trap and the multiple-ADE trap are generated upon the conditions defined in TABLE P-2 on page 179. The actions upon their occurrence are defined in more detail in this section. For convenience, the shorthand ADE is used to refer to *async\_data\_error*.

#### 1. Conditions that cause an ADE trap:

An ADE trap occurs when one of the following conditions is satisfied:

- When ASI\_ERROR\_CONTROL.UGE\_HANDLER = 0 and I\_UGEs and/or A\_UGEs are detected, a single-ADE trap is generated.
- When ASI\_ERROR\_CONTROL.UGE\_HANDLER = 1 and I\_UGEs, IAE, and/or DAE are detected, a multiple-ADE trap is generated.

#### 2. State change, trap target address calculation, and TL manipulation.

The following actions are executed in this order:

### a. State transition

if (TL = MAXTL), the CPU enters error\_state and abandons the ADE trap;

else if (CPU is in execution state && (TL = MAXTL - 1)), then the CPU enters RED\_state.

#### b. Trap target address calculation

When the CPU is in execution state, trap target address is calculated by %tba, %tt, and %tl.

Otherwise, the CPU is in RED\_state and the trap target address is set to RSTVaddr +  $A0_{16}$ .

c. TL increases:  $TL \leftarrow TL + 1$ .

#### 3. Save the old value into TSTATE, TPC, and TNPC.

PSTATE, PC, and NPC immediately before the ADE trap are copied into TSTATE, TPC, and TNPC, respectively. If the copy source register contains an uncorrectable error, the copy target register also contains the UE.

### 4. Set the specific register setting:

The following three sets of registers are updated:

#### a. Update and validation of specific registers.

Hardware writes the registers listed in TABLE P-11.

| Register                                              | Condition For Writing           | Value Written                                                                                                                                                                                                              |
|-------------------------------------------------------|---------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PSTATE                                                | Always                          | AG = 1, $MG = 0$ , $IG = 0$ , $IE = 0$ , $PRIV = 1$ , $AM = 0$ , $PEF = 1$ ,<br>RED = 0 (or 1 depending on the CPU status), $MM = 00$ , $TLE = 0$ ,<br>CLE = 0.                                                            |
| PC                                                    | Always                          | ADE trap address.                                                                                                                                                                                                          |
| nPC                                                   | Always                          | ADE trap address + 4.                                                                                                                                                                                                      |
| CCR                                                   | When the register contains UE   | 0.                                                                                                                                                                                                                         |
| FSR, GSR                                              | When the register contains UE   | If either FSR or GSR contains a UE, 0 is written to that register.<br>When 0 is written to FSR and/or GSR upon a single-ADE trap,<br>ASI_UGESR.IUG_%F is set to 1.                                                         |
| CWP, CANSAVE,<br>CANRESTORE,<br>OTHERWIN,<br>CLEANWIN | When the register contains UE   | Any register among CWP, CANSAVE, CANRESTORE, OTHERWIN,<br>and CLEANWIN that contains a UE is written to 0. When 0 is<br>written to one of these registers upon a single-ADE trap,<br>ASI_UGESR.IUG_PSTATE = 1 is set to 1. |
| TICK                                                  | When the register contains UE   | NPT = 1, Counter = $0$ .                                                                                                                                                                                                   |
| TICK_COMPAR                                           | E When the register contains UE | $INT_DIS = 1$ , $TICK_CMPR = 0$ .                                                                                                                                                                                          |

TABLE P-11 Registers Written for Update and Validation

The error(s) in a written register are removed by setting the correct value to the error checking (parity) code during the full write of the register.

Errors in registers other than those listed above and any errors in the TLB entry remain.

- b. Update of ASI\_UGESR, as shown in TABLE P-12.
- c. Update of ASI ERROR CONTROL

| Bit  | Field            | Update upon a Single-ADE Trap                                            | Update upon a Multiple-ADE Traps                                                                              |
|------|------------------|--------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| 63:6 | Error indication | All bits in this field are updated.                                      | Unchanged.                                                                                                    |
|      |                  | All I_UGEs and A_UGEs detected at the trap are indicated simultaneously. |                                                                                                               |
| 5:4  | INSTEND          | The instruction end-method of the instruction referenced by TPC is set.  | Unchanged.                                                                                                    |
| 2    | MUGE_DAE         | Set to 0.                                                                | If the multiple-ADE trap was caused by a DAE, MUGE_DAE is set to 1.<br>Otherwise, MUGE_DAE is unchanged.      |
| 1    | MUGE_IAE         | Set to 0.                                                                | If the multiple-ADE trap was caused by an IAE, MUGE_IAE is set to 1.<br>Otherwise, MUGE_IAE is unchanged.     |
| 0    | MUGE_IUGE        | Set to 0.                                                                | If the multiple-ADE trap was caused by an I_UGE, MUGE_IUGE is set to 1.<br>Otherwise, MUGE_IUGE is unchanged. |

 TABLE P-12
 ASI
 UGESR
 Update for Single and Multiple-ADE
 Exceptions

Upon a single-ADE trap, ASI\_ERROR\_CONTROL.UGE\_HANDLER is set to 1. During the period after the single-ADE trap occurs and before a RETRY or DONE instruction is executed, UGE\_HANDLER = 1 tells hardware that the urgent error handler is running.

Upon a multiple *async\_data\_error* trap, ASI\_ERROR\_CONTROL.WEAK\_ED is set to 1 and the CPU starts running in the weak error detection state.

#### 5. Set ASI\_ERROR\_CONTROL.UGE\_HANDLER to 0.

Upon completion of a RETRY or DONE instruction, ASI\_ERROR\_CONTROL.UGE\_HANDLER is set to 0.

## P.4.3 Instruction End-Method at ADE Trap

In SPARC64 VII, upon occurrence of the ADE trap, the trapped instruction referenced by TPC ends by using one of the following instruction end-methods:

- Precise
- Retryable but not precise (not included in JPS1)
- Not retryable (not included in JPS1)

Upon a single-ADE trap, the trapped instruction end-method is indicated in ASI UGESR.INSTEND.

TABLE P-13 defines each instruction end-method after an ADE trap.

|                                                                                                                                                                              | Precise                                                                                                                                                                                                                                               | Retryable But Not Precise                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Not Retryable                                                                                                                                                                                                                                                                                                                                                                                               |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Instructions executed after the last ADE, IAE, or DAE trap and before the trapped instruction referenced by TPC.                                                             | Ended (Committed).<br>The instructions without UGE complete as defined in the architecture. The instruction<br>UGE has unpredictable value at its output (destination register or, in the case of a sto<br>instruction, destination memory location). |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                             |  |
| The trapped instruction<br>referenced by TPC                                                                                                                                 | Not executed.                                                                                                                                                                                                                                         | <ul> <li>The output of the instruction is incomplete.</li> <li>Part of the output may be changed, or the invalid value may be written to the instruction output. However, the modification to the invalid target that is not defined as instruction output is not executed.</li> <li>The following modifications are not executed:</li> <li>Store to the cacheable area including cache.</li> <li>Store to the noncacheable area.</li> <li>Output to the source register of the instruction (destructive overlap)</li> </ul> | The output of the instruction is<br>incomplete.<br>Part of the output may be changed,<br>or the invalid value may be written<br>to the instruction output. However,<br>the modification to the invalid target<br>that is not defined as instruction<br>output is not executed.<br>A store to an invalid address is not<br>executed. (Store to a valid address<br>with uncorrected data may be<br>executed.) |  |
| Instructions to be executed<br>after the instruction referenced<br>by TPC                                                                                                    | Not executed.                                                                                                                                                                                                                                         | Not executed.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Not executed.                                                                                                                                                                                                                                                                                                                                                                                               |  |
| The possibility of resuming the<br>trapped program by executing<br>the RETRY instruction to the<br>%tpc when the trapped<br>program is not damaged at the<br>single-ADE trap | Possible.                                                                                                                                                                                                                                             | Possible.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Impossible.                                                                                                                                                                                                                                                                                                                                                                                                 |  |

 TABLE P-13
 Instruction End-Method After async\_data\_error Exception

# P.4.4 Expected Software Handling of ADE Trap

The expected software handling of an ADE trap is described by the pseudo C code below. The main purpose of this flow is to recover from the following errors as much as possible:

- An error in the CPU internal RAM or register file
- An error in the accumulator
- An error in the CPU internal temporary registers and data bus

```
void
expected_software_handling_of_ADE_trap()
{
    /* Only %r0-%r7 can be used from here to Point#1 because the register window
        control registers may not have valid value until Point#1. It is
```

```
recommended that only %r0-%r7 are used as general-purpose registers (GPR)
   in the whole single-ADE trap handler, if possible. */
ASI SCRATCH REGp \leftarrow \$rX;
ASI SCRATCH REGq \leftarrow \$rY;
rX \leftarrow ASI UGESR;
if ((%rX && 0x07) ≠ 0) {
      /* multiple-ADE trap occurrence */
     invoke panic routine and take system dump as much as possible
     with the running environment of ASI ERROR CONTROL.WEAK ED == 1;
}
if (%rX.IUG %R == 1) {
    r1-r31 except rX and rY \leftarrow r0;
    y \leftarrow r0;
    <code>%tstate.pstate</code> \leftarrow %r0; /* because ccr or asi field in %tstate.pstate
                                contains the error */
}
else {
    save required %r1-%r7 to the ADE trap save area, using %rX, %rY,
       ASI SCRATCH REGp and ASI SCRATCH REGq;
    /* whole %r save and restore is required to retry the context
       with PSTATE.AG == 1 */
}
if (ASI UGESR.IUG PSTATE == 1) {
    tstate.pstate \leftarrow r0;
    tpc \leftarrow r0;
    pil \leftarrow r0;
    wstate \leftarrow r0;
    All general-purpose registers in the register window \leftarrow %r0;
    Set the register window control registers
        (CWP, CANSAVE, CANRESTORE, OTHERWIN, CLEANWIN) to appropriate values;
}
/* Point#1: Program can use the general-purpose registers except %r0-%r7
   after this because the register window control registers were validated
   in the above step. */
if ((ASI_UGESR.IAUG_CRE == 1) || (ASI_UGESR.IAUG_TSBCTXT == 1) ||
    (ASI UGESR.IUG TSBP == 1) || (ASI UGESR.IUG TSTATE == 1) ||
    (ASI UGESR.IUG %F==1)) {
    Write to each register with an error indication, to erase as many
        register errors as possible;
}
if (ASI UGESR.IUG DTLB == 1) {
    execute demap all for DTLB;
    /* A locked fDTLB entry with uncorrectable error is not removed by this
      operation. A locked fDTLB entry with UE never detects its tag match or
      causes the data_access_error trap when its tag matches at the DTLB
      reference for address translation. */
}
if (ASI UGESR.IUG ITLB == 1) {
```

```
execute demap all for ITLB;
   /* A locked fITLB entry with uncorrectable error is not removed by this
      operation. A locked fITLB entry with UE never detects its tag match
      or causes the data access error trap when its tag matches at the ITLB
      reference for address translation. */
}
if ((ASI UGESR.bits22:14 == 0) &&
   ((ASI UGESR.INSTEND == 0) || (ASI UGESR.INSTEND == 1))) {
   ++ADE_trap_retry_per_unit_of_time;
   if (ADE trap retry per unit of time < threshold)
        resume the trapped context by use of the RETRY instruction;
   else
        invoke panic routine because of too many ADE trap retries;
}
else if ((ASI UGESR.bits22:18 == 0) &&
       (ASI UGESR.bits15:14 == 0) &&
       (ASI UGESR.PRIV == 0)) {
    ++ADE trap kill user per unit of time;
    if (ADE trap kill user per unit of time < threshold)
        kill one user process trapped and continue system operation;
   else
       invoke panic routine because of too may ADE trap user kill;
}
else
   invoke panic routine because of unrecoverable urgent error;
```

# P.5 Instruction Access Errors

See Appendix F for details.

# P.6 Data Access Errors

See Appendix F for details.

# P.7 Restrainable Errors

This section describes the registers—ASI\_ASYNC\_FAULT\_STATUS, ASI\_ASYNC\_FAULT\_ADDR\_D1, and ASI\_ASYNC\_FAULT\_ADDR\_U2—that define the restrainable errors and explains how software handles these errors.

## P.7.1 ASI\_ASYNC\_FAULT\_STATUS (ASI\_AFSR)

| [1] | Register name:          | ASI_ASYNC_FAULT_STATUS (ASI_AFSR)               |
|-----|-------------------------|-------------------------------------------------|
| [2] | ASI:                    | 4C <sub>16</sub>                                |
| [3] | VA:                     | 0016                                            |
| [4] | Error checking:         | None                                            |
| [5] | Format & function:      | See TABLE P-14                                  |
| [6] | Initial value at reset: | Hard POR: All fields in ASI_AFSR are set to 0.  |
|     |                         | Other resets: Values in ASI_AFSR are unchanged. |

The ASI\_ASYNC\_FAULT\_STATUS register holds the detected restrainable error sticky bits. TABLE P-14 describes the fields of this register. In the table, the prefixes in the name field have the following meaning:

- DG\_ Degradation error
- CE\_ Correctable Error
- UE\_ Uncorrectable Error

| TABLE P-14         ASI_ASYNC_FAULT_STATUS         Bit Description |  |
|-------------------------------------------------------------------|--|
|-------------------------------------------------------------------|--|

| Bit | Name                        | RW   | Description                                                                                                                              |
|-----|-----------------------------|------|------------------------------------------------------------------------------------------------------------------------------------------|
| 12  | DG_U2\$x                    | RW1C | Degradation in U2\$. This bit is set when automatic way reduction is applied in U2\$ due to U2\$ tag errors in system.                   |
| 11  | DG_U2\$                     | RW1C | Degradation in U2\$. This bit is set when automatic way reduction is applied in U2\$ due to U2\$ errors in CPU or System.                |
| 10  | DG_D1\$sTLB                 | RW1C | Degradation in L1\$ and sTLB. This bit is set when<br>automatic way reduction is applied in I1\$, D1\$, sITLB,<br>sDTLB, uITLB and uDTLB |
| 9   | Reserved                    | R    | Always reads as 0; writes are ignored.                                                                                                   |
| 3   | UE_DST_BETO                 | RW1C | Disrupting store JBUS bus error or time-out.                                                                                             |
| 2   | Reserved                    | R    | Always reads as 0; writes are ignored.                                                                                                   |
| 1   | $\mathbf{U}$ E_RAW_L2\$INSD | RW1C | Raw UE in L2 cache inside data.                                                                                                          |

| Bit   | Name            | RW   | Description                            |
|-------|-----------------|------|----------------------------------------|
| 0     | UE_RAW_D1\$INSD | RW1C | Raw UE in D1 cache inside data.        |
| Other | Reserved        | R    | Always reads as 0; writes are ignored. |

TABLE P-14 ASI\_ASYNC\_FAULT\_STATUS Bit Description

**Note** – Disrupting store bus error or time-out is reported as either AFSR.UE\_DST\_BETO, DSFSR.BERR, or DSFSR.RTO exclusively.

**Note** – A load followed by a store with the same address which causes UE\_DST\_BETO may not signals *data\_access\_error*. In this case the data is returned from the store buffer, and AFSR.UE\_DST\_BETO is set eventually.

## P.7.2 ASI\_ASYNC\_FAULT\_ADDR\_D1

The register is always reads as 0; write to this register is ignored in SPARC64 VII.

## P.7.3 ASI\_ASYNC\_FAULT\_ADDR\_U2

The register is always read as 0; write to this register is ignored in SPARC64 VII.

## P.7.4 Expected Software Handling of Restrainable Errors

Error recording and information is expected for all restrainable errors.

The expected software recovery from each type of each restrainable error is described below.

- DG\_L1\$, DG\_U2\$, DG\_U2\$x The following status of the CPU is reported:
  - Performance is degraded by the way reduction in I1\$, D1\$, U2\$, sITLB, or sDTLB.
  - CPU availability may be slightly decreased. If only one way facility is available among I1\$, D1\$, U2\$, sITLB, and sDTLB and further way reduction is detected for this facility, the error\_state transition error is detected.

Software stops the use of the CPU, if required.

- UE\_DST\_BETO This error is caused by either:
  - Invalid DTLB entry is specified, or
  - Invalid memory access instruction when a physical address access ASI is executed in privileged software.

This error is always caused by a mistake in privileged software. Record the error and correct the erroneous privileged software.

- UE\_RAW\_L2\$INSD, and UE\_RAW\_D1\$INSD Software handles these errors as follows:
  - Correct the cache line data containing the uncorrected error by executing a block store with commit instruction, if possible. Note that the original data is deleted by this operation.
  - For UE\_RAW\_L2\$FILL, avoid using the memory block with the UE as much as possible.
- No error indication in ASI\_AFSR at *ECC\_error* trap Ignore the *ECC\_error* trap.

This situation may occur at the condition described in the TABLE P-2 on page 179 (see the third row, last column").

# P.8 Internal Register Error Handling

This section describes error handling for the following registers.

- Nonprivileged and Privileged registers
- ASR registers
- ASI registers

## P.8.1 Nonprivileged and Privileged Registers Error Handling

The terminology used in TABLE P-15 is defined as follows:

| Column                    | Term       | Meaning                                                                                                     |
|---------------------------|------------|-------------------------------------------------------------------------------------------------------------|
| Error Detect<br>Condition | InstAccess | The error is detected when the instruction accesses the register.                                           |
| Correction                | W          | The error indication is removed when an instruction performs a full write to the register                   |
|                           | ADE trap   | The error is removed by a full write to the register in the <i>async_data_error</i> hardware trap sequence. |

TABLE P-15 shows error handling for nonprivileged and privileged registers.

| Desister Neme                                         | RW | Error<br>Protect | Error Detect Condition        |                        | Correction  |
|-------------------------------------------------------|----|------------------|-------------------------------|------------------------|-------------|
| Register Name                                         |    |                  |                               | Error Type             |             |
| %rn                                                   | RW | Parity           | InstAccess                    | IUG_%R                 | W           |
| %fn                                                   | RW | Parity           | InstAccess                    | IUG_%F                 | W           |
| PC                                                    |    | Parity           | Always                        | IUG_PSTATE             | ADE trap    |
| nPC                                                   |    | Parity           | Always                        | IUG_PSTATE             | ADE trap    |
| PSTATE                                                | RW | Parity           | Always                        | IUG_PSTATE             | ADE trap, W |
| TBA                                                   | RW | Parity           | PSTATE.RED=0                  | error_state            | W (by OBP)  |
| PIL                                                   | RW | Parity           | PSTATE . IE = 1<br>InstAccess | IUG_CORE<br>IUG_PSTATE | W           |
| CWP, CANSAVE,<br>CANRESTORE,<br>OTHERWIN,<br>CLEANWIN | RW | Parity           | Always                        | IUG_PSTATE             | ADE trap, W |
| ТТ                                                    | RW | None             | _                             | _                      | _           |
| TL                                                    | RW | Parity           | PSTATE.RED = $0$              | error_state            | W (by OBP)  |
| TPC                                                   | RW | Parity           | InstAccess                    | IUG_TSTATE             | W           |

 TABLE P-15
 Nonprivileged and Privileged Registers Error Handling

|               |    | Error   |                         |             |                           |
|---------------|----|---------|-------------------------|-------------|---------------------------|
| Register Name | RW | Protect | Error Detect Condition  | Error Type  | Correction                |
| TNPC          | RW | Parity  | InstAccess              | IUG_TSTATE  | W                         |
| TSTATE        | RW | Parity  | InstAccess              | IUG_TSTATE  | W                         |
| WSTATE        | RW | Parity  | Always                  | IUG_PSTATE  | W                         |
| VER           | R  | None    | _                       | _           | _                         |
| FSR           | RW | Parity  | Always                  | IUG_%F      | ADE trap, W               |
| Y             | RW | Parity  | InstAccess              | IUG_%R      | W                         |
| CCR           | RW | Parity  | Always                  | IUG_%R      | ADE trap, W               |
| ASI           | RW | Parity  | Always                  | IUG_%R      | ADE trap, W               |
| TICK          | RW | Parity  | AUG Always <sup>1</sup> | IUG_COREERR | ADE trap <sup>2</sup> , W |
| FPRS          | RW | Parity  | Always                  | IUG_%F      | ADE trap, W               |

 TABLE P-15
 Nonprivileged and Privileged Registers Error Handling

1.Notified as error\_state transition error in suspended state.

2.TICK, TICK\_COMPARE are set to 0x8000\_0000\_0000 on ADE trap for correction.

## P.8.2 ASR Error Handling

The terminology used in TABLE P-16 is defined as follows:

| Column                                           | Term               | Meaning                                                                                                     |
|--------------------------------------------------|--------------------|-------------------------------------------------------------------------------------------------------------|
| Error Detect<br>Condition                        | AUG always         | The error is detected while<br>(ASI_ERROR_CONTROL.UGE_HANDLER = 0) &&<br>(ASI_ERROR_CONTROL.WEAK_ED = 0)    |
|                                                  | InstAccess         | The error is detected when the instruction accesses the register.                                           |
| Error Type                                       | (I)AUG_ <i>xxx</i> | The error is indicated by $ASI\_UGESR.IAUG\_xxx = 1$ , and the error is an autonomous urgent error.         |
|                                                  | I(A)UG_xxx         | The error is indicated by $ASI\_UGESR.IAUG\_xxx = 1$ , and the error is an instruction urgent error.        |
| Correction W The error is removed b instruction. |                    | The error is removed by a full write to the register by an instruction.                                     |
|                                                  | ADE trap           | The error is removed by a full write to the register in the <i>async_data_error</i> hardware trap sequence. |

TABLE P-16 shows the handling of ASR errors.

### STICK Behavior upon Error

When error is occurred in *%stick* register, countup is stopped regardless of the error detect condition described in TABLE P-16.

| ASR<br>Numbe | er Register Name | RW | Error Protect | Error Detect Condition  | Error Type  | Correction  |
|--------------|------------------|----|---------------|-------------------------|-------------|-------------|
| 16           | PCR              | RW | None          | _                       | _           | _           |
| 17           | PIC              | RW | None          | _                       | _           | —           |
| 18           | DCR              | R  | None          | _                       | _           | _           |
| 19           | GSR              | RW | Parity        | Always                  | IUG_%F      | ADE trap, W |
| 20           | SET_SOFTINT      | W  | None          | _                       | _           | —           |
| 21           | CLEAR_SOFTINT    | W  | None          | _                       | _           | —           |
| 22           | SOFTINT          | RW | None          | _                       | _           | _           |
| 23           | TICK_COMPARE     | RW | Parity        | AUG Always <sup>1</sup> | IUG_COREERR | ADE trap, W |
| 24           | STICK            | RW | Parity        | AUG always <sup>1</sup> | (I)AUG_CRE  | W           |
|              |                  |    |               | InstAccess              | I(A)UG_CRE  | W           |
| 25           | STICK_COMPARE    | RW | Parity        | AUG always <sup>1</sup> | (I)AUG_CRE  | W           |
|              |                  |    |               | InstAccess              | I(A)UG_CRE  | W           |

 TABLE P-16
 ASR Error Handling

1.Notified as error\_state transition error in suspended state.

# P.8.3 ASI Register Error Handling

| Column        | Term   | Meaning                                                                                                    |
|---------------|--------|------------------------------------------------------------------------------------------------------------|
| Error Protect | Parity | Parity protected.                                                                                          |
|               | ECC    | ECC (double-bit error detection, single-bit error correction) protected.                                   |
| Gecc          |        | Generated ECC.                                                                                             |
|               | PP     | Parity propagation. The parity error in the input registers to calculate the register value is propagated. |

The terminology used in TABLE P-17 is defined as follows:

| Column       | Term         | Meaning                                                                                                                                                                                                                                                                                                                                                                                                                            |
|--------------|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Error Detect | Always       | Error is always checked.                                                                                                                                                                                                                                                                                                                                                                                                           |
| Condition    | AUG always   | Error is checked when<br>(ASI_ERROR_CONTROL.UGE_HANDLER = 0) &&                                                                                                                                                                                                                                                                                                                                                                    |
|              |              | (ASI_ERROR_CONTROL.WEAK_ED = 0).                                                                                                                                                                                                                                                                                                                                                                                                   |
|              | LDXA         | Error is checked when the register is read by LDXA instruction.                                                                                                                                                                                                                                                                                                                                                                    |
|              | ldxa #I      | Error is checked when the register is read by LDXA instruction.                                                                                                                                                                                                                                                                                                                                                                    |
|              |              | Also, the register is used for the calculation of<br>IMMU_TSB_8KB_PTR and IMMU_TSB_64KB_PTR. When the<br>register has a UE and the register is used for the calculation of<br>ASI_IMMU_TSB_PTR registers, the UE is propagated to the<br>ASI_IMMU_TSB_PTR registers. Upon execution of the LDXA<br>instruction to read ASI_IMMU_TSB_PTR with the propagated<br>UE, the <i>IUG_TSBP</i> error is detected.                          |
|              | ldxa #D      | Error is checked when the register is read by LDXA instruction.                                                                                                                                                                                                                                                                                                                                                                    |
|              |              | Also, the register is used for the calculation of<br>DMMU_TSB_8KB_PTR, DMMU_TSB_64KB_PTR, and<br>DMMU_TSB_DIRECT_PTR. When the register has a UE and the<br>register is used for the calculation of ASI_DMMU_TSB_PTR<br>registers, the UE is propagated to the ASI_DMMU_TSB_PTR<br>registers. Upon execution of the LDXA instruction to read<br>ASI_DMMU_TSB_PTR with the propagated UE, the <i>IUG_TSBF</i><br>error is detected. |
|              | ITLB write   | Error is checked at the ITLB update timing after completion of<br>the STXA instruction to write or demap an ITLB entry.                                                                                                                                                                                                                                                                                                            |
|              | DTLB write   | Error is checked at the DTLB update timing after the completion<br>of the STXA instruction to write or demap a DTLB entry.                                                                                                                                                                                                                                                                                                         |
|              | Use for TLB  | Error is checked when the register is used for a TLB reference.                                                                                                                                                                                                                                                                                                                                                                    |
|              | Enabled      | Error is checked when the facility is enabled.                                                                                                                                                                                                                                                                                                                                                                                     |
|              | intr_receive | Error is checked when the Jupiter Bus interrupt packet is<br>received. When an uncorrectable error is detected in the<br>received interrupt packet, the vector interrupt trap is caused bu<br>ASI_INTR_RECEIVE.BUSY = 0 is set. In this case, a new<br>interrupt packet can be received after software writes<br>ASI_INTR_RECEIVE.BUSY = 0.                                                                                        |
|              | BV interface | Uncorrected error in the Barrier Variable transfer interface<br>between the processor and the memory system is checked during<br>the AUG_always period.                                                                                                                                                                                                                                                                            |
| Error Type   | error_state  | error_state transition error.                                                                                                                                                                                                                                                                                                                                                                                                      |
|              | (I)AUG_xxxx  | The error is indicated by ASI_UGESR.IAUG_xxxx = 1, and the error class is autonomous urgent error.                                                                                                                                                                                                                                                                                                                                 |
|              | I(A)UG_xxxx  | The error is indicated by ASI_UGESR.IAUG_ <i>xxxx</i> = 1, and the error class is instruction urgent error.                                                                                                                                                                                                                                                                                                                        |
|              | Others       | The name of the bit set to 1 in ASI_UGESR indicates the error type.                                                                                                                                                                                                                                                                                                                                                                |

| Column     | Term              | Meaning                                                                                                                                                                                                                                                                                                                                                        |  |  |  |
|------------|-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| Correction | RED trap          | The whole register is updated and corrected when a RED_state trap occurs.                                                                                                                                                                                                                                                                                      |  |  |  |
|            | W                 | The whole register is updated and corrected by use of an STXA instruction to write the register.                                                                                                                                                                                                                                                               |  |  |  |
|            | W1AC              | The whole register is updated and corrected by use of an S' instruction to write 1 to the specified bit in the register.                                                                                                                                                                                                                                       |  |  |  |
|            | WotherI           | The register is corrected by a full update of all of the followin<br>ASI registers:<br>• ASI_IMMU_TAG_ACCESS<br>• plus, when ASI_UGESR.IAUG_TSBCTXT = 1 is indicated i                                                                                                                                                                                         |  |  |  |
|            |                   | a single-ADE trap: ASI_IMMU_TSB_BASE,<br>ASI_IMMU_TSB_PEXT, ASI_PRIMARY_CONTEXT,<br>ASI_SECONDARY_CONTEXT, ASI_SHARED_CONTEXT<br>IMMU_TSB_8KB_PTR and IMMU_TSB_64KB_PTR are                                                                                                                                                                                    |  |  |  |
|            |                   | corrected only when a fast_instruction_access_MMU_miss trap occurs.                                                                                                                                                                                                                                                                                            |  |  |  |
|            | WotherD           | <ul> <li>The register is corrected by a full update of all of the followin<br/>ASI registers:</li> <li>ASI_DMMU_TAG_ACCESS</li> <li>plus, when ASI_UGESR.IAUG_TSBCTXT = 1 is indicated i<br/>a single-ADE trap: ASI_DMMU_TSB_BASE,<br/>ASI_DMMU_TSB_PEXT, ASI_DMMU_TSB_SEXT,<br/>ASI_PRIMARY_CONTEXT, ASI_SECONDARY_CONTEXT,<br/>ASI_SHARED_CONTEXT</li> </ul> |  |  |  |
|            |                   | DMMU_TSB_8KB_PTR and DMMU_TSB_64KB_PTR are corrected only when a fast_data_access_MMU_miss trap occurs.                                                                                                                                                                                                                                                        |  |  |  |
|            | DemapAll          | The error is corrected by the <i>demap all</i> operation for the TLB with the error. Note that the <i>demap all</i> operation does not remove the locked TLB entry with uncorrectable error.                                                                                                                                                                   |  |  |  |
|            | Interrupt receive | The register is corrected when the Jupiter Bus interrupt packet i received.                                                                                                                                                                                                                                                                                    |  |  |  |

TABLE P-17 shows the handling of ASI register errors.

| TABLE P-17 | Handling | of ASI | Register | Errors |
|------------|----------|--------|----------|--------|
|------------|----------|--------|----------|--------|

| ASI              | VA               |                      |    | Error   | Error Detect |                                 |            |
|------------------|------------------|----------------------|----|---------|--------------|---------------------------------|------------|
|                  |                  | Register Name        | RW | Protect | Condition    | Error Type                      | Correction |
| 45 <sub>16</sub> | 0016             | DCU_CONTROL          | RW | Parity  | Always       | error_state                     | RED trap   |
|                  | 0816             | MEMORY_CONTROL       | RW | Parity  | Always       | error_state                     | RED trap   |
| 48 <sub>16</sub> | 00 <sub>16</sub> | INTR_DISPATCH_STATUS | R  | Gecc    | LDXA         | I(A)UG_CRE (UE)<br>ignored (CE) | None       |
| 49 <sub>16</sub> | 0016             | INTR_RECEIVE         | RW | Gecc    | LDXA         | I(A)UG_CRE (UE)<br>ignored (CE) | None       |
| 4A <sub>16</sub> |                  | JB_CONFIG_REGISTER   | R  | None    | _            | _                               | _          |

### TABLE P-17 Handling of ASI Register Errors

| ASI              | VA               | Register Name       | RW     | Error<br>Protect | Error Detect<br>Condition | Error Type               | Correction  |
|------------------|------------------|---------------------|--------|------------------|---------------------------|--------------------------|-------------|
| 4C <sub>16</sub> | 0016             | ASYNC_FAULT_STATUS  | RW1C   | None             | _                         | _                        | —           |
| 4C <sub>16</sub> | 0816             | URGENT_ERROR_STATUS | R      | None             | _                         | —                        | —           |
| 4C <sub>16</sub> | $10_{16}$        | ERROR_CONTROL       | RW     | Parity           | Always                    | error_state              | RED trap    |
| 4C <sub>16</sub> | $18_{16}$        | STCHG_ERROR_INFO    | R,W1AC | None             | _                         | _                        | _           |
| 4D <sub>16</sub> | $00_{16}$        | AFAR_D1             | R,WAC  | Parity           | LDXA                      | I(A)UG_CRE               | WAC         |
| 4D <sub>16</sub> | 0816             | AFAR_U2             | R,WAC  | Parity           | LDXA                      | I(A)UG_CRE               | WAC         |
| 50 <sub>16</sub> | $00_{16}$        | IMMU_TAG_TARGET     | R      | Parity           | ldxa #I                   | IUG_TSBP                 | WotherI     |
| 50 <sub>16</sub> | $18_{16}$        | IMMU_SFSR           | RW     | None             | _                         | _                        | _           |
| 50 <sub>16</sub> | $28_{16}$        | IMMU_TSB_BASE       | RW     | Parity           | ldxa #I                   | I(A)UG_TSBCTXT           | W           |
| 50 <sub>16</sub> | 3016             | IMMU_TAG_ACCESS     | RW     | Parity           | ldxa #I                   | IUG_TSBP                 | W (WotherI) |
| 50 <sub>16</sub> | $48_{16}$        | IMMU_TSB_PEXT       | RW     | Parity           | = ITSB_BASE               | IAUG_TSBCTXT             | W           |
| 50 <sub>16</sub> | 58 <sub>16</sub> | IMMU_TSB_NEXT       | R      | Parity           | = ITSB_BASE               | IAUG_TSBCTXT             | W           |
| 50 <sub>16</sub> | 60 <sub>16</sub> | IMMU_TAG_ACCESS_EXT | RW     | Parity           | ldxa #I                   | IUG_TSBP                 | W           |
| 50 <sub>16</sub> | 78 <sub>16</sub> | IMMU_SFPAR          | RW     | Parity           | ldxa #I                   | I(A)UG_CRE               | W           |
| 51 <sub>16</sub> |                  | IMMU_TSB_8KB_PTR    | R      | PP               | LDXA                      | IUG_TSBP                 | WotherI     |
| 52 <sub>16</sub> |                  | IMMU_TSB_64KB_PTR   | R      | PP               | LDXA                      | IUG_TSBP                 | WotherI     |
| 53 <sub>16</sub> |                  | SERIAL_ID           | R      | None             | _                         | _                        | _           |
| 54 <sub>16</sub> | _                | ITLB_DATA_IN        | W      | Parity           | ITLB write                | IUG_ITLB                 | DemapAll    |
| 55 <sub>16</sub> |                  | ITLB_DATA_ACCESS    | RW     | Parity           | LDXA                      | IUG_ITLB                 | DemapAll    |
|                  |                  |                     |        |                  | ITLB write                | IUG_ITLB                 | DemapAll    |
| 56 <sub>16</sub> | _                | ITLB_TAG_READ       | R      | Parity           | LDXA                      | IUG_ITLB                 | DemapAll    |
| 57 <sub>16</sub> | _                | IMMU_DEMAP          | W      | Parity           | ITLB write                | IUG_ITLB                 | DemapAll    |
| 58 <sub>16</sub> | $00_{16}$        | DMMU_TAG_TARGET     | R      | Parity           | ldxa #D                   | IUG_TSBP                 | WotherD     |
| 58 <sub>16</sub> | 08 <sub>16</sub> | PRIMARY_CONTEXT     | RW     | Parity           | ldxa #I,<br>ldxa #D       | I(A)UG_TSBCTXT           | W           |
|                  |                  |                     |        |                  | Use for TLB               | I(A)UG_TSBCTXT           | W           |
|                  |                  |                     |        |                  | AUG always                | (I)AUG_TSBCTXT           | W           |
| 58 <sub>16</sub> | $10_{16}$        | SECONDARY_CONTEXT   | RW     | Parity           | = P CONTEXT               | IAUG_TSBCTXT             | W           |
| 58 <sub>16</sub> | $18_{16}$        | DMMU_SFSR           | RW     | None             | —                         | —                        | _           |
| 58 <sub>16</sub> | $20_{16}$        | DMMU_SFAR           | RW     | Parity           | LDXA                      | IAUG_CRE                 | W           |
| 58 <sub>16</sub> | $28_{16}$        | DMMU_TSB_BASE       | RW     | Parity           | LDXA # $D$                | I(A)UG_TSBCTXT           | W           |
| 58 <sub>16</sub> | 3016             | DMMU_TAG_ACCESS     | RW     | Parity           | ldxa #D                   | IUG_TSBP                 | W (WotherD) |
| 58 <sub>16</sub> | 38 <sub>16</sub> | DMMU_VA_WATCHPOINT  | RW     | Parity           | Enabled                   | (I)AUG_CRE               | W           |
|                  |                  |                     |        |                  | LDXA                      | I(A)UG_CRE               | W           |
| 58 <sub>16</sub> | $40_{16}$        | DMMU_PA_WATCHPOINT  | RW     | Parity           | Enabled                   | (I)AUG_CRE<br>I(A)UG_CRE | W           |
|                  |                  |                     |        |                  | LDXA                      |                          | W           |
| 58 <sub>16</sub> | 48 <sub>16</sub> | DMMU_TSB_PEXT       | RW     | Parity           | = DTSB_BASE               | I(A)UG_TSBCTXT           | W           |

| ASI              | VA                                            | Register Name        | RW | Error<br>Protect | Error Detect<br>Condition          | Error Type                                                    | Correction           |
|------------------|-----------------------------------------------|----------------------|----|------------------|------------------------------------|---------------------------------------------------------------|----------------------|
| 58 <sub>16</sub> | 50 <sub>16</sub>                              | DMMU_TSB_SEXT        | RW | Parity           | = DTSB_BASE                        | I(A)UG_TSBCTXT                                                | W                    |
| 8 <sub>16</sub>  | 58 <sub>16</sub>                              | DMMU_TSB_NEXT        | R  | Parity           | = DTSB_BASE                        | I(A)UG_TSBCTXT                                                | W                    |
| 816              | $60_{16}$                                     | DMMU_TAG_ACCCESS_EXT | RW | Parity           | ldxa #D                            | IUG_TSBP                                                      | W                    |
| 8 <sub>16</sub>  | 68 <sub>16</sub>                              | SHARED_CONTEXT       | RW | Parity           | = P_CONTEXT                        | (I)AUG_TSBCTXT                                                | W                    |
| 8 <sub>16</sub>  | 78 <sub>16</sub>                              | DMMU_SFPAR           | RW | Parity           | ldxa #D                            | I(A)UG_CRE                                                    | W                    |
| 9 <sub>16</sub>  | _                                             | DMMU_TSB_8KB_PTR     | R  | PP               | LDXA                               | IUG_TSBP                                                      | WotherD              |
| A <sub>16</sub>  | _                                             | DMMU_TSB_64KB_PTR    | R  | PP               | LDXA                               | IUG_TSBP                                                      | WotherD              |
| B <sub>16</sub>  | _                                             | DMMU_TSB_DIRECT_PTR  | R  | PP               | LDXA                               | IUG_TSBP                                                      | WotherD              |
| C <sub>16</sub>  | _                                             | DTLB_DATA_IN         | W  | Parity           | DTLB write                         | IUG_DTLB                                                      | DemapAll             |
| D <sub>16</sub>  | _                                             | DTLB_DATA_ACCESS     | RW | Parity           | LDXA<br>DTLB write                 | IUG_DTLB<br>IUG_DTLB                                          | DemapAll<br>DemapAll |
| E <sub>16</sub>  | _                                             | DTLB_TAG_READ        | R  | Parity           | LDXA                               | IUG_DTLB                                                      | DemapAll             |
| F <sub>16</sub>  |                                               | DMMU_DEMAP           | W  | Parity           | DTLB write                         | IUG_DTLB                                                      | DemapAll             |
| 016              |                                               | IIU_INST_TRAP        | RW | Parity           | LDXA                               | No match at error                                             | W                    |
| 51 <sub>16</sub> | $00_{16}, \\ 08_{16}, \\ 40_{16}, \\ 48_{16}$ | ITSB_PREFETCH        | RW | Parity           | LDXA                               | I(A)UG_TSBP                                                   | W                    |
| 2 <sub>16</sub>  | $00_{16}, \\ 08_{16}, \\ 40_{16}, \\ 48_{16}$ | DTSB_PREFETCH        | RW | Parity           | LDXA                               | I(A)UG_TSBP                                                   | W                    |
| 6D <sub>16</sub> | 00 <sub>16</sub> -<br>3E0 <sub>16</sub>       | BARRIER_INIT         | RW | Parity           | Always if<br>assigned<br>or LDXA#D | Fatal Error                                                   | _                    |
| E <sub>16</sub>  | $00_{16}$                                     | EIDR                 | RW | Parity           | Always <sup>1</sup>                | IAUG_CRE                                                      | W                    |
| F <sub>16</sub>  | 00 <sub>16</sub> -<br>50 <sub>16</sub>        | BARRIER_ASSIGN       | RW | Parity           | Always if<br>assigned              | Fatal Error                                                   | —                    |
| 416              | addr                                          | CACHE_INV            | W  | None             | _                                  | _                                                             | _                    |
| 7 <sub>16</sub>  | $40_{16} -$                                   | INTR_DATA0:7_W       | W  | Gecc             | None                               | _                                                             | W                    |
|                  | 8816                                          | INTR_DISPATCH_W      | W  | Gecc             | store                              | (I)AUG_CRE                                                    | W                    |
| F <sub>16</sub>  | $40_{16} - 88_{16}$                           | INTR_DATA0:7_R       | R  | ECC              | LDXA<br>intr_receive               | $\begin{array}{l} IAUG\_CRE \\ \mathtt{BUSY} = 0 \end{array}$ | Interrupt<br>Receive |
| EF <sub>16</sub> | 00 <sub>16</sub> -<br>50 <sub>16</sub>        | LBSY, BST            | RW | Parity           | Always if<br>assigned              | Fatal Error                                                   | _                    |

#### TABLE P-17 Handling of ASI Register Errors

1.Notified as error\_state transition error in suspended state.

# P.9 Cache Error Handling

In this section, handling of cache errors of the following types is specified:

- Cache tag errors
- Cache data errors in I1, D1, and U2 caches

This section concludes with the specification of automatic way reduction in the I1, D1, and U2 caches.

### P.9.1 Handling of a Cache Tag Error

#### Error in D1 Cache Tag and I1 Cache Tag

Both the D1 cache (Data level 1) and the I1 cache (Instruction level 1) maintain a copy of their cache tags in the U2 (unified level 2) cache. The D1 cache tags, the D1 cache tags copy, the I1 cache tags, and the I1 cache tags copy are each protected by parity.

When a parity error is detected in a D1 cache tag entry or in a D1 cache tag copy entry, hardware automatically corrects the error by copying the correct tag entry from the other copy of the tag entry. If the error can be corrected in this way, program execution is unaffected.

Similarly, when a parity error is detected in an I1 cache tag entry or in a I1 cache tag copy entry, hardware automatically corrects the error by copying the correct tag entry from the other copy of the tag entry. If the error can be corrected in this way, program execution is unaffected.

When the error in the level-1 cache tag or tag copy is not corrected by the tag copying operation, the tag copying is repeated. If the error is permanent, a watchdog timeout or a FATAL error is then detected.

#### Error in U2 (Unified Level 2) Cache Tag

The U2 cache tag is protected by double-bit error detection and single-bit error correction ECC code.

When a correctable error is detected in a U2 cache tag, hardware automatically corrects the error by rewriting the corrected data into the U2 cache tag entry. The error is not reported to software.

When an uncorrectable error is detected in a U2 cache tag, a fatal error is detected and the CPU enters the CPU fatal error state.

### P.9.2 Handling of an I1 Cache Data Error

I1 cache data is protected by parity attached to every doubleword.

When a parity error is detected in I1 cache data during an instruction fetch, hardware executes the following sequence:

1. Reread the I1 cache line containing the parity error from the U2 cache.

The read data from U2 cache must contain only the doubleword without error or the doubleword with the marked UE, because error marking is only applied to U2 cache outgoing data.

- 2. For each doubleword read from U2 cache:
  - a. When the doubleword does not have a UE, save the correct data in the I1 cache doubleword without parity error and supply the data for instruction fetch if required.

There is no direct report to software for an I1 cache error corrected by refilling data.

- b. When the doubleword has a marked UE, set the parity bit in the I1 cache doubleword to indicate a parity error and supply the parity error data for the instruction fetch if required.
- 3. Treat a fetched instruction with an error as follows:

When the instruction with a parity error is fetched but not executed in any way visible to software, the fetched instruction with the error is discarded.

Otherwise, fetch and execute the instruction with the indicated parity error. When the execution of the instruction is complete, an *instruction\_access\_error* exception will be generated (precise trap), and the marked UE detection and its ERROR\_MARK\_ID will be indicated in ASI\_ISFSR.

### P.9.3 Handling of a D1 Cache Data Error

D1 cache data is protected by 2-bit error detection and 1-bit error correction ECC, attached to every doubleword.

#### Correctable Error in D1 Cache Data

When a correctable error is detected in D1 cache data, the data is corrected automatically by hardware. There is no direct report to software for a D1 cache correctable error.

### Marked Uncorrectable Error in D1 Cache Data

When a marked uncorrectable error (UE) in D1 cache data is detected during the D1 cache line writeback to the U2 cache, the D1 cache data and its ECC are written to the target U2 cache data and its ECC without modification. That is, a marked UE in D1 cache is propagated into the U2 cache. Such an error is not reported to software.

When a marked UE in D1 cache data is detected during access by a load or store (excluding doubleword store) instruction, the data access error is detected. The *data\_access\_error* exception is generated precisely and the marked UE detection and its ERROR\_MARK\_ID are indicated in ASI DSFSR.

# Raw Uncorrectable Error in D1 Cache Data During D1 Cache Line Writeback

When a raw (unmarked) UE is detected in D1 cache data during the D1 cache line writeback to the U2 cache, error marking is applied to the doubleword containing the raw UE with ERROR\_MARK\_ID = ASI\_EIDR. Only the correct doubleword or the doubleword with marked UE is written into the target U2 cache line.

The restrainable error ASI\_AFSR.UE\_RAW\_D1\$INSD is detected.

# Raw Uncorrectable Error in D1 Cache Data on Access by Load or Store Instruction

When a raw (unmarked) UE is detected in D1 cache data during access by a load or store instruction, hardware executes the following sequence:

- 1. Hardware writes back the D1 cache line and refills it from U2 cache. The D1 cache line containing the raw UE, whether it is clean or dirty, is always written back to the U2 cache. During this D1 cache line writeback to U2 cache, error marking is applied for the doubleword containing the raw UE with ERROR\_MARK\_ID = ASI\_EIDR. The D1 cache line is refilled from the U2 cache and the restrainable error ASI\_AFSR.UE\_RAW\_D1\$INSD is detected.
- 2. Normally, hardware changes the raw UE in the D1 cache data to a marked UE. However, yet another error may introduce a raw UE into the same doubleword again. When a raw UE is detected again, step 1 is repeated until the D1 cache way reduction is applied.
- 3. At this point, hardware changes the raw UE in the D1 cache data to a marked UE. The load or store instruction accesses the doubleword with the marked UE. The marked UE is detected during execution of the load or store instruction, as described in *Raw Uncorrectable Error in D1 Cache Data During D1 Cache Line Writeback*, above.

### P.9.4 Handling of a U2 Cache Data Error

U2 cache data is protected by 2-bit error detection and 1-bit error correction ECC, attached to every doubleword.

#### Correctable Error in U2 Cache Data

When a correctable error is detected in the incoming U2 cache fill data from Jupiter Bus, the data is corrected by hardware, stored into U2 cache. No exception is signalled.

When a correctable error is detected in the data from U2 cache for I1 cache fill, D1 cache fill, copyback to Jupiter Bus, or writeback to Jupiter Bus, both the transfer data and source data in U2 cache are corrected by hardware. The error is not reported to software.

#### Marked Uncorrectable Error in U2 Cache Data

For U2 cache data, a doubleword with marked UE is treated the same as a correct doubleword. No error is reported when the marked UE in U2 cache data is detected.

When a marked uncorrectable error (UE) is detected in incoming U2 cache fill data from Jupiter Bus, the doubleword with the marked UE is stored without modification in the target U2 cache line.

When a marked uncorrectable error is detected in incoming data from the D1 cache to writeback D1 cache line, the doubleword with the marked UE is stored without modification in target U2 cache line. Note that there is no raw UE in D1 writeback data because error marking is applied for D1 writeback data, as described in *Handling of a D1 Cache Data Error* on page 209.

When a marked UE is detected in the data read from the U2 cache for an I1 cache fill, D1 cache fill, copyback to Jupiter Bus, or writeback to Jupiter Bus, the doubleword with the marked UE is transferred without modification.

#### Raw Uncorrectable Error in U2 Cache Data

When a raw (unmarked) UE is detected in incoming U2 cache fill data, error marking is applied for the doubleword with the raw UE, using ERROR\_MARK\_ID = 0. The doubleword and its ECC are changed to the marked UE data, the changed data is stored in the target U2 cache line. No exception is signalled.

When a raw UE is detected in data read from U2 cache, such as for I1 cache fill, D1 cache fill, copyback to Jupiter Bus, or writeback to Jupiter Bus, then error marking is applied for the doubleword with the raw UE, using ERROR\_MARK\_ID = ASI\_EIDR. Both the doubleword and its ECC in the read data and those in the source U2 cache line are changed to marked UE data. The restrainable error ASI\_AFSR.UE\_RAW\_L2\$INSD is detected.

# P.9.5 Automatic Way Reduction of I1 Cache, D1 Cache, and U2 Cache

When frequent errors occur in the I1, D1, or U2 cache, hardware automatically detects that condition and reduces the way, maintaining cache consistency.

#### Way Reduction Condition

Hardware counts the sum of the following error occurrences for each way of each cache:

- For each way of the I1 cache:
  - Parity error in I1 cache tag or I1 cache tag copy
  - I1 cache data parity error
- For each way of the D1 cache:
  - Parity error in D1 cache tag or D1 cache tag copy
  - Correctable error in D1 cache data
  - Raw UE in D1 cache data
- For each way of U2 cache:
  - Correctable error and uncorrectable error in U2 cache tag
  - Correctable error in U2 cache data
  - Raw UE in U2 cache data

If an error count per unit of time for one way of a cache exceeds a predefined threshold, hardware recognizes a cache way reduction condition and takes the actions described below.

#### I1 Cache Way Reduction

When a way reduction condition is recognized for the I1 cache way W (W = 0 or 1), the following way reduction procedure is executed:

- 1. When only one way in I1 cache is active because of previous way reduction:
  - All entries in I1 cache way W are invalidated.
  - The restrainable error ASI\_AFSR.DG\_L1\$U2\$STLB is reported to software.
- 2. Otherwise:
  - All entries in I1 cache way W are invalidated and the way W will never be refilled.
  - The restrainable error ASI\_AFSR.DG\_L1\$U2\$STLB is reported to software.

#### D1 Cache Way Reduction

When a way reduction condition is recognized for the D1 cache way W (W = 0 or 1), the following way reduction procedure is executed:

1. When only one way in D1 cache is active because of previous way reduction:

- All entries in D1 cache way W are invalidated. On invalidation of each dirty D1 cache entry, the D1 cache line is written back to its corresponding U2 cache line.
- The restrainable error ASI\_AFSR.DG\_L1\$U2\$STLB is reported to software.
- 2. Otherwise:
  - All entries in D1 cache way W are invalidated and the way W will never be refilled. On invalidation of each dirty D1 cache entry, the D1 cache line is written back to its corresponding U2 cache line.
  - The restrainable error ASI\_AFSR.DG\_L1\$U2\$STLB is reported to software.

#### U2 Cache Way Reduction

When a way reduction condition is recognized for a U2 cache way, the U2 cache way reduction procedure is executed as follows:

1. When ASI L2CTL.WEAK SPCA = 0,

the U2 cache way reduction procedure (below) is started immediately.

2. Otherwise, when ASI\_L2CTL.WEAK\_SPCA = 1 is set,

the U2 cache way reduction procedure (below) becomes pending until ASI\_L2CTL.WEAK\_SPCA is changed to 0. When ASI\_L2CTL.WEAK\_SPCA is changed to 0, the U2 cache way reduction procedure will be started.

The U2 cache way W (W=0, 1, 2, or 3) reduction procedure:

- 1. When only one way in U2 cache is active because of previous way reductions:
  - All entries in U2 cache way W are at once invalidated (that is, all active U2 cache entries are invalidated) and U2 cache way W remains as the only available U2 cache way. The U2 cache data is invalidated to retain system consistency.
  - The restrainable error ASI\_AFSR.DG\_L1\$U2\$STLB is reported to software, even though the available U2 cache configuration is not changed as a result of the error.
- 2. Otherwise:
  - All entries in available U2 cache ways, including way W, are invalidated to retain system consistency.
  - Way W becomes unavailable and is never refilled.
  - The restrainable error ASI\_AFSR.DG\_L1\$U2\$STLB is reported to software.

# P.10 TLB Error Handling

This section describes how TLB entry errors and sTLB way reduction are handled.

### P.10.1 Handling of TLB Entry Errors

Error protection and error detection in TLB entries are described in TABLE P-18.

TABLE P-18 Error Protection and Detection of TLB Entries

| TLB type        | Field               | Error Protection | Detectable Error                          |
|-----------------|---------------------|------------------|-------------------------------------------|
| SITLB and SDTLB | tag                 | Parity           | Parity error (Uncorrectable)              |
| SITLB and SDTLB | data                | Parity           | Parity error (Uncorrectable)              |
| fITLB and fDTLB | lock bit            | Triplicated      | None; the value is determined by majority |
| fITLB and fDTLB | tag except lock bit | Parity           | Parity error (Uncorrectable)              |
| fITLB and fDTLB | data                | Parity           | Parity error                              |

Errors can occur during the following events:

- Access by LDXA instruction
- Virtual address translation (sTLB)
- Virtual address translation (fTLB)

#### Error in TLB Entry Detected on LDXA Instruction Access

If a parity error is detected in a DTLB entry when an LDXA instruction attempts to read ASI\_DTLB\_DATA\_ACCESS or ASI\_DTLB\_TAG\_ACCESS, hardware automatically demaps the entry and an instruction urgent error is indicated in ASI\_UGESR.IUG\_DTLB.

When a parity error is detected in an ITLB entry when an LDXA instruction attempts to read ASI\_ITLB\_DATA\_ACCESS or ASI\_ITLB\_TAG\_ACCESS, hardware automatically demaps the entry and an instruction urgent error is indicated in ASI\_UGESR.IUG\_ITLB.

#### Error in sTLB Entry Detected During Virtual Address Translation

When a parity error is detected in the sTLB entry during a virtual address translation, hardware automatically demaps the entry and does not report the error to software.

#### Error in fTLB Entry Detected During Virtual Address Translation

When an fTLB tag has a parity error, the fTLB entry never matches any virtual address. An fTLB tag error in a locked entry causes a TLB miss for the virtual address already registered as the locked TLB entry.

A parity error in fTLB entry data is detected only when the tag of the fTLB entry matches a virtual address.

When a parity error in the fITLB is detected at the time of an instruction fetch, a precise *instruction\_access\_error* exception is generated. The parity error in the fITLB entry and the fITLB entry index is indicated in ASI ISFSR.

When a parity error in fDTLB is detected for the memory access of a load or store instruction, a precise *data\_access\_error* exception is generated. The parity error in the fDTLB entry and the fDTLB entry index is indicated in ASI DSFSR.

### P.10.2 Automatic Way Reduction of sTLB

When frequent errors occur in SITLB and SDTLB, hardware automatically detects that condition and reduces the way, with no adverse effects on software.

#### Way Reduction Condition

Hardware counts TLB entry parity error occurrences for each sITLB way and sDTLB way. If the error count per unit of time exceeds a predefined threshold, hardware recognizes an sTLB way reduction condition.

#### sTLB Way Reduction

When a way reduction condition is recognized for the sTLB way W (W = 0 or 1), hardware executes the following way reduction procedures:

- 1. When only one way in sTLB is active because of previous way reductions:
  - The previously reduced way is reactivated.
- 2. Regardless of how many ways were previously active, way reduction occurs:
  - Hardware reduces the way and invalidates all entries in sTLB way W. Way W will never be refilled.
  - The restrainable error ASI\_AFSR.DG\_L1\$U2\$STLB is reported to software.

## **Performance Instrumentation**

This appendix describes and specifies performance monitors that have been implemented in the SPARC64 VII processor. The appendix contains these sections:

- Performance Monitor Overview on page 217
- Performance Event Description on page 219
  - Instruction and trap Statistics on page 222
  - MMU and L1 cache Event Counters on page 229
  - L2 cache Event Counters on page 230
  - Multi-thread specific Event Counters on page 234

# Q.1 Performance Monitor Overview

For the definitions of performance counter registers, please refer to *Performance Control Register (PCR) (ASR 16)* on page 18 and *Performance Instrumentation Counter (PIC) Register (ASR 17)* on page 20.

### Q.1.1 Sample Pseudo-codes

#### Counter Clear/Set

The PICs are read/write registers. Writing zero will clear the counter; writing any other value will set that value. The following pseudocode procedure clears all PICs (assuming privileged access):

```
/* clear pics without altering sl/su values */
pic_init = 0x0;
pcr = rd_pcr();
pcr.ulro = 0x1;  /* don't change su/sl on write */
pcr.ovf = 0x0;  /* clear overflow bits also */
pcr.ut = 0x0;
pcr.st = 0x0;  /* disable counts for good measure */
for (i=0; i<=pcr.nc; i++) {
    /* select the pic to be written */
    pcr.sc = i;
    wr_pcr(pcr);
    wr_pic(pic_init);/* clear pic i */
}</pre>
```

#### Counter Event Selection and Start

Counter events are selected through PCR.SC and PCR.SU/PCR.SL fields. The following pseudocode selects events and enables counters (assuming privileged access):

```
/* initially disable user counts */
pcr.ut = 0x0;
pcr.st = 0x0; /* initially disable user counts */
pcr.ulro = 0x0; /* make sure read-only disabled */
pcr.ovro = 0x1; /* do not modify overflow bits */
/* select the events without enabling counters */
for(i=0; i<=pcr.nc; i++) {</pre>
   pcr.sc = i;
   pcr.sl = select an event;
   pcr.su = select an event;
   wr pcr(pcr);
}
/* start counting */
pcr.ut = 0x1;
pcr.st = 0x1;
/* resetting of overflow bits can be done here */
wr pcr(pcr);
```

#### Counter Stop and Read

The following pseudocode disables and reads counters (assuming privileged access):

```
pcr.ut = 0x0;  /* disable counts */
pcr.st = 0x0;  /* disable counts */
pcr.ulro = 0x1;  /* enable sl/su read-only */
pcr.ovro = 0x1;  /* do not modify overflow bits */
for(i=0; i<=pcr.nc; i++) {
    /* assume rest of pcr data has been preserved */</pre>
```

```
pcr.sc = i;
wr_pcr(pcr);
pic = rd_pic();
picl[i] = pic.picl;
picu[i] = pic.picu;
```

# Q.2 Performance Event Description

The performance events can be divided into the following groups:

- 1. Instruction and Trap statistics
- 2. MMU and L1 cache event counters
- 3. L2 cache event counters

}

- 4. Jupiter Bus transaction event counters
- 5. Multi-thread specific event counters

There are two types of performance events, basic and extended in SPARC64 VII.

Basic performance events are documented in JPS (Joint Programmer's Specification) and verification have been verified.

Extended events are not documented in JPS, and they are intended to provide information for debugging the hardware. Users of these extended events should be aware of the following rules.

- a. Verification of the extended events is not necessarily completed. In other words, the counters might not work as expected.
- **b.** Definition of the extended events may change without notice. Compatibility is not guaranteed between future SPARC64 generations.

All event counters implemented in SPARC64 VII are listed in TABLE Q-1. The events in shadow are extended. The details of the performance counters are described in the following sections. They are speculatively updated, unless specially noted.

TABLE Q-1 Events and Encoding of Performance Monitor

| Encoding | Counter        | Counter     |       |       |       |       |       |       |  |  |  |
|----------|----------------|-------------|-------|-------|-------|-------|-------|-------|--|--|--|
|          | picu0          | picl0       | picu1 | picl1 | picu2 | picl2 | picu3 | picl3 |  |  |  |
| 000000   | cycle_counts   | ycle_counts |       |       |       |       |       |       |  |  |  |
| 000001   | instruction_co | ounts       |       |       |       |       |       |       |  |  |  |

|          | Counter                     |                             |                           |                             |                              |                    |                                    |                    |  |
|----------|-----------------------------|-----------------------------|---------------------------|-----------------------------|------------------------------|--------------------|------------------------------------|--------------------|--|
| Encoding | picu0                       | picl0                       | picu1                     | picl1                       | picu2                        | picl2              | picu3                              | picl3              |  |
| 000010   | instruction_fl<br>ow_counts | only_this_thr<br>ead_active |                           | -                           | instruction_fl<br>ow_counts  | d_move_wait        | cse_priority_<br>wait              | xma_inst           |  |
| 000011   | iwr_empty                   | w_cse_windo<br>w_empty      | w_eu_comp_<br>wait        | w_branch_co<br>mp_wait      |                              | w_op_stv_wa<br>it  | w_d_move                           | w_0endop           |  |
| 000100   | Reserved                    | w_op_stv_wa<br>it_nc_pend   | w_op_stv_wa<br>it_sxmiss  | w_op_stv_wa<br>it_sxmiss_ex |                              | w_fl_comp_w<br>ait | w_cse_windo<br>w_empty_sp_<br>full |                    |  |
| 000101   | op_stv_wait                 |                             |                           |                             |                              |                    |                                    |                    |  |
| 000110   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 000111   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 001000   | load_store_in               | structions                  |                           |                             |                              |                    |                                    |                    |  |
| 001001   | branch_instru               | ctions                      |                           |                             |                              |                    |                                    |                    |  |
| 001010   | floating_instr              | uctions                     |                           |                             |                              |                    |                                    |                    |  |
| 001011   | impdep2_inst                |                             |                           |                             |                              |                    |                                    |                    |  |
| 001100   | prefetch_instructions       |                             |                           |                             |                              |                    |                                    |                    |  |
| 001101   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 001110   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 001111   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 010000   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 010001   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 010010   | rs1                         | flush_rs                    | Reserved                  |                             |                              |                    |                                    |                    |  |
| 010011   | 1iid_use                    | 2iid_use                    | 3iid_use                  | 4iid_use                    | Reserved                     | sync_intlk         | regwin_intlk                       | Reserved           |  |
| 010100   | Reserved                    |                             |                           |                             |                              | I                  |                                    |                    |  |
| 010101   | Reserved                    | toq_rsbr_pha<br>ntom        | Reserved                  | flush_rs                    | Reserved                     |                    | rs1                                | Reserved           |  |
| 010110   | trap_all                    | trap_int_vec<br>tor         | trap_int_lev<br>el        | trap_spill                  | trap_fill                    | trap_trap_in<br>st | trap_IMMU_<br>miss                 | trap_DMMU<br>_miss |  |
| 010111   | Reserved                    | I                           |                           | I                           | I                            |                    | I                                  |                    |  |
| 011000   | only_this_thr<br>ead_active | both_threads<br>_active     | both_threads<br>_empty    | Reserved                    |                              |                    |                                    |                    |  |
| 011001   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 011010   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 011011   | rsf_pmmi                    | Reserved                    | op_stv_wait_<br>nc_pend   | 0iid_use                    | flush_rs                     | Reserved           |                                    | decall_intlk       |  |
| 011100   | Reserved                    |                             |                           |                             |                              |                    |                                    |                    |  |
| 011101   | act_thread_s<br>uspend      | op_stv_wait_<br>sxmiss      | op_stv_wait_<br>sxmiss_ex | op_stv_wait_<br>nc_pend     | cse_window_<br>empty_sp_full |                    | both_threads<br>_suspended         | Reserved           |  |
| 011110   | cse_window_<br>empty        | eu_comp_wai<br>t            | branch_comp<br>_wait      | 0endop                      | op_stv_wait_<br>ex           | fl_comp_wait       | lendop                             | 2endop             |  |

 TABLE Q-1
 Events and Encoding of Performance Monitor (Continued)

| Encoding | Counter                  |                       |                          |                       |                       |                         |                           |                           |
|----------|--------------------------|-----------------------|--------------------------|-----------------------|-----------------------|-------------------------|---------------------------|---------------------------|
| Encoding | picu0                    | picl0                 | picu1                    | picl1                 | picu2                 | picl2                   | picu3                     | picl3                     |
| 011111   | inh_cmit_gpr<br>_2write  | Reserved              | 1                        | I                     | 3endop                | Reserved                | op_stv_wait_<br>sxmiss_ex | op_stv_wait_<br>sxmiss    |
| 100000   | Reserved                 |                       |                          | write_op_uT<br>LB     | if_r_iu_req_<br>mi_go | op_r_iu_req<br>_mi_go   | if_wait_all               | op_wait_all               |
| 100001   | Reserved                 |                       |                          | I                     |                       |                         |                           |                           |
| 100010   | Reserved                 |                       |                          |                       |                       |                         |                           |                           |
| 100011   | if_l1_thrashi<br>ng      | op_l1_thrashi<br>ng   | Reserved                 |                       |                       |                         |                           |                           |
| 100100   | swpf_success<br>_all     | swpf_fail_all         | Reserved                 |                       | swpf_lbs_hit          | Reserved                |                           |                           |
| 100101   | Reserved                 |                       |                          |                       |                       |                         |                           |                           |
| 100110   | Reserved                 |                       |                          |                       |                       |                         |                           |                           |
| 100111   | Reserved                 |                       |                          |                       |                       |                         |                           |                           |
| 110000   | sx_miss_wait<br>_dm      | sx_miss_wa<br>t_pf    | sx_miss_co<br>unt_dm     | sx_miss_co<br>unt_pf  | sx_read_co<br>unt_dm  | sx_read_co<br>unt_pf    | dvp_count_<br>dm          | dvp_count_<br>pf          |
| 110001   | jbus_bi_count            | jbus_cpi_co<br>unt    | jbus_cpb_co<br>unt       | jbus_cpd_co<br>unt    | jbus_reqbus<br>_busy  | jbus_odrbus<br>_busy    | Reserved                  | I                         |
| 110010   | Reserved                 |                       | snres_256                | snres_64              | Reserved              |                         |                           |                           |
| 110011   | Reserved                 |                       | L                        | L                     |                       | sx_miss_coun<br>t_dm_if | sx_miss_coun<br>t_dm_opsh | sx_miss_coun<br>t_dm_opex |
| 110100   | lost_softpf_pf<br>p_full | Reserved              | lost_softpf_by<br>_abort | Reserved              |                       |                         |                           | L                         |
| 110101   | Reserved                 | 1                     | 1                        |                       |                       |                         |                           |                           |
| 110110   | jbus_reqbus0<br>_busy    | jbus_reqbus1<br>_busy | jbus_reqbus2<br>_busy    | jbus_reqbus3<br>_busy | jbus_odrbus0<br>_busy | jbus_odrbus1<br>_busy   | jbus_odrbus2<br>_busy     | jbus_odrbus3<br>_busy     |
| 111111   | Disabled (No             | PIC is counted        | d up)                    |                       |                       |                         |                           |                           |

 TABLE Q-1
 Events and Encoding of Performance Monitor (Continued)

### Q.2.1 Instruction and trap Statistics

#### **Basic events**

#### 1 cycle\_counts

Counts the cycles when the performance monitor is enabled. This counter is similar to the *%tick* register but can separate user cycles from system cycles, based on PCR.UT and PCR.ST selection.

#### 2 *instruction\_counts* (non-speculative)

Counts the number of committed instructions. For user or system mode counts, this counter is exact. Combined with the *cycle\_counts*, it provides instructions per cycle.

IPC = instruction\_counts / cycle\_counts

If *Instruction\_counts* and *cycle\_counts* are both collected for user or system mode, IPC in user or system mode can be derived.

3 *load\_store\_instructions* (non-speculative)

Counts the committed load/store instructions. Also counts atomic load-store instructions.

#### 4 *branch\_instructions* (non-speculative)

Counts the committed branch instructions. Also counts CALL, JMPL, and RETURN instructions.

#### 5 *floating\_instructions* (non-speculative)

Counts the committed floating-point operations (FPop1 and FPop2). Does not count Floating-Point Multiply-and-Add instructions.

6 *impdep2\_instructions* (non-speculative)

Counts the committed Floating Multiply-and-Add instructions.

Contrary to its name, FPMADDX and FPMADDXHI are not counted by this counter. See *xma\_inst* counter for detail.

7 *prefetch\_instructions* (non-speculative)

Counts the committed prefetch instructions.

#### 8 *trap\_all* (non-speculative)

Counts all trap events. The value is equivalent to the sum of type-specific traps counters.

- 9 trap\_int\_vector (non-speculative)
  Counts the occurrences of interrupt\_vector\_trap.
- 10 *trap\_int\_level* (non-speculative) Counts the occurrences of *interrupt\_level\_n*.
- 11 trap\_spill (non-speculative)
  Counts the occurrences of spill\_n\_normal, spill\_n\_other.
- 12 *trap\_fill* (non-speculative) Count the occurrences of *fill\_n\_normal*, *fill\_n\_other*.
- 13 *trap\_trap\_inst* (non-speculative) Counts the occurrences of Tcc instructions.
- 14 *trap\_IMMU\_miss* (non-speculative) Counts the occurrences of *fast\_instruction\_access\_MMU\_miss*.
- 15 *trap\_DMMU\_miss* (non-speculative) Counts the occurrences of *fast\_data\_instruction\_access\_MMU\_miss*.

#### **Extended events**

16 *xma\_inst* (non-speculative)

Counts the committed FPMADDX and FPMADDXHI instructions.

17 *instruction\_flow\_counts* (non-speculative)

Number of committed instruction flow during measuring period. In SPARC64 VII, for specific instructions, an instruction may be internally represented as a set of instructions, and executed as if it were multiple instructions. *instruction\_flow\_count* measures the number of internal instructions during measuring period.

#### 18 *iwr\_empty*

Number of cycles that IWR (Issue Word Register) is empty. IWR is a four-entry register that holds instructions while the decoder is processing. IWR empty may be caused on instruction cache miss. Note that the IWR is shared between both threads in a core.

#### 19 rs1 (non-speculative)

The number of cycles that normal execution is halted in order to service one of the following:

- trap, interrupt
- update of privileged registers
- assurance of memory order
- hardware retry (RAS initiated)

#### 20 *flush\_rs* (non-speculative)

Number of pipeline flushes due to mis-prediction. Since SPARC64 VII employs speculative execution, it may execute instructions that should have not been executed due to mis-prediction. When the predict path is found to be wrong, all instructions in the pipeline are aborted and execution of the correct path is started. A pipeline flush occurrs at this time.

mis-prediction rate = *flush\_rs* / *branch\_instructions* 

#### 21 Oiid\_use

No instruction is issued in a cycle. SPARC64 VII issues up to four instructions. 0iid\_use is incremented when no instruction is issued. In SPARC64 VII, for specific instructions, an instruction may be internally represented as a set of instructions. If an instruction is represented internally by multiple smaller instructions, each sub-instruction is measured.

#### 22 liid\_use

One instruction is issued in a cycle.

#### 23 2iid\_use

Two instructions are issued in a cycle.

#### 24 *3iid\_use*

Three instructions are issued in a cycle.

#### 25 4iid\_use

Four instructions are issued in a cycle.

#### 26 sync\_intlk

Number of cycles that prevent issuing instructions due to pre-sync and post-sync.

#### 27 regwin\_intlk

Number of cycles that prevent issuing instructions due to CWR switch. CWR holds the value of window register (%r8 - %r31), and its neighbors. Replacing the contents of CWR is caused by a save/restore or trap. Replacement is usually done concurrently in the background, but it can sometimes cause an interlock such as successive save/restore.

#### 28 decall\_intlk

Number of cycles that prevent issuing instructions due to any static inter-lock conditions at the decode stage. *decall\_intlk* includes *sync\_intlk* and *regwin\_intlk*, but it does not count stall cycles due to dynamic conditions such as reservation station full.

#### 29 toq\_rsbr\_phantom

Counts when an instruction predicted as a taken branch is actually not a branch instruction. This may happen in SPARC64 VII since branch prediction is done prior to decode of the instruction.

#### 30 *op\_stv\_wait* (non-speculative)

Number of cycles that instruction commit is not done due to data wait. SPARC64 VII has a resource named CSE (Commit Stack Entry), which holds information of in-flight instructions. CSE is a fifo, and information is registered in-order. *op\_stv\_wait* is measured if the top entry of CSE (TOQ: Top of Queue) is a memory access instruction and data is not ready.

*op\_stv\_wait* does not count memory access latency for a store instruction (however, memory access latency for an atomic instruction is counted). This is due to a feature of which SPARC64 VII employs for performance improvement. SPARC64 VII commits a store instruction before data is written to L2 cache.

Caution is needed because not all data cache miss latency is measured by *op\_stv\_wait*. When a data cache miss occurrs, and after all instructions prior to that instruction have committed, the latency of that instruction is measured.

Also caution is needed because the event is counted regardless of a given thread having priority to commit. To measure the event in the prioritized cycles, use  $w_op_stv_wait$ .

#### 31 *op\_stv\_wait\_nc\_pend* (non-speculative)

*op\_stv\_wait* due to non-cache accesses regardless of a given thread having commit priority.

#### 32 *op\_stv\_wait\_ex* (non-speculative)

No instruction is committed waiting for an integer load instruction in TOQ to complete, regardless of a given thread having commit priority.

#### 33 op\_stv\_wait\_sxmiss (non-speculative)

op\_stv\_wait due to L2\$ miss regardless of a given thread having commit priority.

#### 34 *op\_stv\_wait\_sxmiss\_ex* (non-speculative)

op\_stv\_wait\_ex due to L2\$ miss regardless of a given thread having commit priority.

#### 35 *cse\_window\_empty\_sp\_full* (non-speculative)

No instruction is committed because CSE is empty while the Store Port is full, regardless of a given thread having commit priority.

#### 36 cse\_window\_empty (non-speculative)

No instruction is committed because CSE is empty, regardless of a given thread having commit priority.

#### 37 *branch\_comp\_wait* (non-speculative)

No instruction is committed waiting for a branch instruction in TOQ to complete. Its priority is lower than *eu\_comp\_wait*, regardless of a given thread having commit priority.

#### 38 *eu\_comp\_wait* (non-speculative)

No instruction is committed waiting for an integer and floating-point instruction in TOQ to complete. Its priority is higher than *branch\_comp\_wait*, regardless of a given thread having commit priority.

#### 39 *fl\_comp\_wait* (non-speculative)

No instruction is committed waiting for a floating-point instruction in TOQ to complete, regardless of a given thread having commit priority.

#### 40 *d\_move\_wait* (non-speculative)

No instruction is committed waiting for register window, regardless of a given thread having commit priority.

#### 41 cse\_priority\_wait

No instruction is committed because the thread is waiting for commit priority. In SPARC64 VII, only one thread can commit instructions in a given cycle, and the priority is swithed every cycle as long as the other thread is active. *cse\_priority\_wait* counts the number of cycles the thread is ready to commit but does not have the right to do so. The event is counted only when there is an instruction to be committed for the thread.

#### 42 *Oendop* (non-speculative)

No instruction is committed regardless of whether the given thread has commit priority.

#### 43 *lendop* (non-speculative)

One instruction is committed.

#### 44 2endop (non-speculative)

Two instructions are committed.

#### 45 *3endop* (non-speculative)

Number of cycles three instructions are committed.

#### 46 *inh\_cmit\_gpr\_2write* (non-speculative)

Less than four instructions are committed due to lack of GPR write ports.

#### 47 *w\_op\_stv\_wait* (non-speculative)

Number of cycles op\_stv\_wait is observed for the thread that has commit priority.

#### 48 *w\_op\_stv\_wait\_nc\_pend* (non-speculative)

Number of cycles *op\_stv\_wait\_nc\_pend* is observed for the thread that has commit priority.

#### 49 *w\_op\_stv\_wait\_ex* (non-speculative)

Number of cycles *op\_stv\_wait\_ex* is observed for the thread that has commit priority.

#### 50 w\_op\_stv\_wait\_sxmiss (non-speculative)

Number of cycles *op\_stv\_wait\_sxmiss* is observed for the thread that has commit priority.

#### 51 w\_op\_stv\_wait\_sxmiss\_ex (non-speculative)

Number of cycles *op\_stv\_wait\_sxmiss\_ex* is observed for the thread that has commit priority.

#### 52 w\_cse\_window\_empty\_sp\_full (non-speculative)

Number of cycles *cse\_window\_empty\_sp\_full* is observed for the thread that has commit priority.

#### 53 *w\_cse\_window\_empty* (non-speculative)

Number of cycles *cse\_window\_empty* is observed for the thread that has commit priority.

#### 54 w\_branch\_comp\_wait (non-speculative)

Number of cycles *branch\_comp\_wait* is observed for the thread that has commit priority.

#### 55 *w\_eu\_comp\_wait* (non-speculative)

Number of cycles *eu\_comp\_wait* is observed for the thread that has commit priority.

#### 56 *w\_fl\_comp\_wait* (non-speculative)

Number of cycles *fl\_comp\_wait* is observed for the thread that has commit priority.

#### 57 w\_d\_move\_wait

Number of cycles *d\_move\_wait* is observed on the thread which has no right to commit.

#### 58 *w\_0endop* (non-speculative)

Number of cycles *0endop* is observed on the thread which has no right to commit.

#### 59 *rsf\_pmmi* (non-speculative)

Number of cycles where the processor was mixing single and double precision.

#### MMU and L1 cache Event Counters O.2.2

#### **Basic events**

1 write\_if\_uTLB

Counts the occurrences of instruction uTLB misses.

2 write op uTLB

Counts the occurrences of data uTLB misses.

Note – Occurrences of main TLB misses are counted by trap\_IMMU\_miss/ trap\_DMMU\_miss.

3 *if\_r\_iu\_req\_mi\_go* 

Counts the occurrences of I1 cache misses.

4 op\_r\_iu\_req\_mi\_go

Counts the occurrences of D1 cache misses.

5 if wait all

Counts the total latency of I1 cache misses. Sum of if wait=xxx is shown. Caution must be taken as it does not represent L1 instruction cache miss latency. Events measured in if wait=xxx are mutually exclusive, thus, at most one of if wait=xxx is counted up in a cycle. SPARC64 VII can process multiple cache misses in parallel since it employs a non-blocking cache, but only one (TOQ) of those accesses is measured.

6 op\_wait\_all

Counts the total latency of D1 cache misses. Sum of op\_wait=xxx is shown. Caution must be taken as it does not represent L1 instruction cache miss latency. Events measured in op wait=xxx are mutually exclusive, thus, at most one of op wait=xxx is counted up in a cycle. SPARC64 VII can process multiple cache misses in parallel since it employs a non-blocking cache, but only one (TOQ) of those accesses is measured. The condition where an access becomes a TOQ is beyond the scope of this document, but suffice it to say that a prefetch instruction can never become a TOO.

#### **Extended events**

#### 7 swpf\_success\_all

Number of prefetch instructions not lost in SU and sent to SX successfully.

#### 8 swpf\_fail\_all

Number of prefetch instructions lost in SU.

#### 9 swpf\_lbs\_hit

Number of prefetch instructions resulting in a L1-cache hit.

The number of prefetch instructions sent to SU = *swpf\_success\_all* + *swpf\_fail\_all* + *swpf\_lbs\_hit* 

#### 10 *if\_l1\_thrashing*

Counts the occurrences of a read port issuing a move-in request twice for a cache line before releasing the port. This could happen when an L1 instruction cache miss occurs, data is obtained, but then pushed out before reading.

#### 11 op\_l1\_thrashing

Counts the occurrences of a read port issuing a move-in request twice for a cache line before releasing the port. This could happen when an L1 data cache miss occurs, data is obtained, but then pushed out before reading.

### Q.2.3 L2 cache Event Counters

Most L2 cache access related counters are categorized as dm (demand) and pf (prefetch), but for these counters, it does not always correspond to load/store/atomic or prefetch instructions. This is because:

- a. If a load/store/atomic instruction can not be processed due to starvation of L1 cache resources, these requests are handled as if they were prefetches to L2 cache, which does not use L1 cache resources. These requests are treated as 'prefetch' in the L2 cache access related counters.
- b. SPARC64 VII employs hardware to prefetch data for a sequential access. A hardware prefetch request is treated as 'prefetch' in the L2 cache access related counters.

#### **Basic events**

#### 1 sx\_miss\_wait\_dm

Counts the number of cycles from the occurrence of an L2 cache miss to data returned, caused by demand access.

#### 2 sx\_miss\_wait\_pf

Counts the number of cycles from the occurrence of an L2 cache miss to data returned, caused by both software prefetch and hardware prefetch access.

#### 3 sx\_miss\_count\_dm

Counts the occurrences of L2 cache miss by demand access. A Request to the same line of outstanding access (not yet completed) is considered to be "hit" and not counted in this counter.

#### 4 sx\_miss\_count\_pf

Counts the occurrences of L2 cache miss by both software prefetch and hardware prefetch access.

#### 5 *sx\_read\_count\_dm*

Counts L2 cache references by demand read access. A cache access may be aborted for many reasons such as contention of resources. *sx\_read\_count\_dm* does not measure a retry of cache accesses. It double-counts multi-flow operations. Therefore the following equation is approximately true (but not precise):

*sx\_read\_count\_dm* + *sx\_read\_count\_pf* =

number of cache misses by L1I and L1D + number of non-lost hardware prefetch + number of physical address access which bypass the L1 cache (ASI:0x14, 0x1c, 0x34, 0x3c)

Requests from other CPUs (copyback/invalidate request) are not measured by this counter.

#### 6 *sx\_read\_count\_pf*

Counts L2 cache references by both software prefetch and hardware prefetch access.

#### 7 dvp\_count\_dm

Counts the occurrences of L2 cache miss by demand with writeback request.

### 8 *dvp\_count\_pf*

Counts the occurrences of L2 cache miss by both software prefetch and hardware prefetch, with writeback request.

#### **Extended events**

9 sx\_miss\_count\_dm\_if

Count of L2 cache miss by demand request for instruction fetch

10 sx\_miss\_count\_dm\_opsh

Count of L2 cache misses by demand request of shared type for operand access.

11 sx\_miss\_count\_dm\_opex

Count of L2 cache misses by demand request of exclusive type for operand access.

12 *sx\_btc\_count* 

Number of requests of exclusive type while the line exists in SX with the S or O attributes.

#### 13 lost\_softpf\_pfp\_full

Number of software prefetch requests lost due to PF port full.

#### 14 lost\_softpf\_by\_abort

Number of software prefetch requests lost due to SX pipe abort.

### Q.2.4 Jupiter Bus Event Counters

#### **Basic events**

1 jbus\_bi\_count

Counts the number of invalidation requests received.

2 jbus\_cpi\_count

Counts the number of copy and invalidate requests received.

#### 3 jbus\_cpb\_count

Counts the number of copyback requests received.

#### 4 jbus\_cpd\_count

Counts the number of block-load requests and reqd requests from IOs.

#### **Extended events**

#### 5 sn\_res\_64

The number of SC replies which indicate 1 subline (64 byte) will be transferred to the CPU.

#### 6 sn\_res\_256

The number of SC replies which indicate 4 sublines (256byte) will be transferred to the CPU.

#### 7 *jbus\_odrbus\_busy*

Counts the number of busy cycles for order buses from the SCs to the CPU in Jupiter Bus cycles. There are four order buses (maximum) connecting SCs and a CPU with dedicated event counters. *jbus\_odrbus\_busy* summarizes these counters.

*jbus\_odrbus\_busy* = *jbus\_odrbus0\_busy* + *jbus\_odrbus1\_busy* + *jbus\_odrbus2\_busy* + *jbus\_odrbus3\_busy* 

#### 8 *jbus\_reqbus\_busy*

Counts the number of busy cycles for request buses from the CPU to SCs in CPU cycles. There are four request buses (maximum) connecting a CPU and SCs with dedicated event counters. *jbus\_reqbus\_busy* summarizes these counters.

jbus\_reqbus\_busy = jbus\_reqbus0\_busy + jbus\_reqbus1\_busy + jbus\_reqbus2\_busy + jbus\_reqbus3\_busy

#### 9 jbus\_odrbus0\_busy

Counts the number of busy cycles for the bus from SC0 to the CPU.

#### 10 *jbus\_reqbus0\_busy*

Counts the number of busy cycles for the bus the CPU to SC0.

#### 11 *jbus\_odrbus1\_busy*

Counts the number of busy cycles for the bus from SC1 to the CPU.

#### 12 *jbus\_reqbus1\_busy*

Counts the number of busy cycles for the bus from the CPU to SC1.

#### 13 *jbus\_odrbus2\_busy*

Counts the number of busy cycles for the bus from SC2 to the CPU.

#### 14 jbus\_reqbus2\_busy

Counts the number of busy cycles for the bus from the CPU to SC2.

#### 15 *jbus\_odrbus3\_busy*

Counts the number of busy cycles for the bus from SC3 to the CPU.

#### 16 *jbus\_reqbus3\_busy*

Counts the number of busy cycles for the bus from the CPU to SC3.

### Q.2.5 Multi-thread specific Event Counters

#### **Extended events**

1 single\_mode\_cycle\_counts

Number of cycles the thread is active in single threaded mode.

2 single\_mode\_instructions

Number of committed instructions in single threaded mode.

3 *both\_threads\_active* 

Number of cycles both of the threads in a core are active and at least one entry of CSE in both threads are used.

#### 4 *both\_threads\_empty*

Number of cycles both of the threads in a core are active, but the CSE in both threads are empty.

#### 5 *both\_threads\_suspended*

Number of cycles when both of the threads in a core are in the suspended state.

#### 6 only\_this\_thread\_active

Number of cycles only this thread in a core is active and the other thread is in the suspended state.

#### 7 act\_thread\_suspend

Number of cycles that this thread is in the suspended state.

# Q.3 CPI analysis

A common way to identify a performance bottleneck in SPARC64 VII is to measure the number of stall cycles and the cause of the stall for each instruction. This is called CPI (Cycle Per Instruction) analysis. The performance events shown in Table Q-2 are useful for CPI analysis on a thread-base and a core-base. Note that using a sum of events for both threads leads to a core-based analysis. These events are all counted at the commit stage.

|       | er of instructions<br>ycles committed                  | Factors to prevent                    | the next instruction from comm                                                                                                                                                                           | itting                                                                                                                                                              |  |  |
|-------|--------------------------------------------------------|---------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Inst. | Cycle                                                  |                                       | Thread-based analysis Core-based analysis <sup>1</sup>                                                                                                                                                   |                                                                                                                                                                     |  |  |
| 4     | cycle_counts<br>- 3endop - 2endop<br>- 1endop - 0endop | N/A (Four instruc                     | ctions are committed in a cycle)                                                                                                                                                                         |                                                                                                                                                                     |  |  |
| 3     | 3endop                                                 | inh_cmit_gpr_2                        | write + misc.                                                                                                                                                                                            |                                                                                                                                                                     |  |  |
| 2     | 2endop                                                 | misc. = 2endop                        | p + 3endop - inh_cmit_gpr_2v                                                                                                                                                                             | write                                                                                                                                                               |  |  |
| 1     | lendop                                                 | misc. = 1endop                        |                                                                                                                                                                                                          |                                                                                                                                                                     |  |  |
| 0     | 0endop                                                 | Others<br>wait for commit<br>priority | Oendop<br>- d_move_wait<br>- cse_priority_wait<br>- op_stv_wait<br>- cse_window_empy<br>- eu_comp_wait<br>- branch_comp_wait<br>- (instruction_flow_counts<br>- instruction_counts)<br>cse_priority_wait | w_Oendop<br>- w_d_move<br>- w_op_stv_wait<br>- w_cse_window_empy<br>- w_eu_comp_wait<br>- w_branch_comp_wait<br>-(instruction_flow_counts)<br>- instruction_counts) |  |  |
|       |                                                        | Execution                             | eu_comp_wait<br>+ branch_comp_wait                                                                                                                                                                       | w_eu_comp_wait<br>+ w_branch_comp_wait                                                                                                                              |  |  |
|       |                                                        | Fetch miss                            | cse_window_empy                                                                                                                                                                                          | w_cse_window_empy                                                                                                                                                   |  |  |
|       |                                                        | L1D cache miss                        | <i>op_stv_wait</i><br>- L2 cache miss                                                                                                                                                                    | w_op_stv_wait<br>- L2 cache miss                                                                                                                                    |  |  |
|       |                                                        | L2 cache miss                         | op_stv_wait_sxmiss<br>+ op_stv_wait_nc_pend                                                                                                                                                              | w_op_stv_wait_sxmiss<br>+ w_op_stv_wait_nc_pend                                                                                                                     |  |  |

TABLE Q-2 Performance events useful for CPI analysis

1.Use sum of events in both threads.

# Q.4 Shared performance events between threads

The performance counters (PCR and PIC) are not shared between threads. This is true for performance events as well. In other words, a given performance event increments a performance counter of one and only one thread which has triggered the event.

But there are some exceptions. The following performance events are shared among all eight threads. That is, each event increments PICs for all of the threads.

- cycle\_counts
- Jupiter Bus events

These performance events are shared by two threads in a core.

both\_threads\_active, both\_threads\_empty, both\_threads\_suspended

# Q.5 Differences of Performance Events Between SPARC64 VI and SPARC64 VII

As defined in Section Q.2, *Performance Event Description*, on page 219, extended events may change in definition, or even existence, without notice. Some events found in SPARC64 VI no longer exist in SPARC64 VII. This section summarize the difference of extended events in these CPUs.

| Encoding | Counter | SPARC64 VI | SPARC64 VII                               | Reason                           |
|----------|---------|------------|-------------------------------------------|----------------------------------|
| 0000102  | picl0   | Reserved   | only_this_thread_ac<br>tive               | Add SMT event                    |
| 0000102  | picu1   | Reserved   | single_mode_cycle_<br>counts              | Add SMT event                    |
| 0000102  | picl1   | Reserved   | single_mode_instruc Add SMT even<br>tions |                                  |
| 0000102  | picl2   | Reserved   | d_move_wait                               | Microarchitecture design changed |
| 0000102  | picu3   | Reserved   | cse_priority_wait                         | Add SMT event                    |
| 0000102  | picl3   | Reserved   | xma_inst                                  | New Instruction                  |
| 0000112  | picl0   | Reserved   | w_cse_window_emp<br>ty                    | Add SMT event                    |

| Encoding | Counter | SPARC64 VI         | SPARC64 VII                    | Reason                    |
|----------|---------|--------------------|--------------------------------|---------------------------|
| 0000112  | picu1   | Reserved           | w_eu_comp_wai                  | Add SMT event             |
| 0000112  | picl1   | Reserved           | w_branch_comp_wa<br>it         | Add SMT event             |
| 0000112  | picl2   | Reserved           | w_op_stv_wait                  | Add SMT event             |
| 0000112  | picu3   | Reserved           | w_d_move                       | Add SMT event             |
| 0000112  | picl3   | Reserved           | w_0endop                       | Add SMT event             |
| 0001002  | picl0   | Reserved           | w_op_stv_wait_nc_p<br>end      | Add SMT event             |
| 0001002  | picu1   | Reserved           | w_op_stv_wait_sxmi<br>ss       | Add SMT event             |
| 0001002  | picl1   | Reserved           | w_op_stv_wait_sxmi<br>ss_ex    | Add SMT event             |
| 0001002  | picl2   | Reserved           | w_fl_comp_wait                 | Add SMT event             |
| 0001002  | picu3   | Reserved           | w_cse_window_emp<br>ty_sp_full | Add SMT event             |
| 0001002  | picl3   | Reserved           | w_op_stv_wait_ex               | Add SMT event             |
| 0110002  | picu0   | thread_switch_all  | only_this_thread_ac<br>tive    | VMT to SMT                |
| 0110002  | picl0   | ts_by_sxmiss       | both_threads_active            | VMT to SMT                |
| 0110002  | picu1   | ts_by_data_arrive  | both_threads_empty             | VMT to SMT                |
| 0110002  | picl1   | ts_by_timer        | Reserved                       | Remove VMT event          |
| 0110002  | picu2   | ts_by_intr         | Reserved                       | Remove VMT event          |
| 0110002  | picl2   | ts_by_if           | Reserved                       | Remove VMT event          |
| 0110002  | picl3   | ts_by_suspend      | Reserved                       | Remove VMT event          |
| 0110012  | picl3   | ts_by_other        | Reserved                       | Remove VMT event          |
| 0110102  | all     | active_cycle_count | Reserved                       | Remove VMT event          |
| 0111012  | picl3   | Reserved           | both_threads_suspen<br>ded     | Add SMT event             |
| 1000112  | picu0   | Reserved           | if_l1_thrashing                | Enhance Microarchitecture |
| 1000112  | picl0   | Reserved           | op_l1_thrashing                | Enhance Microarchitecture |

# Jupiter Bus Programmer's Model

This chapter describes the programmers model of the Jupiter Bus interface of the SPARC64 VII. The registers for the Jupiter Bus interface and the access method for those registers are described.

# R.3 Jupiter Bus Config Register

The Jupiter Bus Config Register is an implementation-specific ASI read-only register. This register is accessible in the ASI  $4A_{16}$  space from the processor.

| [1] | Register Name: | ASI_JB_CONFIG_REGISTER               |
|-----|----------------|--------------------------------------|
| [2] | ASI:           | 4A <sub>16</sub>                     |
| [3] | VA:            | 0                                    |
| [4] | RW             | Supervisor read, a write is ignored. |
| [5] | Data           |                                      |

The Jupiter Bus Config Register is illustrated below and described in TABLE R-1.

| Reserved |    | UC_S  | UC_SW | CLK_MODE | Reserved | ITID |
|----------|----|-------|-------|----------|----------|------|
| 63 2     | 20 | 19 17 | 16    | 15 11    | 10       | 9 0  |

| TABLE R-1 | Jupiter Bus | Config Register Description |  |
|-----------|-------------|-----------------------------|--|
|           |             |                             |  |

| Bits  | Field | RW | Description             |
|-------|-------|----|-------------------------|
| 63:20 | —     | R  | Reserved. Read as 0.    |
| 19:17 | UC_S  | R  | U2 cache size:          |
|       |       |    | 100 <sub>2</sub> : 4 MB |
|       |       |    | 101 <sub>2</sub> : 5 MB |
|       |       |    | $110_{2}$ : 6 MB        |

| Bits  | Field      | RW | Description                                         |                                      |
|-------|------------|----|-----------------------------------------------------|--------------------------------------|
| 16    | UC_SW      | R  | U2 cache size per way                               | у                                    |
|       |            |    | 0: 0.5 MB                                           |                                      |
|       |            |    | 1: 1 MB                                             |                                      |
| 15:11 | CLK_MODE R |    | Specify the ratio between CPU clock and JBUS clock. |                                      |
|       |            |    | $00000_2 - 01011_2$ :                               | Reserved                             |
|       |            |    | 01100 <sub>2</sub> :                                | 3:1                                  |
|       |            |    | $01101_{2}$ :                                       | 3.25:1                               |
|       |            |    | $01110_2$ :                                         | 3.5:1                                |
|       |            |    |                                                     |                                      |
|       |            |    | 11110 <sub>2</sub> :                                | 7.5:1                                |
| 9:0   | ITID       | R  | This field shows ITIE                               | (Interrupt Target ID) of the thread. |

 TABLE R-1
 Jupiter Bus Config Register Description (Continued)

# Summary Differences Between SPARC64 VI and SPARC64 VII

The following table summarizes differences between SPARC64 VI and SPARC64 VII ISA. This list is a summary, not an exhaustive list.

|             |                             | SPARC64 VI                                        | SPARC64 VII                                                                      | SPARC64 VII<br>page             |
|-------------|-----------------------------|---------------------------------------------------|----------------------------------------------------------------------------------|---------------------------------|
| Chip        | Chip<br>Architecture        | 2CORE x 2VMT<br>128KB(I) + 128KB(D) L1-Cache/core | 4CORE x 2SMT<br>64KB(I) + 64KB(D) L1-Cache /core                                 | 2, 45<br>148                    |
| MMU         | Newly Added<br>Features     | N/A                                               | fTLB as a victim cache<br>Shared Context<br>TSB Prefetch                         | 117<br>114<br>127               |
|             | Removed<br>Features         | sTLB hash                                         | N/A                                                                              | 116                             |
| Instruction | Modified<br>Instructions    | N/A                                               | sleep<br>prefetch                                                                | 60<br>70                        |
|             | Newly Added<br>Instructions | N/A                                               | FPMADDXHI, FPMADDX                                                               | 61                              |
| Register    | Newly Added<br>Registers    | N/A                                               | SHARED_CONTEXT<br>I/DTSB_PREFETCH<br>BARRIER_INIT<br>BARRIER_ASSIGN<br>LBSY, BST | 114<br>127<br>143<br>144<br>145 |
|             | Removed<br>Registers        | L2_DIAG_TAG_READ<br>L2_DIAG_TAG_READ_REG          | N/A                                                                              | N/A                             |
|             | Modified<br>Registers       | VER                                               | VER                                                                              | 18                              |

# Index

# Α

A UGE categories 175 error detection action 180 error detection mask 179 specification of 175 address mask (AM) field of PSTATE register 53 address space identifier (ASI) complete list 137 ADE conditions causing 192 end-method 194 registers written for update/validation 193 software handling 195 state transition 192 see also *async\_data\_error* ASI\_AFAR\_D1 167, 190, 199, 206 ASI\_AFAR\_U2 167, 190, 199, 206 ASI\_AFSR, see ASI\_ASYNC\_FAULT\_STATUS ASI\_ASYNC\_FAULT\_STATUS 178, 198, 198, 206 ASI\_ATOMIC\_QUAD\_LDD\_PHYS 64, 129, 137, 138 ASI\_ATOMIC\_QUAD\_LDD\_PHYS\_LITTLE 64, 129, 137 ASI\_DCU\_CONTROL\_REGISTER 138 ASI\_DCUCR 138 ASI\_DMMU\_SFAR 178 ASI\_DMMU\_SFSR 178 ASI\_DMMU\_TAG\_ACCESS 190

ASI\_DMMU\_TAG\_TARGET 190 ASI\_DMMU\_TSB\_64KB\_PTR 190 ASI\_DMMU\_TSB\_8KB\_PTR 190 ASI\_DMMU\_TSB\_BASE 190 ASI\_DMMU\_TSB\_DIRECT\_PTR 190 ASI\_DMMU\_TSB\_NEXT 190 ASI\_DMMU\_TSB\_PEXT 190 ASI\_DMMU\_TSB\_PTR 204 ASI\_DMMU\_TSB\_SEXT 190 ASI\_DSFSR FTYPE field 140, 141 ASI\_DTLB\_DATA\_ACCESS 214 ASI\_DTLB\_TAG\_ACCESS 214 ASI\_ECR 185 UGE\_HANDLER 180 ASI EIDR 178, 185, 188, 190, 207, 210 ASI\_ERROR\_CONTROL 178, 185 **UGE HANDLER 192** update after ADE 193 WEAK\_ED 174 ASI\_FLUSH\_L1I 148, 151, 152 ASI\_IESR 138 ASI\_IMMU\_SFSR 178 ASI\_IMMU\_TAG\_ACCESS 190 ASI\_IMMU\_TAG\_TARGET 190 ASI\_IMMU\_TSB\_64KB\_PTR 190

ASI\_IMMU\_TSB\_8KB\_PTR 190 ASI\_IMMU\_TSB\_BASE 190 ASI\_IMMU\_TSB\_PEXT 190 ASI\_IMMU\_TSB\_SEXT 190 ASI\_INT\_ERROR\_CONTROL 138 ASI\_INT\_ERROR\_RECOVERY 138 ASI\_INT\_ERROR\_STATUS 138 ASI\_INTR\_DISPATCH\_STATUS 156 ASI\_INTR\_DISPATCH\_W 190 ASI\_INTR\_R 157, 190 ASI\_INTR\_RECEIVE 157 ASI\_INTR\_W 155, 156 ASI\_ITLB\_DATA\_ACCESS 214 ASI\_ITLB\_TAG\_ACCESS 214 ASI\_JB\_CONFIG\_REGISTER 205, 239 ASI\_L2\_CTRL 152 ASI\_MCNTL 109 JPS1\_TSBP 105 ASI\_MEMORY\_CONTROL\_REG 138 ASI\_NUCLEUS 70, 119, 122 ASI\_NUCLEUS\_LITTLE 70, 122 ASI\_PA\_WATCH\_POINT 188, 190 ASI\_PHYS\_BYPASS\_EC\_WITH\_E\_BIT 149 ASI\_PHYS\_BYPASS\_EC\_WITH\_E\_BIT\_LITTLE 149 ASI\_PHYS\_BYPASS\_WITH\_EBIT 26 ASI\_PRIMARY 70, 119, 122 ASI\_PRIMARY\_AS\_IF\_USER 70 ASI\_PRIMARY\_AS\_IF\_USER\_LITTLE 70 ASI PRIMARY CONTEXT 190 ASI\_PRIMARY\_LITTLE 70, 122 ASI\_SCRATCH 140 ASI\_SECONDARY 70 ASI\_SECONDARY\_AS\_IF\_USER 70 ASI\_SECONDARY\_AS\_IF\_USER\_LITTLE 70 ASI\_SECONDARY\_CONTEXT 190 ASI\_SECONDARY\_LITTLE 70 ASI\_SERIAL\_ID 48, 139 ASI\_STCHG\_ERROR\_INFO 178

ASI\_UGESR 189 IUG\_DTLB 214 ASI\_UPA\_CONFIGURATION\_REGISTER 138 ASI\_URGENT\_ERROR\_STATUS 178, 189 ASI\_VA\_WATCH\_POINT 188, 190 ASRs 18 *async\_data\_error* exception 3, 25, 38, 39, **39**, 39, 40, 50, 88, 89, 93, 175, 176, 179, 180, 186, 188, 189, 191, **192**, 192 asynchronous error 15 atomic load quadword 64 load-store instructions compare and swap 37

#### В

block block store with commit 140 load instructions 140 store instructions 140 blocked instructions 11 branch history buffer **2**, 2, 6, 30 branch instructions 22 BRHIS, see *branch history buffer* 30 bypass attribute bits 129

# С

cache coherence 150, 164 data cache tag error handling 208 characteristics 149 data error detection 209 description 7 modification 147 protection 209 uncorrectable data error 210 way reduction 212 error protection 3 event counting ??–232 instruction

characteristics 148 data protection 209 description 7 error handling 209 fetched 9 flushing/invalidation 151 invalidation 147 way reduction 212 level-1 characteristics 147 level-2 characteristics 147 control register 152 unified 149 use 2 snooping 164 synchronizing 42 unified characteristics 149 description 7 CALL instruction 22, 28, 63 CANRESTORE register 190 CANSAVE register 190 CASA instruction 26, 37, 123 CASXA instruction 26, 37, 123 catastrophic\_error exception 37 CE correction 182 counting in D1 cache data 212 in D1 cache data 209 in U2 cache tag 208 Chip Multi Processing 45 CLEANWIN register 92, 190 CLEAR\_SOFTINT register 203 cmask field 67 CMP 46 CMP, see Chip Multi Processing Commit Stack Entry 6, 32, 225 committed, definition 9, 10, 11, 12 compare and swap instructions 37 completed, definition 9 context ID hashing 110 core 3, 4, 10, 40, 45, 46, 47

counter disabling/reading 218 enabling 218 overflow (in PIC) 20 CPopn instructions (SPARC V8) 54 CSE, see *Commit Stack Entry* current exception (*cexc*) field of FSR register **16** CWP register 92, 190

# D

DAE error detection action 180, 186 error detection mask 179 reporting 174 data cacheable doubleword error marking 183 error marking 182 error protection 182 prefetch 25 data\_access\_error exception 65, 107, 124, 152, 175 data\_access\_exception exception 64, 107, 123, 124, 140, 151 data\_access\_MMU\_miss exception 50 data\_access\_protection exception 50, 65 data\_breakpoint exception 89 DCR error handling 203 nonprivileged access 20 DCU\_CONTROL register 205 DCUCR access data format 21 CP (cacheability) field 21 CV (cacheability) field 21 data watchpoint masks 68 DC (data cache enable) field 21 DM (DMMU enable) field 21 field setting after POR 20 IC (instruction cache enable) field 21 IM field 148, 164

IMI (IMMU enable) field 21 PM (PA data watchpoint mask) field 21 PR/PW (PA watchpoint enable) fields 21 updating 164 VM (VA data watchpoint mask) field 21 VR/VW (VA data watchpoint enable) fields 21 WEAK\_SPCA field 21 deferred trap 37 deferred-trap queue floating-point (FQ) 15, 22 integer unit (IU) 11, 15, 23, 87 denormalized operands 16 results 16 DG L1\$L2\$STLB error 213 DG\_L1\$U2\$STLB error 213 dispatch (instruction) 9 disrupting traps 15, 37 distribution nonspeculative 10 speculative 11 D-MMU Secondary Context Register 113, 115 DMMU access bypassing 129 disabled 108 internal register (ASI\_MCNTL) 109 registers accessed 109 Synchronous Fault Status Register 118 Tag Access Register 107 DMMU\_DEMAP register 207 DMMU PA WATCHPOINT register 206 DMMU\_SFAR register 206 DMMU\_SFSR register 206 DMMU\_TAG\_ACCESS register 206 DMMU\_TAG\_TARGET register 206 DMMU\_TSB\_64KB\_PTR register 207 DMMU\_TSB\_8KB\_PTR register 207 DMMU\_TSB\_BASE register 206

DMMU\_TSB\_DIRECT\_PTR register 207 DMMU TSB NEXT register 207 DMMU\_TSB\_PEXT register 206 DMMU\_TSB\_SEXT register 207 DMMU\_VA\_WATCHPOINT register 206 DSFAR on JMPL instruction error 63 update during MMU trap 107 DSFSR bit description 121 format 118 FT field 123, 124, 151 on JMPL instruction error 63 UE field 122 update during MMU trap 107 update policy 124 DTLB\_DATA\_ACCESS register 207 DTLB\_DATA\_IN register 207 DTLB\_TAG\_READ register 207

#### Ε

E bit of PTE 26 ECC\_error exception 50, 176, 180, 200 ee\_second\_watch\_dog\_timeout 188 ee\_sir\_in\_maxtl 188 ee\_trap\_addr\_uncorrected\_error 188 ee\_trap\_in\_maxtl 188 ee\_watch\_dog\_timeout\_in\_maxtl 188 error asynchronous 15 categories 171 classification 3 correctable 176, 208 correction, for single-bit errors 3 D1 cache data 209 fatal 172 handling ASI errors 205 ASR errors 202

most registers 201 isolation 3 restrainable 176 source identification 183 transition 172 U2 cache tag 208 uncorrectable 208 D1 cache data 210 without direct damage 176 urgent 173 ERROR\_CONTROL register 206 ERROR\_MARK\_ID 183, 184, 210 error\_state 36, 89, 162, 164, 180, 192 exceptions catastrophic 37 data\_access\_error 65 data\_access\_protection 65 data\_breakpoint 89 fp\_exception\_ieee\_754 58, 81 fp\_exception\_other 78,96 illegal\_instruction 29, 58, 62, 68, 87, 88, 91 LDDF\_mem\_address\_not\_aligned 97, 140 mem\_address\_not\_aligned 97, 140 persistence 38 privileged\_action 96 statistics monitoring ??–223 unfinished\_FPop 78, 81 execute\_state 164 executed, definition 9 execution EU (execution unit) 6 out-of-order 25 speculative 25

### F

fast\_data\_access\_MMU\_miss exception 107 fast\_data\_access\_protection exception 107, 123 fast\_data\_instruction\_access\_MMU\_miss exception 223 fast\_instruction\_access\_MMU\_miss exception 50, 107,

120, 121, 223 fatal error behavior of CPU 172 cache tag 208 definition 172 detection 187 U2 cache tag 208 fDTLB 94, 102, 108 fe\_other 188 fe\_upa\_addr\_uncorrected\_error 188 fetched, definition 9 *fill\_n\_normal* exception 223 fill\_n\_other exception 223 finished, definition 9 fITLB 94, 102, 107 floating-point deferred-trap queue (FQ) 15, 22 denormalized operands 16 denormalized results 16 operate (FPop) instructions 16 trap types fp\_disabled 52, 58, 68, 91 unimplemented\_FPop 87 FLUSH instruction 87.89 FMADD instruction 29, 49, 55 FMSUB instruction 29, 49, 55 FNMADD instruction 49, 55 FNMSUB instruction 49, 55 formats, instruction 27 fp\_disabled exception 29, 52, 58, 68, 91 fp\_exception\_ieee\_754 exception 58, 81 fp\_exception\_other exception 50, 78, 96 FQ 15, 22 FSR aexc field 17 cexc field 16.17 conformance 17 NS field 78 TEM field 17 VER field 16

fTLB 95, 104, 107, 115, 116, 117, 119, 121, 129, 130, 214

### G

GSR register 203

## I

I UGE definition 174 error detection action 180, 186 error detection mask 179 type 173 IAE error detection action 180 error detection mask 179 reporting 174 IEEE Std 754-1985 16, 77 IIU\_INST\_TRAP register 50, 207 illegal\_instruction exception 22, 29, 58, 62, 68, 87, 88, 91 IMMU internal register (ASI\_MCNTL) 109 registers accessed 109 Synchronous Fault Status Register 118 IMMU\_DEMAP register 206 IMMU\_SFSR register 206 IMMU\_TAG\_ACCESS register 206, 207 IMMU\_TAG\_TARGET register 206 IMMU\_TSB\_64KB\_PTR register 206 IMMU\_TSB\_8KB\_PTR register 206 IMMU\_TSB\_BASE register 206, 207 IMMU\_TSB\_NEXT register 206 IMMU\_TSB\_PEXT register 206 IMPDEP1 instruction 29, 54, 90 **IMPDEP1** instructions 101 IMPDEP2 instruction 29, 54, 57, 90, 100 IMPDEP2A instruction 61 **IMPDEP2B** instruction 27, 55 IMPDEPn instructions 54, 55

impl field of VER register 16 implementation number (impl) field of VER register 87 initiated, definition 9 instruction execution 25 formats 27 prefetch 26 instruction fields, reserved 49 instruction\_access\_error exception 50, 107, 119, 121, 152.175.215 instruction\_access\_exception exception 50, 107, 120, 121 instruction\_access\_MMU\_miss exception 50 instructions atomic load-store 37 blocked 11 cache manipulation 151-?? cacheable 148 committed, definition 9, 10, 11, 12 compare and swap 37 completed, definition 9 control unit (IU) 6 count. committed instructions 222 executed, definition 9 fetched, definition 9 fetched, with error 209 finished, definition 9 floating-point operate (FPop) 16 FLUSH 89 IMPDEP2 90 implementation-dependent (IMPDEP2) 29 implementation-dependent (IMPDEPn) 54, 55 initiated, definition 9 issued, definition 9 LDDFA 97 prefetch 108 reserved fields 49 stall 10 timing 50 integer unit (IU) deferred-trap queue 11, 15, 23, 87

internal ASI, reference to 124 interrupt causing trap 15 dispatch 155 level 15 20 Interrupt Vector Dispatch Register 158 Interrupt Vector Receive Register 158 interrupt\_level\_n exception 223 *interrupt\_level\_n* exception 60 interrupt\_vector\_trap exception 38, 60, 223 INTR\_DATA0:7\_R register, error handling 207 INTR\_DATA0:7\_W register, error handling 207 INTR\_DISPATCH\_STATUS register 155, 205 INTR\_DISPATCH\_W register 207 INTR\_RECEIVE register 205 I-SFSR update during MMU trap 107 ISFSR bit description 119 format 118 FT field 120 update policy 121 issue unit 9 issued (instruction) 9 issue-stalling instruction instructions issue-stalling 10 ITLB\_DATA\_ACCESS register 206 ITLB\_DATA\_IN register 206 ITLB\_TAG\_READ register 206

# J

JEDEC manufacturer code 18 JMPL instruction 28, 63 JPS1\_TSBP mode 110 JTAG command 188 Jupiter Bus 7, 8, 38, 68, 91, 107, 108, 162, 182, 184, 188, 204, 205, 211, 219, 232, 237, 239 Jupiter Bus Config Register 239

### L

LDD instruction 37 LDDA instruction 37, 64, 123, 124 LDDF\_mem\_address\_not\_aligned exception 97, 140 LDDFA instruction 97.140 LDQF\_mem\_address\_not\_aligned exception 50 LDSTUB instruction 26, 37, 123 LDSTUBA instruction 123 LDXA instruction 214 le 46 load quadword atomic 64 LoadLoad MEMBAR relationship 66 load-store instructions compare and swap 37 D1 cache data errors 210 memory model 51 LoadStore MEMBAR relationship 66 Lookaside MEMBAR relationship 67

#### Μ

machine sync 10 MAXTL 36, 90, 162, 164 MCNTL.NC\_CACHE 148, 149 mem\_address\_not\_aligned exception 64, 97, 107, 124, 140, 151 MEMBAR #LoadLoad 66 #LoadStore 66 #Lookaside 67 #MemIssue 67 #StoreLoad 66 #Svnc 67 blockload and blockstore 51 functions 66 in interrupt dispatch 156 instruction 66 partial ordering enforcement 67 membar\_mask field 66 memory model

**PSO 41 RMO 41** store order (STO) 91 TSO 41, 42 MEMORY\_CONTROL register 205 mmask field 66 MMU disabled 108 event counting 229, 230 exceptions recorded 107 Memory Control Register 109 physical address width 104 registers accessed 109 Synchronous Fault Address Registers 126, 163 TLB data access address assignment 116 TLB organization 102 MOESI cache-coherence protocol 150 MT, see Multi-thread Multi-thread 2, 4, 45, 45, 46

#### Ν

noncacheable access 64, 148 nonleaf routine 63 nonspeculative distribution 10 nonstandard floating-point (NS) field of FSR register 16, 88 nonstandard floating-point mode 16, 78

# 0

OBP facilitating diagnostics 148 notification of error 187 resetting WEAK\_ED 174 validating register error handling 201 with urgent error 175 Operating Status Register (OPSR) 36, 164 OTHERWIN register 92, 190 out-of-order execution 25

#### Ρ

panic process 175 parity error counting in D1 cache 212 D1 cache tag 208 fDTLB lookup 108 I1 cache data 209 I1 cache tag 208 partial ordering, specification 67 partial store instruction watchpoint exceptions 68 partial store instructions 140 partial store order (PSO) memory model 41 PC register 45, 46, 193 PCR accessibility 18 counter events, selection 218 error handling 203 NC field 19 OVF field 19 **OVRO** field 19 PRIV field 18, 72, 74 SC field 19, 218 SL field 218 ST field 222 SU field 218 UT field 222 performance monitor events/encoding 219 groups 219 pessimistic overflow 81 pessimistic zero 80 PIC register clearing 217 counter overflow 20 error handling 203 nonprivileged access 20 OVF field 20 PIL register 38

POPC instruction 49, 69 POR reset 180, 185, 187, 198 power-on reset (POR) DCUCR settings 20 implementation dependency 89 RED\_state 164 precise traps 15, 37 prefetch data 25 instruction 26, 108 variants 70 prefetcha instruction 70 PRIMARY\_CONTEXT register 206 privileged registers 17 privileged\_action exception 18, 96, 107, 124, 137 PCR access 72, 74 privileged\_opcode exception 20 processor states after reset 165 error state 36, 89, 164 execute\_state 164 RED\_state 36, 164 program counter (PC) register 92 program order 26 **PSTATE** register AM field 28, 53, 92 IE field 156. 157 MM field 42 PRIV field 18, 72, 74 RED field 17, 148, 164, 165 PTE E field 26

# Q

quadword-load ASI 64 queues 11

# R

RAS, see Return Stack Address 28, 29, 30, 63

**RDPCR** instruction 18, 72 **RDTICK** instruction 17 reclaimed status 11 RED state 180, 193 entry after failure/reset 36 entry after SIR 162 entry after WDR 164 entry after XIR 162 entry trap 15 processor states 164, 165 restricted environment 36 setting of PSTATE.RED 17 trap vector 36 trap vector address (RSTVaddr) 91 registers clean windows (CLEANWIN) 92 clock-tick (TICK) 90 current window pointer (CWP) 92 Data Cache Unit Control (DCUCR) 21 other windows (OTHERWIN) 92 privileged 17 renaming 11 restorable windows (CANRESTORE) 92 savable windows (CANSAVE) 92 relaxed memory order (RMO) memory model 41 reservation station 11 reserved fields in instructions 49 reset externally\_initiated\_reset (XIR) 162 power\_on\_reset (POR) 89 software\_initiated\_reset (SIR) 162 resets POR 180, 185, 187, 198 WDR 180, 187 restorable windows (CANRESTORE) register 92 restrainable error definitions 176 handling ASI\_AFSR.UE\_DST\_BETO 199 ASI\_AFSR.UE\_RAW\_L2\$FILL 200

UE\_RAW\_D1\$INSD 200 UE\_RAW\_L2\$INSD 200 software handling 199 types 176 Return Address Stack **28**, 30, 53, 63 return prediction hardware 28 RMO, see *relaxed memory ordering* rs3 field of instructions 27 RSTVaddr 36, 91, 162, 164

#### S

S\_CPB\_REQ packets received count 233 S\_CPD\_REQ packets received count 233 S\_CPI\_REQ packets received count 232 S\_INV\_REQ packets received count 232 savable windows (CANSAVE) register 92 SAVE instruction 63 scan definition 11 ring 11 sDTLB 94, 102 SECONDARY\_CONTEXT register 206 SERIAL\_ID register 206 SET\_SOFTINT register 203 SHARD CONTEXT register 207 SHUTDOWN instruction 73 Simultaneous Multi-thread 46 SIR instruction 162 sITLB 94, 102, 107 size field of instructions 27 SLEEP instruction 49, 54, 90, 101 SMT 46.241 SMT. see Simultaneous Multi-Thread SOFTINT register 38, 157, 190, 203 speculative distribution 11 execution 25 *spill\_n\_normal* exception 223 spill\_n\_other exception 223

stall (instruction) 10 STBAR instruction 75 STCHG\_ERROR\_INFO register 206 STD instruction 37 STDA instruction 37 STDFA instruction 140 STICK 60 STICK register 190, 203 STICK\_COMP register 190 STICK\_COMPARE register 203 sTLB 7, 94, 95, 102, 103, 104, 110, 111, 115, 116, 119, 121, 125, 129, 130, 198, 213, 214, 215 Store Buffer 7 store order (STO) memory model 91 StoreLoad MEMBAR relationship 66 StoreStore MEMBAR relationship 66 STQF\_mem\_address\_not\_aligned exception 50 superscalar 11, 25 SUSPEND instruction 49, 54, 90, 101 suspended state 48, 59, 172, 173, 175, 176, 177, 179, 235 SWAP instruction 26, 37, 123 SWAPA instruction 123 sync (machine) 11 Sync MEMBAR relationship 67 synchronizing caches 42 syncing instruction 11

#### Т

Tag Access Register 117 Tcc instruction, counting 223 Thread 46 thread 4, 11, **12**, 45, 46, 47, 48 Threads 46 threads 46 TICK register 17, 90 TICK\_COMPARE register 203 TL register 162, 164 TLB

CP field 148 data characteristics 94 in TLB organization 102 data access address 116 Data Access/Data In Register 117 index 116 instruction characteristics 94 in TLB organization 102 main 10, 36 multiple hit detection 103 replacement algorithm 116 TNP register 190 total store order (TSO) memory model 41, 42 TPC register 190 transition error 172 traps deferred 37 disrupting 15, 37 precise 15 TSB Base Register 118 Extension Register 118 size 118 TSB Prefetch 105 **TSB** Prefetch Registers 127 **TSTATE** register CWP field 17 error bit in ASI\_UCESR register 190 TTE CV field 148

### U

U2 cache operation control (SXU) 7 tag error protection 208 uncorrectable data error 211 way reduction 213 uDTLB 10, 102 UE\_RAW\_D1\$INSD error 210 uITLB 10, 102, 107 uncorrectable error 176, 191 unfinished\_FPop exception 78, 81 unimplemented\_FPop floating-point trap type 87 unimplemented\_LDD exception 50 unimplemented\_STD exception 50 urgent error definition 173 types A\_UGE 173 DAE 173 IAE 173 instruction-obstructing 173 URGENT\_ERROR\_STATUS register 206 uTLB 10, 36, 103

#### ۷

VA\_watchpoint exception 124
var field of instructions 27
VER register 18, 139
version (ver) field of FSR register 88
Vertical Multi-thread 45
virtual 45
Virtual Processor 45
VIS instructions

encoding 101

VMT 46, 241
VMT, see Vertical Multi-thread

#### W

watchdog timeout 188, 190, 208
watchdog\_reset (WDR) 37, 96, 164
watchpoint exception

on block load-store 52
on partial store instructions 68
quad-load physical instruction 65

WDR reset 180, 187
Write Buffer 7

writeback cache 149 WRPCR instruction 18, 74 WRPR instruction 164, 165