NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
Am486DX Am486DX5-133 Am486DX4-100 Am486DX2-66 AM486 PDE-208 CGM-168 - Datasheet Archive
Enhanced Am486 DX Microprocessor Family ® DISTINCTIVE CHARACTERISTICS s High-Performance Design - Industry-standard
PRELIMINARY Enhanced Am486 DX Microprocessor Family ® DISTINCTIVE CHARACTERISTICS s High-Performance Design - Industry-standard write-back cache support Frequent instructions execute in one clock 105.6-million bytes/second burst bus at 33 MHz Flexible write-through and write-back address control - Advanced 0.35-µ CMOS-process technology - Dynamic bus sizing for 8-, 16-, and 32-bit buses - Supports "soft reset" capability s High On-Chip Integration - 16-Kbyte unified code and data cache - Floating-point unit - Paged, virtual memory management s Enhanced System and Power Management - Stop clock control for reduced power consumption - Industry-standard two-pin System Management Interrupt (SMI) for power management independent of processor operating mode and operating system - Static design with Auto Halt power-down support - Wide range of chipsets supporting SMM available to allow product differentiation s Complete 32-Bit Architecture - Address and data buses - All registers - 8-, 16-, and 32-bit data types s Standard Features - 3-V core with 5-V tolerant I/O Wide range of chipsets and support available through the AMD FusionE86SM Program s 168-Pin PGA Package or 208-Pin SQFP Package s IEEE 1149.1 JTAG Boundary-Scan Compatibility GENERAL DESCRIPTION The Enhanced Am486®DX Microprocessor Family is an addition to the AMD E86 family of embedded microprocessors. This new family enhances system performance by incorporating a 16-Kbyte write-back cache to the existing flexible clock control and enhanced SMM features of a 486 CPU. The Enhanced Am486DX Am486DX microprocessor family enables write-back configuration through software and cacheable access control. On-chip cache lines are configurable as either write-through or write-back. The CPU clock control feature permits the CPU clock to be stopped under controlled conditions, allowing reduced power consumption during system inactivity. The SMM function is implemented with an industry standard two-pin interface. Since the Enhanced Am486DX Am486DX microprocessor family is supported as an embedded product, customers can rely on continued cost reduction, a long-term supply, and extended temperature products. hanced Am486DX Am486DX microprocessor family. This results in decreased development costs and improved time to market. Table 1 shows available processors in the Enhanced Am486DX Am486DX microprocessor family. See page 54 for information on how these parts differ from other Am486 processors. Table 1. Clocking Options Operating Frequency Input Clock Am486DX5-133 Am486DX5-133 33 MHz 168-pin PGA Am486DX5-133 Am486DX5-133 33 MHz 208-pin SQFP Am486DX4-100 Am486DX4-100 33 MHz 168-pin PGA Am486DX4-100 Am486DX4-100 33 MHz 208-pin SQFP Am486DX2-66 Am486DX2-66 33 MHz 168-pin PGA Am486DX2-66 Am486DX2-66 33 MHz 208-pin SQFP Available Package In addition, customers have access to a large selection of inexpensive development tools, compilers, and chipsets. A large number of PC operating systems and Real Time Operating Systems (RTOS) support the EnThis document contains information on a product under development at Advanced Micro Devices. The information is intended to help you evaluate this product. AMD reserves the right to change or discontinue work on this proposed product without notice. Publication # 20736 Rev: B Amendment/0 Issue Date: March 1997 P R E L I M I N A R Y BLOCK DIAGRAM VOLDET Power Plane 32-Bit Data Bus Clock Interface 32-Bit Data Bus Clock Generator 32-Bit Linear Address Segmentation Unit Descriptor Registers Register File ALU 24 Physical Address Limit and Attribute PLA Cache Unit 2 Paging Unit Translation Lookaside Buffer 24 Physical Address 16-Kbyte Cache 128 Displacement Bus Prefetcher 32 Micro-instruction Code Stream Floating Point Unit Floating Point Register File Central and Protection Test Unit Control ROM Instruction Decode CLK CLKMUL STPCLK Bus Interface PCD, PWT Barrel Shifter VCC, Vss 32 Address Drivers Write Buffers 4x32 Copyback Buffers 4x32 Writeback Buffers 4x32 Data Bus 32 Transceivers 32-Byte Code Queue 2x16 Bytes 24 A31A2 BE3BE0 Bus Control Request Sequencer Decoded Instruction Path D31D0 ADS, W/R, D/C, M/IO, PCD, PWT, RDY, LOCK, PLOCK, BOFF, A20M, BREQ, HOLD, HLDA, RESET, INTR, NMI, FERR, UP, IGNNE, SMI, SMIACT, SRESET Burst Bus Control BRDY, BLAST Bus Size Control BS16, BS8 Cache Control Parity Generation and Control JTAG 2 Enhanced Am486DX Am486DX Microprocessor Family KEN, FLUSH, AHOLD, CACHE, EADS, INV, WB/WT, HITM PCHK, DP3DP0 TDI, TCK, TDO, TMS P R E L I M I N A R Y LOGIC SYMBOL CLK STPCLK CLKMUL A20M UP Clock Stop Clock Clock Multiplier Address Mask Upgrade Present Voltage Detect D31D0 DP3DP0 VOLDET 32 4 PCHK 28 BRDY BLAST CACHE 2 A3A2 SMI SMIACT BS16 ADS RDY Enhanced Am486DX Am486DX CPU PWT PCD M/IO D/C Bus Cycle Definition Interrupts Burst Control BE3BE0 BS8 Bus Cycle Control Data Parity A31A4 4 Address Bus Data Bus WB/WT INV KEN FLUSH AHOLD EADS HITM W/R LOCK PLOCK INTR NMI RESET SRESET HOLD BOFF BREQ HLDA Bus Arbitration IGNNE FERR Numeric Error Reporting TCK SMM Page Cacheability Cache Control/ Invalidation TDI TMS TDO IEEE Test Port Access Enhanced Am486DX Am486DX Microprocessor Family 3 P R E L I M I N A R Y ORDERING INFORMATION Standard Products AMD standard products are available in several packages and operating ranges. The order number (Valid Combination) is formed by a combination of the elements below. AM486 AM486 DX 5 133 W 16 B H C Temperature Range C = Commercial (Tcase = 0°C to +85°C) I = Industrial (Tcase = 40°C to +100°C) Package Type H =208-lead Shrink Quad Flat Pack (PDE-208 PDE-208) G = 168-pin Pin Grid Array (CGM-168 CGM-168) Cache Type B = Write-back (also supports write-through) Cache Size 16 = 16 Kbyte Voltage Range V = 3.3 V ± 0.3 V W = 3.45 V ± 0.15 V Speed Option 133 =133 MHz (5-class performance) 100 =100 MHz 66 = 66 MHz Processor Type DX2 =Clock-doubled with FPU DX4 =Clock-tripled with FPU DX5 =Clock-quadrupled with FPU Processor Family Am486 high-performance CPU Valid Combinations AM486DX2-66V16B AM486DX2-66V16B AM486DX4-100V16B AM486DX4-100V16B HC GC HI GI AM486DX5-133W16B AM486DX5-133W16B HC GC AM486DX5-133V16B AM486DX5-133V16B 4 HC GC HI GI Valid Combinations Valid Combinations list configurations planned to be supported in volume for this device. Consult the local AMD sales office to confirm availability of specific valid combinations and to check on newly released combinations. HC GC Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y TABLE OF CONTENTS Distinctive Characteristics . 1 General Description . 1 Block Diagram. 2 Logic Symbol . 3 Ordering Information . 4 Connection Diagrams and Pin Designations . 8 168-Pin PGA (Pin Grid Array) Package . 8 168-Pin PGA Designations (Functional Grouping) . 9 208-Pin SQFP (Shrink Quad Flat Pack) Package . 10 208-Pin SQFP Designations (Functional Grouping) . 11 Pin Description . 12 Functional Description . 17 Overview . 17 Memory . 17 Modes of Operation . 17 Cache Architecture . 17 Write-Back Cache Protocol . 18 Cache Replacement Description . 19 Memory Configuration . 19 Cache Functionality in Write-Back Mode . 19 Cache Invalidation and Flushing in Write-Back Mode . 31 Burst Write . 32 Clock Control . 34 Clock Generation . 34 Stop Clock . 34 Stop Grant Bus Cycle . 35 Pin State During Stop Grant . 35 Clock Control State Diagram . 36 SRESET Function . 38 System Management Mode . 38 Overview . 38 Terminology . 38 System Management Interrupt Processing . 39 Entering System Management Mode . 43 Exiting System Management Mode . 43 Processor Environment . 43 Executing System Management Mode Handler . 44 SMM System Design Considerations . 47 SMM Software Considerations . 51 Test Registers 4 and 5 Modifications . 51 TR4 Definition. 52 TR5 Definition. 53 Using TR4 and TR5 for Cache Testing . 53 Am486 Microprocessor Functional Differences . 54 Enhanced Am486DX Am486DX CPU Identification . 55 DX Register at RESET . 55 CPUID Instruction . 55 Electrical Data . 56 Power and Grounding . 56 Absolute Maximum Ratings . 57 Operating Ranges . 57 DC Characteristics Over Commercial and Industrial Operating Ranges . 57 Switching Characteristics Over Commercial and Industrial Operating Ranges . 58 AC Characteristics for Boundary Scan Test Signals at 25 MHz . 59 Switching Waveforms . 60 Package Thermal Specifications . 64 Physical Dimensions . 65 Enhanced Am486DX Am486DX Microprocessor Family 5 P R E L I M I N A R Y LIST OF FIGURES Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Figure 22 Figure 23 Figure 24 Figure 25 Figure 26 Figure 27 Figure 28 Figure 29 Figure 30 Figure 31 Figure 32 Figure 33 Figure 34 Figure 35 Figure 36 Figure 37 Figure 38 Figure 39 Figure 40 Figure 41 Figure 42 Figure 43 Figure 44 Figure 45 Figure 46 6 Processor-Induced Line Transitions in Write-Back Mode . 20 Snooping State Transitions . 20 Typical System Block Diagram for HOLD/HLDA Bus Arbitration . 21 External Read . 22 External Write . 22 Snoop of On-Chip Cache That Does Not Hit a Line . 23 Snoop of On-Chip Cache That Hits a Non-Modified Line . 24 Snoop That Hits a Modified Line (Write-Back) . 24 Write-Back and Pending Access . 25 Valid HOLD Assertion During Write-Back . 26 Closely Coupled Cache Block Diagram . 27 Snoop Hit Cycle with Write-Back . 28 Cycle Reordering with BOFF (Write-Back) . 29 Write Cycle Reordering Due to Buffering . 30 Latest Snooping of Copy-Back . 32 Burst Write . 33 Burst Read with BOFF Assertion . 33 Burst Write with BOFF Assertion . 33 Entering Stop Grant State . 36 Stop Clock State Machine . 37 Recognition of Inputs when Exiting Stop Grant State . 37 Basic SMI Interrupt Service . 39 Basic SMI Hardware Interface . 40 SMI Timing for Servicing an I/O Trap . 40 SMIACT Timing . 41 Redirecting System Memory Address to SMRAM . 41 Transition to and from SMM . 43 Auto HALT Restart Register Offset. 45 I/O Instruction Restart Register Offset . 46 SMM Base Slot Offset . 46 SRAM Usage . 47 SMRAM Location . 47 SMM Timing in Systems Using Non-Overlaid Memory Space and Write-Through Mode with Caching Enabled During SMM . 48 SMM Timing in Systems Using Non-Overlaid Memory Space and Write-Back Mode with Caching Enabled During SMM . 48 SMM Timing in Systems Using Non-Overlaid Memory Space and Write-Back Mode with Caching Disabled During SMM . 48 SMM Timing in Systems Using Overlaid Memory Space and Write-Through Mode with Caching Enabled During SMM . 49 SMM Timing in Systems Using Overlaid Memory Space and Write-Through Mode with Caching Disabled During SMM . 49 SMM Timing in Systems Using Overlaid Memory Space and Configured in Write-Back Mode . 49 CLK Waveforms . 60 Output Valid Delay Timing . 60 Maximum Float Delay Timing . 61 PCHK Valid Delay Timing . 61 Input Setup and Hold Timing . 62 RDY and BRDY Input Setup and Hold Timing . 62 TCK Waveforms . 63 Test Signal Timing Diagram . 63 Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y LIST OF TABLES Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12 Table 13 Table 14 Table 15 Table 16 Table 17 Table 18 Table 19 Table 20 Table 21 Table 22 Table 23 Table 24 Table 25 Clocking Options . 1 CLKMUL Settings . 13 EADS Sample Time . 14 Cache Line Organization . 18 Legal Cache Line States . 18 MESI Cache Line Status . 19 Key to Switching Waveforms . 21 WBINVD/INVD Special Bus Cycles . 32 FLUSH Special Bus Cycles . 32 Pin State During Stop Grant Bus State . 35 SMRAM State Save Map . 42 SMM Initial CPU Core Register Settings . 44 Segment Register Initial States . 44 SMM Revision Identifier . 45 SMM Revision Identifier Bit Definitions . 45 HALT Auto Restart Configuration . 46 I/O Trap Word Configuration . 46 Test Register TR4 Bit Descriptions . 52 Test Register TR5 Bit Descriptions . 52 Am486 Family Functional Differences . 54 CPU ID Codes . 55 CPUID Instruction Description . 55 Thermal Resistance (°C/W) JC and JA for the Enhanced Am486DX Am486DX CPU in 168-Pin PGA Package 64 Maximum TA at Various Airflows in °C for Commercial Temperatures (85°C). 64 Maximum TA at Various Airflows in °C for Industrial Temperatures (100°C) . 64 Enhanced Am486DX Am486DX Microprocessor Family 7 P R E L I M I N A R Y 1 1.1 CONNECTION DIAGRAMS AND PIN DESIGNATIONS 168-Pin PGA (Pin Grid Array) Package PIN SIDE VIEW 8 Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y 1.2 168-Pin PGA Designations (Functional Grouping) Address Data Control INC Test Vcc Vss Pin Name Pin No. Pin Name Pin No. Pin Name Pin No. Pin Name Pin No. Pin No. Pin No. Pin No. A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 Q-14 R-15 S-16 Q-12 S-15 Q-13 R-13 Q-11 S-13 R-12 S-7 Q-10 S-5 R-7 Q-9 Q-3 R-5 Q-4 Q-8 Q-5 Q-7 S-3 Q-6 R-2 S-2 S-1 R-1 P-2 P-3 Q-1 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 D21 D22 D23 D24 D25 D26 D27 D28 D29 D30 D31 P-1 N-2 N-1 H-2 M-3 J-2 L-2 L-3 F-2 D-1 E-3 C-1 G-3 D-2 K-3 F-3 J-3 D-3 C-2 B-1 A-1 B-2 A-2 A-4 A-6 B-6 C-7 C-6 C-8 A-8 C-9 B-8 A20M ADS AHOLD BE0 BE1 BE2 BE3 BLAST BOFF BRDY BREQ BS8 BS16 CACHE CLK CLKMUL D/C DP0 DP1 DP2 DP3 EADS FERR FLUSH HITM HLDA HOLD IGNNE INTR INV KEN LOCK M/IO NMI PCD PCHK PLOCK PWT RDY RESET SMI SMIACT SRESET STPCLK UP VOLDET WB/WT W/R D-15 S-17 A-17 K-15 J-16 J-15 F-17 R-16 D-17 H-15 Q-15 D-16 C-17 B-12 C-3 R-17 M-15 N-3 F-1 H-3 A-5 B-17 C-14 C-15 A-12 P-15 E-15 A-15 A-16 A-10 F-15 N-15 N-16 B-15 J-17 Q-17 Q-16 L-15 F-16 C-16 B-10 C-12 C-10 G-15 C-11 S-4 B-13 N-17 TCK TDI TDO TMS A-3 A-14 B-16 B-14 A-13 C-13 J-1 B-7 B-9 B-11 C-4 C-5 E-2 E-16 G-2 G-16 H-16 K-2 K-16 L-16 M-2 M-16 P-16 R-3 R-6 R-8 R-9 R-10 R-11 R-14 A-7 A-9 A-11 B-3 B-4 B-5 E-1 E-17 G-1 G-17 H-1 H-17 K-1 K-17 L-1 L-17 M-1 M-17 P-17 Q-2 R-4 S-6 S-8 S-9 S-10 S-11 S-12 S-14 Notes: 1. VOLDET is connected internally to VSS. 2. INC = Internal No Connect Enhanced Am486DX Am486DX Microprocessor Family 9 P R E L I M I N A R Y 1.3 208-Pin SQFP (Shrink Quad Flat Pack) Package TOP VIEW 10 Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y 1.4 208-Pin SQFP Designations (Functional Grouping) Address Data Control Test INC Vcc Vss Pin Name Pin No. Pin Name Pin No. Pin Name Pin No. Pin Name Pin No. Pin No. Pin No. Pin No. A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 202 197 196 195 193 192 190 187 186 182 180 178 177 174 173 171 166 165 164 161 160 159 158 154 153 152 151 149 148 147 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 D21 D22 D23 D24 D25 D26 D27 D28 D29 D30 D31 144 143 142 141 140 130 129 126 124 123 119 118 117 116 113 112 108 103 101 100 99 93 92 91 87 85 84 83 79 78 75 74 A20M ADS AHOLD BE0 BE1 BE2 BE3 BLAST BOFF BRDY BREQ BS8 BS16 CACHE CLK CLKMUL D/C DP0 DP1 DP2 DP3 EADS FERR FLUSH HITM HLDA HOLD IGNNE INTR INV KEN LOCK M/IO NMI PCD PCHK PLOCK PWT RDY RESET SMI SRESET STPCLK SMIACT UP WB/WT W/R 47 203 17 31 32 33 34 204 6 5 30 8 7 70 24 11 39 145 125 109 90 46 66 49 63 26 16 72 50 71 13 207 37 51 41 4 206 40 12 48 65 58 73 59 194 64 27 TCK TDI TDO TMS 18 168 68 167 3 67 96 127 2 9 14 19 20 22 23 25 29 35 38 42 44 45 54 56 60 62 69 77 80 82 86 89 95 98 102 106 111 114 121 128 131 133 134 136 137 139 150 155 162 163 169 172 176 179 183 185 188 191 198 200 205 1 10 15 21 28 36 43 52 53 55 57 61 76 81 88 94 97 104 105 107 110 115 120 122 132 135 138 146 156 157 170 175 181 184 189 199 201 208 Note: INC = Internal No Connect Enhanced Am486DX Am486DX Microprocessor Family 11 P R E L I M I N A R Y 2 PIN DESCRIPTION The Enhanced Am486DX Am486DX microprocessors provide the complete interface support offered by the Enhanced Am486 family. However, the CLKMUL pin settings have changed to accommodate the higher operating speed selection. For more information on how all Am486 processors differ, see section 8 on page 54. A20M BE3BE0 Byte Enable (Active Low; Outputs) The byte enable pins indicate which bytes are enabled and active during read or write cycles. During the first cache fill cycle, however, an external system should ignore these signals and assume that all bytes are active. Address Bit 20 Mask (Active Low; Input) s BE3 for D31D24 A Low signal on the A20M pin causes the microprocessor to mask address line A20 before performing a lookup to the internal cache, or driving a memory cycle on the bus. Asserting A20M causes the processor to wrap the address at 1 Mbyte, emulating Real mode operation. The signal is asynchronous, but must meet setup and hold times t20 and t21 for recognition during a specific clock. During normal operation, A20M should be sampled High at the falling edge of RESET. s BE2 for D23D16 A31A4/A3A2 Address Lines (Inputs/Outputs)/(Outputs) Pins A31A2 define a physical area in memory or indicate an input/output (I/O) device. Address lines A31A4 drive addresses into the microprocessor to perform cache line invalidations. Input signals must meet setup and hold times t22 and t23. A31A2 are not driven during bus or address hold. ADS Address Status (Active Low; Output) A Low output from this pin indicates that a valid bus cycle definition and address are available on the cycle definition lines and address bus. ADS is driven active by the same clock as the addresses. ADS is active Low and is not driven during bus hold. s BE1 for D15D8 s BE0 for D7D0 BE3BE0 are active Low and are not driven during bus hold. BLAST Burst Last (Active Low; Output) Burst Last goes Low to tell the CPU that the next BRDY signal completes the burst bus cycle. BLAST is active for both burst and non-burst cycles. BLAST is active Low and is not driven during a bus hold. BOFF Back Off (Active Low; Input) This input signal forces the microprocessor to float all pins normally floated during hold, but HLDA is not asserted in response to BOFF. BOFF has higher priority than RDY or BRDY; if both are returned in the same clock, BOFF takes effect. The microprocessor remains in bus hold until BOFF goes High. If a bus cycle is in progress when BOFF is asserted, the cycle restarts. BOFF must meet setup and hold times t18 and t19 for proper operation. BOFF has an internal weak pull-up. BRDY AHOLD Burst Ready Input (Active Low; Input) Address Hold (Active High; Input) The BRDY signal performs the same function during a burst cycle that RDY performs during a non-burst cycle. BRDY indicates that the external system has presented valid data in response to a read, or that the external system has accepted data in response to a write. BRDY is ignored when the bus is idle and at the end of the first clock in a bus cycle. BRDY is sampled in the second and subsequent clocks of a burst cycle. The data presented on the data bus is strobed into the microprocessor when BRDY is sampled active. If RDY is returned simultaneously with BRDY, BRDY is ignored and the cycle is converted to a non-burst cycle. BRDY is active Low and has a small pull-up resistor, and must satisfy the setup and hold times t16 and t17. The external system may assert AHOLD to perform a cache snoop. In response to the assertion of AHOLD, the microprocessor stops driving the address bus A31 A2 in the next clock. The data bus remains active and data can be transferred for previously issued read or write bus cycles during address hold. AHOLD is recognized even during RESET and LOCK. The earliest that AHOLD can be deasserted is two clock cycles after EADS is asserted to start a cache snoop. If HITM is activated due to a cache snoop, the microprocessor completes the current bus activity and then asserts ADS and drives the address bus while AHOLD is active. This starts the write-back of the modified line that was the target of the snoop. BREQ Internal Cycle Pending (Active High; Output) BREQ indicates that the microprocessor has generated a bus request internally, whether or not the micropro- 12 Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y cessor is driving the bus. BREQ is active High and is floated only during three-state Test mode (see FLUSH). BS8/BS16 BS8/BS16 Bus Size 8 (Active Low; Input)/ Bus Size 16 (Active Low; Input) The BS8 and BS16 signals allow the processor to operate with 8-bit and 16-bit I/O devices by running multiple bus cycles to respond to data requests: four for 8-bit devices, and two for 16-bit devices. The bus sizing pins are sampled every clock. The microprocessor samples the pins every clock before RDY to determine the appropriate bus size for the requesting device. The signals are active Low input with internal pull-up resistors, and must satisfy setup and hold times t14 and t15 for correct operation. Bus sizing is not permitted during copy-back or write-back operation. BS8 and BS16 are ignored during copy-back or write-back cycles. Table 2. CLKMUL Settings Processor Am486DX2-66 Am486DX2-66 Am486DX4-100 Am486DX4-100 Am486DX5-133 Am486DX5-133 CLKMUL=1 Undefined 3x Undefined CLKMUL=0 2x Undefined 4x 2x indicates that the CPU runs at twice the system bus speed. 3x indicates that the CPU runs at three times the system bus speed. 4x indicates that the CPU runs at four times the system bus speed. D31D0 Data Lines (Inputs/Outputs) Lines D31D0 define the data bus. The signals must meet setup and hold times t22 and t23 for proper read operations. These pins are driven during the second and subsequent clocks of write cycles. D/C Data/Control (Output) CACHE Internal Cacheability (Active Low; Output) In Write-through mode, this signal always floats. In Write-back mode for processor-initiated cycles, a Low output on this pin indicates that the current read cycle is cacheable, or that the current cycle is a burst writeback or copy-back cycle. If the CACHE signal is driven High during a read, the processor will not cache the data even if the KEN pin signal is asserted. If the processor determines that the data is cacheable, CACHE goes active when ADS is asserted and remains in that state until the next RDY or BRDY is asserted. CACHE floats in response to a BOFF or HOLD request. CLK Clock (Input) The CLK input provides the basic microprocessor timing signal. The CLKMUL input selects the multiplier value used to generate the internal operating frequency for the Enhanced Am486DX Am486DX microprocessors. All external timing parameters are specified with respect to the rising edge of CLK. The clock signal passes through an internal Phase-Lock Loop (PLL). This bus cycle definition pin distinguishes memory and I/O data cycles from control cycles. The control cycles are: s Interrupt Acknowledge s Halt/Special Cycle s Code Read (instruction fetching) DP3DP0 Data Parity (Inputs/Outputs) Data parity is generated on all write data cycles with the same timing as the data driven by the microprocessor. Even parity information must be driven back into the microprocessor on the data parity pins with the same timing as read information to ensure that the processor uses the correct parity check. The signals read on these pins do not affect program execution. Input signals must meet setup and hold times t22 and t23. DP3DP0 should be connected to VCC through a pull-up resistor in systems not using parity. DP3DP0 are active High and are driven during the second and subsequent clocks of write cycles. EADS CLKMUL External Address Strobe (Active Low; Input) Clock Multiplier (Input) This signal indicates that a valid external address has been driven on the address pins A31A4 of the microprocessor to be used for a cache snoop. This signal is recognized while the processor is in hold (HLDA is driven active), while forced off the bus with the BOFF input, or while AHOLD is asserted. The microprocessor ignores EADS at all other times. EADS is not recognized if HITM is active, nor during the clock after ADS, nor during the clock after a valid assertion of EADS. Snoops to the on-chip cache must be completed before another snoop cycle is initiated. Table 3 describes EADS when first sampled. EADS can be asserted every other clock cycle as long as the hold remains active and HITM re- The microprocessor samples the CLKMUL input signal at RESET to determine the design operating frequency. Table 2 shows the effects CLKMUL has on system configurations for various Enhanced Am486DX Am486DX microprocessors. Enhanced Am486DX Am486DX Microprocessor Family 13 P R E L I M I N A R Y mains inactive. INV is sampled in the same clock period that EADS is asserted. EADS has an internal weak pullup. Table 3. EADS Sample Time Trigger AHOLD EADS First Sampled Second clock after AHOLD asserted HOLD First clock after HLDA asserted BOFF Second clock after BOFF asserted Note: The triggering signal (AHOLD, HOLD, or BOFF) must remain active for at least 1 clock after EADS to ensure proper operation. FERR Floating-Point Error (Active Low; Output) Driven active when a floating-point error occurs, FERR is similar to the ERROR pin on a 387 math coprocessor. FERR is included for compatibility with systems using DOS-type floating-point error reporting. FERR is active Low, and is not floated during bus hold, except during three-state Test mode (see FLUSH). FLUSH Cache Flush (Active Low; Input) In Write-back mode, FLUSH forces the microprocessor to write-back all modified cache lines and invalidate its internal cache. The microprocessor generates two flush acknowledge special bus cycles to indicate completion of the write-back and invalidation. In Write-through mode, FLUSH invalidates the cache without issuing a special bus cycle. FLUSH is an active Low input that needs to be asserted only for one clock. FLUSH is asynchronous, but setup and hold times t20 and t21 must be met for recognition in any specific clock. Sampling FLUSH Low in the clock before the falling edge of RESET causes the microprocessor to enter three-state Test mode. HITM Hit Modified Line (Active Low; Output) In Write-back mode (WB/WT=1 at RESET), HITM indicates that an external snoop cache tag comparison hit a modified line. When a snoop hits a modified line in the internal cache, the microprocessor asserts HITM two clocks after EADS is asserted. The HITM signal stays asserted (Low) until the last BRDY for the corresponding write-back cycle. At all other times, HITM is deasserted (High). During RESET, the HITM signal can be used to detect whether the CPU is operating in Write-back mode. In Write-back mode (WB/WT=1 at RESET), HITM is deasserted (driven High) until the first snoop that hits a modified line. In Write-through mode, HITM floats at all times. 14 HLDA Hold Acknowledge (Active High; Output) The HLDA signal is activated in response to a hold request presented on the HOLD pin. HLDA indicates that the microprocessor has given the bus to another local bus master. HLDA is driven active in the same clock in which the microprocessor floats its bus. HLDA is driven inactive when leaving bus hold. HLDA is active High and remains driven during bus hold. HLDA is floated only during three-state Test mode (see FLUSH). HOLD Bus Hold Request (Active High; Input) HOLD gives control of the microprocessor bus to another bus master. In response to HOLD going active, the microprocessor floats most of its output and input/output pins. HLDA is asserted after completing the current bus cycle, burst cycle, or sequence of locked cycles. The microprocessor remains in this state until HOLD is deasserted. HOLD is active High and does not have an internal pull-down resistor. HOLD must satisfy setup and hold times t18 and t19 for proper operation. IGNNE Ignore Numeric Error (Active Low; Input) When this pin is asserted, the Enhanced Am486DX Am486DX microprocessors will ignore a numeric error and continue executing non-control floating-point instructions. When IGNNE is deasserted, the Enhanced Am486DX Am486DX microprocessors will freeze on a non-control floating-point instruction if a previous floating-point instruction caused an error. IGNNE has no effect when the NE bit in Control Register 0 is set. IGNNE is active Low and is provided with a small internal pullup resistor. IGNNE is asynchronous but must meet setup and hold times t20 and t21 to ensure recognition in any specific clock. INTR Maskable Interrupt (Active High; Input) When asserted, this signal indicates that an external interrupt has been generated. If the internal interrupt flag is set in EFLAGS, active interrupt processing is initiated. The microprocessor generates two locked interrupt acknowledge bus cycles in response to the INTR pin going active. INTR must remain active until the interrupt acknowledges have been performed to ensure that the interrupt is recognized. INTR is active High and is not provided with an internal pull-down resistor. INTR is asynchronous, but must meet setup and hold times t20 and t21 for recognition in any specific clock. INV Invalidate (Active High; Input) The external system asserts INV to invalidate the cacheline state when an external bus master proposes a write. It is sampled together with A31A4 during the clock in Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y which EADS is active. INV has an internal weak pull-up. INV is ignored in Write-through mode. PCHK KEN Parity status is driven on the PCHK pin the clock after RDY for read operations. The parity status reflects data sampled at the end of the previous clock. A Low PCHK indicates a parity error. Parity status is checked only for enabled bytes as is indicated by the byte enable and bus size signals. PCHK is valid only in the clock immediately after read data is returned to the microprocessor; at all other times PCHK is inactive High. PCHK is floated only during three-state Test mode (see FLUSH). Cache Enable (Active Low; Input) KEN determines whether the current cycle is cacheable. When the microprocessor generates a cacheable cycle and KEN is active one clock before RDY or BRDY during the first transfer of the cycle, the cycle becomes a cache line fill cycle. Returning KEN active one clock before RDY during the last read in the cache line fill causes the line to be placed in the on-chip cache. KEN is active Low and is provided with a small internal pull-up resistor. KEN must satisfy setup and hold times t14 and t15 for proper operation. Parity Status (Active Low; Output) PLOCK Pseudo-Lock (Active Low; Output) A Low output on this pin indicates that the current bus cycle is locked. The microprocessor ignores HOLD when LOCK is asserted (although it does acknowledge AHOLD and BOFF). LOCK goes active in the first clock of the first locked bus cycle and goes inactive after the last clock of the last locked bus cycle. The last locked cycle ends when RDY is returned. LOCK is active Low and is not driven during bus hold. Locked read cycles are not transformed into cache fill cycles if KEN is active. In Write-back mode, the processor forces the output High and the signal is always read as inactive. In Writethrough mode, PLOCK operates normally. When asserted, PLOCK indicates that the current bus transaction requires more than one bus cycle. Examples of such operations are segment table descriptor reads (8 bytes) and cache line fills (16 bytes). The microprocessor drives PLOCK active until the addresses for the last bus cycle of the transaction have been driven, whether or not RDY or BRDY is returned. PLOCK is a function of the BS8, BS16, and KEN inputs. PLOCK should be sampled on the clock when RDY is returned. PLOCK is active Low and is not driven during bus hold. M/IO PWT Memory/Input-Output (Active High/Active Low; Output) Page Write-Through (Active High; Output) LOCK Bus Lock (Active Low; Output) A High output indicates a memory cycle. A Low output indicates an I/O cycle. NMI Non-Maskable Interrupt (Active High; Input) A High NMI input signal indicates that an external nonmaskable interrupt has occurred. NMI is rising-edge sensitive. NMI must be held Low for at least four CLK periods before this rising edge. The NMI input does not have an internal pull-down resistor. The NMI input is asynchronous, but must meet setup and hold times t20 and t21 for recognition in any specific clock. PCD Page Cache Disable (Active High; Output) This pin reflects the state of the PCD bit in the page table entry or page directory entry (programmable through the PCD bit in CR3). If paging is disabled, the CPU ignores the PCD bit and drives the PCD output Low. PCD has the same timing as the cycle definition pins (M/IO, D/C, and W/R). PCD is active High and is not driven during bus hold. PCD is masked by the Cache Disable bit (CD) in Control Register 0 (CR0). This pin reflects the state of the PWT bit in the page table entry or page directory entry (programmable through the PWT bit in CR3). If paging is disabled, the CPU ignores the PWT bit and drives the PWT output Low. PWT has the same timing as the cycle definition pins (M/IO, D/C, and W/R). PWT is active High and is not driven during bus hold. RESET Reset (Active High; Input) RESET forces the microprocessor to initialize. The microprocessor cannot begin execution of instructions until at least 1 ms after VCC and CLK have reached their proper DC and AC specifications. To ensure proper microprocessor operation, the RESET pin should remain active during this time. RESET is active High. RESET is asynchronous but must meet setup and hold times t20 and t21 to ensure recognition on any specific clock. RDY Non-Burst Ready (Active Low; Input) A Low input on this pin indicates that the current bus cycle is complete, that is, either the external system has presented valid data on the data pins in response to a read, or the external system has accepted data from the microprocessor in response to a write. RDY is ignored when the bus is idle and at the end of the bus cycle's Enhanced Am486DX Am486DX Microprocessor Family 15 P R E L I M I N A R Y first clock. RDY is active during address hold. Data can be returned to the processor while AHOLD is active. RDY is active Low and does not have an internal pullup resistor. RDY must satisfy setup and hold times t16 and t17 for proper chip operation. STPCLK is active Low and has an internal pull-up resistor. STPCLK is asynchronous, but it must meet setup and hold times t20 and t21 to ensure recognition in any specific clock. STPCLK must remain active until the Stop Clock special bus cycle is issued and the system returns either RDY or BRDY. SMI SMM Interrupt (Active Low; Input) TCK A Low signal on the SMI pin signals the processor to enter System Management mode (SMM). SMI is the highest level processor interrupt. The SMI signal is recognized on an instruction boundary, similar to the NMI and INTR signals. SMI is sampled on every rising clock edge. SMI is a falling-edge sensitive input. The SMI input has an internal pull-up resister. Recognition of SMI is guaranteed in a specific clock if it is asserted synchronously and meets the setup and hold times. If SMI is asserted asynchronously, it must go High for a minimum of two clocks before going Low, and it must remain Low for at least two clocks to guarantee recognition. When the CPU recognizes SMI, it enters SMM before executing the next instruction and saves internal registers in SMM space. Test Clock (Input) SMIACT Test Clock provides the clocking function for the JTAG boundary scan feature. TCK clocks state information and data into the component on the rising edge of TCK on TMS and TDI, respectively. Data is clocked out of the component on the falling edge of TCK on TDO. TCK uses an internal weak pull-up. TDI Test Data Input (Input) TDI is the serial input that shifts JTAG instructions and data into the tested component. TDI is sampled on the rising edge of TCK during the SHIFT-IR and the SHIFT-DR TAP (Test Access Port) controller states. During all other TAP controller states, TDI is ignored. TDI uses an internal weak pull-up. SMM Interrupt Active (Active Low; Output) TDO SMIACT goes Low in response to SMI. It indicates that the processor is operating under SMM control. SMIACT remains Low until the processor receives a RESET signal or executes the Resume Instruction (RSM) to leave SMM. This signal is always driven. It does not float during bus HOLD or BOFF. Test Data Output (Active High; Output) Note: Do not use SRESET to exit from SMM. The system should block SRESET during SMM. TMS SRESET TMS is decoded by the JTAG TAP to select the operation of the test logic. TMS is sampled on the rising edge of TCK. To guarantee deterministic behavior of the TAP controller, the TMS pin has an internal pull-up resistor. Soft Reset (Active High; Input) The CPU samples SRESET on every rising clock edge. If SRESET is sampled active, the SRESET sequence begins on the next instruction boundary. SRESET resets the processor, but, unlike RESET, does not cause it to sample UP or WB/WT, or affect the FPU, cache, CD and NW bits in CR0, and SMBASE. SRESET is asynchronous and must meet the same timing as RESET. The SRESET input has an internal pull-down resistor. STPCLK TDO is the serial output that shifts JTAG instructions and data out of the component. TDO is driven on the falling edge of TCK during the SHIFT-IR and SHIFT-DR TAP controller states. Otherwise, TDO is three-stated. Test Mode Select (Active High; Input) UP Write/Read (Input) The processor samples the Upgrade Present (UP) pin in the clock before the falling edge of RESET. If it is Low, the processor three-states its outputs immediately. UP must remain asserted to keep the processor inactive. The pin uses an internal pull-up resistor. Stop Clock (Active Low; Input) VOLDET-(168-Pin PGA Package Only) A Low input signal indicates a request has been made to turn off the CLK input. When the CPU recognizes a STPCLK, the processor: Voltage Detect (Output) s Stops execution on the next instruction boundary (unless superseded by a higher priority interrupt) s Empties all internal pipelines and write buffers s VOLDET provides an external signal to allow the system to determine the CPU input power level (3 V or 5 V). For the Enhanced Am486DX Am486DX microprocessors, the pin ties internally to VSS. Generates a Stop Grant acknowledge bus cycle 16 Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y WB/WT Write-Back/Write-Through (Input) 4 Gbytes. Thus, each task has a maximum of 64 Tbytes of virtual memory. If the processor samples WB/WT High at RESET, the processor is configured in Write-back mode and all subsequent cache line fills sample WB/WT on the same clock edge in which it finds either RDY or the first BRDY of a burst transfer to determine if the cache line is designated as Write-back mode or Write-through. If the signal is Low on the first BRDY or RDY, the cache line is write-through. If the signal is High, the cache line is writeback. If WB/WT is sampled Low at RESET, all cache line fills are write-through. WB/WT has an internal weak pull-down. 3.3 W/R 3.3.1 Write/Read (Output) In Real mode, the Enhanced Am486DX Am486DX microprocessors operate as a fast 8086. Real mode is required primarily to set up the processor for Protected mode operation. A High output indicates a write cycle. A Low output indicates a read cycle. Note: The Enhanced Am486DX Am486DX microprocessors do not use the VCC5 pin used by some 3-V, 486-based processors. The corresponding pin on the Enhanced Am486DX Am486DX microprocessors is an Internal No Connect (INC). 3 3.1 FUNCTIONAL DESCRIPTION Overview The Enhanced Am486DX Am486DX microprocessors use a 32-bit architecture with on-chip memory management and cache memory units. The instruction set includes the complete 486 microprocessor instruction set along with extensions to serve the new extended applications. All applications written for the 486 microprocessor and previous members of the x86 architectural family can run on the Enhanced Am486DX Am486DX microprocessors without modification. The on-chip Memory Management Unit (MMU) is completely compatible with the 486 MMU. The MMU includes a segmentation unit and a paging unit. Segmentation allows management of the logical address space by providing easy data and code relocatibility and efficient sharing of global resources. The paging mechanism operates beneath segmentation and is transparent to the segmentation process. Paging is optional and can be disabled by system software. Each segment can be divided into one or more 4-Kbyte segments. To implement a virtual memory system, the Enhanced Am486DX Am486DX microprocessors support full restartability for all page and segment faults. 3.2 The segmentation unit provides four levels of protection for isolating and protecting applications and the operating system from each other. The hardware-enforced protection allows high-integrity system designs. Modes of Operation The Enhanced Am486DX Am486DX microprocessors have four modes of operation: Real Address mode (Real mode), Virtual 8086 Address mode (Virtual mode), Protected Address mode (Protected mode), and System Management mode (SMM). 3.3.2 Real Mode Virtual Mode In Virtual mode, the processor appears to be in Real mode, but can use the extended memory accessing of Protected mode. 3.3.3 Protected Mode Protected mode provides access to the sophisticated memory management paging and privilege capabilities of the processor. 3.3.4 System Management Mode SMM is a special operating mode described in detail in Section 6, beginning on page 38. 3.4 Cache Architecture The Enhanced Am486DX Am486DX microprocessors support a superset architecture of the standard 486DX 486DX cache implementation. This architectural enhancement improves not only CPU performance, but total system performance. 3.4.1 Write-Through Cache The standard 486DX 486DX write-through cache architecture is characterized by the following: s External read accesses are placed in the cache if they meet proper caching requirements. s Subsequent reads to the data in the cache are made if the address is stored in the cache tag array. s Write operations to a valid address in the cache are Memory Memory is organized into one or more variable length segments, each up to 4 Gbytes (232 bytes). A segment can have attributes associated with it, including its location, size, type (i.e., stack, code, or data), and protection characteristics. Each task on a microprocessor can have a maximum of 16,381 segments, each up to updated in the cache and to external memory. This data writing technique is called write-through. The write-through cache implementation forces all writes to flow through to the external bus and back to main memory. Consequently, the write-through cache generates a large amount of bus traffic on the external data bus. Enhanced Am486DX Am486DX Microprocessor Family 17 P R E L I M I N A R Y 3.4.2 s The system memory is always updated during a Write-Back Cache The microprocessor write-back cache architecture is characterized by the following: s External read accesses are placed in the cache if they meet proper caching requirements. s Subsequent reads to the data in the cache are made if the address is stored in the cache tag array. s Write operations to a valid address in the cache that is in the write-through (shared) state is updated in the cache and to external memory. s Write operations to a valid address in the cache that is in the write-back (exclusive or modified) state is updated only in the cache. External memory is not updated at the time of the cache update. snoop when a modified line is hit. s If a modified line is hit by another master during snooping, the master is forced off the bus and the snooped cache writes back the modified line to the system memory. After the snooped cache completes the write, the forced-off bus master restarts the access and reads the modified data from memory. 3.5.1 To implement the Enhanced Am486DX Am486DX microprocessor cache-coherency protocol, each tag entry is expanded to 2 bits: S1 and S0. Each tag entry is associated with a cache line. Table 4 shows the cache line organization. Table 4. Cache Line Organization s Modified data is written back to external memory when the modified cache line is being replaced with a new cache line (copy-back operation) or an external bus master has snooped a modified cache line (write-back). The write-back cache feature significantly reduces the amount of bus traffic on the external bus; however, it also adds complexity to the system design to maintain memory coherency. The write-back cache requires enhanced system support because the cache may contain data that is not identical to data in main memory at the same address location. 3.5 Write-Back Cache Protocol Cache Line Overview Data Words (32 Bits) Address Tag and Status D0 Address Tag, S1, S0 D1 D2 D3 3.5.2 Line Status and Line State A cache line can occupy one of four legal states as indicated by bits S0 and S1. The line states are shown in Table 5. Each line in the cache is in one of these states. The state transition is induced either by the processor or during snooping from an external bus master. The Enhanced Am486DX Am486DX microprocessor write-back cache coherency protocol reduces bus activity while maintaining data coherency in a multimaster environment. The cache coherency protocol offers the following advantages: Table 5. Legal Cache Line States S1 S0 Line State 0 0 Invalid s No unnecessary bus traffic. The protocol dynamical- 0 1 Exclusive ly identifies shared data to the granularity of a cache line. This dynamic identification ensures that the traffic on the external bus is the minimum necessary to ensure coherency. 1 0 Modified 1 1 Shared s Software-transparent. Because the protocol gives the appearance of a single, unified memory, software does not have to maintain coherency or identify shared data. Application software developed for a system without a cache can run without modification. Software support is required only in the operating system to identify non-cacheable data regions. The Enhanced Am486DX Am486DX microprocessors implement a modified MESI protocol on systems with write-back cache support. MESI allows a cache line to exist in four states: modified, exclusive, shared, and invalid. The Enhanced Am486DX Am486DX microprocessors allocate memory in the cache due to a read miss. Write allocation is not implemented. To maintain coherency between cache and main memory, the MESI protocol has the following characteristics: 18 3.5.2.1 Invalid An invalid cache line does not contain valid data for any external memory location. An invalid line does not participate in the cache coherency protocol. 3.5.2.2 Exclusive An exclusive line contains valid data for some external memory location. The data exactly matches the data in the external memory location. 3.5.2.3 Shared A shared line contains valid data for an external memory location, the data is shared by another cache, and the shared data matches the data in the external memory exactly; or the cache line is in Write-through mode. Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y 3.5.2.4 Modified A modified line contains valid data for an external memory location. However, the data does not match the data in the external location because the processor has modified the data since it was loaded from the external memory. A cache that contains a modified line is responsible for ensuring that the data is properly maintained. This means that in the case of an external access to that line from another external bus master, the modified line is first written back to the external memory before the other external bus master can complete its access. Table 6 shows the MESI cache line states and the corresponding availability of data. Shared Invalid Line valid? Yes Yes Yes No External memory is. out-ofdate valid valid status unknown A write to this cache line. goes to does not goes does not go the bus go to the directly to to the bus and bus the bus updates 3.6 Cache Replacement Description The cache line replacement algorithm uses the standard Am486 CPU pseudo LRU (Least-Recently Used) strategy. When a line must be placed in the internal cache, the microprocessor first checks to see if there is an invalid line available in the set. If no invalid line is available, the LRU algorithm replaces the least-recently used cache line in the four-way set with the new cache line. If the cache line for replacement is modified, the modified cache line is placed into the copy-back buffer for copying back to external memory, and the new cache line is placed into the cache. This copy-back ensures that the external memory is updated with the modified data upon replacement. 3.7 Memory Configuration In computer systems, memory regions require specific caching and memory write methods. For example, some memory regions are non-cacheable while others are cacheable but are write-through. To allow maximum memory configuration, the microprocessor supports specific memory region requirements. All bus masters, such as DMA controllers, must reflect all data transfers on the microprocessor local bus so that the microprocessor can respond appropriately. 3.7.1 Note: The CD bit in CR0 enables (0) or disables (1) the internal cache. The NW bit in CR0 enables (0) or disables (1) write-through and snooping cycles. RESET sets CD and NW to 1. Unlike RESET, however, SRESET does not invalidate the cache nor does it modify the values of CD and NW in CR0. 3.7.2 Table 6. MESI Cache Line Status Situation Modified Exclusive during the first BRDY, KEN meets the standard setup and hold requirements and the four 32-bit doublewords are still placed in the cache. However, all cacheable accesses in this mode are considered write-through. When the WB/WT is High during the first BRDY, the entire four 32-bit doubleword transfer is considered write-back. Cacheability Write-Through/Write-Back If the CPU is operating in Write-back mode (i.e., the WB/WT pin was sampled High at RESET), the WB/WT pin indicates whether an individual write access is executed as write-through or write-back. The Enhanced Am486DX Am486DX microprocessors do this on an access-byaccess basis. Once the cache line is in the cache, the STATUS bit is tested each time the processor writes to the cache line or a tag compare results in a hit during Bus-watching mode. If the WB/WT signal is Low during the first BRDY of the cache line read access, the cache line is considered a write-through access. Therefore, all writes to this location in the cache are reflected on the external bus, even if the cache line is write protected. 3.8 Cache Functionality in Write-Back Mode The description of cache functionality in Write-back mode is divided into two sections: processor-initiated cache functions and snooping actions. 3.8.1 Processor-Initiated Cache Functions and State Transitions The Enhanced Am486DX Am486DX microprocessors contain two new buffers for use with the MESI protocol support: the copy-back buffer and the write-back buffer. The processor uses the copy-back buffer for cache line replacement of modified lines. The write-back buffer is used when an external bus master hits a modified line in the cache during a snoop operation and the cache line is designated for write-back to main memory. Each buffer is four doublewords in size. Figure 1 shows a diagram of the state transitions induced by the local processor. When a read miss occurs, the line selected for replacement remains in the modified state until overwritten. A copy of the modified line is sent to the copy-back buffer to be written back after replacement. When reload has successfully completed, the line is set either to the exclusive or the shared state, depending on the state of PWT and WB/WT signals. The Enhanced Am486DX Am486DX microprocessors cache data based on the state of the CD and NW bits in CR0, in conjunction with the KEN signal, at the time of a burst read access from memory. If the WB/WT signal is Low Enhanced Am486DX Am486DX Microprocessor Family 19 P R E L I M I N A R Y Invalid Read_Miss (WB/WT = 1) · (PWT = 0) Read_Miss [(WB/WT = 0) + (PWT = 1) tiated externally to the microprocessor, and the signal for beginning the cycle is EADS instead of ADS. The address bus of the microprocessor is bidirectional to allow the address of the snoop to be driven by the system. A snoop access can begin during any hold state: s While HOLD and HLDA are asserted Read_Hit Shared Exclusive s While BOFF is asserted s While AHOLD is asserted Write_Hit + Read_Hit Note: Write_Hit generates external bus cycle. Write_Hit Modified Read_Hit + Write_Hit Figure 1. Processor-Induced Line Transitions in Write-Back Mode If the PWT signal is 0, the external WB/WT signal determines the new state of the line. If the WB/WT signal was asserted to 1 during reload, the line transits to the exclusive state. If the WB/WT signal was 0, the line transits to the shared state. If the PWT signal is 1, it overrides the WB/WT signal, forcing the line into the shared state. Therefore, if paging is enabled, the software programmed PWT bit can override the hardware signal WB/WT. Until the line is reallocated, a write is the only processor action that can change the state of the line. If the write occurs to a line in the exclusive state, the data is simply written into the cache and the line state is changed to modified. The modified state indicates that the contents of the line require copy-back to the main memory before the line is reallocated. If the write occurs to a line in the shared state, the cache performs a write of the data on the external bus to update the external memory. The line remains in the shared state until it is replaced with a new cache line or until it is flushed. In the modified state, the processor continues to write the line without any further external actions or state transitions. If the PWT or PCD bits are changed for a specified memory location, the tag bits in the cache are assumed to be correct. To avoid memory inconsistencies with respect to cacheability and write status, a cache copyback and invalidation should be invoked either by using the WBINVD instruction or asserting the FLUSH signal. 3.8.2 Snooping Actions and State Transitions To maintain cache coherency, the CPU must allow snooping by the current bus master. The bus master initiates a snoop cycle to check whether an address is cached in the internal cache of the microprocessor. A snoop cycle differs from any other cycle in that it is ini20 In the clock in which EADS is asserted, the microprocessor samples the INV input to qualify the type of inquiry. INV specifies whether the line (if found) must be invalidated (i.e., the MESI status changes to Invalid or I). A line is invalidated if the snoop access was generated due to a write of another bus master. This is indicated by INV set to 1. In the case of a read, the line does not have to be invalidated, which is indicated by INV set to 0. The core system logic can generate EADS by watching the ADS from the current bus master, and INV by watching the W/R signal. The microprocessor compares the address of the snoop request with addresses of lines in the cache and of any line in the copy-back buffer waiting to be transferred on the bus. It does not, however, compare with the address of write-miss data in the write buffers. Two clock cycles after sampling EADS, the microprocessor drives the results of the snoop on the HITM pin. If HITM is active, the line was found in the modified state; if inactive, the line was in the exclusive or shared state, or was not found. Figure 2 shows a diagram of the state transitions induced by snooping accesses. Invalid (EADS = 0 * INV = 1) + FLUSH = 0 (EADS = 0 * INV = 1) + FLUSH = 0 EADS = 0 * INV = 0 * FLUSH = 1 Exclusive Shared EADS = 0 * INV = 0 * FLUSH = 1 (HITM asserted + write-back) EADS = 0 * INV = 1 + FLUSH = 0 (HITM asserted + write-back) EADS = 0 * INV = 0 * FLUSH = 1 Modified Figure 2. Snooping State Transitions Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y 3.8.2.1 Difference Between Snooping Access Cases Snooping accesses are external accesses to the microprocessor. As described earlier, the snooping logic has a set of signals independent from the processor-related signals. Those signals are: s EADS s INV In the following scenarios, read accesses are assumed to be cache line fills. The cases also assume that the core system logic does not return BRDY or RDY until HITM is sampled. The addition of wait states follows the standard 486 bus protocol. For demonstration purposes, only the zero wait state approach is shown. Table 7 explains the key to switching waveforms. 3.8.2.2.2 External Read s HITM In addition to these signals, the address bus is required as an input. This is achieved by setting AHOLD, HOLD, or BOFF active. Snooping can occur in parallel with a processor-initiated access that has already been started. The two accesses depend on each other only when a modified line is written back. In this case, the snoop requires the use of the cycle control signals and the data bus. The following sections describe the scenarios for the HOLD, AHOLD, and BOFF implementations. 3.8.2.2 3.8.2.2.1 Processor-Induced Bus Cycles HOLD Bus Arbitration Implementation Scenario: The data resides in external memory (see Figure 4). Step 1 The processor starts the external read access by asserting ADS = 0 and W/R = 0. Step 2 WB/WT is sampled in the same cycle as BRDY. If WB/WT = 1, the data resides in a write-back cacheable memory location. Step 3 The processor completes its burst read and asserts BLAST. Table 7. Key to Switching Waveforms Note: To maintain proper system timing, the HOLD signal must remain active for one clock cycle after HITM transitions active. Deassertion of HOLD in the same clock cycle as HITM assertion may lead to unpredictable processor behavior. CPU Local Bus Peripheral Address Bus Waveform Inputs Outputs Must be steady Will be steady May change from H to L Will change from H to L May change from L to H Will change from L to H Don't care; any change permitted Changing; state unknown Does not apply The HOLD/HLDA bus arbitration scheme is used primarily in systems where all memory transfers are seen by the microprocessor. The HOLD/HLDA bus arbitration scheme permits simple write-back cache design while maintaining a relatively high performing system. Figure 3 shows a typical system block diagram for HOLD/HLDA bus arbitration. Center line is High-impedance "Off" state 3.8.2.2.3 External Write Scenario: The data is written to the external memory (see Figure 5). Step 1 The processor starts the external write access by asserting ADS = 0 and W/R = 1. Data Bus L2 Cache I/O Bus Interface Address Bus DRAM Data Bus Slow Peripheral Step 2 The processor completes its write to the core system logic. 3.8.2.2.4 HOLD/HLDA External Access TIming In systems with two or more bus masters, each bus master is equipped with individual HOLD and HLDA control signals. These signals are then centralized to the core system logic that controls individual bus masters, depending on bus request signals and the HITM signal. Figure 3. Typical System Block Diagram for HOLD/HLDA Bus Arbitration Enhanced Am486DX Am486DX Microprocessor Family 21 P R E L I M I N A R Y CLK ADR n n+4 n+8 n+12 M/IO W/R 1 ADS BLAST 3 2 BRDY n Data n+4 n+8 KEN WB/WT BOFF Note: The circled numbers in this figure represent the steps in section 4.8.2.2.2. Figure 4. External Read CLK n ADR M/IO W/R ADS 1 BLAST 2 BRDY Data n WB/WT BOFF Note: The circled numbers in this figure represent the steps in section 4.8.2.2.3. Figure 5. External Write 22 Enhanced Am486DX Am486DX Microprocessor Family n+12 P R E L I M I N A R Y 3.8.3 In the fastest case, this means that HOLD was asserted one clock cycle before the HLDA response. External Bus Master Snooping Actions The following scenarios describe the snooping actions of an external bus master. 3.8.3.1 Snoop Miss Scenario: A snoop of the on-chip cache does not hit a line, as shown in Figure 6. Step 1 The microprocessor is placed in Snooping mode with HOLD. HLDA must be High for a minimum of one clock cycle before EADS assertion. In the fastest case, this means that HOLD was asserted one clock cycle before the HLDA response. Step 2 EADS and INV are applied to the microprocessor. If INV is 0, a read access caused the snooping cycle. If INV is 1, a write access caused the snooping cycle. Step 3 Two clock cycles after EADS is asserted, HITM becomes valid. Because the addressed line is not in the snooping cache, HITM is 1. 3.8.3.2 Snoop Hit to a Non-Modified Line Scenario: The snoop of the on-chip cache hits a line, and the line is not modified (see Figure 7). Step 1 The microprocessor is placed in Snooping mode with HOLD. HLDA must be High for a minimum of one clock cycle before EADS assertion. Step 2 EADS and INV are applied to the microprocessor. If INV is 0, a read access caused the snooping cycle. If INV is 1, a write access caused the snooping cycle. Step 3 Two clock cycles after EADS is asserted, HITM becomes valid. In this case, HITM is 1. 3.8.4 Write-Back Case Scenario: Write-back accesses are always burst writes with a length of four 32-bit words. For burst writes, the burst always starts with the microprocessor line offset at 0. HOLD must be deasserted before the write-back can be performed (see Figure 8). Step 1 HOLD places the microprocessor in Snooping mode. HLDA must be High for a minimum of one clock cycle before EADS assertion. In the fastest case, this means that HOLD asserts one clock cycle before the HLDA response. Step 2 EADS and INV are asserted. If INV is 0, snooping is caused by a read access. If INV is 1, snooping is caused by a write access. EADS is not sampled again until after the modified line is written back to memory. It is detected again as early as in Step 11. Step 3 Two clock cycles after EADS is asserted, HITM becomes valid, and is 0 because the line is modified. CLK valid ADR valid INV EADS HITM HOLD HLDA Note: The circled numbers in this figure represent the steps in section 4.8.3.1. Figure 6. Snoop of On-Chip Cache That Does Not Hit a Line Enhanced Am486DX Am486DX Microprocessor Family 23 P R E L I M I N A R Y CLK ADR valid INV valid EADS HITM HOLD HLDA Note: The circled numbers in this figure represent the steps in section 4.8.3.2. Figure 7. Snoop of On-Chip Cache That Hits a Non-Modified Line CLK ADR M/IO CACHE W/R n n n+4 n+8 n+1 n floating/three-stated floating/three-stated 5 ADS BLAST 11 6 BRDY INV valid valid 2 EADS 3 HITM 7 HOLD 1 9 HLDA Data n n+4 n+8 n+12 External bus master's BOFF signal Note: The circled numbers in this figure represent the steps in section 4.8.4. Figure 8. Snoop That Hits a Modified Line (Write-Back) 24 8 4 Enhanced Am486DX Am486DX Microprocessor Family 10 P R E L I M I N A R Y Step 9 One cycle after sampling HOLD High, the microprocessor transitions HLDA transitions to 1, acknowledging the HOLD request. Step 4 In the next clock, the core system logic deasserts the HOLD signal in response to the HITM = 0 signal. The core system logic backs off the current bus master at the same time so that the microprocessor can access the bus. HOLD can be reasserted immediately after ADS is asserted for burst cycles. Step 10 The core system logic removes hold-off control to the external bus master. This allows the external bus master to immediately retry the aborted access. ADS is strobed Low, which generates EADS Low in the same clock cycle. Step 5 The snooping cache starts its write-back of the modified line by asserting ADS = 0, CACHE = 0, and W/R = 1. The write access is a burst write. The number of clock cycles between deasserting HOLD to the snooping cache and first asserting ADS for the write-back cycles can vary. In this example, it is one clock cycle, which is the shortest possible time. Regardless of the number of clock cycles, the start of the writeback is seen by ADS going Low. Step 11 The bus master restarts the aborted access. EADS and INV are applied to the microprocessor as before. This starts another snoop cycle. The status of the addressed line is now either shared (INV = 0) or is changed to invalid (INV = 1). 3.8.5 Scenario: The following occurs when, in addition to the write-back operation, other bus accesses initiated by the processor associated with the snooped cache are pending. The microprocessor gives the write-back access priority. This implies that if HOLD is deasserted, the microprocessor first writes back the modified line (see Figure 9). Step 6 The write-back access is finished when BLAST and BRDY both are 0. Step 7 In the clock cycle after the final write-back access, the processor drives HITM back to 1. Step 8 HOLD is sampled by the microprocessor. CLK ADR M/IO CACHE W/R n n+4 n Write-Back and Pending Access n+8 n+12 n floating/three-stated 5 ADS BLAST 11 6 BRDY INV valid valid 2 EADS 3 HITM 7 HOLD 1 9 HLDA Data External bus master's BOFF signal 8 4 n n+4 n+8 n+12 10 Note: The circled numbers in this figure represent the steps in section 4.8.5. Figure 9. Write-Back and Pending Access Enhanced Am486DX Am486DX Microprocessor Family 25 P R E L I M I N A R Y Step 1 HOLD places the microprocessor in Snooping mode. HLDA must be High for a minimum of one clock cycle before EADS assertion. In the fastest case, this means that HOLD asserts one clock cycle before the HLDA response. Step 8 HOLD is sampled by the microprocessor. Step 2 EADS and INV are asserted. If INV is 0, snooping is caused by a read access. If INV is 1, snooping is caused by a write access. EADS is not sampled again until after the modified line is written back to memory. It is detected again as early as in Step 11. Step 10 The core system logic removes hold-off control to the external bus master. This allows the external bus master to immediately retry the aborted access. ADS is strobed Low, which generates EADS Low in the same clock cycle. Step 3 Two clock cycles after EADS is asserted, HITM becomes valid, and is 0 because the line is modified. Step 4 In the next clock the core system logic deasserts the HOLD signal in response to the HITM = 0. The core system logic backs off the current bus master at the same time so that the microprocessor can access the bus. HOLD can be reasserted immediately after ADS is asserted for burst cycles. Step 5 The snooping cache starts its write-back of the modified line by asserting ADS = 0, CACHE = 0, and W/R = 1. The write access is a burst write. The number of clock cycles between deasserting HOLD to the snooping cache and first asserting ADS for the write-back cycles can vary. In this example, it is one clock cycle, which is the shortest possible time. Regardless of the number of clock cycles, the start of the writeback is seen by ADS going Low. Step 6 The write-back access is finished when BLAST and BRDY both are 0. Step 9 A minimum of 1 clock cycle after the completion of the pending access, HLDA transitions to 1, acknowledging the HOLD request. Step 11 The bus master restarts the aborted access. EADS and INV are applied to the microprocessor as before. This starts another snoop cycle. The status of the addressed line is now either shared (INV = 0) or is changed to invalid (INV = 1). 3.8.5.1 HOLD/HLDA Write-Back Design Considerations When designing a write-back cache system that uses HOLD/HLDA as the bus arbitration method, the following considerations must be observed to ensure proper operation (see Figure 10). Step 1 During a snoop to the on-chip cache that hits a modified cache line, the HOLD signal cannot be deasserted to the microprocessor until the next clock cycle after HITM transitions active. Step 2 After the write-back has commenced, the HOLD signal should be asserted no earlier than the next clock cycle after ADS goes active, and no later than in the final BRDY of the last write. Asserting HOLD later than the final BRDY may allow the microprocessor to permit a pending access to begin. Step 7 In the clock cycle after the final write-back access, the processor drives HITM back to 1. CLK ADS BLAST BRDY HOLD Valid Hold Assertion HITM HLDA Figure 10. Valid HOLD Assertion During Write-Back 26 Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y Step 3 If RDY is returned instead of BRDY during a write-back, the HOLD signal can be reasserted at any time starting one clock after ADS goes active in the first transfer up to the final transfer when RDY is asserted. Asserting RDY instead of BRDY will not break the write-back cycle if HOLD is asserted. The processor ignores HOLD until the final write cycle of the write-back. 3.8.5.2 AHOLD Bus Arbitration Implementation The use of AHOLD as the control mechanism is often found in systems where an external second-level cache is closely coupled to the microprocessor. This tight coupling allows the microprocessor to operate with the least amount of stalling from external snooping of the on-chip cache. Additionally, snooping of the cache can be performed concurrently with an access by the microprocessor. This feature further improves the performance of the total system (see Figure 11). Note: To maintain proper system timing, the AHOLD signal must remain active for one clock cycle after HITM transitions active. Deassertion of AHOLD in the same clock cycle as HITM assertion may lead to unpredictable processor behavior. Step 2 In the same cycle, AHOLD is asserted to indicate the start of snooping. The address bus floats and becomes an input in the next clock cycle. Step 3 During the next clock cycles, the BRDY or RDY signal is not strobed Low. Therefore, the processor-initiated access is not finished. Step 4 Two clock cycles after AHOLD is asserted, the EADS signal is activated to start an actual snooping cycle, and INV is valid. If INV is 0, a read access caused the snooping cycle. If INV is 1, a write access caused the snooping cycle. Additional EADS are ignored due to the hit of a modified line. It is detected after HITM goes inactive. Step 5 Two clock cycles after EADS is asserted, the snooping signal HITM becomes valid. The line is modified; therefore, HITM is 0. Step 6 In this cycle, the processor-initiated access is finished. CPU Address Bus Data Bus L2 Cache Address Bus Data Bus DRAM Step 1 The processor initiates an external, simple, non-cacheable read access, strobing ADS = 0 and W/R = 0. The address is driven from the CPU. I/O Bus Interface Address Bus Data Bus Slow Peripheral Step 7 Two clock cycles after the end of the processorinitiated access, the cache immediately starts writing back the modified line. This is indicated by ADS = 0 and W/R = 1. Note that AHOLD is still active and the address bus is still an input. However, the write-back access can be executed without any address. This is because the corresponding address must have been on the bus when EADS was strobed. Therefore, in the case of the core system logic, the address for the write-back must be latched with EADS to be available later. This is required only if AHOLD is not removed if HITM becomes 0. Otherwise, the address of the write-back is put onto the address bus by the microprocessor. Figure 11. Closely Coupled Cache Block Diagram The following sections describe the snooping scenarios for the AHOLD implementation. 3.8.5.3 Normal Write-Back Scenario: This scenario assumes that a processor-initiated access has already started and that the external logic can finish that access even without the address being applied after the first clock cycle. Therefore, a snooping access with AHOLD can be done in parallel. In this case, the processor-initiated access is finished first, then the write-back is executed (see Figure 12). The sequence is as follows: Step 8 As an example, AHOLD is now removed. In the next clock cycle, the current address of the write-back access is driven onto the address bus. Step 9 The write-back access is finished when BLAST and BRDY both transition to 0. Step 10 In the clock cycle after the final write-back access, the snooping cache drives HITM back to 1. The status of the snooped and written-back line is now either shared (INV = 0) or is changed to invalid (INV = 1). Enhanced Am486DX Am486DX Microprocessor Family 27 P R E L I M I N A R Y CLK ADR from CPU from CPU to CPU M/IO CACHE W/R ADS 7 1 9 BLAST 3 6 BRDY 8 AHOLD 2 INV EADS 4 10 5 HITM Data Read Wn W n+4 W n+8 W n+C Note: The circled numbers in this figure represent the steps in section 4.8.5.3. Figure 12. Snoop Hit Cycle with Write-Back 3.8.6 Reordering of Write-Backs (AHOLD) with BOFF As seen previously, the Bus Interface Unit (BIU) completes the processor-initiated access first if the snooping access occurs after the start of the processor-initiated access. If the HITM signal occurs one clock cycle before the ADS = 0 of the processor-initiated access, the writeback receives priority and is executed first. However, if the snooping access is executed after the start of the processor-initiated access, there is a methodology to reorder the access order. The BOFF signal delays outstanding processor-initiated cycles so that a snoop write-back can occur immediately (see Figure 13). Scenario: If there are outstanding processor-initiated cycles on the bus, asserting BOFF clears the bus pipeline. If a snoop causes HITM to be asserted, the first cycle issued by the microprocessor after deassertion of BOFF is the write-back cycle. After the write-back cycle, it reissues the aborted cycles. This translates into the following sequence: Step 1 The processor starts a cacheable burst read cycle. 28 Step 2 One clock cycle later, AHOLD is asserted. This switches the address bus into an input one clock cycle after AHOLD is asserted. Step 3 Two clock cycles after AHOLD is asserted, the EADS and INV signals are asserted to start the snooping cycle. Step 4 Two clock cycles after EADS is asserted, HITM becomes valid. The line is modified, therefore HITM = 0. Step 5 Note that the processor-initiated access is not completed because BLAST = 1. Step 6 With HITM going Low, the core system logic asserts BOFF in the next clock cycle to the snooping processor to reorder the access. BOFF overrides BRDY. Therefore, the partial read is not used. It is reread later. Step 7 One clock cycle later BOFF is deasserted. The write-back access starts one clock cycle later because the BOFF has cleared the bus pipeline. Step 8 AHOLD is deasserted. In the next clock cycle the address for the write-back is driven on the address bus. Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y CLK ADR R1 from CPU W1 to CPU don't care W1 from CPU W2 W3 W4 R2 from CPU M/IO CACHE W/R ADS 11 BLAST BRDY BOFF AHOLD INV EADS HITM 12 Data R1 R2 W1 W2 W3 W4 Note: The circled numbers in this figure represent the steps in section 4.8.6. Figure 13. Cycle Reordering with BOFF (Write-Back) Step 9 One cycle after BOFF is deasserted, the cache immediately starts writing back the modified line. This is indicated by ADS = 0 and W/R = 1. Step 10 The write-back access is finished when BLAST and BRDY go active 0. Step 11 The BIU restarts the aborted cache line fill with the previous read. This is indicated by ADS = 0 and W/R = 0. Step 12 In the same clock cycle, the snooping cache drives HITM back to 1. Step 13 The previous read is now reread. 3.8.7 Special Scenarios for AHOLD Snooping In addition to the previously described scenarios, there are special scenarios regarding the time of the EADS and AHOLD assertion. The final result depends on the time EADS and AHOLD are asserted relative to other processor-initiated operations. 3.8.7.1 Write Cycle Reordering Due to Buffering Scenario: The MESI cache protocol and the ability to perform and respond to snoop cycles guarantee that writes to the cache are logically equivalent to writes to memory. In particular, the order of read and write operations on cached data is the same as if the operations were on data in memory. Even non-cached memory read and write requests usually occur on the external bus in the same order that they were issued in the program. For example, when a write miss is followed by a read miss, the write data goes on the bus before the read request is put on the bus. However, the posting of writes in write buffers coupled with snooping cycles may cause the order of writes seen on the external bus to differ from the order they appear in the program. Consider the following example, which is illustrated in Figure 14. For simplicity, snooping signals that behave in their usual manner are not shown. Step 1 AHOLD is asserted. No further processor-initiated accesses to the external bus can be started. No other access is in progress. Step 2 The processor writes data A to the cache, resulting in a write miss. Therefore, the data is put into the write buffers, assuming they are not full. No external access can be started because AHOLD is still 1. Step 3 The next write of the processor hits the cache and the line is non-shared. Therefore, data B is written into the cache. The cache line transits to the modified state. Enhanced Am486DX Am486DX Microprocessor Family 29 P R E L I M I N A R Y CLK 2 Write Buffer Cached Data AHOLD XXX 3 A B original B modified 7 1 4 EADS Ignored 5 HITM 10 9 6 ADS 11 BLAST 8 BRDY B Data B+4 B+8 B+12 A Note: The circled numbers in this figure represent the steps in section 4.8.7.1. Figure 14. Write Cycle Reordering Due to Buffering Step 4 In the same clock cycle, a snoop request to the same address where data B resides is started because EADS = 0. The snoop hits a modified line. EADS is ignored due to the hit of a modified line, but is detected again as early as in step 10. Step 5 Two clock cycles after EADS asserts, HITM becomes valid. Step 6 Because the processor-initiated access cannot be finished (AHOLD is still 1), the BIU gives priority to a write-back access that does not require the use of the address bus. Therefore, in the clock cycle, the cache starts the write-back sequence indicated by ADS = 0 and W/R = 0. Step 7 During the write-back sequence, AHOLD is deasserted. Step 8 The write-back access is finished when BLAST and BRDY transition to 0. Step 9 After the last write-back access, the BIU starts writing data A from the write buffers. This is indicated by ADS = 0 and W/R = 0. Step 10 In the same clock cycle, the snooping cache drives HITM back to 1. Step 11 The write of data A is finished if BRDY transitions to 0 (BLAST = 0), because it is a single word. The software write sequence was first data A and then data B. But on the external bus the data appear first as 30 data B and then data A. The order of writes is changed. In most cases, it is unnecessary to strictly maintain the ordering of writes. However, some cases (for example, writing to hardware control registers) require writes to be observed externally in the same order as programmed. There are two options to ensure serialization of writes, both of which drive the cache to Write-through mode: 1. Set the PWT bit in the page table entries. 2. Drive the WB/WT signal Low when accessing these memory locations. Option 1 is an operating-system-level solution not directly implemented by user-level code. Option 2, the hardware solution, is implemented at the system level. 3.8.7.2 BOFF Write-Back Arbitration Implementation The use of BOFF to perform snooping of the on-chip cache is used in systems where more than one cacheable bus master resides on the microprocessor bus. The BOFF signal forces the microprocessor to relinquish the bus in the following clock cycle, regardless of the type of bus cycle it was performing at the time. Consequently, the use of BOFF as a bus arbitrator should be implemented with care to avoid system problems. 3.8.8 BOFF Design Considerations The use of BOFF as a bus arbitration control mechanism is immediate. BOFF forces the microprocessor to abort an access in the following clock cycle after it is asserted. The following design issues must be considered. Enhanced Am486DX Am486DX Microprocessor Family P R E L I M I N A R Y 3.8.8.1 Cache Line Fills The microprocessor aborts a cache line fill during a burst read if BOFF is asserted during the access. Upon regaining the bus, the read access commences where it left off when BOFF was recognized. External buffers should take this cycle continuation into consideration if BOFF is allowed to abort burst read cycles. 3.8.8.2 Cache Line Copy-Backs Similar to the burst read, the burst write also can be aborted at any time with the BOFF signal. Upon regaining access to the bus, the write continues from where it was aborted. External buffers and control logic should take into consideration the necessary control, if any, for burst write continuations. 3.8.8.3 Locked Accesses Locked bus cycles occur in various forms. Locked accesses occur during read-modify-write operations, interrupt acknowledges, and page table updates. Although asserting BOFF during a locked cycle is permitted, extreme care should be taken to ensure data coherency for semaphore updates and proper data ordering. 3.8.9 BOFF During Write-Back If BOFF is asserted during a write-back, the processor performing the write-back goes off the bus in the next clock cycle. If BOFF is released, the processor restarts that write-back access from the point at which it was aborted. The behavior is identical to the normal BOFF case that includes the abort and restart behavior. 3.8.10 Snooping Characteristics During a Cache Line Fill The microprocessor takes responsibility for responding to snoop cycles for a cache line only during the time that the line is actually in the cache or in a copy-back buffer. There are times during the cache line fill cycle and during