| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
Technology Manual 2000 Advanced Micro Devices, Inc. rights reserv
Top Searches for this datasheet3DNow! Technology Manual 2000 Advanced Micro Devices, Inc. rights reserved. contents this document provided connection with Advanced Micro Devices, Inc. ("AMD") products. makes representations warranties with respect accuracy completeness contents this publication reserves right make changes specifications product descriptions time without notice. license, whether express, implied, arising estoppel otherwise, intellectual property rights granted this publication. Except forth AMD's Standard Terms Conditions Sale, assumes liability whatsoever, disclaims express implied warranty, relating products including, limited implied warranty merchantability, fitness particular purpose, infringement intellectual property right. AMD's products designed, intended, authorized warranted components systems intended surgical implant into body, other applications intended support sustain life, other application which failure AMD's product could create situation where personal injury, death, severe property environmental damage occur. reserves right discontinue make changes products time without notice. Trademarks AMD, logo, 3DNow!, Athlon, combinations thereof, trademarks, AMD-K6 registered trademark Advanced Micro Devices, Inc. trademark Intel Corporation. Other product names used this publication identification purposes only trademarks their respective companies. 21928G/0-March 2000 3DNow!Technology Manual Contents Revision History 3DNow!Technology Introduction Functionality Feature Detection Register Data Types 3DNow!Instruction Formats Definitions Execution Resources AMD-K6® Processors Task Switching Exceptions. Prefixes 3DNow!Instruction FEMMS. PAVGUSB PF2ID PFACC PFADD PFCMPEQ PFCMPGE PFCMPGT PFMAX. PFMIN PFMUL PFRCP PFRCPIT1 PFRCPIT2 Contents 3DNow!Technology Manual 21928G/0-March 2000 PFRSQIT1 PFRSQRT. PFSUB PFSUBR PI2FD PMULHRW PREFETCH/PREFETCHW Division Square Root Division Divide Examples. Square Root Square Root Examples. Contents 21928G/0-March 2000 3DNow!Technology Manual List Figures Figure 3DNow!TM/MMXRegisters Figure 3DNow! Data Type Figure Single-Precision, Floating-Point Data Format. Figure Integer Data Types. Figure Register Unit Register Unit Resources List Figures 3DNow!Technology Manual 21928G/0-March 2000 List Figures 21928G/0-March 2000 3DNow!Technology Manual List Tables Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table 3DNow!Technology Exponent Ranges. 3DNow! Floating-Point Instructions. 3DNow! Performance-Enhancement Instructions 3DNow! MMXInstruction Exceptions Numerical Range PF2ID Instruction. Numerical Range PFACC Instruction Numerical Range PFADD Instruction. Numerical Range PFCMPEQ Instruction Numerical Range PFCMPGE Instruction Numerical Range PFCMPGT Instruction Numerical Range PFMAX Instruction Numerical Range PFMIN Instruction Numerical Range PFMUL Instruction Numerical Range PFRCP Instruction Numerical Range PFRCPIT1 Instruction Numerical Range PFRCPIT2 Instruction Numerical Range PFRSQIT1 Instruction Numerical Range PFRSQRT Instruction Numerical Range PFSUB Instruction Numerical Range PFSUBR Instruction Summary PREFETCH Instruction Type Options List Tables 3DNow!Technology Manual 21928G/0-March 2000 viii List Tables 21928G/0-March 2000 3DNow!Technology Manual Revision History Date 1998 1998 1998 1998 Sept 1998 Sept 1998 Sept 1998 1998 1998 1998 1999 1999 2000 Initial Release Clarified CPUID usage "Feature Detection" page Revised description 3DNow! instructions "Definitions" page Revised function descriptions Table "3DNow!Floating-Point Instructions," page Revised code example PFRSQRT instruction page Changed exceptions generated PREFETCH/PREFETCHW instructions none, deleted exception table, revised PREFETCHW description page Added PUNPCKLDQ instruction division example (24-bit precision) page Added sample code that tests presence extended function 8000_0001h page Clarified instruction descriptions PFRCPIT1 page PFRCPIT2 page PFRSQIT1 page Added PUNPCKLDQ instruction clarified comments square root examples page Changed variable Newton-Raphson recurrence definitions, swapped order PFMUL PUNPCKLDQ instructions square root example (24-bit precision) Chapter page Added references Athlonprocessor throughout manual. Updated clarified PFACC instruction operation description page Description Revision History 3DNow!Technology Manual 21928G/0-March 2000 Revision History 21928G/0-March 2000 3DNow!Technology Manual 3DNow!Technology Introduction 3DNow!Technology significant innovation architecture that drives today's personal computers. 3DNow! technology group instructions that opens traditional processing bottlenecks floating-point-intensive multimedia applications. With 3DNow! technology, hardware software applications implement more powerful solutions create more entertaining productive platform. Examples type improvements that 3DNow! technology enables fast frame rates high-resolution scenes, much better physical modeling real-world environments, sharper more detailed imaging, smoother video playback, near theater-quality audio. taken leadership role developing these instructions that enable exciting levels performance realism. 3DNow! technology defined implemented collaboration with independent software developers, including operating system designers, application developers, graphics vendors. compatible with today's existing software requires operating system support, thereby enabling 3DNow! applications work with existing operating systems. 3DNow! technology implemented AMD-K6®-2, AMD-K6-III, Athlonprocessors. Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 technology instructions that streaming digital signal processing (DSP) technologies. more information, Extensions 3DNow!and MMXInstruction Sets Manual, order# 22466. Functionality 3DNow! technology instructions intended open major processing bottleneck graphics application floating-point operations. Today's applications facing limitations fact that only floating-point execution unit exists most advanced processors. front typical graphics software pipeline performs object physics, geometry transformations, clipping, floating-point intensive often limit features functionality application. source performance 3DNow! instructions originates from single instruction multiple data (SIMD) implementation. With SIMD, each inst ruction only operates single-precision, floating-point operands, microarchitecture within processor execute 3DNow! instructions clock through register execution pipelines, which allows total four floating-point operations clock. addition, because 3DNow! instructions same floating-point registers MMXtechnology instructions, task switching between 3DNow! operations eliminated. 3DNow! technology instruction contains instructions that support SIMD floating-point operations includes SIMD MMX-to-floating-point switching. improve MPEG decoding, 3DNow! instructions include specific SIMD integer instruction created facilitate pixel-motion compensation. Because media-based software typically operates large data sets, processor often needs wait this data transferred from main memory. extra time involved with retrieving this data avoided using 3DNow! instruction called PREFETCH. This instruction ensure that data level cache when needed. improve time takes switch between code, 3DNow! 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual instructions include FEMMS (fast entry/exit multimedia state) instruction, which eliminates much overhead involved with switch. addition 3DNow! technology expands capabilities family processors enables generation enriched user applications. Feature Detection properly identify 3DNow! instructions, application program must determine processor supports them. CPUID instruction gives programmers ability determine presence 3DNow! technology processor. Software applications must first test CPUID instruction supported. detailed description CPUID instruction, Processor Recognition Application Note, order# 20734. presence CPUID instruction indicated (21) EFLAGS register. this writable, CPUID instruction supported. following code sample shows test presence CPUID instruction. pushfd ebx, eax, 00200000h push popfd pushfd eax, NO_CPUID save EFLAGS store EFLAGS save later testing toggle stack save changed EFLAGS push EFLAGS store EFLAGS changed change, CPUID Once software identified processor's support CPUID, must test extended functions executing extended function 8000_0000h (EAX=8000_0000h). register returns largest extended function input value defined CPUID instruction processor. value greater than 8000_0000h, extended functions supported. following code sample shows test presence extended function 8000_0001h. eax, 80000000h CPUID eax, 80000000h NO_EXTENDEDMSR query extended functions extended function limit 8000_0001h supported? not, 3DNow! tech. supported Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 next step programmer determine 3DNow! instructions supported. Extended function 8000_0001h CPUID instruction provides this information returning extended feature bits register. register 3DNow! instructions supported. following code sample shows test 3DNow! instruction support. eax, 80000001h CPUID test edx, 80000000h YES_3DNow! setup ext. function 8000_0001h call function test 3DNow! technology supported processor supports above features. Concatenating code examples above will produce basis detection software routine. more comprehensive code example available website Register complete multimedia units processor combine existing instructions with 3DNow! instructions. addition, merging 3DNow! with MMX, becomes possible write programs containing both integer, MMX, floating-point graphics instructions with performance penalty switching between multimedia (integer) 3DNow! (floating-point) units. processor implements eight 64-bit 3DNow!/MMX registers. These registers mapped onto floating-point registers. shown Figure 3DNow! instructions refer these registers mm7. Mapping 3DNow!/MMX registers onto floating-point register stack enables backwards compatibility register saving that must occur result task switching. 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual BITS Figure 3DNow!TM/MMXRegisters Aliasing 3DNow!/MMX registers onto floating-point register stack provides safe method introduce 3DNow! technology, because does require modifications existing operating systems. Instead requiring operating system modifications, 3DNow! technology applications supported through device drivers, 3DNow! libraries, Dynamic Link Library (DLL) files. Current operating systems have support floating-point operations floating-point register state. Using floating-point registers 3DNow! code convenient implementing non-intrusive support 3DNow! instructions. Every time processor executes 3DNow! instruction, floating-point register bits zero (00b=valid), except FEMMS EMMS instructions, which bits (11b=empty). Note: Executing PREFETCH instruction does change bits. Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 Data Types 3DNow! technology uses packed data format. data packed single, 64-bit 3DNow!/MMX register quadword memory operand. Figure shows 3DNow! floating-point data type. each hold IEEE 32-bit single-precision, floating-point doubleword. bits packed, single-precision, floating-point doublewords Figure 3DNow!Data Type Figure page shows format IEEE 32-bit, single-precision, floating-point format. 32-bit, single-precision, floating-point doubleword Biased Exponent Value definitions 1.X=(-1)S*0 2.X=(-1)S*2(Biased 3.X=Undefined Exponent 127) Significand *Significand Biased Exponent=0 0<Biased Exponent<FFh Biased Exponent=FFh value 32-bit, single-precision, floating-point doubleword. Figure Single-Precision, Floating-Point Data Format 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual Figure shows formats integer data types. bits Packed bytes bits Packed words bits Packed doublewords bits Quadword Figure Integer Data Types Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 3DNow!Instruction Formats format 3DNow! instruction encodings based conventional modR/M instruction format similar format used instructions. assembly language syntax used 3DNow! instructions follows: 3DNow! Mnemonic mmreg1, mmreg2/mem64 destination source1 operand (mmreg1) must (mmreg2/mem64) either register 64-bit memory value. encoding uses opcode prefix followed second opcode byte 0Fh. differentiate various 3DNow! instructions, third instruction suffix byte used. This suffix byte occupies same position 3DNow! instructions would imm8 byte. opcode format follows: modR/M [sib] [displacement] 3DNow!_suffix determine values used modR/M [sib] [displacement], follow conventional encodings. 3DNow! suffix determined actual 3DNow! instruction. 3DNow! suffixes defined Table page example, 3DNow! PFMUL instruction produce following opcodes, depending use: Opcode Instruction PFMUL PFMUL PFMUL PFMUL PFMUL mm1, mm1, mm1, mm1, mm1, [ebx] [ebx+10] es:[ebx] [ebx+eax*4+10] instructions (FEMMS PREFETCH) uses single opcode prefix 0Fh. details opcodes these instructions shown pages respectively. 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual Definitions 3DNow! technology provides additional instructions support high-performance, graphics audio processing. 3DNow! instructions vector instructions that operate 64-bit registers. 3DNow! instructions SIMD each instruction operates pairs 32-bit values. definitions 3DNow! instructions starting page contain designations classifying each instruction vectored scalar. Vector instructions operate parallel sets 32-bit, single-precision, floating-point words. Instructions that labeled scalar instructions operate single 32-bit operands (from halves 64-bit operands). 3DNow! single-precision, floating-point format compatible with IEEE-754, single-precision format. This format comprises 1-bit sign, 8-bit biased exponent, 23-bit significand with hidden integer total bits significand. bias exponent 127, consistent with IEEE single-precision standard. significands normalized within range [1,2). contrast IEEE standard that dictates four rounding modes, 3DNow! technology supports rounding mode either round-to-nearest round-to-zero (truncation). hardware implementation 3DNow! technology determines round-to-nearest mode. Regardless rounding mode used, floating-point-to-integer integer-to-floating-point conversion instructions, PF2ID PI2FD, always round-to-zero (truncation) mode. largest, representable, normal number magnitude this precision hexadecimal exponent significand 7FFFFFh, with numerical value 2127 2-23). results that overflow above maximum-representable maximum-representable normal number positive infinity. minimum-representable negative value saturated either Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 this minimum-representable normal number negative infinity. implementation 3DNow! technology determines arithmetic overflow handled either properly signed maximum- minimum-representable normal numbers properly signed infinities. processor generates properly signed maximum- minimum-representable normal numbers. Infinities NaNs supported operands 3DNow! instructions. smallest representable normal number magnitude this precision hexadecimal exponent significand 000000h, with numerical value Accordingly, results below this minimum representable value magnitude held zero. Table shows exponent ranges supported 3DNow! technology. Table 3DNow!Technology Exponent Ranges Description Unsupported Zero Normal (1-127) lowest possible exponent (254-127) largest possible exponent Biased Exponent 00h<x<FFh Note: Unsupported numbers used operands. results operations with unsupported numbers undefined. Like instructions, 3DNow! instructions generate numeric exceptions they status flags. user's responsibility ensure that in-range data provided 3DNow! instructions that computations remain within valid ranges held expected). 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual Execution Resources AMD-K6® Processors instructions executed either register unit register unit. operation issued each register unit each clock cycle, maximum issue execution rate 3DNow! operations cycle. 3DNow! operations have execution latency clock cycles fully pipelined. Even though 3DNow! execution resources duplicated both register units (for example, there pairs 3DNow! multipliers, just shared pair multipliers), there restrictions. When, example, 3DNow! multiply operation starts execution register unit, that unit grabs uses shared pair 3DNow! multipliers. Only when actual contention occurs between 3DNow! operations starting execution same time operations held cycle first execution pipe stage while other proceeds. delay never more than cycle. code optimization purposes, 3DNow! operations grouped into categories. These categories based execution resources important when creating properly scheduled code. long 3DNow! operations that start execution simultaneously fall into same category, both operations will start execution without delay. first category instructions contains operations following 3DNow! instructions: PFADD, PFSUB, PFSUBR, PFACC, PFCMPx, PFMIN, PFMAX, PI2FD, PF2ID, PFRCP, PFRSQRT. second category contains operations following 3DNow! instructions: PFMUL, PFRCPIT1, PFRSQIT1, PFRCPIT2. Note: 3DNow! multiply operations, among other combinations, execute simultaneously. Normally, high-performance 3DNow! code, 3DNow! instructions properly scheduled apart from each other avoid delays execution resource contentions well taking into account dependencies execution latencies). Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 further information regarding code optimization, AMD-K6® Processor Code Optimization Application Note, order# 21924. This document provides in-depth discussions code optimization techniques processor. execution resources information Athlon processor, refer Athlon Processor Code Optimization Guide, order# 22007. instructio ssors summarized Table page dedicated shared execution resources register unit register unit shown Figure page execution resources some operations, well 3DNow! operations, shared between register units. contention-checking purposes, each represents category operations that cannot start execution simultaneously. addition, 3DNow! multiplies same hardware, while 3DNow! adds subtracts not. 3DNow! performance-enhancement instructions processors summarized Table page FEMMS instruction does specific execution resource pipeline. PREFETCH instruction operated Load unit. 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual Register Execution Pipeline Register Execution Pipeline Integer Integer Shift Integer Multiply Divide Integer Byte Operations Integer Special Registers Integer Segment Register Loads Add/Subtract, Compare Logical, Pack, Unpack 3DNow!Add/Subtract, Compare, Integer Conversion, Reciprocal Reciprocal Square Root Table Lookup MMXand 3DNow! Multiply, Reciprocal Reciprocal Square Root Iteration Integer Add/Subtract, Compare Logical, Pack, Unpack Shifter Dedicated Register Resources Shared Register Resources Dedicated Register Resources Figure Register Unit Register Unit Resources Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 Table 3DNow!Floating-Point Instructions Function Packed 8-bit Unsigned Integer Averaging Packed Floating-Point Addition Packed Floating-Point Subtraction Packed Floating-Point Reverse Subtraction Packed Floating-Point Accumulate Packed Floating-Point Comparison, Greater Equal Packed Floating-Point Comparison, Greater Packed Floating-Point Comparison, Equal Packed Floating-Point Minimum Packed Floating-Point Maximum Packed 32-bit Integer Floating-Point Conversion Packed Floating-Point 32-bit Integer Packed Floating-Point Reciprocal Approximation Packed Floating-Point Reciprocal Square Root Approximation Packed Floating-Point Multiplication Packed Floating-Point Reciprocal First Iteration Step Packed Floating-Point Reciprocal Square Root First Iteration Step Packed Floating-Point Reciprocal/Reciprocal Square Root Second Iteration Step Packed 16-bit Integer Multiply with rounding Opcode Suffix Operation PAVGUSB PFADD PFSUB PFSUBR PFACC PFCMPGE PFCMPGT PFCMPEQ PFMIN PFMAX PI2FD PF2ID PFRCP PFRSQRT PFMUL PFRCPIT1 PFRSQIT1 PFRCPIT2 PMULHRW Table 3DNow!Performance-Enhancement Instructions Operation Function Faster entry/exit MMXor floating-point state Prefetch least 32-byte line into data cache (Dcache) Opcode Second Byte FEMMS PREFETCH/PREFETCHW Note: AMD-K6-2 AMD-K6-III processors execute PREFETCHW instruction identically PREFETCH instruction. Athlon processor, PREFETCHW increase performance providing hint processor intent modify cache line. 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual Task Switching With respect task switching, treat 3DNow! instructions exactly same instructions. Operating system design must taken into account when writing 3DNow! program. programmer must know whether operating system automatically saves current states when task switching, 3DNow! program provide code save states. task switch occurs, Control Register (CR0) Task Switch (TS) processor then generates interrupt (int Device Available) when encounters next floating-point, 3DNow!, instruction, allowing operating system save state 3DNow!/MMX/FP registers. multitasking operating system, there task switch when 3DNow!/MMX applications running with older applications that include instructions, MMX/FP register state still saved automatically through handler. Exceptions Table contains list exceptions that 3DNow! instructions generate. Table 3DNow!and MMXInstruction Exceptions Real Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Chapter 3DNow!Technology 3DNow!Technology Manual 21928G/0-March 2000 rules exceptions same both 3DNow! instructions. addition, exception detection handling identical 3DNow! instructions. None exception handlers need modification. Notes: invalid opcode exception (interrupt occurs 3DNow! instruction executed processor that does support 3DNow! instructions. floating-point exception pending processor encounters 3DNow! instruction, FERR# asserted and, CR0.NE interrupt generated. (This same instructions.) Prefixes following prefixes used with 3DNow! instructions: segment override prefixes (2Eh/CS, 36h/SS, 3Eh/DS, 26h/ES, 64h/FS, 65h/GS) affect 3DNow! instructions that contain memory operand. address-size override prefix (67h) affects 3DNow! instructions that contain memory operand. operand-size override prefix (66h) ignored. LOCK prefix (F0h) triggers invalid opcode exception (interrupt prefixes (F3h/ REP/ REPE/ REPZ, F2h/ REPNE/ REPNZ) ignored. 3DNow!Technology Chapter 21928G/0-March 2000 3DNow!Technology Manual 3DNow!Instruction alphabetical order according instruction mnemonics. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 FEMMS mnemonic FEMMS Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Floating-point exception pending (16) Real Virtual 8086 opcode none none description Faster Enter/Exit floating-point state Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) exception pending floating-point execution unit. Like EMMS instruction, FEMMS instruction used clear state following execution block instructions. Because registers words shared with floating-point unit, necessary clear state before executing floating-point instructions. Unlike EMMS instruction, contents MMX/floating-point registers undefined after FEMMS instruction executed. Therefore, FEMMS instruction offers faster context switch routine where values registers longer required. FEMMS also used prior executing instructions where preceding floating-point register values longer required, which facilitates faster context switching. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PAVGUSB mnemonic PAVGUSB mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 None None Virtual 8086 Protected Description description Average unsigned packed 8-bit values emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PAVGUSB instruction produces rounded averages eight unsigned 8-bit integer values source operand register 64-bit memory location) eight corresponding unsigned 8-bit integer values destination operand register). does adding source destination byte values then adding 001h 9-bit intermediate value. intermediate value then divided (shifted right place) eight unsigned 8-bit results stored register specified destination operand. PAVGUSB instruction used pixel averaging MPEG-2 motion compensation video scaling operations. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Functional Illustration PAVGUSB Instruction mmreg2/mem64 byte averaging mmreg1 mmreg1 Indicates value that rounded-up following list explains functional illustration PAVGUSB instruction: rounded byte average FFh. rounded byte average 80h. rounded byte average also 80h. rounded byte average 10h. rounded byte average 01h. rounded byte average 5Ah. rounded byte average 7Fh. rounded byte average A1h. equations byte averaging with rounding follows: mmreg1[63:56] (mmreg1[63:56] mmreg2/mem64[63:56] 01h)/2 mmreg1[55:48] (mmreg1[55:48] mmreg2/mem64[55:48] 01h)/2 mmreg1[47:40] (mmreg1[47:40] mmreg2/mem64[47:40] 01h)/2 mmreg1[39:32] (mmreg1[39:32] mmreg2/mem64[39:32] 01h)/2 mmreg1[31:24] (mmreg1[31:24] mmreg2/mem64[31:24] 01h)/2 mmreg1[23:16] (mmreg1[23:16] mmreg2/mem64[23:16] 01h)/2 mmreg1[15:8] (mmreg1[15:8] mmreg2/mem64[15:8] 01h)/2 mmreg1[7:0] (mmreg1[7:0] mmreg2/mem64[7:0] 01h)/2 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PF2ID mnemonic PF2ID mmreg1, mmreg2/mem64 opcode/imm8 description Converts packed floating-point operand packed 32-bit integer Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, inst conver gist conta ining single-precision, floating-point operands 32-bit signed integers using truncation. Table page shows numerical range PF2ID instruction. PF2ID instruction performs following operations: (mmreg2/mem64[31:0] 231) THEN mmreg1[31:0] 7FFF_FFFFh ELSEIF (mmreg2/mem64[31:0] -231) THEN mmreg1[31:0] 8000_0000h ELSE mmreg1[31:0] int(mmreg2/mem64[31:0]) (mmreg2/mem64[63:32] 231) THEN mmreg1[63:32] 7FFF_FFFFh ELSEIF (mmreg2/mem64[63:32] -231) THEN mmreg1[63:32] 8000_0000h ELSE mmreg1[63:32] int(mmreg2/mem64[63:32]) Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PF2ID Instruction Source Source Destination round zero (Source round zero (Source 7FFF_FFFFh 8000_0000h Undefined Normal, abs(Source Normal, -2147483648 Source Normal, Source 2147483648 Normal, Source 2147483648 Normal, Source -2147483648 Unsupported Related Instructions PI2FD instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFACC mnemonic PFACC mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none description Floating-point accumulate Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFACC vector instruction that accumulates words destination operand source operand stores results high words destination operand respectively. Both operands single-precision, floating-point operands with 24-bit significands. Table page shows numerical range PFACC instruction. PFACC instruction performs following operations: temp mmreg2/mem64 mmreg1[31:0] mmreg1[31:0] mmreg1[63:32] mmreg1[63:32] temp[31:0] temp[63:32] Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFACC Instruction Source Source Source Normal Source Normal, Undefined Unsupported Source Undefined Undefined Source Destination Notes: Normal Unsupported sign result logical signs source operands. absolute value result less then -126, result zero with sign being sign source operand that larger magnitude magnitudes equal, sign source used). absolute value result greater than equal 128, result largest normal number with sign being sign source operand that larger magnitude. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFADD mnemonic opcode/imm8 description Packed, floating-point addition PFADD mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFADD vector instruction that performs addition destination operand source operand. Both operands single-precision, floating-point operands with 24-bit significands. Table page shows numerical range PFADD instruction. PFADD instruction performs following operations: mmreg1[31:0] mmreg1[31:0] mmreg2/mem64[31:0] mmreg1[63:32] mmreg1[63:32] mmreg2/mem64[63:32] Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFADD Instruction Source Source Source Normal Source Normal, Undefined Unsupported Source Undefined Undefined Source Destination Notes: Normal Unsupported sign result logical signs source operands. absolute value result less then -126, result zero with sign being sign source operand that larger magnitude magnitudes equal, sign source used). absolute value result greater than equal 128, result largest normal number with sign being sign source operand that larger magnitude. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFCMPEQ mnemonic PFCMPEQ mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 description Packed floating-point comparison, equal none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFCMPEQ vector instruction that performs comparison destination operand source operand generates bits zero bits based result corresponding comparison. Table page shows numerical range PFCMPEQ instruction. PFCMPEQ instruction performs following operations: (mmreg1[31:0] mmreg2/mem64[31:0]) THEN mmreg1[31:0] FFFF_FFFFh ELSE mmreg1[31:0] 0000_0000h (mmreg1[63:32] mmreg2/mem64[63:32] THEN mmreg1[63:32] FFFF_FFFFh ELSE mmreg1[63:32] 0000_0000h Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFCMPEQ Instruction Source FFFF_FFFFh 0000_0000h 0000_0000h Normal 0000_0000h 0000_0000h, FFFF_FFFFh 0000_0000h Unsupported 0000_0000h 0000_0000h Undefined Source Destination Normal Unsupported Notes: Positive zero equal negative zero. result FFFF_FFFFh source source have identical signs, exponents, mantissas. Otherwise, result 0000_0000h. Related Instructions PFCMPGE instruction. PFCMPGT instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFCMPGE mnemonic PFCMPGE mmreg1, mmreg2/mem64 opcode/imm8 description Packed floating-point comparison, greater than equal Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFCMPGE vector instruction that performs comparison destination operand source operand generates bits zero bits based result corresponding comparison. Table page shows numerical range PFCMPGE instruction. PFCMPGE instruction performs following operations: (mmreg1[31:0] mmreg2/mem64[31:0]) THEN mmreg1[31:0] FFFF_FFFFh ELSE mmreg1[31:0] 0000_0000h (mmreg1[63:32] mmreg2/mem64[63:32] THEN mmreg1[63:32] FFFF_FFFFh ELSE mmreg1[63:32] 0000_0000h Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFCMPGE Instruction Source FFFF_FFFFh 0000_0000h, Normal Unsupported FFFF_FFFFh Undefined Normal 0000_0000h, FFFF_FFFFh 0000_0000h, FFFF_FFFFh Undefined Unsupported Undefined Undefined Undefined Source Destination Notes: Positive zero equal negative zero. result FFFF_FFFFh, source negative. Otherwise, result 0000_0000h. result FFFF_FFFFh, source positive. Otherwise, result 0000_0000h. result FFFF_FFFFh, source positive source negative, they both negative source smaller than equal magnitude source source source both positive source greater than equal magnitude source result 0000_0000h other cases. Related Instructions PFCMPEQ instruction. PFCMPGT instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFCMPGT mnemonic PFCMPGT mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 description Packed floating-point comparison, greater than none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFCMPGT vector instruction that performs comparison destination operand source operand generates bits zero bits based result corresponding comparison. Table page shows numerical range PFCMPGT instruction. PFCMPGT instruction performs following operations: (mmreg1[31:0] mmreg2/mem64[31:0]) THEN mmreg1[31:0] FFFF_FFFFh ELSE mmreg1[31:0] 0000_0000h (mmreg1[63:32] mmreg2/mem64[63:32] THEN mmreg1[63:32] FFFF_FFFFh ELSE mmreg1[63:32] 0000_0000h Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFCMPGT Instruction Source Source Destination Normal Unsupported Notes: Normal 0000_0000h, FFFF_FFFFh 0000_0000h, FFFF_FFFFh Undefined Unsupported Undefined Undefined Undefined 0000_0000h 0000_0000h, FFFF_FFFFh Undefined result FFFF_FFFFh, source negative. Otherwise, result 0000_0000h. result FFFF_FFFFh, source positive. Otherwise, result 0000_0000h. result FFFF_FFFFh, source positive source negative, they both negative source smaller magnitude than source source source positive source greater magnitude than source result 0000_0000h other cases. Related Instructions PFCMPEQ instruction. PFCMPGE instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFMAX mnemonic opcode/imm8 description Packed floating-point maximum PFMAX mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFMAX vector instruction that returns larger single-precision, floating-point operands. operation with zero negative number returns positive zero. operation consisting zeros returns positive zero. Table page shows numerical range PFMAX instruction. PFMAX instruction performs following operations: (mmreg1[31:0] mmreg2/mem64[31:0]) THEN mmreg1[31:0] mmreg1[31:0] ELSE mmreg1[31:0] mmreg2/mem64[31:0] (mmreg1[63:32] mmreg2/mem64[63:32]) THEN mmreg1[63:32] mmreg1[63:32] ELSE mmreg1[63:32] mmreg2/mem64[63:32] Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFMAX Instruction Source Source Destination Notes: Normal Source Source 1/Source Undefined Unsupported Undefined Undefined Undefined Source Undefined Normal Unsupported result source source positive. Otherwise, result positive zero. result source source positive. Otherwise, result positive zero. result source source positive source negative. result source both positive source greater magnitude than source result source both negative source lesser magnitude than source result source other cases. Related Instructions PFMIN instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFMIN mnemonic PFMIN mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none description Packed floating-point minimum Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFMIN vector instruction that returns smaller single-precision, floating-point operands. operation with zero positive number returns positive zero. operation consisting zeros returns positive zero. Table page shows numerical range PFMIN instruction. PFMIN instruction performs following operations: (mmreg1[31:0] mmreg2/mem64[31:0]) THEN mmreg1[31:0] mmreg1[31:0] ELSE mmreg1[31:0] mmreg2/mem64[31:0] (mmreg1[63:32] mmreg2/mem64[63:32]) THEN mmreg1[63:32] mmreg1[63:32] ELSE mmreg1[63:32] mmreg2/mem64[63:32] Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFMIN Instruction Source Source Destination Notes: Normal Source Source 1/Source Undefined Unsupported Undefined Undefined Undefined Source Undefined Normal Unsupported result source source negative. Otherwise, result positive zero. result source source negative. Otherwise, result positive zero. result source source negative source positive. result source both negative source greater magnitude than source result source both positive source lesser magnitude than source result source other cases. Related Instructions PFMAX instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFMUL mnemonic PFMUL mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none Virtual 8086 Protected Description description Packed floating-point multiplication emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFMUL vector instruction that performs multiplication destination operand source operand. Both operands single-precision, floating-point operands with 24-bit significands. Table page shows numerical range PFMUL instruction. PFMUL instruction performs following operations: mmreg1[31:0] mmreg1[31:0] mmreg2/mem64[31:0] mmreg1[63:32] mmreg1[63:32] mmreg2/mem64[63:32] Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFMUL Instruction Source Source Destination Notes: Normal Normal, Undefined Unsupported Undefined Undefined Normal Unsupported sign result exclusive-OR signs source operands. absolute value result less then -126, result zero with sign being exclusive-OR signs source operands. absolute value product greater than equal 128, result largest normal number with sign being exclusive-OR signs source operands. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFRCP mnemonic PFRCP mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none description Floating-point reciprocal approximation Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFRCP scalar instruction that returns low-precision estimate reciprocal source operand. single result value duplicated both high halves this instruction's 64-bit result. source operand single-precision with 24-bit significand, result accurate bits. Table page shows numerical range PFRCP instruction. Increased accuracy (the full bits single-precision significand) requires additional instructions (PFRCPIT1 PFRCPIT2). first stage this increase refinement accuracy (PFRCPIT1) requires that input output already executed PFRCP instruction used input PFRCPIT1 application-specific example this instruction related instructions. PFRCP instruction performs following operations: mmreg1[31:0] reciprocal(mmreg2/mem64[31:0]) mmreg1[63:32] reciprocal(mmreg2/mem64[31:0]) Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 following code example, bold line illustrates PFRCP instruction sequence used compute accurate bits: PFRCP(b) PFRCPIT1(b,X0) PFRCPIT2(X1,X0) PFMUL(a,X2) Table Numerical Range PFRCP Instruction Source Destination Source Normal Unsupported Notes: Maximum Normal Normal, Undefined result same sign source operand. absolute value result less then -126, result zero with sign being sign source operand. Otherwise, result normal with sign being same sign source operand. Related Instructions PFRCPIT1 instruction. PFRCPIT2 instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFRCPIT1 mnemonic PFRCPIT1 mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none Virtual 8086 Protected Description description Packed floating-point reciprocal, first iteration step emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFRCPIT1 vector instruction that performs first intermediate step Newton-Raphson iteration refine reciprocal approximation produced PFRCP instruction (the second final step completes iteration accurate bits). Table page shows numerical range PFRCPIT1 instruction. behavior this instruction only defined those combinations operands such that source operand input PFRCP instruction other source operand output same PFRCP instruction. Refer "Division Square Root" page application-specific example this instruction related instructions. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 following code example, bold line illustrates PFRCPIT1 instruction sequence used compute accurate bits: PFRCP(b) PFRCPIT1(b,X0) PFRCPIT2(X1,X0) PFMUL(a,X2) Table Numerical Range PFRCPIT1 Instruction Source Source Destination Notes: Normal Normal Undefined Unsupported Undefined Undefined Normal Unsupported sign result exclusive-OR signs source operands. sign positive. Related Instructions PFRCP instruction. PFRCPIT2 instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFRCPIT2 mnemonic PFRCPIT2 mmreg1, mmreg2/mem64 opcode/imm8 description Packed floating-point reciprocal/reciprocal square root, second iteration step Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFRCPIT2 vector instruction that performs second final intermediate step Newton-Raphson iteration refine reciprocal reciprocal square root approximation produced PFRCP PFSQRT instructions, respectively. Table page shows numerical range PFRCPIT2 instruction. behavior this instruction only defined those combinations operands such that first source operand (mmreg1) output either PFRCPIT1 PFRSQIT1 instructions second source operand (mmreg2/mem64) output either PFRCP PFRSQRT instructions. Refer "Division Square Root" page application-specific example this instruction related instructions. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 following code example, bold line illustrates PFRCPIT2 instruction sequence used compute accurate bits: PFRCP(b) PFRCPIT1(b,X0) PFRCPIT2(X1,X0) PFMUL(a,X2) Table Numerical Range PFRCPIT2 Instruction Source Source Destination Notes: Normal Normal, Undefined Unsupported Undefined Undefined Normal Unsupported sign result exclusive-OR signs source operands. absolute value result less then -126, result zero with sign being exclusive-OR signs source operands. absolute value product greater than equal 128, result largest normal number with sign being exclusive-OR signs source operands. Related Instructions PFRCPIT1 instruction. PFRSQIT1 instruction. PFRCP instruction. PFRSQRT instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFRSQIT1 mnemonic PFRSQIT1 mmreg1, mmreg2/mem64 opcode/imm8 description Packed floating-point reciprocal square root, first iteration step Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real none none Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFRSQIT1 vector instruction that performs first intermediate step Newton-Raphson iteration refine reciprocal square root approximation produced PFSQRT instruction (the second final step completes iteration accurate bits). Table page shows numerical range PFRSQIT1 instruction. behavior this instruction only defined those combinations operands such that source operand input PFRSQRT instruction other source operand square output same PFRSQRT instruction. Refer "Division Square Root" page application-specific example this instruction related instructions. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 following code example, bold lines illustrate PFMUL PFRSQIT1 instructions sequence used compute 1/sqrt accurate bits: PFRSQRT(b) PFMUL(X0,X0) PFRSQIT1(b,X1) PFRCPIT2(X2,X0) Table Numerical Range PFRSQIT1 Instruction Source Source Destination Notes: Normal Normal Undefined Unsupported Undefined Undefined Normal Unsupported sign result exclusive-OR signs source operands. sign Related Instructions PFRCPIT2 instruction. PFRSQRT instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFRSQRT mnemonic PFRSQRT mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none Virtual 8086 Protected Description description Floating-point reciprocal square root approximation emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFRSQRT scalar instruction that returns low-precision estimate reciprocal square root source operand. single result value duplicated both high halves this instruction's 64-bit result. source operand single-precision with 24-bit significand, result accurate bits. Negative operands treated positive operands purposes reciprocal square root computation, with sign result same sign source operand. Table page shows numerical range PFRSQRT instruction. Increased accuracy (the full bits single-precision significand) requires additional instructions (PFRSQIT1 PFRCPIT2). first stage this increase refinement accuracy (PFRSQIT1) requires that input squared output already executed PFRSQRT instruction used input PFRSQIT1 instruction. Refer "Division Square Root" page application-specific example this instruction related instructions. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 PFRSQRT instruction performs following operations: mmreg1[31:0] reciprocal square root(mmreg2/mem64[31:0]) mmreg1[63:32] reciprocal square root(mmreg2/mem64[31:0]) following code example, bold line illustrates PFRSQRT instruction sequence used compute 1/sqrt accurate bits: PFRSQRT(b) PFMUL(X0,X0) PFRSQIT1(b,X1) PFRCPIT2(X2,X0) Table Numerical Range PFRSQRT Instruction Source Destination Source Normal Unsupported Note: Maximum Normal* Normal Undefined result same sign source operand. Related Instructions PFRSQIT1 instruction. PFRCPIT2 instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFSUB mnemonic PFSUB mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none description Packed floating-point subtraction Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFSUB vector instruction that performs subtraction source operand from destination operand. Both operands single-precision, floating-point operands with 24-bit significands. Table page shows numerical range PFSUB instruction. PFSUB instruction performs following operations: mmreg1[31:0] mmreg1[31:0] mmreg2/mem64[31:0] mmreg1[63:32] mmreg1[63:32] mmreg2/mem64[63:32] Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFSUB Instruction Source Source Destination Notes: Normal Source Normal, Undefined Unsupported Source Undefined Undefined Source Source Normal Unsupported sign result logical sign source inverse sign source absolute value result less then -126, result zero with sign being sign source operand that larger magnitude magnitudes equal, sign source used). absolute value result greater than equal 128, result largest normal number with sign being sign source operand that larger magnitude. Related Instructions PFSUBR instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PFSUBR mnemonic PFSUBR mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none Virtual 8086 Protected Description description Packed floating-point reverse subtraction emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PFSUBR vector instruction that performs subtraction destination operand from source operand. Both operands single-precision, floating-point operands with 24-bit significands. Table page shows numerical range PFSUBR instruction. PFSUBR instruction performs following operations: mmreg1[31:0] mmreg2/mem64[31:0] mmreg1[31:0] mmreg1[63:32] mmreg2/mem64[63:32] mmreg1[63:32] Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 Table Numerical Range PFSUBR Instruction Source Source Destination Notes: Normal Source Normal, Undefined Unsupported Source Undefined Undefined Source Source Normal Unsupported sign result logical sign source inverse sign source absolute value result less then -126, result zero with sign being sign source operand that larger magnitude magnitudes equal, sign source used). absolute value result greater than equal 128, result largest normal number with sign being sign source operand that larger magnitude. Related Instructions PFSUB instruction. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual PI2FD mnemonic PI2FD mmreg1, mmreg2/mem64 Privilege: Registers Affected: Flags Affected: Exceptions Generated Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real opcode/imm8 none none description Packed 32-bit integer floating-point conversion Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PI2FD vector instruction that converts vector register containing signed, 32-bit integers single-precision, floating-point operands. When PI2FD converts input operand with more significant digits than available output, output truncated. PI2FD instruction performs following operations: mmreg1[31:0] float(mmreg2/mem64[31:0]) mmreg1[63:32] float(mmreg2/mem64[63:32]) Related Instructions PF2ID instruction. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 PMULHRW mnemonic opcode/imm8 description Multiply signed packed 16-bit values with rounding store high bits. PMULHRW mmreg1, mmreg2/mem64 0Fh/B7h Privilege: Registers Affected: Flags Affected: Exceptions Generated: Exception Invalid opcode Device available Stack exception (12) General protection (13) Segment overrun (13) Page fault (14) Floating-point exception pending (16) Alignment check (17) Real None None Virtual 8086 Protected Description emulate instruction (EM) control register (CR0) Save floating-point state task switch (TS) control register (CR0) During instruction execution, stack segment limit exceeded. During instruction execution, effective address segment registers used operand points illegal memory location. instruction data operands falls outside address range 00000h 0FFFFh. page fault resulted from execution instruction. exception pending floating-point execution unit. unaligned memory reference resulted from instruction execution, alignment mask (AM) control register (CR0) Protected Mode, PMULHRW instruction multiplies four signed 16-bit integer values source operand register 64-bit memory location) four corresponding signed 16-bit integer values destination operand register). PMULHRW instruction then adds 8000h lower bits 32-bit result, which results rounding high-order, 16-bit result. high-order bits result (including sign bit) stored destination operand. PMULHRW instruction provides numerically more accurate result than PMULMH instruction, which truncates result instead rounding. 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual Functional Illustration PMULHRW Instruction D250h 5321h 7007h FFFFh mmreg2/mem64 EC22h 7FFEh FFFFh mmreg1 8807h F98Ch 3803h 0000h mmreg1 1569h Indicates value that rounded-up following list explains functional illustration PMULHRW instruction: signed 16-bit negative value D250h (-2DB0h) multiplied signed 16-bit negative value 8807h (-77F9h) produce signed 32-bit positive result 1569_4030h. 8000h then added lower bits produce final result 1569_C030h. This rounding does affect final result 1569h. signed high-order bits result stored destination operand. signed 16-bit positive value 5321h multiplied signed 16-bit negative value EC22h (-13DEh) produce signed 32-bit negative result F98C_7662h (-0673_899Eh). 8000h then added lower bits, producing final result F98C_F662h. This rounding does affect final result F98Ch. signed high-order bits result stored destination operand. signed 16-bit positive value 7007h multiplied signed 16-bit positive value 7FFEh produce signed 32-bit positive result 3802_9FF2h. 8000h then added lower bits produce final result 3803_1FF2h. This result been rounded signed high-order bits result (3803h) stored destination operand. signed 16-bit negative value FFFFh (-1) multiplied signed 16-bit negative value FFFFh (-1) produce signed 32-bit positive result 0000_0001h. 8000h then added lower bits produce final result 0000_8001h. This rounding does affect final result 0000h. signed high-order bits result stored destination operand. Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 PREFETCH/PREFETCHW mnemonic PREFETCH(W) mem8 opcode description Prefetch processor cache line into data cache (Dcache) Privilege: Registers Affected: Flags Affected: Exceptions Generated: none none none none PREFETCH instruction loads processor cache line into data cache. address this line specified mem8 value. processor, line size bytes. future processors, size line that loaded PREFETCH instruction will least 32-bytes. PREFETCH instruction loads cache line even mem8 address aligned with start line (although some implementations, including AMD-K6 family processors, perform cache fill starting from cache miss mem8 address). cache occurs (the line already Dcache) memory fault detected, cycle initiated instruction treated NOP. applications where large number data sets must processed, PREFETCH instruction pre-load next data into Dcache while, simultaneously, processor operating present data. This instruction allows programmer explicitly code operation concurrency. When present data values completed, next already available Dcache. example concurrent operation vertices processing transformations, where next vertices prefetched into data cache while present being transformed. PREFETCH instruction format processor defined allow extensions future K86processors. instruction mnemonic PREFETCH instruction includes modR/M byte. Only memory form modR/M valid (use register form results invalid opcode exception). Because there destination register, three destination register field bits modR/M byte used define type prefetch performed. PREFETCH PREFETCHW instructions defined pattern 000b 001b, respectively. other patterns reserved future use. PREFETCHW instruction loads prefetched line sets cache line MESI state modified anticipation subsequent data writes line), unlike PREFETCH instruction, which typically sets state exclusive. data that prefetched into Dcache modified, PREFETCHW instruction 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual will save cycle that PREFETCH instruction requires modifying Dcache line state. PREFETCHW instruction should used when programmer expects that data cache line will modified. Otherwise, PREFETCH instruction should used. Note: AMD-K6-2 AMD-K6-III processors execute PREFETCHW instruction identically PREFETCH instruction. However, Athlon future processors that support PREFETCHW described above will able take advantage performance benefit provided this instruction. more information, Athlon Processor Code Optimization Guide, order# 22007. Table summarizes PREFETCH type options: Table Summary PREFETCH Instruction Type Options 11-xxx-xxx mm-000-xxx mm-001-xxx mm-010-xxx mm-011-xxx mm-100-xxx mm-101-xxx mm-110-xxx mm-111-xxx PREFETCH PREFETCHW Reserved Reserved Reserved Reserved Reserved Reserved Result Invalid Opcode Note: "Reserved" PREFETCH types result Invalid Opcode Exception executed. Instead, forward compatibility with future processors that implement additional forms PREFETCH instruction, "Reserved" PREFETCH types implemented synonyms basic PREFETCH type (for example, PREFETCH instruction with type 000b). Chapter 3DNow!Instruction 3DNow!Technology Manual 21928G/0-March 2000 3DNow!Instruction Chapter 21928G/0-March 2000 3DNow!Technology Manual Division Square Root Division 3DNow! instructions used compute very fast, highly accurate reciprocal quotient. Consider quotient a/b. on-chip, ROM-based table lookup used quickly produce 14-15 precision approximation (using just two-cycle latency instruction-PFRCP). full-precision reciprocal then quickly computed from this approximation using Newton-Raphson algorithm. general Newton-Raphson recurrence reciprocal follows: Given that initial approximation accurate least bits, that full IEEE single precision contains bits mantissa, just Newton-Raphson iteration required. following shows 3DNow! instruction sequence produce full-precision reciprocal from this, lastly, complete required division a/b. Chapter Division Square Root 3DNow!Technology Manual 21928G/0-March 2000 PFRCP(b) PFRCPIT1(b, PFRCPIT2(X1, PFMUL(a, 24-bit final reciprocal value processor round-to-nearest value approximately arguments. unit-in-the-last-place (ulp). quotient formed last step multiplying reciprocal dividend Divide Examples These examples illustrate 3DNow! instructions perform divides. (14-Bit Precision) MOVD PFRCP MOVQ PFMUL MM0, MM0, MM2, MM2, [mem] [mem] (approx.) (24-Bit Precision) MOVD PFRCP PUNPCKLDQ PFRCPIT1 MOVQ PFRCPIT2 PFMUL MM0, MM1, MM0, MM0, MM2, MM0, MM2, [mem] [mem] (approx.) (MMX instruction) (intermed.) (full prec.) Note: description PUNPCKLDQ instruction, AMD-K6® Processor Multimedia Technology Manual, order# 20726. Division Square Root Chapter 21928G/0-March 2000 3DNow!Technology Manual Square Root 3DNow! instructions also used compute reciprocal square root square root with high performance. general Newton-Raphson reciprocal square root recurrence follows: Zi2) reduce number iterations, initial approximation read from table. 3DNow! reciprocal square root approximation accurate least bits. Accordingly, obtain single-precision 24-bit reciprocal square root input operand Newton-Raphson iteration required using following 3DNow! instructions: PFRSQRT(b) PFMUL(X0, PFRSQIT1(b, PFRCPIT2(X2, PFMUL(b, 24-bit final reciprocal square root value round-to-nearest value approximately arguments. round-to-nearest value ulp. square root (X4) formed last step multiplying input operand Square Root Examples These examples illustrate 3DNow! technology perform square roots. (15-Bit Precision) MOVD PFRSQRT PUNPCKLDQ PFMUL MM0, MM1, MM0, MM0, [mem] 1/(sqrt 1/(sqrt (approx.) (MMX instr.) (sqrt (sqrt Chapter Division Square Root 3DNow!Technology Manual 21928G/0-March 2000 (24-Bit Precision) MOVD PFRSQRT MOVQ PFMUL PUNPCKLDQ PFRSQIT1 PFRCPIT2 PFMUL MM0, MM1, MM2, MM1, MM0, MM1, MM1, MM0, [mem] 1/(sqrt 1/(sqrt 1/(sqrt (intermediate) 1/(sqrt (full prec.) (sqrt (sqrt (approx.) (approx.) step (MMX instr.) step step Division Square Root Chapter Other recent searchesXN0A311G - XN0A311G XN0A311G Datasheet WM8960 - WM8960 WM8960 Datasheet Si9926DY - Si9926DY Si9926DY Datasheet PDM-39-9G - PDM-39-9G PDM-39-9G Datasheet NCV4276 - NCV4276 NCV4276 Datasheet TLE4276 - TLE4276 TLE4276 Datasheet GNR32D - GNR32D GNR32D Datasheet CY7C4255 - CY7C4255 CY7C4255 Datasheet CY7C4265 - CY7C4265 CY7C4265 Datasheet AN42851 - AN42851 AN42851 Datasheet AN2352 - AN2352 AN2352 Datasheet
Privacy Policy | Disclaimer |