| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
User's Guide 1999 Digital Signal Processing Solutions P
Top Searches for this datasheetTMS320C4x General Purpose Applications User's Guide 1999 Digital Signal Processing Solutions Printed U.S.A., 1999 SPRU159A User's Guide TMS320C4x General Purpose Applications 1999 TMS320C4x General Purpose Applications User's Guide SPRU159A 1999 Printed Recycled Paper IMPORTANT NOTICE Texas Instruments subsidiaries (TI) reserve right make changes their products discontinue product service without notice, advise customers obtain latest version relevant information verify, before placing orders, that information being relied current complete. products sold subject terms conditions sale supplied time order acknowledgement, including those pertaining warranty, patent infringement, limitation liability. warrants performance semiconductor products specifications applicable time sale accordance with TI's standard warranty. Testing other quality control techniques utilized extent deems necessary support this warranty. Specific testing parameters each device necessarily performed, except those mandated government requirements. CERTAIN APPLICATIONS USING SEMICONDUCTOR PRODUCTS INVOLVE POTENTIAL RISKS DEATH, PERSONAL INJURY, SEVERE PROPERTY ENVIRONMENTAL DAMAGE ("CRITICAL APPLICATIONS"). SEMICONDUCTOR PRODUCTS DESIGNED, AUTHORIZED, WARRANTED SUITABLE LIFE-SUPPORT DEVICES SYSTEMS OTHER CRITICAL APPLICATIONS. INCLUSION PRODUCTS SUCH APPLICATIONS UNDERSTOOD FULLY CUSTOMER'S RISK. order minimize risks associated with customer's applications, adequate design operating safeguards must provided customer minimize inherent procedural hazards. assumes liability applications assistance customer product design. does warrant represent that license, either express implied, granted under patent right, copyright, mask work right, other intellectual property right covering relating combination, machine, process which such semiconductor products services might used. TI's publication information regarding third party's products services does constitute TI's approval, warranty endorsement thereof. Copyright 1999, Texas Instruments Incorporated Preface Read This First About This Manual This user's guide serves applications reference book TMS320C40 TMS320C44 digital signal processors (DSP). Throughout book, references TMS320C4x apply both devices (exceptions noted). Specifically, this book complements TMS320C4x User's Guide providing information assist managers hardware/software engineers application development. includes example code hardware connections various applications. guide shows instruction set, architecture, 'C4x interface. presents examples frequently used applications discusses more involved examples applications. also defines principles involved many applications gives corresponding assembly language code instructional purposes immediate use. Whenever detailed explanation underlying theory extensive included this manual, appropriate references given further information. This Manual following table summarizes information contained this user's guide: looking information about: Arithmetic Communication Ports Companding Development Support Turn these chapters: Chapter Logical Arithmetic Operations Chapter Using Communication Ports Chapter Applications-Oriented Operations Chapter Development Support Part Order Information looking information about: Coprocessor FTTs Filters Ordering Parts Repeat Modes Reset Stacks Tips Wait States XDS510 Emulator Turn these chapters: Chapter Programming Coprocessor Chapter Applications-Oriented Operations Chapter Applications-Oriented Operations Chapter Development Support Part Order Information Chapter Program Control Chapter Processor Initialization Chapter Program Control Chapter Programming Tips Chapter Memory Interfacing Chapter XDS510 Emulator Design Considerations Style Symbol Conventions This document uses following conventions: Program listings, program examples, file names, symbol names shown special font. Examples bold version special font emphasis. Here sample program listing segment: LOOP1 RPTB CMPF LDFLT LOOP2 RPTB CMPF LDFLT NEXT *AR0,R0 *AR0,R0 NEXT *AR0++(1),R0 *-AR0(1),R0 ;Compare number maximum greater, this ;Compare number minimum smaller, this minimum Throughout this book indicates most significant indicates least significant bit. indicates most significant byte indicates least significant byte. Information About Cautions Warnings This book contain cautions warnings. This example caution statement. caution statement describes situation that could potentially damage your software equipment. This example warning statement. warning statement describes situation that could potentially cause harm you. information caution warning provided your protection. Please read each caution warning carefully. Related Documentation From Texas Instruments following books describe TMS320 floating-point devices related support tools. obtain copy these documents, call Texas Instruments Literature Response Center (800) 477-8924. When ordering, please identify book title literature number. TMS320C4x User's Guide (literature number SPRU063) describes 'C4x 32-bit floating-point processor, developed digital signal processing well parallel processing applications. Covered architecture, internal register structure, instruction set, pipeline, specifications, operation channels communication ports. TMS320C4x Parallel Processing Development System Technical Reference (literature number SPRU075) describes TMS320C4x parallel processing system, system with four C4xs with shared distributed memory. Parallel Processing with TMS320C4x (literature number SPRA031) describes parallel processing 'C4x used parallel processing. Also provides sample parallel processing applications. TMS320C3x/C4x Assembly Language Tools User's Guide (literature number SPRU035) describes assembly language tools (assembler, linker, other tools used develop assembly language code), assembler directives, macros, common object file format, symbolic debugging directives 'C3x 'C4x generations devices. TMS320 Floating-Point Optimizing Compiler User's Guide (literature number SPRU034) describes TMS320 floating-point compiler. This compiler accepts ANSI standard source code produces TMS320 assembly language source code 'C3x 'C4x generations devices. TMS320C4x Source Debugger User's Guide (literature number SPRU054) tells invoke 'C4x emulator simulator versions source debugger interface. This book discusses various aspects debugger interface, including window management, command entry, code execution, data management, breakpoints. also includes tutorial that introduces basic debugger functionality. TMS320C4x Technical Brief (literature number SPRU076) gives condensed overview 'C4x development tools. also lists TMS320C4x third parties. TMS320 Family Development Support Reference Guide (literature number SPRU011) describes '320 family digital signal processors various products that support This includes code-generation tools (compilers, assemblers, linkers, etc.) system integration debug tools (simulators, emulators, evaluation modules, etc.). This book also lists related documentation, outlines seminars university program, gives factory repair exchange information. TMS320 Third-Party Support Reference Guide (literature number SPRU052) alphabetically lists over third parties that supply various products that serve family '320 digital signal processors-software hardware development tools, speech recognition, image processing, noise cancellation, modems, etc. TMS320 Designer's Notebook: Volume (literature number SPRT125) presents solutions common design problems using 'C2x, 'C3x, 'C4x, 'C5x, other DSPs. Related Articles Books wide variety related documentation available digital signal processing. These references fall into following application categories: General-Purpose Graphics/Imagery Speech/Voice Control Multimedia Military Telecommunications Automotive Consumer Medical Development Support following list, references appear alphabetical order according author. documents contain beneficial information regarding designs, operations, applications signal-processing systems; documents provide additional references. Texas Instruments strongly suggests that refer these publications. General-Purpose DSP: Antoniou, Digital Filters: Analysis Design, York, McGraw-Hill Company, Inc., 1979. Brigham, E.O., Fast Fourier Transform, Englewood Cliffs, Prentice-Hall, Inc., 1974. Burrus, C.S., T.W. Parks, DFT/FFT Convolution Algorithms, York, John Wiley Sons, Inc., 1984. Chassaing, Horning, D.W., Digital Signal Processing with Fixed Floating-Point Processors." CoED, USA, Volume Number pages 1-4, March 1991. Defatta, David Joseph Lucas, William Hodgkiss, Digital Signal Processing: System Design Approach, York: John Wiley, 1988. Erskine, Magar, "Architecture Applications SecondGeneration Digital Signal Processor." Proceedings IEEE International Conference Acoustics, Speech, Signal Processing, USA, 1985. Essig, Erskine, Caudel, Magar, Second-Generation Digital Signal Processor." IEEE Journal Solid-State Circuits, USA, Volume SC-21, Number pages 86-91, February 1986. Frantz, Lin, Reimer, Bradley, "The Texas Instruments TMS320C25 Digital Signal Microcomputer." IEEE Microelectronics, USA, Volume Number pages 10-28, December 1986. Gass, Tarrant, Richard, Pawate, Gammel, Rajasekaran, Wiggins, Covington, "Multiple Digital Signal Processor Environment Intelligent Signal Processing." Proceedings IEEE, USA, Volume Number pages 1246-1259, September 1987. Gold, Bernard, C.M. Rader, Digital Processing Signals, York, McGraw-Hill Company, Inc., 1969. Hamming, R.W., Digital Filters, Englewood Cliffs, Prentice-Hall, Inc., 1977. IEEE ASSP Committee (Editor), Programs Digital Signal Processing, York, IEEE Press, 1979. Jackson, Leland Digital Filters Signal Processing, Hingham, Kluwer Academic Publishers, 1986. Jones, D.L., T.W. Parks, Digital Signal Processing Laboratory Using TMS32010, Englewood Cliffs, Prentice-Hall, Inc., 1987. Lim, Jae, Alan Oppenheim, Advanced Topics Signal Processing, Englewood Cliffs, Prentice- Hall, Inc., 1988. Lin, Frantz, Simar, Jr., "The TMS320 Family Digital Signal Processors." Proceedings IEEE, USA, Volume Number pages 1143-1159, September 1987. viii Lovrich, Reimer, Advanced Audio Signal Processor." Digest Technical Papers 1991 International Conference Consumer Electronics, June 1991. Magar, Essig, Caudel, Marshall Peters, NMOS Digital Signal Processor with Multiprocessing Capability." Digest IEEE International Solid-State Circuits Conference, USA, February 1985. Morris, Robert Digital Signal Processing Software, Ottawa, Canada: Carleton University, 1983. Oppenheim, Alan (Editor), Applications Digital Signal Processing, Englewood Cliffs, Prentice-Hall, Inc., 1978. Oppenheim, Alan R.W. Schafer, Digital Signal Processing, Englewood Cliffs, Prentice-Hall, Inc., 1975 1988. Oppenheim, A.V., A.N. Willsky, I.T. Young, Signals Systems, Englewood Cliffs, Prentice-Hall, Inc., 1983. Papamichalis, P.E., C.S. Burrus, "Conversion Digit-Reversed BitReversed Order Algorithms." Proceedings ICASSP USA, pages 984-987, 1989. Papamichalis, Simar, Jr., "The TMS320C30 Floating-Point Digital Signal Processor." IEEE Micro Magazine, USA, pages 13-29, December 1988. Parks, T.W., C.S. Burrus, Digital Filter Design, York, John Wiley Sons, Inc., 1987. Peterson, Zervakis, Shehadeh, "Adaptive Filter Design Implementation Using TMS320C25 Microprocessor." Computers Education Journal, USA, Volume Number pages 12-16, July-September 1993. Prado, Alcantara, Fast Square-Rooting Algorithm Using Digital Signal Processor." Proceedings IEEE, USA, Volume Number pages 262-264, February 1987. Rabiner, L.R. Gold, Theory Applications Digital Signal Processing, Englewood Cliffs, Prentice-Hall, Inc., 1975. Simar, Jr., Davis, "The Application High-Level Languages Single-Chip Digital Signal Processors." Proceedings ICASSP USA, Volume page 1678, April 1988. Simar, Jr., Leigh, Koeppen, Leach, Potts, Blalock, MFLOPS Digital Signal Processor: First Supercomputer Chip." Proceedings ICASSP USA, Catalog Number 87CH2396 Volume pages 535-538, April 1987. Simar, Jr., Reimer, "The TMS320C25: CMOS VLSI Digital Signal Processor." 1986 Workshop Applications Signal Processing Audio Acoustics, September 1986. Texas Instruments, Digital Signal Processing Applications with TMS320 Family, 1986; Englewood Cliffs, Prentice-Hall, Inc., 1987. Treichler, J.R., C.R. Johnson, Jr., M.G. Larimore, Practical Guide Adaptive Filter Design, York, John Wiley Sons, Inc., 1987. Graphics/Imagery: Andrews, H.C., B.R. Hunt, Digital Image Restoration, Englewood Cliffs, Prentice-Hall, Inc., 1977. Gonzales, Rafael Paul Wintz, Digital Image Processing, Reading, Addison-Wesley Publishing Company, Inc., 1977. Papamichalis, P.E., "FFT Implementation TMS320C30." Proceedings ICASSP USA, Volume page 1399, April 1988. Pratt, William Digital Image Processing, York, John Wiley Sons, 1978. Reimer, Lovrich, "Graphics with TMS32020." WESCON/85 Conference Record, USA, 1985. Speech/Voice: DellaMorte, Papamichalis, "Full-Duplex Real-Time Implementation FED-STD-1015 LPC-10e Standard V.52 TMS320C25." Proceedings SPEECH TECH pages 218-221, 1989. Frantz, G.A., K.S. Lin, Low-Cost Speech System Using TMS320C17." Proceedings SPEECH TECH '87, pages 25-29, April 1987. Gray, A.H., J.D. Markel, Linear Prediction Speech, York, Springer-Verlag, 1976. Jayant, N.S., Peter Noll, Digital Coding Waveforms, Englewood Cliffs, Prentice-Hall, Inc., 1984. Papamichalis, Panos, Practical Approaches Speech Coding, Englewood Cliffs, Prentice-Hall, Inc., 1987. Papamichalis, Lively, "Implementation Standard LPC-10/52E TMS320C25." Proceedings SPEECH TECH '87, pages 201-204, April 1987. Pawate, B.I., G.R. Doddington, "Implementation Hidden Markov Model-Based Layered Grammar Recognizer." Proceedings ICASSP USA, pages 801- 804, 1989. Rabiner, L.R., R.W. Schafer, Digital Processing Speech Signals, Englewood Cliffs, Prentice-Hall, Inc., 1978. Reimer, J.B. K.S. Lin, "TMS320 Digital Signal Processors Speech Applications." Proceedings SPEECH TECH '88, April 1988. Reimer, J.B., M.L. McMahan, W.W. Anderson, "Speech Recognition Low-Cost System Using DSP." Digest Technical Papers 1987 International Conference Consumer Electronics, June 1987. Control: Ahmed, "16-Bit Microcontroller Fits Motion Control System Application." PCIM, October 1988. Ahmed, "Implementation Self Tuning Regulators with TMS320 Family Digital Signal Processors." MOTORCON '88, pages 248-262, September 1988. Ahmed, Lindquist, "Digital Signal Processors: Simplifying HighPerformance Control." Machine Design, September 1987. Ahmed, Meshkat, "Using DSPs Control." Control Engineering, February 1988. Allen, Pillay, "TMS320 Design Vector Current Control Motor Drives." Electronics Letters, Volume Number pages 2188-2190, November 1992. Bose, B.K., P.M. Szczesny, Microcomputer-Based Control Simulation Advanced Synchronous Machine Drive System Electric Vehicle Propulsion." Proceedings IECON '87, Volume pages 454-463, November 1987. Hanselman, "LQG-Control Highly Resonant Disc Drive Head Positioning Actuator." IEEE Transactions Industrial Electronics, USA, Volume Number pages 100-104, February 1988. Jacquot, Modern Digital Control Systems, York, Marcel Dekker, Inc., 1981. Katz, Digital Control Using Microprocessors, Englewood Cliffs, Prentice-Hall, Inc., 1981. Kuo, B.C., Digital Control Systems, York, Holt, Reinholt, Winston, Inc., 1980. Lovrich, Troullinos, Chirayil, All-Digital Automatic Gain Control." Proceedings ICASSP USA, Volume page 1734, April 1988. Matsui, Shigyo, "Brushless Motor Control Without Position Speed Sensors." IEEE Transactions Industry Applications, USA, Volume Number Part pages 120-127, January-February 1992. Meshkat, Ahmed, "Using DSPs Induction Motor Drives." Control Engineering, February 1988. Panahi, Restle, "DSPs Redefine Motion Control." Motion Control Magazine, December 1993. Phillips, Nagle, Digital Control System Analysis Design, Englewood Cliffs, Prentice-Hall, Inc., 1984. Multimedia: Reimer, "DSP-Based Multimedia Solutions Lead Enhancing Audio Compression Performance." Dobbs Journal, December 1993. Reimer, Benbassat, Bonneau Jr., "Application Processors: Making Multimedia Happen." Silicon Valley Design Conference, July 1991. Military: Papamichalis, Reimer, "Implementation Data Encryption Standard Using TMS32010." Digital Signal Processing Applications, 1986. Telecommunications: Ahmed, Lovrich, "Adaptive Line Enhancer Using TMS320C25." Conference Records Northcon/86, USA, 14/3/1-10, September/October 1986. Casale, Russo, Bellina, "Optimal Architectural Solution Using Processors Implementation ADPCM Transcoder." Proceedings GLOBECOM '89, pages 1267-1273, November 1989. Cole, Haoui, Winship, High-Performance Digital Voice Echo Canceller SINGLE TMS32020." Proceedings ICASSP USA, Catalog Number 86CH2243-4, Volume pages 429-432, April 1986. Cole, Haoui, Winship, High-Performance Digital Voice Echo Canceller Single TMS32020." Proceedings IEEE International Conference Acoustics, Speech Signal Processing, USA, 1986. Lovrich, Reimer, Multi-Rate Transcoder." Transactions Consumer Electronics, USA, November 1989. Lovrich, Reimer, Multi-Rate Transcoder." Digest Technical Papers 1989 International Conference Consumer Electronics, June 7-9, 1989. Hedberg, Fraenkel, "Implementation High-Speed Voiceband Data Modems Using TMS320C25." Proceedings ICASSP USA, Catalog Number 87CH2396-0, Volume pages 1915-1918, April 1987. Mock, "Add DTMF Generation Decoding DSP- Designs." Electronic Design, USA, Volume Number pages 205-213, March 1985. Reimer, McMahan, Arjmand, "ADPCM TMS320 Chip." Proceedings SPEECH TECH pages 246-249, April 1985. Troullinos, Bradley, "Split-Band Modem Implementation Using TMS32010 Digital Signal Processor." Conference Records Electro/86 Mini/Micro Northeast, USA, 14/1/1-21, 1986. Automotive: Lin, "Trends Digital Signal Processing Automotive." International Congress Transportation Electronic (CONVERGENCE '88), October 1988. Consumer: Frantz, G.A., J.B. Reimer, R.A. Wotiz, "Julie, Application Product." Speech Tech Magazine, USA, September 1988. Reimer, J.B., G.A. Frantz, "Customization Integrated Circuit Customer Product." Transactions Consumer Electronics, USA, August 1988. Reimer, J.B., P.E. Nixon, E.B. Boles, G.A. Frantz, "Audio Customization IC." Digest Technical Papers 1988 International Conference Consumer Electronics, June 8-10 1988. Medical: Knapp Townshend, Real-Time Digital Signal Processing System Auditory Prosthesis." Proceedings ICASSP USA, Volume page 2493, April 1988. Morris, L.R., P.B. Barszczewski, "Design Evolution PocketSized Speech Processing System Cochlear Implant Other Hearing Prosthesis Applications." Proceedings ICASSP USA, Volume page 2516, April 1988. xiii Development Support: Mersereau, Schafer, Barnwell, Smith, Digital Filter Design Package TMS320." MIDCON/84 Electronic Show Convention, USA, 1984. Simar, Jr., Davis, "The Application High-Level Languages Single-Chip Digital Signal Processors." Proceedings ICASSP USA, Volume pages 1678-1681, April 1988. Need Assistance. want Request more information about Texas Instruments Digital Signal Processing (DSP) products this. Write Texas Instruments Incorporated Market Communications Manager P.O. 1443 Houston, Texas 77251-1443 Call Literature Response Center: (800) 477-8924 Contact hotline: Phone: (713) 274-2320 FAX: (713) 274-2324 Electronic Mail: 4389750@mcimail.com. Call BBS: (713) 274-2323 from: ftp.ti.com user /mirrors/tms320bbs Visit online, including TI&ME your customized page. Order Texas Instruments documentation questions about product operation report suspected problems Obtain source code this user's guide. Point your browser http://www.ti.com Send electronic mail comments@books.sc.ti.com Send printed comments Texas Instruments Incorporated Technical Publications Mgr., P.O. 1443 Houston, Texas 77251-1443 Report mistakes make comments about this other documentation. Trademarks registered trademark Microsoft Corp. MS-Windows registered trademark Microsoft Corp. MS-DOS registered trademark Microsoft Corp. OS/2 trademark International Business Machines Corp. SPARC trademarks Microsystems, Inc. trademarks Digital Equipment Corp. Contents Contents Processor Initialization Provides examples initializing processor. Reset Process Reset Signal Generation Multiprocessing System Reset Considerations Initialize Processor Program Control Provides examples initializing processor discusses program control features. Subroutines 2.1.1 Regular Subroutine Calls 2.1.2 Zero-Overhead Subroutine Calls Stacks Queues 2.2.1 System Stacks 2.2.2 User Stacks 2.2.3 Queues Double-Ended Queues Interrupt Examples 2-11 2.3.1 Correct Interrupt Programming 2-11 2.3.2 Software Polling Interrupts 2-11 2.3.3 Using Interrupt Services 2-12 2.3.4 Nesting Interrupts 2-13 Context Switching Interrupts Subroutines 2-14 Repeat Modes 2-18 2.5.1 Block Repeat 2-18 2.5.2 Delayed Block Repeat 2-19 2.5.3 Single-Instruction Repeat 2-20 Computed GOTOs Select Subroutines Runtime 2-21 xvii Contents Logical Arithmetic Operations Provides examples performing logical arithmetic operations. Manipulation Block Moves Byte Half-Word Manipulation Bit-Reversed Addressing 3.4.1 Bit-Reversed Addressing 3.4.2 Bit-Reversed Addressing Integer Floating-Point Division 3.5.1 Integer Division 3.5.2 Computation Floating-Point Inverse Division 3-12 Calculating Square Root 3-15 Extended-Precision Arithmetic 3-17 Floating-Point Format Conversion: IEEE to/From 'C4x 3-19 Memory Interfacing Provides examples TMS320C4x System Configuration, Memory Interfaces, Reset. System Configuration External Interfacing Global Local Interfaces Zero Wait-State Interfacing RAMs 4.4.1 Consecutive Reads Followed Write Interface Timing 4.4.2 Consecutive Writes Followed Read Interface Timing 4.4.3 Interface Using Local Strobe 4.4.4 Interface Using Both Local Strobes Wait States Ready Generation 4-11 4.5.1 ORing Ready Signals (STRBx 4-12 4.5.2 ANDing Ready Signals (STRBx 4-12 4.5.3 External Ready Generation 4-13 4.5.4 Ready Control Logic 4-14 4.5.5 Example Circuit 4-15 4.5.6 Page Switching Techniques 4-18 Parallel Processing Through Shared Memory 4-21 4.6.1 Shared Global-Memory Interface 4-21 4.6.2 Shared-Memory Interface Design Example 4-22 Programming Tips Provides hints writing more efficient assembly-language code. Hints Optimizing Code Hints Optimizing Assembly-Language Code xviii Contents Applications-Oriented Operations Describes common algorithms provides code implementing them. Companding FIR, IIR, Adaptive Filters 6.2.1 Filters 6.2.2 Filters 6.2.3 Adaptive Filters (LMS Algorithm) 6-13 Lattice Filters 6-17 Matrix-Vector Multiplication 6-21 Fast Fourier Transforms (FFTs) 6-24 6.5.1 Complex Radix-2 6-26 6.5.2 Complex Radix-4 6-33 6.5.3 Faster Complex Radix-2 6-41 6.5.4 Real Radix-2 6-56 'C4x Benchmarks 6-86 Programming Coprocessor Provides examples programming TMS320C4x's on-chip peripherals. Hints Programming When Channel Finishes Transfer Assembly Programming Examples C-Programming Examples Using Communication Ports Describes interface with TMS320C4x communication ports. Communication Ports Signal Considerations Interfacing With Non-'C4x Device Terminating Unused Communication Ports Design Tips Commport Host Interface 8-10 8.6.1 Simplified Hardware Interface 'C40 3.3, 'C44 devices 8-10 8.6.2 Improved Drive Sense Amplifiers 8-12 8.6.3 Circuit Works 8-13 8.6.4 Interface Software 8-13 Coprocessor-'C4x Interface 8-14 Implementing Token Forcer 8-15 Implementing CSTRB Shortener Circuit 8-17 Parallel Processing Through Communication Ports 8-18 Broadcasting Messages From 'C4x Many 'C4x Devices 8-20 8.10 8.11 Contents Contents 'C4x Power Dissipation Explains current consumption .the 'C4x also provides information about current consumption components. Capacitive Resistive Loading Basic Current Consumption 9.2.1 Current Components 9.2.2 Current Dependency 9.2.3 Algorithm Partitioning 9.2.4 Test Setup Description Current Requirement Internal Components 9.3.1 Quiescent 9.3.2 Internal Operations 9.3.3 Internal Operations Current Requirement Output Driver Components 9-12 9.4.1 Local Global 9-13 9.4.2 9-16 9.4.3 Communication Port 9-16 9.4.4 Data Dependency 9-17 9.4.5 Capacitive Loading Dependence 9-19 Calculation Total Supply Current 9-20 9.5.1 Combining Supply Current Components 9-20 9.5.2 Supply Voltage, Operating Frequency, Temperature Dependencies 9-21 9.5.3 Design Equation 9-22 9.5.4 Average Current 9-23 9.5.5 Thermal Management Considerations 9-23 Example Supply Current Calculations 9-27 9.6.1 Processing 9-27 9.6.2 Data Output 9-27 9.6.3 Average Current 9-28 9.6.4 Experimental Results 9-28 Design Considerations 9-29 9.7.1 System Clock Signal Switching Rates 9-29 9.7.2 Capacitive Loading Signals 9-30 9.7.3 Component Signal Loading 9-30 Development Support Part Order Information 10-1 Describes 'C4x support available from third-part vendors. 10.1 Development Support 10.1.1 Third-Party Support 10.1.2 Hotline 10.1.3 Bulletin Board Service (BBS) 10.1.4 Internet Services 10.1.5 Technical Training Organization (TTO) TMS320 Workshops 10-2 10-3 10-3 10-4 10-4 10-5 Contents 10.2 10.3 Sockets 10-6 10.2.1 Tool-Activated Socket (TAZ) 10-7 10.2.2 Handle-Activated Socket (HAZ) 10-8 Part Order Information 10-9 10.3.1 Nomenclature 10-9 10.3.2 Device Development Support Tools 10-10 XDS510 Emulator Design Considerations 11-1 Describes JTAG emulator cable. Tells construct 14-pin connector your target system connect target system emulator. 11.1 11.2 11.3 11.4 11.5 11.6 11.7 Designing Your Target System's Emulator Connector (14-Pin Header) 11-2 Protocol 11-3 IEEE 1149.1 Standard 11-3 JTAG Emulator Cable Logic 11-4 JTAG Emulator Cable Signal Timing 11-5 Emulation Timing Calculations 11-6 Connections Between Emulator Target System 11-8 11.7.1 Buffering Signals 11-8 11.7.2 Using Target-System Clock 11-10 11.7.3 Configuring Multiple Processors 11-11 Mechanical Dimensions 14-Pin Emulator Connector 11-12 Emulation Design Considerations 11-14 11.9.1 Using Scan Path Linkers 11-14 11.9.2 Emulation Timing Calculations 11-16 11.9.3 Using Emulation Pins 11-18 11.9.4 Performing Diagnostic Applications 11-23 11.8 11.9 Glossary Contents Figures Figures 4-10 xxii Reset Circuit Voltage RESET System Stack Configuration Implementations High-to-Low Memory Stacks Implementations Low-to-High Memory Stacks Bit-Reversed Addressing Possible System Configurations External Interfaces Consecutive Reads Followed Write Consecutive Writes Followed Read 'C4x Interface Eight Zero-Wait-State SRAM 'C4x Interface Zero-Wait-State SRAMs, Strobes 4-10 Logic Generation Wait States Multiple Devices 4-14 Page Switching CY7B185 4-19 Timing Read Operations Using Bank Switching 4-20 'C4x Shared/Distributed-Memory Networks 4-21 Data Memory Organization Filter Data Memory Organization Single Biquad Data Memory Organization Biquads 6-11 Structure Inverse Lattice Filter 6-17 Data Memory Organization Inverse Lattice Filters 6-18 Structure Forward Lattice Filter 6-19 Data Memory Organization Matrix-Vector Multiplication 6-21 Impedance Matching 'C4x Communication-Port Design Better Commport Signal Splitter 8-11 Improved Interface Circuit 8-12 'C32 'C4x Interface 8-14 Token Forcer Circuit (Output) 8-15 Communication-Port Driver Circuit (Input) 8-16 CSTRB Shortener Circuit 8-17 'C4x Parallel Connectivity Networks 8-18 Message Broadcasting 'C4x Many 'C4x Devices 8-21 Test Setup Internal Quiescent Current Components Internal Current Versus Transfer Rate Internal Current Versus Data Complexity Derating Curve 9-10 Figures Local/ Global Current Versus Transfer Rate Wait States 9-14 Local/ Global Current Versus Transfer Rate Zero Wait States 9-15 Current Versus Clock Rate 9-16 Communication Port Current Versus Clock Rate 9-17 Local/ Global Current Versus Data Complexity 9-18 Current Versus Output Load Capacitance MHz) 9-19 Current Versus Frequency Supply Voltage 9-21 Change Operating Temperature (5C) 9-22 Load Currents 9-25 Tool-Activated Socket 10-7 Handle-Activated Socket 10-8 Device Nomenclature 10-10 14-Pin Header Signals Header Dimensions 11-2 JTAG Emulator Cable Interface 11-4 JTAG Emulator Cable Timings 11-5 Target-System-Generated Test Clock 11-10 Multiprocessor Connections 11-11 Pod/Connector Dimensions 11-12 14-Pin Connector Dimensions 11-13 Connecting Secondary JTAG Scan Path 11-15 EMU0/1 Configuration 11-19 Suggested Timings EMU0 EMU1 Signals 11-21 EMU0/1 Configuration With Additional Gate Meet Timing Requirements 11-21 11-12 EMU0/1 Configuration Without Global Stop 11-22 11-13 Emulation Connections JTAG Scan Paths 11-23 9-10 9-11 9-12 9-13 10-1 10-2 10-3 11-1 11-2 11-3 11-4 11-5 11-6 11-7 11-8 11-9 11-10 11-11 Contents xxiii Tables Tables 10-1 10-2 10-3 10-4 11-1 11-2 RESET Vector Locations 'C40 'C44 Local/Global Control Signals Page Switching Interface Timing 4-20 'C4x Application Benchmarks 6-86 Timing Benchmarks (Cycles) 6-87 Wait State Timing Table 9-15 Current Equation Typical Values (FCLK MHz) 9-23 Sockets that Accept 325-pin 'C40 304-pin 'C44 10-6 Manufacturer Phone Numbers 10-6 Device Part Numbers 10-11 Development Support Tools Part Numbers 10-12 14-Pin Header Signal Descriptions 11-2 Emulator Cable Timing Parameters 11-5 xxiv Examples Examples 2-10 3-10 3-11 3-12 3-13 3-14 Processor Initialization Example Linker Command File Linking Previous Example Enabling Cache Regular Subroutine Call (Dot Product) Zero-Overhead Subroutine Call (Dot Product) Interrupts Software Polling 2-11 Interrupt Signal Different Services 2-12 Interrupt Service Routine 2-13 Context Save Context Restore 2-15 Block Repeat Find Maximum Minimum 2-18 Loop Using Delayed Block Repeat 2-19 Loop Using Single Repeat 2-20 Computed GOTO 2-21 TSTB Software-Controlled Interrupt Copy from Location Another Block Move Under Program Control Packing Data From Half-Word FIFO 32-Bit Data Memory Unpacking 32-Bit Data Into Four-Byte-Wide Data Array Bit-Reversed Addressing Integer Division 3-11 Inverse Floating-Point Number With 32-Bit Mantissa Accuracy 3-14 Reciprocal Square Root Positive Floating Point 3-16 64-Bit Addition 3-17 64-Bit Subtraction 3-18 32-Bit 32-Bit Multiplication 3-18 IEEE 'C4x Conversion Within Block Memory Transfer 3-21 'C4x IEEE Conversion Within Block Memory Transfer 3-21 Equations Ready Generation 4-16 Exchanging Objects Memory Optimizing Loop Allocating Large Array Objects m-Law Compression m-Law Expansion A-Law Compression A-Law Expansion Filter Contents Examples 6-10 6-11 6-12 6-13 6-14 6-15 6-16 6-17 6-18 7-10 7-11 7-12 Filter (One Biquad) 6-10 Filter Biquads) 6-12 Adaptive Filter (LMS Algorithm) 6-15 Inverse Lattice Filter 6-18 Lattice Filter 6-20 Matrix Times Vector Multiplication 6-22 Complex Radix-2 6-27 Table With Twiddle Factors 64-Point 6-32 Complex Radix-4 6-34 Faster Version Complex Radix-2 6-42 Bit-Reversed Sine Table 6-55 Real Forward Radix-2 6-56 Real Inverse Radix-2 6-73 Array initialization With Transfer With Communication-Port ICRDY Synchronization Split-Mode Transfer With External-Interrupt Synchronization Autoinitialization With Communication Port ICRDY Single-Interrupt-Driven Transfer Unified-Mode Using Read Sync 7-10 Unified-Mode Using Autoinitialization (Method 7-11 Unified-Mode Using Autoinitialization (Method 7-12 Split-Mode Auxiliary Using Read Sync 7-13 Split-Mode Auxiliary Primary Channel 7-14 Split-Mode Using Autoinitialization 7-15 Include File Examples (dma.h) 7-17 Read Data from Communication Port With ICFULL Interrupt Write Data Communication Port With Polling Method xxvi Chapter Processor Initialization Before execute algorithm, necessary initialize processor. Initialization brings processor known state. Generally, initialization takes place time after processor reset. This chapter reviews concepts explained user's guide provides examples. Topic Page Reset Process Reset Signal Generation Multiprocessing System Reset Considerations Initialize Processor Chapter Title-Attribute Reference Reset Process Reset Process After RESET applied, 'C4x jumps address stored reset vector location starts execution from that point. order reset 'C4x correctly, need comply with several hardware software requirements: Select reset vector location: RESET vector 'C4x mapped four different locations that controlled value RESETLOC(1,0) pins RESET. Table shows possible reset vectors 'C40 'C44. microcomputer mode (ROMEN =1), RESETLOC(1,0) must equal boot loader operate correctly. microcomputer mode, IIOFx pins discussed bootloader chapter TMS320C4x User's Guide that bootloader works properly. Provide correct reset vector value: RESET vector normally contains address system initialization routine. microcomputer mode reset vector initialized automatically processor point beginning on-chip boot loader code. user action required. microprocessor mode, reset vector typically stored EPROM. Example shows initialize that vector. Apply level RESET input. (See section 1.2). Table 1-1. RESET Vector Locations 'C40 'C44 Value RESETLOCx RESETLOC1 RESETLOC0 Reset Vector From Memory Address 00000 0000 07FFF FFFF 08000 0000 0FFFF FFFF Local Local Global Global This corresponds 32-bit address that processor accesses. However, 'C44 only 24-LSBs reset address driven pins A0-A23 pins LA0-LA23. corresponding LSTRBx pins also activated. Reset Signal Generation Reset Signal Generation Several aspects 'C4x system hardware design critical overall system operation. such aspect reset signal generation. reset input controls initialization internal 'C4x logic execution system initialization software. proper system initialization, RESET signal must applied least cycles, that 'C4x operating MHz. Upon power however, take more before system oscillator reaches stable operating state. Therefore, power-up reset circuit should generate pulse RESET Once proper reset pulse been applied, processor fetches reset vector from location zero, which contains address system initialization routine. Figure shows circuit that will generate appropriate power-up push-button reset signal. Figure 1-1. Reset Circuit TMS320C4x Reset 74ALS34 voltage RESET controlled R1C1 network. After reset, this voltage rises exponentially according time constant R1C1, shown Figure 1-2. Figure 1-1, 74ALS34 provides clean RESET signal 'C4x. Processor Initialization Reset Signal Generation Figure 1-2. Voltage RESET Voltage Time duration pulse RESET approximately which time takes capacitor charged This approximately voltage which reset input switches from logic logic capacitor voltage expressed where R1C1 reset circuit time constant. Solving results Setting following: results Therefore, reset circuit Figure provides pulse long enough time ensure stabilization system oscillator upon powerup. Note: Reset does have internal Schmidt hysteresis. ensure proper reset operation, avoid rise fall times. Rise/fall time should exceed CLKIN cycle. Multiprocessing System Reset Considerations Multiprocessing System Reset Considerations synchronization multiple 'C4x DSPs required, processors should provided with same input clock same reset signal. After powerup, when clock stabilized, RESET high H1/H3 cycles then synchronize their H1/H3 clock phases. Following falling edge, RESET should remain least cycles then driven high. circuit Figure used RESET generation. Pullup resistors recommended each connection avoid unintended triggering after reset when RESET going received 'C4x devices same time. recommended that power system with RESET low. This prevents 'C4x asynchronous signals from driving unknown values before RESET goes low, which could create contention communication-port pins, resulting damage device. Processor Initialization Initialize Processor Initialize Processor After reset, jumps address stored reset vector location starts execution from that point. RESET vector normally contains address system initialization routine. initialization routine should typically perform several tasks: register. stack pointer. interrupt vector table. trap vector table. memory control register. Clear/enable cache. Note: When running under microcomputer mode (ROMEN address stored reset vector location points beginning bootloader code. on-chip bootloader automatically initializes memory-control register values from bootloader table following examples illustrate initialize 'C4x when using assembly language when using Processor initialization under assembly language running under assembly-only environment, Example provides basic initialization routine. This example shows code initializing 'C4x following machine state: Timer interrupt enabled. Trap initialized. program cache enabled. initialized point .text section. stack pointer initialized beginning mystack section. memory control registers initialized. 'C4x initialized microcontroller mode with reset vector located address 08000 0000h (RESETLOC(1,0)=1,0). program already been loaded into memory location address 0x4000 0000. need allocate section addresses using linker command file (see TMS320 Floating-Point Assembly Language Tools User's Guide book more information about linker command files) shown Example 1-2. Initialize Processor Example 1-1.Processor Initialization Example Create Reset Vector .sect "rst_sect" ;Named section RESET vector reset .word init ;RESET vector Create Interrupt Vector Table _myvect .sect "myvect" ;Named section int. vectors .space ;Reserved space .word tint0 ;Timer address Create Trap Vector Table _mytrap .sect "mytrap" named section trap vectors .word trap0 ;Trap subroutine address Create Stack _mystack .usect "mystack",500 reserve locations stack .text stacka .word _mystack address mystack section ivta .word _myvect address myvect section tvta .word _mytrap address mytrap section ieval .word register value gctrl .word ???????? target board specific lctrl .word ???????? target board specific mctrla .word 100000h address global memory control register init: Initialize Register stacka Expansion Register IVTP LDPE @ivta,AR0 AR0,IVTP Expansion Register TVTP LDPE @tvta,AR0 AR0,TVTP Processor Initialization Initialize Processor Example 1-1. Processor Initialization Example (Continued) Initialize global memory interface control @mctrla,ar0 @gctrl,R0 R0,*AR0 Initialize local memory interface control @lctrl,R0 R0,*+AR0(4) Initialize Stack Pointer @stacka,SP Enable timer interrupt This equivalent 1,iie @ieval,IIE Clear/Enable Cache Enable Global Interrupts 3800H,ST Global interrupt enable BEGIN Branch beginning application begin this your application code> trap0 this your trap0 trap code> reti tint0 this your tint0 interrupt service routine> reti .end Initialize Processor Example 1-2.Linker Command File Linking Previous Example MEMORY EPROM: 0x80000000 0x10 RAM: 0x40000000 0x100 EPROM reset vector location extend SPECIFY SECTIONS ALLOCATION INTO MEMORY SECTIONS rst_sect: EPROM myvect: mystack: .text: mytrap: Processor initialization under language running under environment, your initialization routine typically boot.asm (from RTS40.LIB library that comes with floating-point compiler). addition initializing global variables, boot.asm initializes register (pointing .bss section) register (pointing .stack section). need enable cache, shown Example 1-3, setup your interrupts inside your main routine before enable interrupts. Application Report, Setting TMS320 Interrupts (SPRA036), more information. Example 1-3.Enabling Cache main() asm(" 1800,st") enable cache asm(" 3800,st") enable cache interrupts Processor Initialization 1-10 Chapter Program Control Several 'C4x instructions provide program control facilitate high-speed processing. These instructions directly handle: Regular zero-overhead subroutine calls Software stack Interrupts Delayed branches Single- multiple-instruction loops without overhead Topic Page Subroutines Stacks Queues Interrupt Examples 2-11 Context Switching Interrupts Subroutines 2-14 Repeat Modes 2-18 Computed GOTOs Select Subroutines Runtime 2-21 Chapter Title-Attribute Reference Subroutines Subroutines 'C4x provides ways invoke subroutine calls: regular calls zerooverhead calls. regular zero-overhead subroutine calls software stack extended-precision register R11, respectively, save return address. following subsections example programs explain this works. 2.1.1 Regular Subroutine Calls 'C4x 32-bit program counter (PC) virtually unlimited software stack. CALL CALLcond subroutine calls increment stack pointer store contents next value counter stack. subroutine, RETScond performs conditional return. Example illustrates subroutine determine product vectors. Given vectors length represented arrays a[0], a[1], a[N-1] b[0], b[1],., b[N-1], product computed from expression a[0] b[0] a[1] b[1] a[N-1] b[N-1] Processing proceeds main routine point where product computed. assumed that arguments subroutine have been appropriately initialized. this point, CALL made subroutine, transferring control that section program memory execution, then returning calling routine RETS instruction when execution completed. Note that this particular example, would suffice save register However, larger number registers saved demonstration purposes. saved registers stored system stack, which should large enough accommodate maximum anticipated storage requirements. Other methods saving registers could used equally well. Subroutines Example 2-1.Regular Subroutine Call (Dot Product) TITLE REGULAR SUBROUTINE CALL (DOT PRODUCT) MAIN ROUTINE THAT CALLS SUBROUTINE `DOT' COMPUTE PRODUCT VECTORS. @blk0,AR0 ;AR0 points vector @blk1,AR1 ;AR1 points vector contains number elements N,RC CALL *SUBROUTINE *EQUATION: a(0) b(0) a(1) b(1) a(N-1) b(N-1) *THE PRODUCT PLACED REGISTER MUST GREATER THAN EQUAL ARGUMENT ASSIGNMENTS: ARGUMENT FUNCTION -------------- +------------------------- ADDRESS a(0) ADDRESS b(0) LENGTH VECTORS REGISTERS USED INPUT: AR0, AR1, REGISTER MODIFIED: REGISTER CONTAINING RESULT: .global PUSH ;Save status register PUSH ;Use stack save R2's PUSHF ;bottom bits PUSH ;Save PUSH ;Save PUSH ;Save PUSH PUSH Initialize MPYF3 *AR0,*AR1,R0 ;a(0) b(0) Program Control Subroutines Example 2-1.Regular Subroutine Call (Dot Product) (Continued) SUBF SUBI R2,R2,R2 2,RC ;Initialize ;Set PRODUCT RPTS MPYF3 *++AR0(1),*++AR1(1),R0 ADDF3 R0,R2,R2 ADDF3 R0,R2,R0 RETURN SEQUENCE POPF RETS .end Setup repeat single. a(i) b(i) a(i-1)*b(i-1) a(N-1)*b(N-1) ;Restore ;Restore ;Restore ;Restore ;Restore ;Restore ;Return bits bottom bits 2.1.2 Zero-Overhead Subroutine Calls instructions, link jump (LAJ) link jump conditional (LAJcond), implement zero-overhead subroutine calls implemented 'C4x. Unlike CALL CALLcond, which value into software stack, LAJcond value into extended-precision register R11. Three instructions following LAJcond executed before going subroutine. restriction that applies these three instructions same that three instructions following delayed branch. subroutine, delayed branch conditional, Bcond register addressing mode with source, perform zero-overhead subroutine return. comparison, same product example with zero-overhead subroutine call given following example program. Subroutines Example 2-2.Zero-Overhead Subroutine Call (Dot Product) TITLE ZERO-OVERHEAD SUBROUTINE CALL (DOT PRODUCT) MAIN ROUTINE THAT CALLS SUBROUTINE `DOT' COMPUTE PRODUCT VECTORS. @blk0,AR0 @blk1,AR1 N,RC points vector points vector contains number elements *SUBROUTINE *EQUATION: a(0) b(0) a(1) b(1) a(N-1) b(N-1) PRODUCT PLACED REGISTER MUST GREATER THAN EQUAL ARGUMENT ASSIGNMENTS: ARGUMENT FUNCTION --------------- +------------------------- ADDRESS a(0) ADDRESS b(0) LENGTH VECTORS .global PUSH PUSH PUSHF PUSH PUSH PUSH PUSH PUSH ;Save status register ;Use stack save R2's ;bottom bits ;Save ;Save ;Save REGISTERS USED INPUT: AR0, AR1, REGISTER MODIFIED: REGISTER CONTAINING RESULT: Program Control Subroutines Example 2-2.Zero-Overhead Subroutine Call (Dot Product) (Continued) Initialize MPYF3 *AR0,*AR1,R0 SUBF R2,R2,R2 SUBI 2,RC PRODUCT RPTS MPYF3 ADDF3 ADDF3 Setup repeat single *++AR0(1),*++AR1(1),R0; a(i) b(i) R0,R2,R2 a(i-1)*b(i-1) R0,R2,R0 a(N-1)*b(N-1) ;a(0) b(0) ;Initialize ;Set RETURN SEQUENCE POPF ;Restore ;Restore ;Restore ;Return ;Restore ;Restore ;Restore bits bottom bits .end Stacks Queues Stacks Queues 'C4x provides dedicated stack pointer (SP) building stacks memory. Also, auxiliary registers used build user stacks variety more general linear lists. This section discusses implementation following types linear lists: Stack linear list which insertions deletions made list. linear list which insertions made list, deletions made other end. double-ended queue linear list which insertions deletions made either list. Queue Dequeue 2.2.1 System Stacks stack 'C4x fills from low-memory address high-memory address, shown Figure 2-1. system stack stores addresses data during subroutine calls, traps, interrupts. stack pointer (SP) 32-bit register that contains address system stack. always points last element pushed onto stack. push performs preincrement, performs postdecrement Provisions should made accommodate your software's anticipated storage requirements. stack pointer (SP) read from well written multiple stacks created updating initialized hardware during reset; important remember initialize value that points predetermined memory location. Example page 1-7, shows initialize must initialize stack valid free memory space. Otherwise, stack could corrupt data program memory. program counter pushed onto system stack subroutine calls, traps, interrupts. popped from system stack returns. PUSH, POP, PUSHF, POPF instructions push system stack. stack used inside subroutines place temporary storage registers, case shown Example 2-1, page 2-3. Program Control Stacks Queues instructions, PUSHF POPF, floating-point numbers. These instructions push floating-point numbers registers R11. This feature very useful saving extended-precision registers (see Example Example 2-2). PUSH saves lower bits extendedprecision register, PUSHF saves upper bits. recover this extended-precision number, execute POPF followed POP. important perform integer floating-point PUSH above order, since POPF forces last eight bits extended-precision registers zero. Figure 2-1. System Stack Configuration Memory Bottom stack stack (Free) High Memory 2.2.2 User Stacks User stacks built store data from low-to-high memory from high-tolow memory. cases each type stack shown. build stacks using preincrement/decrement postincrement/decrement modes modifying auxiliary registers (AR). implement stack growth from high memory ways: Case Store memory using push data onto stack, read from memory using *ARn data stack. Case Store memory using *ARn push data onto stack, read from memory using data stack. Figure illustrates these cases. only difference that case always points stack, case always points next free location stack. Stacks Queues Figure 2-2. Implementations High-to-Low Memory Stacks Memory (Free) stack Memory (Free) stack Bottom stack High Memory Bottom stack High Memory Case Case implement stack growth from high memory ways: Case Store memory using push data onto stack, read from memory using *ARn data stack. Case Store memory using *ARn push data onto stack, read from memory using data stack. Figure shows these cases. case always points stack. case always points next free location stack. Figure 2-3. Implementations Low-to-High Memory Stacks Memory Bottom stack stack (Free) High Memory Memory Bottom stack stack (Free) High Memory Case Case 2.2.3 Queues Double-Ended Queues implementations queues double-ended queues based upon manipulation auxiliary registers user stacks. Program Control Stacks Queues queues, auxiliary registers used: mark front queue from which data popped other mark rear queue where data pushed. double-ended queues, auxiliary registers also necessary. register marks double-ended queue, other register marks other end. Data popped from pushed onto either end. 2-10 Interrupt Examples Interrupt Examples When using interrupts, must consider several issues. This section offers examples several interrupt-related topics: 2.3.1 Interrupt Service Routines Context Switching Interrupt-Vector Table (IVTP) Interrupt Priorities Correct Interrupt Programming interrupts work properly need execute following sequence steps, shown Example 1-1: interrupt-vector table 512-word boundary. Initialize IVTP register. Create software stack. Enable specific interrupt. Enable global interrupts. Generate interrupt signal. 2.3.2 Software Polling Interrupts interrupt flag register polled, action taken, depending whether interrupt occurred. This true even when maskable interrupts disabled.This useful when interrupt-driven interface implemented. Example shows case which subroutine called when external interrupt occurred. Example 2-3.Use Interrupts Software Polling TITLE INTERRUPT POLLING TSTB 40H,IIF ;Test interrupt occurred CALLZ SUBROUTINE not, call subroutine When interrupt processing begins, program counter pushed onto stack, interrupt vector loaded program counter. Interrupts disabled when cleared program continues from address loaded program counter. Because maskable interrupts disabled, interrupt processing proceed without further interruption unless interrupt service routine re-enables interrupts, occurs. Program Control 2-11 Interrupt Examples 2.3.3 Using Interrupt Services IVTP changed point alternate interrupt-vector tables. This relocatable feature table allows single interrupt signal more than service. Example 2-4, IVTP reset external INT0 interrupt service routines EINT0A EINT0B. After value IVTP changed, goes different interrupt service routine when same interrupt signal reoccurs. Example 2-4.Use Interrupt Signal Different Services TITLE INTERRUPT SIGNAL DIFFERENT SERVICES THIS EXAMPLE, ADDRESS EINT0A EINT0B MEMORY LOCATION 1003H, RESPECTIVELY. ASSUME IVTP BEEN CHANGED AFTER DEVICE RESET EXTERNAL INTERRUPT IIOF0 ENABLED. WHEN FIRST IIOF0 INTERRUPT SIGNAL COMES EINT0A ROUTINE WILL EXECUTED THEN NEXT IIOF0 INTERRUPT SIGNAL OCCURS, EINT0B ROUTINE WILL EXECUTED, EINT0A EINT0B ROUTINES WILL TAKE TURNS EXECUTED WHEN IIOF0 INTERRUPT SIGNAL OCCURS. External IIOF0 interrupt service routine .global EINT0A 1000H,R0 LDPE R0,IVTP RETI External IIOF0 interrupt service routine .global EINT0B 0,R0 LDPE R0,IVTP RETI EINT0A: ;Change IVTP point 1000H ;Return enable interrupts EINT0B: ;Change IVTP point ;Return enable interrupts 2-12 Interrupt Examples 2.3.4 Nesting Interrupts Example 2-5, interrupt service routine INT2 temporarily modifies interrupt enable register (IIE) interrupt flag register (IIF) permit interrupt processing when interrupt INT0 (but other interrupt) occurs. When routine finishes processing, register restored original state. Notice that RETIcond instruction only pops next program counter address from stack, also restores bits from PGIE bits. This re-enables interrupts that were enabled before INT2 interrupt serviced. Example 2-5.Interrupt Service Routine TITLE INTERRUPT SERVICE ROUTINE .global ISR2 ENABLE .set 2000h MASK .set INTERRUPT PROCESSING EXTERNAL INTERRUPT INT2- ISR2: PUSH ;Save status register PUSH ;Save data page pointer PUSH ;Save interrupt enable register PUSH PUSH ;Save lower bits PUSHF ;upper bits PUSH ;Save lower bits PUSHF ;upper bits 0,IIE ;Unmask internal interrupts MASK, ;Enable INT2 ENABLE,ST ;Enable interrupts MAIN PROCESSING SECTION ISR2 POPF POPF RETI ;Return enable interrupts ENABLE,ST ;Disable interrupts ;Restore upper bits ;lower bits ;Restore upper bits ;lower bits ;Restore interrupt enable register ;Restore data page register ;Restore status register Program Control 2-13 Context Switching Interrupts Subroutines Context Switching Interrupts Subroutines Context switching commonly required when subroutine call interrupt processed. extensive simple, depending system requirements. 'C4x, program counter automatically pushed onto stack. Important information other 'C4x registers, such status, auxiliary, extended-precision registers, must saved stack with PUSH/PUSHF recovered later with POP/POPF instructions. need preserve only registers that modified inside your subroutine interrupt/trap service routine that could potentially affect previous context environment. Note: status register should saved first restored last preserve processor status without further change caused other context-switching instructions. previous context environment then your program must perform tasks: program subroutine, must preserve dedicated registers: Save integers Save floating-point (small model only) (`C4x only) program interrupt service routine, must preserve 'C4x registers, Example shows. previous context environment assembly language, need determine which registers must save based operations your assembly-language code. 2-14 Context Switching Interrupts Subroutines Example 2-6.Context Save Context Restore .global ISR1 TOTAL CONTEXT SAVE INTERRUPT. ISR1: PUSH ;Save status register SAVE EXTENDED PRECISION REGISTERS PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits PUSH ;Save lower bits PUSHF ;and upper bits SAVE AUXILIARY REGISTERS PUSH ;Save PUSH ;Save PUSH ;Save PUSH ;Save PUSH ;Save PUSH ;Save PUSH ;Save PUSH ;Save Program Control 2-15 Context Switching Interrupts Subroutines Example 2-6.Context Save Context Restore (Continued) SAVE REST REGISTERS FROM REGISTER FILE PUSH PUSH PUSH PUSH PUSH PUSH PUSH PUSH PUSH PUSH ;Save ;Save ;Save ;Save ;Save ;Save ;Save ;Save data page pointer index register index register block-size register interrupt enable register interrupt flag register interrupt enable register repeat start address ;Save repeat address ;Save repeat counter SAVE COMPLETE YOUR INTERRUPT SERVICE ROUTINE CODE GOES HERE* .global RESTR CONTEXT RESTORE SUBROUTINE CALL INTERRUPT. RESTR: RESTORE REST REGISTERS FROM REGISTER FILE ;Restore repeat counter ;Restore repeat address ;Restore repeat start address ;Restore interrupt enable register ;Restore interrupt flag register ;Restore interrupt enable register ;Restore block-size register ;Restore index register ;Restore index register ;Restore data page pointer RESTORE AUXILIARY REGISTERS ;Restore ;Restore ;Restore ;Restore ;Restore ;Restore ;Restore ;Restore 2-16 Context Switching Interrupts Subroutines Example 2-6.Context Save Context Restore (Continued) RESTORE EXTENDED PRECISION REGISTERS POPF POPF POPF POPF POPF POPF POPF POPF POPF POPF POPF POPF ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore upper bits ;the lower bits ;Restore status register RESTORE COMPLETE RETI Program Control 2-17 Repeat Modes Repeat Modes RPTB, RPTBD, RPTS instructions support looping without overhead. Loop execution parameters specified three registers, seen following examples: (Repeat start address) (Repeat address) (Repeat counter) principle, possible nest repeat blocks. However, there only control registers: therefore, necessary save these registers before entering inside loop restore these registers after completing inside loop. takes four cycles overhead save restore these registers. Hence, sometimes more economical implement nested loop more traditional method using register counter then using delayed branch, rather than using nested repeat block approach. Often, implementing outer loop counter inner loop RPTB/RPTBD instruction produces fastest execution. 2.5.1 Block Repeat Example shows block repeat find maximum minimum value numbers. elements array either positive negative numbers. Because loop cannot predetermined, RPTBD instruction suitable here. Example 2-7.Use Block Repeat Find Maximum Minimum TITLE BLOCK REPEAT FIND MAXIMUM MINIMUM THIS ROUTINE FINDS MAXIMUM MINIMUM N=147 NUMBERS LOOP1 LOOP2 NEXT RPTB CMPF LDFLT RPTB CMPF LDFLT 146,RC @ADDR,AR0 *AR0++(1),R0 LOOP2 *AR0,R0 *AR0,R0 NEXT *AR0++(1),R0 *-AR0(1),R0 ;Initialize repeat counter 147-1 ;AR0 points beginning array ;Initialize first value negative array, find minimum ;Compare number maximum greater, this maximum ;Compare number minimum smaller, this minimum 2-18 Repeat Modes 2.5.2 Delayed Block Repeat Example shows application delayed block-repeat construct. this example, array elements flipped over exchanging elements that equidistant from array. other words, original array a(1), a(2),., a(31), a(32),., a(63), a(64); then final array after rearrangement a(64), a(63),., a(32), a(31),., a(2), a(1). Because exchange operation performed elements same time, requires operations. repeat counter (RC) initialized general, contains number loop executed times. example, loop begins fourth instruction following RPTBD instruction EXCH label). should initiated next three instructions following RPTBD. Example 2-8.Loop Using Delayed Block Repeat TITLE LOOP USING DELAYED BLOCK REPEAT THIS CODE SEGMENT EXCHANGES VALUES ARRAY ELEMENTS THAT SYMMETRIC AROUND MIDDLE ARRAY. RPTBD EXCH ADDI loop START EXCH @ADDR,AR0 AR0,AR1 63,AR1 ;Repeat times between ;START EXCH ;AR0 points beginning array ;AR1 points array 31,RC ;Initialize repeat counter starts here *AR0,R0 ;Load memory element *AR1,R1 ;and other R1,*AR0++(1) ;Then, exchange their locations R0,*AR1- -(1) Program Control 2-19 Repeat Modes 2.5.3 Single-Instruction Repeat Example shows application repeat-single construct. this example, products arrays computed. arrays necessarily different. arrays a(i) b(i), each length 512, register contains following quantity: a(1) b(1) a(2) b(2) a(N) b(N). value repeat counter (RC) specified instruction. Example 2-9.Loop Using Single Repeat TITLE LOOP USING SINGLE REPEAT MPYF3 RPTS MPYF3 ADDF3 ADDF *AR0++(1),*AR1++(1),R1 ;Compute next product R1,R2,R2 ;accumulate previous R1,R2 ;One final addition ;Repeat times *AR0++(1),*AR1++(1),R1 ;Compute first product 0.0,R2 ;Initialize @ADDR1,AR0 @ADDR2,AR1 ;AR0 points array a(i) ;AR1 points array b(i) 2-20 Computed GOTOs Select Subroutines Runtime Computed GOTOs Select Subroutines Runtime Occasionally, convenient select during runtime, during assembly, subroutine executed. 'C4x's computed GOTO supports this selection. implement computed GOTO using CALLcond instruction register addressing mode. This instruction uses contents register address call. Example 2-10 shows case task controller. Example 2-10. Computed GOTO TITLE COMPUTED GOTO TASK CONTROLLER THIS MAIN ROUTINE CONTROLS ORDER TASK EXECUTION TASKS PRESENT EXAMPLE). TASK0 THROUGH TASK5 NAMES SUBROUTINES CALLED. THEY EXECUTED ORDER, TASK0, TASK1, TASK5. WHEN INTERRUPT OCCURS, INTERRUPT SERVICE ROUTINE EXECUTED, PROCESSOR CONTINUES WITH INSTRUCTION FOLLOWING IDLE INSTRUCTION. THIS ROUTINE SELECTS APPROPRIATE TASK CURRENT CYCLE, CALLS TASK SUBROUTINE, BRANCHES BACK IDLE INSTRUCTION WAIT NEXT SAMPLE INTERRUPT WHEN SCHEDULED TASK COMPLETED EXECUTION. HOLDS OFFSET FROM BASE ADDRESS TASK EXECUTED. (SET COND BIT) STATUS REGISTER (ST) SHOULD WAIT IDLE ADDI SUBI LDILT CALLU TSKSEQ .word .word .word .word .word .word .word 5,IR0 @ADDR,AR1 ;Initialize ;AR1 holds base address table ;Wait next interrupt ;Add base address ;table entry number ;Decrement IR0<0, reinitialize ;Execute appropriate task *+AR1(IR0),R1 1,IR0 5,IR0 WAIT TASK5 TASK4 TASK3 TASK2 TASK1 TASK0 TSKSEQ ;Address ;Address ;Address ;Address ;Address ;Address TASK5 TASK4 TASK3 TASK2 TASK1 TASK0 ADDR Program Control 2-21 2-22 Chapter Logical Arithmetic Operations 'C4x instruction supports both integer floating-point arithmetic logical operations. basic functions such instructions combined form more complex operations. This chapter contains following operations examples: manipulation Block moves Byte half-word manipulation Bit-reversed addressing Integer floating-point division Square root Extended-precision arithmetic Floating-point format conversion between IEEE 'C4x formats Topic Page Manipulation Block Moves Byte Half-Word Manipulation Bit-Reversed Addressing Integer Floating-Point Division Calculating Square Root 3-15 Extended-Precision Arithmetic 3-17 Floating-Point Format Conversion: IEEE to/From 'C4x 3-19 Chapter Title-Attribute Reference Manipulation Manipulation Instructions logical operations, such AND, NOT, ANDN, XOR, used together with shift instructions manipulation. special instruction, TSTB, tests bits. TSTB does same operation AND, result TSTB used only condition flags written anywhere. Example Example demonstrate several instructions manipulation testing. Example 3-1.Use TSTB Software-Controlled Interrupt TITLE TSTB SOFTWARE-CONTROLLED INTERRUPT THIS EXAMPLE, INTERRUPTS HAVE BEEN DISABLED RESETTING STATUS REGISTER. WHEN INTERRUPT ARRIVES, STORED REGISTER. PRESENT EXAMPLE ACTIVATES INTERRUPT SERVICE ROUTINE INTR WHEN DETECTS THAT INT2- OCCURRED. TSTB 4,IIF Check set, CALLNZ INTR and, call subroutine INTR Example 3-2.Copy from Location Another TITLE COPY FROM LOCATION ANOTHER NEEDS COPIED POINTS LOCATION HOLDING ASSUMED THAT NEXT MEMORY LOCATION HOLDS VALUE TSTB ANDN CONT 1,R0 *AR0,R0 R1,R0 CONT 1,R0 *+AR0(1),R0 R0,R2 R0,R2 ;Shift align with ;Test I-th branch delayed ;Align with J-th location reset J-th J-th Block Moves Block Moves Because 'C4x directly addresses large amount memory, blocks data program code stored off-chip slow memories then loaded on-chip faster execution. Data also moved from on-chip memory off-chip memory storage multiprocessor data transfers. transfer data efficiently parallel with operations. Alternatively, load store instructions repeat mode perform data transfers under program control. Example shows transfer block floating-point numbers from external memory block on-chip RAM. Example 3-3.Block Move Under Program Control TITLE BLOCK MOVE UNDER PROGRAM CONTROL extern .word 01000H block1 .word 02FFC00H @extern,AR0 ;Source address @block1,AR1 ;Destination address *AR0++,R0 ;Load first number RPTS ;Repeat following instruction times *AR0++,R0 ;Load next number, and. R0,*AR1++ ;store previous R0,*AR1 ;Store last number Logical Arithmetic Operations Byte Half-Word Manipulation Byte Half-Word Manipulation instructions byte half-word accessibility, such LB(3,2,1,0), LBU(3,2,1,0), LH(1,0), LHU(1,0), LWL(0,1,2,3), LWR(0,1,2,3), MB(3,2,1,0), MH(1,0), available 'C4x. application such image processing, often important able manipulate packed data. example, pixels color images often represented four 8-bit unsigned quantities red, green, blue alpha which packed into single 32-bit word. byte half-word instruction makes very easy manipulate this packed data. Example shows packing data from half-word FIFO 32-bit data memory, Example shows unpacking 32-bit data array into 4-byte-wide data array (assuming 32-bit data array contains four 8-bit unsigned numbers). Example 3-4.Use Packing Data From Half-Word FIFO 32-Bit Data Memory TITLE PACKING DATA FROM HALF-WORD FIFO 32-BIT DATA MEMORY THIS EXAMPLE, EVERY INPUT BITS DATA BEEN PACKED INTO 32-BIT DATA MEMORY. LOOP SIZE USED HERE ARRAY SIZE, INPUT DATA LENGTH. ;Load array size ;Load fifo address ;Load data array address ;Loop starts here ;Pack LSBs ;Pack MSBs ;Store data size-1,RC RPTBD PACK @fifo_adr,AR1 @array,AR2 >>>>>>>>>>>>>>>> LWL0 *AR1,R9 LWL1 *AR1,R9 PACK R9,*AR2++(1) Byte Half-Word Manipulation Example 3-5.Use Unpacking 32-Bit Data Into Four-Byte-Wide Data Array TITLE UNPACKING 32-BIT DATA INTO FOUR BYTE-WIDE DATA ARRAY 32-BIT DATA CONTAINS FOUR 8-BIT THIS EXAMPLE ASSUMED THAT UNSIGNED DATA. size-1,RC @input_adr,AR0 @array1,AR1 RPTBD UNPACK @array2,AR2 @array3,AR3 @array4,AR4 >>>>>>>>>>>>>>>> LBU0 *AR0,R8 R8,*AR1++(1) LBU1 *AR0,R8 R8,*AR2++(1) LBU2 *AR0,R8 R8,*AR3++(1) LBU3 *AR0++(1),R8 UNPACK R8,*AR4++(1) ;Load array size ;Load RPTBD UNPACK input address ;Load output data array address ;Load output data array address ;Load output data array address ;Load output data array address ;Loop starts here ;Unpack first byte ;Unpack second byte ;Unpack third byte ;Unpack fourth byte Logical Arithmetic Operations Bit-Reversed Addressing Bit-Reversed Addressing 'C4x implement fast Fourier transforms (FFT) with bit-reversed addressing. data transformed correct order, final result scrambled bit-reversed order. recover frequency-domain data correct order, certain memory locations must swapped. bit-reversed addressing mode makes swapping unnecessary. next time data accessed, access bit-reversed rather than sequential. 'C4x, this bit-reversed addressing implemented through both DMA. correct bit-reversed operation, base address bit-reversed addressing must located boundary size table. clarify this point, assume size When real imaginary data stored separate arrays, LSBs base address must zero, must initialized 2n-1 (half size). When real imaginary data stored consecutive memory locations (Re-Im-Re-Im) LSBs base address must zero (0), must equal (FFT size). 3.4.1 Bit-Reversed Addressing auxiliary register (AR0, this case) points physical location data value. When auxiliary register using bit-reversed addressing, addresses generated bit-reversed fashion (reverse carry propagation). largest index (IR0, this case) reversing 00FF FFFFh. Example illustrates move 512-point complex from place computation (pointed AR0) location pointed AR1. Reads executed bit-reversed fashion writes linear fashion. this example, real imaginary parts XR(i) XI(i) data stored separate arrays, they interleaved with XR(0), XI(0), XR(1), XI(1), XR(N1), XI(N1). Because this arrangement, length array instead instead 256. Bit-Reversed Addressing Example 3-6.CPU Bit-Reversed Addressing TITLE BIT-REVERSED ADDRESSING THIS EXAMPLE MOVES RESULT 512-POINT COMPUTATION, POINTED AR0, LOCATION POINTED AR1. REAL IMAGINARY POINTS ALTERNATING. 511,RC ;Repeat 511+1 times RPTBD LOOP 512,IR0 ;Load size 2,IR1 *+AR0(1),R1 ;Load first imaginary point *AR0++(IR0)B,R0 ;Load real value (and point next R1,*+AR1(1) ;location) store imaginary value LOOP *+AR0(1),R1 ;Load next imaginary point store R0,*AR1++(IR1) ;previous real value 3.4.2 Bit-Reversed Addressing bit-reversed addressing, bits control register enable bit-reversed addressing reads (READ REV) writes (WRITE REV). source address index register destination address index register define size bit-reversed addressing. Their function similar index register described previous subsection. block transfers required when used bit-reversed transfer complex numbers: transfer real ports transfer imaginary ports. Figure illustrates settings required operation equivalent Example 3-6. Unified-autoinitialization mode bit-reversed read used. more detailed information about operation, refer Coprocessor TMS320C4x User's Guide. Logical Arithmetic Operations Bit-Reversed Addressing Figure 3-1. Bit-Reversed Addressing Control Register 00C0 1009h label label 00C0 1005h AR0+1 AR1+1 Address Index Counter Address Index Link Pointer Integer Floating-Point Division Integer Floating-Point Division single-cycle instruction, RCPF, generate estimate reciprocal floating-point number. This estimate correct exponent, mantissa accurate eighth binary place (the error mantissa 2-8). Often, this satisfactory estimate reciprocal floatingpoint number. other cases, this estimate used seed algorithm that computes reciprocal even greater accuracy. NewtonRaphson algorithm described later such case. Although provides special instruction integer division, instruction perform efficient division routine. Additionally, FLOAT, RCPF, instructions produce rough estimate. 3.5.1 Integer Division implement division 'C4x repeating SUBC, special conditional subtract instruction. Consider case 32-bit positive dividend with significant bits (and 32-i sign bits), 32-bit positive divisor with significant bits (and 32-j sign bits). repetition SUBC command times produces 32-bit result which lower bits quotient, upper 31-i bits remainder division. SUBC implements binary division same manner long division. divisor (assumed smaller than dividend) shifted left times align with dividend. Then, using SUBC, shifted divisor subtracted from dividend. each subtract that does produce negative answer, dividend replaced difference. then shifted left, difference negative, dividend simply shifted left one. This operation repeated times. example, consider division using both long division SUBC method. this case, SUBC operation repeated times. LONG DIVISION: Quotient -101 1101 -101 Remainder Logical Arithmetic Operations Integer Floating-Point Division SUBC METHOD: Negative difference Dividend Divisor (aligned) (1st SUBC command) Dividend Quotient Divisor Difference (>0) (2nd SUBC command) Dividend Quotient Divisor Difference (>0) (3rd SUBC command) Negative difference Dividend Quotient Divisor (4th SUBC command) Final Result Remainder Quot. When SUBC command used, both dividend divisor must positive. Example shows realization integer division which sign quotient properly handled. last instruction before returning modifies condition flag, case subsequent operations depend sign result. 3-10 Integer Floating-Point Division Example 3-7.Integer Division TITLE INTEGER DIVISION SUBROUTINE DIVI INPUTS: SIGNED INTEGER DIVIDEND SIGNED INTEGER DIVISOR OUTPUT: R0/R1 into REGISTERS USED: R0-R3, IR0, OPERATION: NORMALIZE DIVISOR WITH DIVIDEND REPEAT SUBC QUOTIENT LSBs RESULT CYCLES: 31-62 (DEPENDS AMOUNT NORMALIZATION) .globl DIVI SIGN .set TEMPF .set TEMP .set COUNT .set DIVI SIGNED DIVISION DIVI: DETERMINE SIGN RESULT. ABSOLUTE VALUE OPERANDS. R0,R1,SIGN ;Get sign ABSI ABSI CMPI R0,R1 ;Divisor dividend BGTD ZERO return NORMALIZE OPERANDS. DIFFERENCE EXPONENTS SHIFT COUNT DIVISOR, REPEAT COUNT 'SUBC'. FLOAT R0,TEMPF ;Normalize dividend PUSHF TEMPF ;PUSH float COUNT ;POP -24,COUNT ;Get dividend exponent FLOAT R1,TEMPF ;Normalize divisor PUSHF TEMPF ;PUSH float TEMP ;POP -24,TEMP ;Get divisor exponent SUBI TEMP,COUNT ;Get difference exponents COUNT,R1 ;Align divisor with dividend COUNT+1 SUBTRACT SHIFTS. RPTS COUNT SUBC R1,R0 Logical Arithmetic Operations 3-11 Integer Floating-Point Division Example 3-7.Integer Division (Continued) MASK LOWER COUNT+1 BITS SUBRI NEGI 31,COUNT COUNT,R0 COUNT COUNT,R0 ;Shift count (COUNT+1)) ;Shift left ;Shift right result CHECK SIGN NEGATE RESULT NECESSARY. NEGI LDINZ CMPI R0,R1 -31,SIGN R1,R0 0,R0 ;Negate result ;Check sign set, negative result ;Set status from result RETS RETURN ZERO. ZERO: 0,R0 RETS .end dividend less than divisor want fractional division, perform division after determine desired accuracy quotient bits. desired accuracy bits, start shifting dividend left positions. Then apply algorithm described above, replace with assumed that less than 3.5.2 Computation Floating-Point Inverse Division When RCPF (reciprocal floating-point number) instruction generate estimate reciprocal floating-point number, also Newton-Raphson algorithm extend precision mantissa reciprocal floating-point number that instruction generates. floating-point division obtained multiplying dividend reciprocal divisor. input RCPF assumed v(man) 2v(exp). output x(man) x(exp). value v(man) x(man)) composed three fields: sign v(sign), implied nonsign bit, fraction field v(frac). Four rules apply generating reciprocal floating-point number: then x(exp) -v(exp) x(man) 2/v(man). special case which MSBs v(man) 01.00000000b, then x(man) 01.11111111b. both cases, LSBs x(frac) then x(exp) -v(exp) x(man) 2/v(man). special case which MSBs v(man) 10.00000000b, 3-12 Integer Floating-Point Division then x(man) 10.11111111b. both cases, LSBs x(frac) v(exp) -128 then x(exp) 127, x(man) other words, then becomes largest positive number representable extended-precision floating-point format. overflow flag v(exp) 127, then x(exp) -128, x(man) zero flag Newton-Raphson algorithm x[n+1] x[n](2.0 vx[n]) this algorithm, number which reciprocal desired. x[0] seed algorithm given RCPF. every iteration algorithm, number bits accuracy mantissa doubles. Using RCPF, accuracy starts eight bits. With iteration, accuracy increases to16 bits mantissa, with second iteration, accuracy increases bits mantissa. Example shows program implementing this algorithm 'C4x. Logical Arithmetic Operations 3-13 Integer Floating-Point Division Example 3-8.Inverse Floating-Point Number With 32-Bit Mantissa Accuracy TITLE INVERSE FLOATING-POINT NUMBER WITH 32-BIT MANTISSA ACCURACY SUBROUTINE INVF FLOATING-POINT NUMBER STORED AFTER COMPUTATION COMPLETED, STORED TYPICAL CALLING SEQUENCE: LAJU INVF <---- other non-pipeline-break <---- instructions ARGUMENT ASSIGNMENTS: ARGUMENT |FUNCTION NUMBER FIND RECIPROCAL (UPON CALL) (UPON RETURN) -------------- REGISTER USED INPUT: REGISTERS MODIFIED: REGISTER CONTAINING RESULT: REGISTER SUBROUTINE CALL: CYCLES: (not including subroutine overhead) WORDS: (not including subroutine overhead) .global INVF INVF: MPYF3 SUBRF MPYF MPYF3 SUBRF MPYF R1,R0,R2 2.0,R2 R2,R1 ;Delayed return caller R1,R0,R2 2.0,R2 R2,R1 RCPF R0,R1 ;Get x[0] ;estimate 1/v, ;End first iteration ;(16 bits accuracy) ;End second iteration ;(32 bits accuracy) 1/v, Return caller .end 3-14 Calculating Square Root Calculating Square Root many applications, normalization data values necessary. Often, normalizing factor square root another quantity. example, given vector, unit vector same direction original vector found normalizing original vector length. This involves division square root. 'C4x single-cycle instruction RSQRF generates estimate reciprocal square root positive floating-point number. This estimate correct exponent, mantissa accurate eighth binary place (the error mantissa 2-8). Three rules apply this algorithm: v(exp) even, then x(exp) -(v(exp)/2) x(man) 2/sqrt(v(man)). special case where MSBs y(man) 01.00000000b, then x(man) 01.11111111b. both cases, LSBs x(frac) v(exp) odd, then x(exp) -((v(exp) 1)/2) x(man) sqrt(2/v(man)). LSBs x(frac) v(exp) -128 then x(exp) 127, x(man) other words, then becomes largest positive number representable extended-precision floating-point format. overflow flag need larger precision than RSQRF instruction gives estimate reciprocal square root, Newton-Raphson algorithm further extend precision mantissa. algorithm x[n+1] x[n](1.5 (v/2) [n]) this equation, number which reciprocal desired. x[0] seed algorithm given RSQRF. every iteration algorithm, number bits accuracy mantissa doubles. Using RSQRF, accuracy starts eight bits. With iteration, accuracy increases to16 bits, with second iteration, accuracy increases bits mantissa. Example shows program implementing this algorithm 'C4x. Logical Arithmetic Operations 3-15 Calculating Square Root Example 3-9.Reciprocal Square Root Positive Floating Point TITLE RECIPROCAL SQUARE ROOT POSITIVE FLOATING-POINT SUBROUTINE RCPSQRF FLOATING-POINT NUMBER STORED AFTER COMPUTATION COMPLETED, 1/SQRT(v) STORED TYPICAL CALLING SEQUENCE: LAJU RCPSQRF ARGUMENT ASSIGNMENTS: ARGUMENT FUNCTION ------------ NUMBER FIND RECIPROCAL (UPON CALL) 1/sqrt(v) (UPON RETURN) REGISTER USED INPUT: REGISTERS MODIFIED: REGISTER CONTAINING RESULT: REGISTER SUBROUTINE CALL: CYCLES: (not including subroutine overhead) WORDS: (not including subroutine overhead) .global RCPSQRF RCPSQRF: RSQRF R0,R1 MPYF 0.5,R0 MPYF3 R1,R1,R2 MPYF R0,R2 SUBRF 1.5,R2 MPYF R2,R1 MPYF3 R1,R1,R2 MPYF R0,R2 SUBRF 1.5,R2 MPYF R2,R1 1/SQRT(v), Return .end ;Get x[0] estimate 1/sqrt(v), ;First iteration ;End first iteration bits accuracy) ;Second iteration ;Delayed return caller ;End second iteration bits accuracy) caller find square root simple multiplication: sqrt(v) vx[n] which x[n] estimate 1/sqrt(v) determined Newton-Raphson algorithm another algorithm. 3-16 Extended-Precision Arithmetic Extended-Precision Arithmetic 'C4x offers bits precision mantissa integer arithmetic, bits precision mantissa floating-point arithmetic. higher precision floating-point operations, twelve extended-precision registers, R11, contain eight more bits accuracy. Because comparable extension available fixed-point arithmetic, this section discusses achieve fixed-point double precision. technique consists performing arithmetic parts similar which longhand arithmetic done. instructions, ADDC (add with carry) SUBB (subtract with borrow) status carry extended-precision arithmetic. carry affected arithmetic operations rotate shift instructions. also manipulate directly setting status register certain values. proper operation, overflow mode should reset (OVM that accumulator results loaded with saturation values. Example 3-10 Example 3-11 show 64-bit addition 64-bit subtraction, respectively. first operand stored registers (low word) (high word). second operand stored registers respectively. result stored Example 3-10. 64-Bit Addition TITLE 64-BIT ADDITION 64-BIT NUMBERS ADDED EACH OTHER PRODUCING 64-BIT RESULT. NUMBERS (R1,R0) (R3,R2) ADDED, RESULTING (R1,R0). ------- ADDI ADDC R2,R0 R3,R1 Logical Arithmetic Operations 3-17 Extended-Precision Arithmetic Example 3-11. 64-Bit Subtraction TITLE 64-BIT SUBTRACTION 64-BIT NUMBERS SUBTRACTED FROM EACH OTHER PRODUCING 64-BIT RESULT. NUMBERS (R1,R0) (R3,R2) SUBTRACTED, RESULTING (R1,R0). ------- SUBI SUBB R2,R0 R3,R1 When 32-bit numbers multiplied, 64-bit product results. this, 'C4x provides 32-bit multiplier special instructions, MPYSHI (multiply signed integer produce MSBs) MPYUHI (multiply unsigned integer produce MSBs). Example 3-12 shows implementation 32-bit 32-bit multiplication. Example 3-12. 32-Bit 32-Bit Multiplication TITLE 32-BIT MULTIPLICATION MULTIPLIES 32-BIT NUMBERS, PRODUCING 64-BIT RESULT. NUMBERS MULTIPLIED, RESULTING (R3,R2). ---- MPYI3 R0,R1,R2 MPYSHI3 R0,R1,R3 3-18 Floating-Point Format Conversion: IEEE to/From 'C4x Floating-Point Format Conversion: IEEE to/From 'C4x fixed-point arithmetic, binary point that separates integer from fractional part number fixed certain location. Therefore, binary point 32-bit number fixed after most significant (which also sign bit), only fractional number number with absolute value less than represented. other words, there number with fractional bits. operations assume that binary point fixed this location. fixed-point system, although simple implement hardware, imposes limitations dynamic range represented number. This causes scaling problems many applications. avoid this difficulty using floating-point numbers. floating-point number consists mantissa multiplied base raised exponent current hardware implementations, mantissa typically normalized number with absolute value between base Although mantissa represented fixed-point number, actual value overall number floats binary point because multiplication exponent integer whose value determines position binary point number. IEEE established standard format representation floating-point numbers. achieve higher efficiency hardware implementation, 'C4x uses floating-point format that differs from IEEE standard. However, 'C4x single-cycle instructions, TOIEEE FRIEEE, format conversion. These instructions also used with instruction, which allows data format converted within memory-to-memory transfer. Here descriptions both formats example program convert between them. 'C4x floating-point format: bits bits 32-bit word representing floating-point number, first bits correspond exponent expressed twos-complement format. sign, bits mantissa. mantissa expressed twos-complement form with binary point after most significant nonsign bit. Because this complement sign suppressed. other words, mantissa actually bits. special case occurs when Logical Arithmetic Operations 3-19 Floating-Point Format Conversion: IEEE to/From 'C4x -128. this case, number interpreted zero, independently values (which are, default, zero). summarize, values represented numbers 'C4x floating-point format follows: (01.f) (10.f) -128 IEEE floating-point format: bits bits IEEE floating-point format uses sign-magnitude notation mantissa. 32-bit word representing floating-point number, first sign bit. next bits correspond exponent, expressed offset-by-127 format (the actual exponent e-127). following bits represent absolute value mantissa with most significant implied. binary point fixed after this most significant other words, mantissa actually bits. Several special cases summarized below. These values represented numbers IEEE floating-point format: (-1) e-127 (01.f) Special cases: (-1) (-1) 2-126 (0.f) (-1) infinity (not number) (zero) (denormalized) (infinity) 'C4x performs conversion according these definitions formats. assumes that source data IEEE format memory only that source data 'C4x floating-point format either memory extended-precision register. destination both conversions must extended-precision register. case block memory transfer, no-penalty data-format conversion executed parallel instruction with STF. Example 3-13 Example 3-14 show data-format conversion within data transformation between communication port internal RAM. 3-20 Floating-Point Format Conversion: IEEE to/From 'C4x Example 3-13. IEEE 'C4x Conversion Within Block Memory Transfer TITLE IEEE 'C4x CONVERSION WITHIN BLOCK MEMORY TRANSFER PROGRAM ASSUMES THAT INPUT FIFO COMMUNICATION PORT FULL IEEE FORMAT DATA. EIGHT DATA WORDS TRANSFERRED FROM COMMUNICATION PORT INTERNAL BLOCK DATA FORMAT CONVERTED FROM IEEE FORMAT 'C4x FLOATING-POINT FORMAT. FRIEEE RPTS FRIEEE @CP0_IN,AR0 @RAM0,AR1 *AR0,R0 *AR0,R0 R0,*AR1++(1) R0,*AR1++(1) ;Load comm port0 input FIFO address ;Load internal block address ;Convert first data ;Convert next data ;Store previous data ;Store last data Example 3-14. 'C4x IEEE Conversion Within Block Memory Transfer TITLE 'C4x IEEE CONVERSION WITHIN BLOCK MEMORY TRANSFER PROGRAM ASSUMES THAT OUTPUT FIFO COMMUNICATION PORT EMPTY. EIGHT DATA WORDS TRANSFERRED FROM INTERNAL BLOCK COMMUNICATION PORT DATA FORMAT CONVERTED FROM 'C4x FLOATING-POINT FORMAT IEEE FORMAT. TOIEEE RPTS TOIEEE @CP0_OUT,AR0 @RAM0,AR1 *AR1++(1),R0 *AR1++(1),R0 R0,*AR0 R0,*AR0 ;Load comm port0 output FIFO address ;Load internal block address ;Convert first data ;Convert next data ;Store previous data ;Store last data Logical Arithmetic Operations 3-21 3-22 Chapter Memory Interfacing 'C4x's advanced interface design used implement wide variety system configurations. external buses capability provide flexible parallel 32-bit interface byte- word-wide devices. This chapter describes 'C4x's memory interfaces connect various external devices. Specific discussions include implementation parallel interface devices with without wait states implementing system control functions. System Configuration External Interfacing Global Local Interfaces Zero Wait-State Interfacing RAMs Wait States Ready Generation 4-11 Parallel Processing Through Shared Memory 4-21 Chapter Title-Attribute Reference System Configuration System Configuration Figure illustrates expanded configuration 'C4x system with different types external devices interfaces which they connected. Figure 4-1. Possible System Configurations Fast local memory Analog Large shared memory 'C4x Peripherals Local Global Peripherals Interrupt interface Communication ports Peripherals Peripherals External flags 'C4x devices Timer interface devices Timer interface Clock, reset generator, System control devices your design, subset superset illustrated components. External Interfacing External Interfacing 'C4x interfaces connect wide variety device types. Each these interfaces tailored particular type device such memory, DMA, parallel serial peripherals, I/O. addition, 'C4x devices interface directly with each other, without external logic, through their communication ports their external flag pins IIOF(0-3). Each interface comprises more signal lines, which transfer information control operation. Figure shows signal groups these interfaces. Figure 4-2. External Interfaces Global Data address Data enable Address enable Status Interlock signal STRB0 control STRB0 control enable STRB1 control STRB1 control enable Interrupt Flags Nonmaskable interrupt Interrupt acknowledge Reset control Master clock Clock outputs D0-31 A0-30 'C4x LD0-31 LA0-30 Data address Data enable Address enable STAT(3-0) LOCK STRB0 R/W0 PAGE0 RDY0 STRB1 R/W1 PAGE1 RDY1 LSTAT(3-0) LLOCK LSTRB0 LR/W0 LPAGE0 LRDY0 LCE0 LSTRB1 LR/W1 LPAGE1 LRDY1 LCE1 Status Interlock signal LSTRB0 control LSTRB0 control enable LSTRB1 control LSTRB1 control enable Local CnD(7-0) IIOF(3-0) CREQn CACKn IACK CSTRBn RESET RESETL0C(1,0) CRDYn ROMEN TCLK0 TCLK1 X2/CLKIN TRST EMU0 EMU1 Communication port interface Sets) Timer interface flags Emulation interface Note: communication port communication port etc. global local buses implement primary memory-mapped interfaces device. These interfaces allow external devices such controllers other microprocessors share resources with more 'C4x devices through common bus. Memory Interfacing Global Local Interfaces Global Local Interfaces 'C4x uses global local buses access majority memory-mapped locations. Since these memory interfaces identical every way, except their positions memory map, each example this memory interface section focuses only interfaces. However, examples applicable either local global bus. buses have identical mutually exclusive sets control signals: Table 4-1. Local/Global Control Signals Global STRB0 STRB1 RDY0 RDY1 PAGE0 PAGE1 R/W0 R/W1 Local LSTRB0 LSTRB1 LCE0 LCE1 LRDY0 LRDY1 LPAGE0 LPAGE1 LR/W0 LR/W1 While both global local interface wide variety devices, they most commonly interface memories. Zero Wait-State Interfacing RAMs Zero Wait-State Interfacing RAMs memory-read access time normally defined time between address valid data valid. This time determined Read access time where: tc(H) H1/H3 cycle time address valid Data valid before next (read) tc(H) (td(H1L-A) tsu(D)R) td(H1L-A) tsu(D)R full-speed, zero wait-state interface device, 50-MHz 'C4x (40-ns instruction cycle time) requires read access time from address stable data valid. most memories, access time from chip enable same access time from address; thus, possible 20-ns memories full speed with 50-MHz 'C4x. However, 20-ns memories properly, must avoid long delays between processor memories. Avoiding these delays always possible, because interconnections gating chip-enable generation cause them. addition, choose memory device with output enable, output enable must become active quickly enough ensure that memory meet data valid timing requirements 'C4x. memories with 20-ns access times, output enable active data valid timing parameter typically less than Currently available RAMs without output-enable (OE) control lines include 1-bit wide organized RAMs most 4-bit wide RAMs. Those with controls include byte-wide 4-bit wide RAMs. Many fastest RAMs provide control; they chip-enable (CE) controlled write cycles ensure that data outputs turn write operations. CE-controlled write cycles, write control line (WE) goes before goes low, internal logic holds outputs disabled until cycle completed. Using CE-controlled write cycles efficient interface fast RAMs without controls 'C4x full speed. Note: find timing parameters CLKIN, memory TMS320C40 TMS320C44 data sheets. Memory Interfacing Zero Wait-State Interfacing RAMs 4.4.1 Consecutive Reads Followed Write Interface Timing Figure shows timing consecutive reads followed write. consecutive reads, LSTRB0 stays active (low), LR/W stays high long read cycles continue. back-to-back reads, 'C4x requires zero-waitstate memories have address-valid data-valid time less than most memory devices, this time same memory access time, which Thus, memories with access times more cannot meet this timing. Memory device timing critical zero-wait-state nonzero-waitstate write cycles, because cycle writes 'C4x. extra cycle gives LSTRB0 enough time frame LR/W, preventing memories that into high impedance slowly read cycle from driving during subsequent write cycle. memory device used this design (Figure 4-3), data lines guaranteed into high impedance after goes inactive, which gives more than margin before 'C4x starts driving with write data. Also, extra cycle with LSTRB0 inactive prevents writes random locations memory while address changing between consecutive writes. write cycles shown Figure Figure 4-4, requires write data setup before goes high, this design provides least (t3). data hold time (t4) required RAM, this design provides greater than Finally, RAM's 20-ns setup 0-ns hold times address (with respect high) ensure clear margin. Figure 4-3. Consecutive Reads Followed Write LR/W0 LSTRB0 LD(31-0) Valid data Valid data Valid write data LA(30-0) Valid read addr Valid read addr Write address Zero Wait-State Interfacing RAMs Figure 4-4. Consecutive Writes Followed Read LR/W0 STRB0 LD(31-0) Valid write data Valid write data Valid data LA(30-0) Valid write address Valid write address Read address 4.4.2 Consecutive Writes Followed Read Interface Timing Figure shows timing consecutive writes followed read. Notice that between consecutive writes, LR/W stays low, STRB0 goes inactive frame write cycles. Although 'C4x zero-wait-state writes take cycles, writes appear take cycle internally (from perspective DMA) access interface already progress. read cycle following writes Figure 4-4, 'C4x requires zero-waitstate memories have LSTRB-active data-valid time less than (one cycle minus LSTRB active plus data setup before low)). most memory devices, this time same memory access time, which this design. Thus, margin only exists, leaving little allowance STRB gating desired. 4.4.3 Interface Using Local Strobe Figure shows 'C4x's local interfaced eight Integrated Device Technology IDT71258 20-ns 4-bit CMOS static RAMs with zero wait states using chip-enable controlled write cycles. SRAMs arranged implement first 64K, 32-bit words external memory, located addresses 00000h thru 0FFFFh (internal assumed disabled). these words SRAM only memory controlled LSTRB0, LSTRB ACTIVE field local memory interface control register (LMICR) should minimum value 011112, allowing LSTRB0 active only first Memory Interfacing Zero Wait-State Interfacing RAMs words 'C4x's memory space. addition, this memory only memory interfaced LSTRB0, LSTRB0 requires only page, PAGESIZE field LMICR should 011112. Also note that Figure 4-5, LRDY0 input tied low, selecting zero wait states LSTRB0 accesses local bus. With zero-wait-state memory controlled LSTRB0, LSTRB1 used control accesses slower read-only memory devices other types memory. Figure 4-5. 'C4x Interface Eight Zero-Wait-State SRAM IDT71258 SRAM A15-A0 IDT71258 SRAM 'C4x A15-A0 LSTRB0 LR/W0 I/O3 A15-A0 LRDY0 I/O3 LD(31-0) this circuit implementation, external logic necessary interface 'C4x memory device. Typically, memory devices must held inactive inactive) during changes this avoids undesired memory accesses while address changes. 'C4x ensures this glueless interface because LSTRB always frames changes LR/W. 4.4.4 Interface Using Both Local Strobes Figure shows 'C4x's local interfaced HM6708 20-ns 4-bit CMOS static RAMs with zero wait states using controlled write cycles. Zero Wait-State Interfacing RAMs These RAMs arranged allow 128K 32-bit words local memory, which implemented 32-bit banks. bank controlled each sets control signals local bus. these memory devices properly 'C4x's memory space, must local-memory-interface control register (LMICR) define which part local bus's memory space mapped each strobes. this implementation with internal disabled, LSTRB0 mapped first words local space (addresses through 0FFFFh), LSTRB1 mapped rest local space (addresses 10000h through 7FFF FFFFh). this memory configuration, LSTRB ACTIVE field local-memory-interface control register (LMICR) should 011112. Also, each LSTRB requires only page. PAGESIZE field LMICR should 011112. Note that Figure 4-6, LRDY inputs tied low, selecting zero wait states accesses local bus. Hence, through 'C4x's four strobes (two each local global buses), four different banks memory decoded. addition, through program control, change address decoding under program control changing LSTRB active field (bits 24-28) LMICR global-memory-interface control register (GMICR). must decode more than four banks memory chosen memory device cannot meet read cycle timing requirements 'C4x zero wait states, should page switching (discussed subsection 4.5.6 page 4-18) extra cycle read accesses outside current bank boundary. Memory Interfacing Zero Wait-State Interfacing RAMs Figure 4-6. 'C4x Interface Zero-Wait-State SRAMs, Strobes HM6708 SRAM HM6708 SRAM 'C4x A15-A0 LSTRB0 LR/W0 LRDY0 I/O3-I/O0 I/O3-I/O0 A15-A0 A15-A0 LSTRB1 LR/W1 LRDY1 LD(31-0) 4-10 Wait States Ready Generation Wait States Ready Generation Using wait states greatly increase system's flexibility reduce hardware requirement. 'C4x capable generating wait states either global local bus, both buses have independent sets ready control logic. buses' wait-state configuration determined WTCNT fields local global-bus-interface control registers. This section discusses ready generation from perspective globalbus interface; however, wait-state operation local same global bus, this discussion pertains equally well both (local global). Also, local global buses each have sets control signals R/W0, STRB0, RDY0, PAGE0, R/W1, STRB1, RDY1, PAGE1, CE1- with each control signals having ready signal, providing more flexibility support external devices with different speeds. Since both strobes' ready signals share same electrical characteristics, following discussion focuses global bus's control signals. Wait states generated internal wait-state generator external ready inputs (RDY0 RDY1) logical ready signals When enabled, internally generated wait states affect external cycles, regardless address accessed. different numbers wait states required various external devices, external input used customize wait-state generation specific system requirements. either logical electrical (since signals true low) external wait-count ready signals selected, earlier signals will generate ready condition allow cycle completed. required that both signals present. Memory Interfacing 4-11 Wait States Ready Generation 4.5.1 ORing Ready Signals (STRBx ready signals implement wait states devices that require more wait states than internal logic implement seven). This feature useful, example, system contains some fast some slow devices. this case: Fast devices generate ready externally with minimum logic. When fast devices accessed, external hardware responds promptly with ready, which terminates cycle. Slow devices internal wait counter larger numbers wait states. When slow devices accessed, external hardware does respond, cycle appropriately terminated after internal wait count. ready signals also terminate cycle before number wait states implemented with external logic allows termination. this case, shorter wait count specified internally than number wait states implemented with external ready logic, cycle terminated after wait count. Also, this feature used safeguard against inadvertent accesses nonexistent memory that would never respond with ready would, therefore, lock 'C4x. ready signals used, however, internal wait-state count less than number wait states implemented externally, external ready generation logic must able reset sequencing allow cycle begin immediately following internal wait count. Also, consecutive cycles must from independently decoded areas memory from different pages memory). Otherwise, external ready generation logic lose synchronization with cycles generate improperly timed wait states. 4.5.2 ANDing Ready Signals (STRBx logical (electrical wait count external ready signals selected, later signals will control internal ready signal, both signals must asserted. Accordingly, external ready control must implemented each wait-state device, wait count ready signal must enabled. This feature useful devices system equipped provide ready signal cannot respond quickly enough meet 'C4x's timing requirements. these devices normally indicate ready condition and, when accessed, respond with wait until they become ready, logical 4-12 Wait States Ready Generation ready signals used save hardware system. this case, internal wait counter provide wait states initially, then external ready provide wait states after external device time send not-ready indication. internal wait counter then remains ready until external device also becomes ready, which terminates cycle. Additionally, ready signals used extending number wait states devices that already have external ready logic implemented, require additional wait states under certain unique circumstances. 4.5.3 External Ready Generation optimum technique implementing external ready generation hardware depends specific characteristics system, including relative number wait-state nonwait-state devices system maximum number wait states required device. approaches discussed here intended general enough most applications easily modifiable comprehend many different system configurations. general, ready generation involves following three functions: Segmentation address space distinguish fast slow devices Generation properly timed ready indications Logical ORing separate ready timing signals together connect physical ready input Segmentation address space required obtain unique indication each particular area within address space that requires wait states. This segmentation commonly implemented form chip-select generation. Chip-select signals initiate wait states many cases; however, occasionally, chip-select decoding considerations provide signals that allow ready input timing requirements met. this case, segment coarse address space basis small number address lines, where simpler gating allows signals generated more quickly. either case, signal that indicates that particular area memory being addressed also normally initiates ready wait-state signal. When address space accessed been established, timing circuit normally used provide ready indication processor appropriate point cycle satisfy each device's unique requirements. Finally, since indications ready status from multiple devices typically present, should logically signals using single gate drive input. Memory Interfacing 4-13 Wait States Ready Generation 4.5.4 Ready Control Logic take basic approaches implement ready control logic, depending state ready input between accesses. between accesses, processor always ready unless wait state required; high between accesses, processor will always enter wait state unless ready indication generated. between accesses, control devices that zero-wait-state full speed straightforward; action necessary, because ready always active unless otherwise required. Devices requiring wait states, however, must drive ready high fast enough meet input timing requirements. Then, after appropriate delay, ready indication must generated. This difficult many circumstances because wait-state devices inherently slow often require complex select decoding. high between accesses, zero-wait-state devices, which tend inherently fast, usually respond immediately with ready indication. Waitstate devices simply delay their select signals appropriately generate ready. Typically, this approach results most efficient implementation ready control logic. Figure shows circuit this type, which used generate wait states multiple devices system. Figure 4-7. Logic Generation Wait States Multiple Devices 16R4 'C4x Address bits device selection STRB0 From 'C4x RESET strb_syn_ RDY0 'C4x From 'C4x 4-14 Wait States Ready Generation 4.5.5 Example Circuit Figure shows single, 7-ns 16R4 programmable logic device (PLD) used generate wait states multiple devices that interfaced 'C4x. this example, distinct address bits used select different wait-state devices. Here, each three address lines input 16R4 corresponds different speed device. single 16R4 implementation, nine different address bits used select different speed devices. single output, connected directly RDY0 input 'C4x signal completion access external wait-state generation. Because RDY0 sampled falling output clock used clock input. Example shows ready logic equations programming 16R4 PLD. language used ABEL. STRB0 input into that indicates that valid 'C4x cycle occurring. Also, delayed version STRB0 (synchronized with going high) provided strb_syn_ input signal. This delayed signal needed avoid problems with race condition that exist between STRB0 going rising. RESET used bring state machine back idle state. Notice that RDY0 output registered. asynchronous RDY0 signal necessary generate ready signal zero-wait-state devices. When zero-wait-state device selected (ahi1 high Example 4-1) STRB0 low, asserts RDY0 within Hence, RDY0 goes active fast enough satisfy 20-ns setup time RDY0 before low. generation RDY0 wait states, device select address bits strb_syn_ delayed cycles, respectively, before RDY0 brought active low. H3-cycle delay, required onewait-state device ready generation, corresponds state wait_one Example H3-cycle delay required two-wait-state devices corresponds state wait_twoa wait_twob. This 16R4 PLD-based design used implement different numbers wait states multiple devices. More devices selected with 'C4x address lines, higher number wait states produced with logic. Furthermore, this approach used conjunction with 'C4x's internal wait-state generator. Memory Interfacing 4-15 Wait States Ready Generation Example 4-1.PLD Equations Ready Generation 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 0040 0041 0042 0043 0044 0045 0046 0047 0048 0049 0050 module ready_generation title' ready generation logic wait state devices interfaced TMS320C4x' C40u5 device 'P16R4'; "inputs "The following TMS320C40 address bits used "select different speed devices. More used "necessary. this example, zero wait state, wait "state, wait state device decoded with these "three address bits ahi1 "when high selects zero wait state device ahi2 "when high selects wait state device ahi3 "when high selects wait state device strb0_ "indicates valid TMS320C40 cycle reset_ "reset signal from TMS320C40 strb_syn_ "reset strb0_ synchronized with rising edge. "output rdy0_ "ready signal TMS320C40 one_wait "internal flip-flop signal wait state "device ready signal generation two_waita "internal flip-flop signal first "wait states wait state devices two_waitb "internal flip-flop signal second wait states wait "state devices "name substitutions test vectors c,H,L,X .C.,1,0,.X.; "state bits outstate [one_wait, two_waita, two_waitb]; idle ^b111; wait_one ^b011; wait_twoa ^b101; wait_twob ^b110; |state_diagram outstate |state idle: (reset_ ahi2 !strb_syn_) then wait_one else (reset_ ahi3 !strb_syn_) then wait_twoa 4-16 Wait States Ready Generation Example 4-1.PLD Equations Ready Generation (Continued) 0051 0052 0053 0054 0055 0056 0057 0058 0059 0060 0061 0062 0063 0064 0065 0066 0067 0068 0069 0070 0071 0072 0073 0074 0075 0076 0077 0078 0079 0080 0081 0082 0083 0084 0085 0086 0087 0088 0089 else idle; |state wait_one: GOTO idle; |state wait_twoa: (reset_) then wait_twob else idle; |state wait_twob: GOTO idle; |equations !rdy0_ reset_ ((ahi1 !strb0_) !one_wait !two_waitb) |@page |"Test level global arbitration logic |test_vectors |([h3,ahi1,ahi2,ahi3,strb0_, _strb_syn_ reset_] [outstate, rdy0_]) [idle, [wait_one, [idle, [wait_twoa, [idle, [wait_twoa, [wait_twob, [idle, [idle, [idle, [idle, [wait_one, [idle, [wait_twoa, [wait_twob, [idle, [idle, [idle, |end ready_generation Memory Interfacing 4-17 Wait States Ready Generation 4.5.6 Page Switching Techniques 'C4x's programmable page-switching feature greatly ease system design when large amounts memory slow external peripheral devices required. This feature provides time period disabling device selects. During interval, slow devices allowed time turn before other devices have opportunity drive data bus, thus avoiding contention. When page switching enabled, time portion high-order address lines changes, defined contents STRB0 STRB1 PAGESIZE fields global local memory interface control registers), corresponding STRB PAGE high full cycle. Provided that STRB included chip-select decodes, this causes devices selected that STRB disabled during this period. next page devices enabled until STRB PAGE again. high-order address lines remain constant during read cycle, memory access time with page switching same memory access time without page switching. addition, page switching required during writes, because these write cycles exhibit inherent one-half cycle setup address information before STRB goes low. Thus, when page switching read/write devices, minimum half cycle address setup provided accesses outside page boundary. Therefore, large amounts memory implemented without wait states extra hardware required isolation between pages. Also, note that access time cycles during page switching same that cycles without page switching, and, accordingly, full-speed accesses still accomplished within each page. circuit shown Figure illustrates page switching with CY7B185 15-ns BiCMOS static RAM. This circuit implements 32-bit words memory with full-speed zero wait-state accesses within each page. 4-18 Wait States Ready Generation Figure 4-8. Page Switching CY7B185 'C4x A30-0 R/W0 STRB0 'P16L8 I/O1 I/O2 I/O1 I/O2 SEL0 Address Data Address Data Address Data Address Data Bank Bank CY7B185) D31-0 Bank Bank 5-ns, 16L8 decodes lines A13. These lines along with STRB0 select each four pages this circuit. With PAGESIZE field STRB0 global memory interface control register 0Ch, pages selected even 8K-word boundaries, starting location zero external memory space. This circuit cannot implemented without page switching, because data output's turn-on turn-off delays cause conflicts, full-speed accesses allow enough time chip-select decoding four pages. Here, propagation delay 16L8 involved only during page switches, where there sufficient time between cycles allow chip-selects decoded. timing this circuit read operations with pa Other recent searchesuPD78044F - uPD78044F uPD78044F Datasheet uPD78044H - uPD78044H uPD78044H Datasheet uPD780208 - uPD780208 uPD780208 Datasheet uPD780228 - uPD780228 uPD780228 Datasheet NP88N075EUE - NP88N075EUE NP88N075EUE Datasheet NP88N075KUE - NP88N075KUE NP88N075KUE Datasheet NP88N075CUE - NP88N075CUE NP88N075CUE Datasheet NP88N075DUE - NP88N075DUE NP88N075DUE Datasheet NP88N075MUE - NP88N075MUE NP88N075MUE Datasheet NP88N075NUE - NP88N075NUE NP88N075NUE Datasheet M185XW01 - M185XW01 M185XW01 Datasheet FMXL107-00 - FMXL107-00 FMXL107-00 Datasheet ADNV6340 - ADNV6340 ADNV6340 Datasheet 54ABT16500 - 54ABT16500 54ABT16500 Datasheet
Privacy Policy | Disclaimer |