| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
Abstract TMS320C6211's caches deliver high performance without cost la
Top Searches for this datasheetTMS320C6211 Cache Analysis Abstract TMS320C6211's caches deliver high performance without cost large arrays on-chip memory. efficiency TMS320C6211 caches makes cost, highdensity external memory, such SDRAM, effective on-chip memory. Cache Architecture Overview Level-one Program Cache (L1P) Level-one Data Cache (L1D) Level-two Cache/Unified Memory (L2) Price Performance Two-Level Cache Benefits. 2.2.1 reduces latency cache miss 2.2.2 Unifying program data Real-time operation 2.3.1 Predictability 2.3.2 Interrupt Latency. Ease Cache Performance Efficient Capability.9 Summary Figures Figure Figure Figure Figure Figure Figure Figure TMS320C6211 Block Diagram TMS320C6211 Two-level cache fetch flow Memory Configurations, Diagram Memory Configurations, Diagram Data Allocation Multiple Cache Ways Typical processor peripheral data flow.9 TMS320C6211 peripheral data flow Tables Table TMS320C6211 Benchmark Performance Digital Signal Processing Solutions September 1998 Cache Architecture Overview TMS320C6211 ('C6211) utilizes highly efficient two-level real-time cache internal program data storage (See Figure 'C6211's caches deliver high performance without cost large arrays on-chip memory. efficiency TMS320C6211 caches makes cost, high-density external memory, such SDRAM, effective on-chip memory. 'C6211 executes over 99.5% cycles without going off-chip. This leads greater than cycle count performance C62x device with infinite internal memory. TMS320C6211 employs two-level memory architecture chip program data accesses. first level dedicated Kbyte program data caches, respectively. second level memory Kbyte memory-block that shared both program data, designated Dedicated caches eliminate conflicts memory resources between program data busses. unified memory provides flexible memory allocation between program data accesses that reside Since data frequently resident flexibility more important level than conflict reduction. Figure TMS320C6211 Block Diagram Cache Direct Mapped Kbytes C6200B Control Registers Instruction Dispatch In-Circuit Instruction Decode Emulation Instruction Fetch Data Path Register File Data Path Register File External Memory Interface (EMIF) Multi-channel Buffered Serial Port (McBSP Multi-channel Buffered Serial Port (McBSP Host Port Interface (HPI) Power Down Logic C6211 Digital Signal Processor Interrupt Control Enhanced Controller Memory Banks Kbytes Timer Timer Cache 2-Way Associative Kbytes Each cache consists cache memory, smaller block memory save state cache known RAM, cache controller. When access initiated CPU, cache controller checks determine that data resides cache. that data does reside cache cache occurs that data sent CPU. data does reside cache, then cache miss occurs. cache miss, controller requests data from next level memory. case miss, next level memory case miss, next level memory external memory. amount data that cache requests miss referred cache's line size. Figure illustrates decision process used 'C6211 memory system fetch correct data request. TMS320C6211 Cache Analysis Figure TMS320C6211 Two-level cache fetch flow requests data data data Request data from external memory Send data data data using cache, which dynamically allocates memory reduce latency slower memory, processor's performance dramatically increased. cache's performance affected situation known thrashing. thrashing occur, data must read into cache. Subsequently, another location cached whose data overwrites first data. When first data requested again, cache must again fetch from slow memory. Level-one Program Cache (L1P) organized direct mapped cache with 64-byte line size. direct mapped cache well suited algorithms, which tend consist small, tight loops that rarely thrash. line size provides modest prefetch next fetch packet, eliminating startup latency fetching that packet. direct mapped cache, every cacheable memory location maps only location cache. Thus, cache controller needs check only location determine requested data available cache. algorithms primarily consist loops that execute same program kernel many times multiple data locations. Such algorithms remain loop long time before proceeding next kernel. large enough hold several typical kernels simultaneously. Since these kernels execute sequentially, they will thrash L1P. Thus, simple direct mapped cache that needed achieve considerable program performance without requiring complex caching hardware. When cache miss occurs, requests entire line data from other words, both requested fetch packet next fetch packet memory loaded into cache. Since most applications execute sequential instructions, there high likelihood that next fetch packet will immediately available when requested CPU. Thus, startup latency fetch next fetch packet eliminated bursting entire cache line. Fetching ahead also reduces number cache misses. Eliminating startup latency reducing misses reduces execution time application considerably compared cache with smaller line size. TMS320C6211 Cache Analysis Level-one Data Cache (L1D) organized 2-way associative cache with 32-byte line size. associative cache provides additional flexibility direct mapped cache. This cache architecture beneficial data, which tends more random have larger strides than program data. 2-way associative cache comprises direct mapped caches, each which referred cache way. Each caches Kbytes data. 2-way associative cache, every cacheable memory location reside location each cache way. 2-way associative cache reduces chance cache thrash since thrashing addresses stored cache simultaneously. This beneficial architecture data, which often accesses multiple arrays simultaneously, such arrays coefficients samples. 2-way associative cache advantageous design TMS320C6211 since data paths, which could simultaneously access different data arrays. minimizes chance these data paths thrashing. replaces data with Least Recently Used (LRU) replacement strategy. replacement chooses which update with data determining which cache ways accessed least recently. data then placed appropriate that least recently used way. best replacement strategy associative caches because temporal locality data once data been used will probably needed again within short time. Thus, cache should always keep data that most recently used, replace least recently used data. Like L1P, line size provides prefetch subsequent data, minimizing fetch latency that data. When cache miss occurs, requests entire line data from When fetching data array from contiguous non-cache memory, this greatly reduces latency subsequent data fetches. case array words, only first fetch will experience delay going chip. next seven array elements will fetched from L1D, each single cycle. half-word byte arrays, benefit will even greater since next array elements will cache. memories dual ported, which allows support simultaneous data accesses without stalling. Level-two Cache/Unified Memory (L2) 64-Kbyte SRAM divided into four 16-Kbyte blocks. unified memory, used both program data. amount program data configurable. example, your application requires only Kbytes program space Kbytes data space then both could linked into same time. Likewise, your application needed more program space than data, majority could linked program space. Each four blocks independently configured either cache memory mapped RAM. This allows dictate amount that used cache much used RAM. your application uses some data which must accessed quickly that data linked into block which configured RAM. rest configured cache, which will provide high performance operation remaining program data. TMS320C6211 Cache Analysis When block configured RAM, external data cached that block; instead, that memory accessed direct addressing. Each block that configured cache adds cache example, when only block configured cache, operates 1-way associative (direct mapped) cache Kbytes RAM. When four blocks configured cache, operates 4-way associative cache. Figure Figure illustrate division memory space according Mode. Figure Memory Configurations, Diagram Kbytes Mapped Cache Kbytes Kbytes Mapped Kbytes Mapped Kbytes Mapped Cache Kbytes Cache Kbytes Cache Kbytes Figure Memory Configurations, Diagram Mode SRAM Memory Kbytes SRAM SRAM Kbytes 4-Way Cache TMS320C6211 Cache Analysis SRAM 3-Way Cache Kbytes 2-Way Cache 1-Way Cache Kbytes providing high level associativity, minimizes thrashing between multiple data sources. example, your application could execute program data array coefficients, another data array, stack. associativity eliminates thrashing between data since each data source cached different cache way. Figure depicts example multiple data streams reside without thrashing. Figure Data Allocation Multiple Cache Ways kernel stack function input data stream function coeficients globals supports accesses from L1P, L1D, Enhanced (EDMA). EDMA only access blocks memory that configured SRAM. memories organized four 64-bit wide banks. simultaneous accesses serviced without stalling accesses same bank. Thus, concurrent accesses EDMA busses different banks serviced without stalling. Since same memory location simultaneously cached both `C6211 must ensure that always accessing current state memory locations. example, reads data that that data fetched from external memory. same data then written into both cache block. then modifies that memory location writing address, data will only updated L1D. this case, correct contains data value. When this location written external memory, example write array data external peripheral, `C6211 must write correct value. TMS320C6211 uses snooping ensure that latest data written time that location written back external memory. snoops checking memory location that cached also cached informs that memory location been modified. data been modified then retrieves modified data from sends that data external memory, removes data from both TMS320C6211 Cache Analysis Cache Performance two-level cache TMS320C6211 achieves high level performance. Tests have shown that typical applications running TMS320C6211 operate greater than optimal cycle count performance `C62x device. Optimal performance would achieved only having infinite internal program data memory. Table TMS320C6211 Benchmark Performance Application v.34 AC-3 Decoder Zlib File Compression Line Echo Cancellation Frame Encoder Frame Decoder ADSL Efficiency This level performance achievable high rate cache. Typically, program fetches execute without miss data fetches L1D. When configured 4-way associative cache, normally requests found cache. Over 99.5% cycles execute without requiring access external memory, virtually eliminating access penalty associated with external memory devices. Price Performance primary focus TMS320C6211 achieve easy use, cost, device with outstanding performance. TMS320C6211 External Memory Interface (EMIF) been optimized operate variety devices. 'C6211 offers 16-, 32-bit interfaces asynchronous memory, SDRAM, SBSRAM devices. This enables take advantage high performance processor with single chip external memory solutions, reducing total system cost board area. Two-Level Cache Benefits cache architecture TMS320C6211 allows device achieve high performance without large amounts expensive on-chip memory. having efficient cache, cost, high-density external memory, such SDRAM, effective onchip memory. Having two-level cache provides several benefits over one-level cache system. allows reduced latency cache miss, unifies program data same on-chip memory. TMS320C6211 Cache Analysis 2.2.1 reduces latency cache miss providing space, cache misses serviced much more quickly. There significant reduction cycle time retrieving data from on-chip memory than from external memory. external memory devices have significant startup latencies associated with them. having intermediate cache, this latency hidden from user. external memories that interface 'C6211 operate maximum MHz, while device operates maximum frequency. Using fast memories cache slower external memories reduces latency external accesses factor five. wide, high-bandwidth transfers data 1920 Mbytes/s while EMIF interface operates Mbytes/s. 2.2.2 Unifying program data unifying program data space, cache more likely hold memory requested CPU. enables on-chip memory contain more data than program when highly computational, looping code being process large data streams. long, serial code with data accesses, more densely populated with program instructions. unification allows allocate appropriate amount memory both program data keeps on-chip memory full instructions data that most likely requested CPU. Real-time operation important concern cache systems that device able perform real time. There several requirements system ensure that real-time operation possible. operation device must predictable, interrupts must handled without affecting continued real-time operation device, efficient must maintained. 2.3.1 Predictability TMS320C6211 high degree predictability. Device operation typically achieves over performance `C62x device with infinite on-chip memory. Software tools simulate performance cache will available early 4Q98 help optimize system performance. 2.3.2 Interrupt Latency Interrupt handling important part operation. crucial that able receive handle interrupts while maintaining real-time operation. typical applications, interrupt frequency increased proportion increase device operation frequency. processing speeds have increased, latency requirements have not. TMS320C6211 capable servicing interrupts with latency fraction microsecond when service routine located external memory. configuring memory blocks memory-mapped SRAM, possible lock critical program data sections into internal memory. This ideal situations such interrupts task switching. locking routines that need performed minimal time, microsecond delay interrupts reduced tens nanoseconds. TMS320C6211 Cache Analysis Efficient Capability Peripherals feature most systems that take advantage memorymapped RAM. Typical processors require that peripheral data first placed external memory before accessed CPU. TMS320C6211 maintain data buffers on-chip memory, rather than off-chip memory, providing higher data throughput peripherals. This increases performance when using on-chip McBSPs, HPI, external peripherals. EDMA used transfer data directly into mapped space while processes data. This increases performance since stalled while fetching data from slow external memory directly from peripheral. Using this method transferring data also minimizes EMIF activity, which crucial data rates number peripherals increase. Figure illustrates data flow from peripheral typical processor. Figure shows same data flow from peripheral TMS320C6211. Figure Typical processor peripheral data flow Typical Processor External Peripheral External Memory Internal Peripheral Figure TMS320C6211 peripheral data flow External Memory C6211 External Peripheral Internal Peripheral TMS320C6211 Cache Analysis Ease efficiency 'C6211 cache architecture makes device simple use. cache inherently transparent user. level associativity high cache rate, virtually optimization must done achieve high performance. Reduced time optimization leads reduced development time, allowing functional systems running quickly. High performance immediately achieved with 'C6211 cache architecture, while Harvard architecture with small internal memory requires much more time achieve similar performance. This because optimizing application small Harvard architecture requires several iterations tune application small, fixed internal memories. Summary TMS320C6211 two-level cache architecture optimized applications. 'C6211's caches deliver high performance without cost large arrays on-chip memory. They provide better than efficiency applications while greatly reducing development time. 'C6211 utilizes efficient low-cost cache architecture that will enable many applications, such low-cost client modems imaging, provides performance take many existing applications next level. TMS320C6211 Cache Analysis IMPORTANT NOTICE Texas Instruments (TI) reserves right make changes products discontinue semiconductor product service without notice, advises customers obtain latest version relevant information verify, before placing orders, that information being relied current complete. warrants performance semiconductor products related software specifications applicable time sale accordance with TI's standard warranty. Testing other quality control techniques utilized extent deems necessary support this warranty. Specific testing parameters each device necessarily performed, except those mandated government requirements. Certain application using semiconductor products involve potential risks death, personal injury, severe property environmental damage ("Critical Applications"). SEMICONDUCTOR PRODUCTS DESIGNED, INTENDED, AUTHORIZED, WARRANTED SUITABLE LIFE-SUPPORT APPLICATIONS, DEVICES SYSTEMS OTHER CRITICAL APPLICATIONS. Inclusion products such applications understood fully risk customer. products such applications requires written approval appropriate officer. Questions concerning potential risk applications should directed through local sales office. order minimize risks associated with customer's applications, adequate design operating safeguards should provided customer minimize inherent procedural hazards. assumes liability applications assistance, customer product design, software performance, infringement patents services described herein. does warrant represent that license, either express implied, granted under patent right, copyright, mask work right, other intellectual property right covering relating combination, machine, process which such semiconductor products services might used. Copyright 1998, Texas Instruments Incorporated trademark Texas Instruments Incorporated. Other brands names property their respective owners TMS320C6211 Cache Analysis Other recent searchesW9812G6JH - W9812G6JH W9812G6JH Datasheet TPS54160 - TPS54160 TPS54160 Datasheet SM4001A - SM4001A SM4001A Datasheet SM4007A - SM4007A SM4007A Datasheet SC33690DS - SC33690DS SC33690DS Datasheet MC33690 - MC33690 MC33690 Datasheet DMN3051LDM - DMN3051LDM DMN3051LDM Datasheet BTS7904S - BTS7904S BTS7904S Datasheet BCR8CS - BCR8CS BCR8CS Datasheet
Privacy Policy | Disclaimer |