NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
SPRA994 TMS320C6000 C6000 SPRU234 SPRU610 SPRU190 SPRAA00 SPRU578 SPRU518 - Datasheet Archive
SPRA994 - March 2004 TMS320C64x EDMA Architecture Jeffrey Ward Jamon Bowen TMS320C6000 Architecture ABSTRACT The enhanced DMA
Application Report SPRA994 SPRA994 - March 2004 TMS320C64x EDMA Architecture Jeffrey Ward Jamon Bowen TMS320C6000 TMS320C6000 Architecture ABSTRACT The enhanced DMA (EDMA) controller of the TMS320C64x device is a highly efficient data transfer engine. To maximize bandwidth, minimize transfer interference, and fully utilize the resources of the EDMA, it is crucial to understand the architecture of the engine. Transfer requests (TRs) originate from many requestors, including sixty-four programmable EDMA channels, the level 2 (L2) memory controller, and other master peripherals. The EDMA controls access to resources and arbitrates between concurrent transfers. Understanding the interaction points for transfer requests in the EDMA architecture is crucial to creating a system that takes full advantage of EDMA's capabilities. Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 EDMA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Data Transfer Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Transfer Requestors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Level-Two Memory Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 EDMA Channel Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Master Peripherals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Transfer Request Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Transfer Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Peripheral Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Transfer Controller Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Transfer Request Submission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 L2 Transfer Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1 CPU and Cache Transfer Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 QDMA Transfer Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 EDMA Channel Transfer Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 HPI Transfer Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 PCI Transfer Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5 EMAC Transfer Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Priority Queue Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1 Transfer Requestor Stalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Programming Priority Queue Allocation Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 Transfer Interaction and Arbitration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 3 3 3 5 5 5 5 6 8 Trademarks are the property of their respective owners. C6000 C6000 is a trademark of Texas Instruments. 1 SPRA994 SPRA994 6 Priority Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.1 Priority Inversion Due to Port Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2 Priority Inversion Due to Multiple High Priority Transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.3 Priority Inversion Due to TR Stalls (TR Blocking) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.4 Priority Inversion Due to Read/Write Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7 Resolving Priority Inversion, TR Blocking/Stalls, and Port Blocking . . . . . . . . . . . . . . . . . . . . 21 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 List of Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 EDMA Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 L2 Controller Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Peripheral Port Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Command and Data Busses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 L2 Services Multiple Transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Port Activity is Determined by Command Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Example Port Blocking Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Example Multiple High Priority Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Example TR Stall Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Example Parallel Read/Write Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 List of Tables Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 2 Command Buffers and Burst Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 EDMA/Cache Activity Due to CPU Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Data Transferred by an EDMA Channel Transfer Request . . . . . . . . . . . . . . . . . . . . . . . . . 11 Data Transferred by an HPI Transfer Request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Data Transferred by an PCI Transfer Request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Priority Queue Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 TMS320C64x EDMA Architecture SPRA994 SPRA994 TR Priority Queues L2 Controller Node: Cache/CPU/QDMA Peripheral Ports Q0: Urgent Priority Transfer Logic: Queue Registers Q1: High Priority Q2: Medium Priority Q3: Low Priority Master Peripheral Nodes EDMA Channel Controller Channel 0 Channel 1 Channel 2 Channel 61 Channel 62 Channel 63 EDMA Transfer Request Nodes Synchronization Logic EDMA Channel Controller Node Completion Events All Other Synchronization Events Figure 1. EDMA Architecture Overview 1 Introduction The enhanced DMA (EDMA) controller of the TMS320C64x devices is a highly efficient data transfer engine, capable of handling up to 8 bytes per EDMA cycle, resulting in 2.4 Giga-bytes per second of total data throughput at a CPU rate of 600 MHz (the EDMA frequency being CPU frequency divided by two). The EDMA performs all data movement between the on-chip level-two (L2) memory, external memory (connected to the device through an external memory interface (EMIF), and the device peripherals. These data transfers include CPU-initiated and event-triggered transfers, master peripheral accesses, cache servicing, and non-cacheable memory accesses. The EDMA architecture has many features designed to service multiple high-speed data transfers simultaneously. With a working knowledge of this architecture and the ways in which data transfers interact and are performed, it is possible to create an efficient system and to maximize the bandwidth utilization of the EDMA. 2 EDMA Architecture The most important thing to understand, prior to setting up the data movement in a system, is the architecture of the transfer engine. By understanding this architecture, it is possible to understand the stages through which a transfer is accomplished. The architecture is the key to knowing how multiple transfers (from multiple transfer requestors) interact with one another, and ultimately how they impact the system performance. 2.1 Data Transfer Overview Each data transfer is initiated by a transfer request (TR), which contains all the information required to perform the transfer: source address, destination address, transfer priority, element count, etc. TRs are sorted into queues based on priority. Once at the head of the queue, the TR is moved into the EDMA transfer controller's queue registers, which perform the actual data movement defined by the TR. TMS320C64x EDMA Architecture 3 SPRA994 SPRA994 The entire process of TR submission, priority queuing, and arbitration occurs at the speed of the EDMA, which is CPU frequency divided by two. Data movement at the peripheral occurs at the speed of the peripheral. The peripheral ports buffer data to isolate the high speed EDMA from the peripherals. This is a very efficient architecture, allowing the EDMA to service multiple simultaneous data transfers. 2.2 Transfer Requestors There are up to three requestors of data transfers inside the DSP: the L2 cache/memory controller, the EDMA channels, and the master peripherals. The transfers requested are likely to be different due to the different tasks that each performs. However, the way each transfer request is handled by the EDMA transfer controller is the same, regardless of its requestor. 2.2.1 Level-Two Memory Controller The L2 cache/memory controller performs many functions. It services CPU data accesses, submits quick DMA (QDMA) transfer requests, and maintains the coherency of the level-1 cache and the level-2 cache (if enabled). All communication between the CPU block and the rest of the device must pass through the L2 controller as depicted in Figure 2. Transfer Requests Level-1 Cache Memory Level-2 Memory Controller Peripheral Port Peripheral Config Bus CPU Peripheral Ports EDMA McBSP UTOPIA Master Periphs Video EMIF A EMIF B Memory-Mapped Config Registers in all Peripherals Figure 2. L2 Controller Functionality The L2 controller directs QDMA requests and external memory accesses to the EDMA, L2 cache/memory accesses to the L2 memory, and memory-mapped control register accesses to the peripheral configuration (config) bus. In addition, if any L2 memory is set up as cache, it maintains the coherency of the data between the cache and the cacheable memory space(s). The L2 controller receives L2 memory accesses from the CPU side and from the EDMA side. In the case of contention, the EDMAWEIGHT register defines which requestor takes precedence. The EDMAWEIGHT register is documented in the TMS320C6000 TMS320C6000 EDMA Controller Reference Guide (literature number SPRU234 SPRU234). Some accesses to the L2 controller result in an EDMA transfer request (TR); others do not. The L2 controller generates a TR for the following conditions: · · The CPU accesses a non-cacheable external memory space. · 4 The CPU issues a QDMA transfer. The L2 controller performs a cache allocation from external memory the result of a CPU access to a cacheable memory space. TMS320C64x EDMA Architecture SPRA994 SPRA994 · The L2 controller performs a cache eviction to external memory the result of a cache allocation which has no space to land in the cache memory. · User-initiated cache operations (flush, clean, etc.). The L2 controller submits all CPU and cache servicing transfer requests on the EDMA priority level set in the priority bits in the cache configuration (CCFG) register. QDMA transfers can be set to any priority on a per-transfer basis via the priority bits of the QDMA options register. Note that some accesses and data paths do not pass through the EDMA. The L2 controller does not generate an EDMA transfer request for the following conditions: · The CPU accesses memory in the L2 SRAM space. This access goes directly to L2 within the cache memory system; refer to TMS320C64x DSP Two Level Internal Memory Reference Guide (SPRU610 SPRU610). · The CPU accesses a memory-mapped config register. This access passes through the config bus. · The CPU accesses a cacheable external memory element that is allocated in L2 or L1 cache. This access goes directly to L2 within the cache memory system; refer to TMS320C64x DSP Two Level Internal Memory Reference Guide (SPRU610 SPRU610). Details on programming the QDMA can be found in the TMS320C6000 TMS320C6000 EDMA Controller Reference Guide (literature number SPRU234 SPRU234). Information about configuring the L2 cache, defining cacheable and non-cacheable external memory spaces, and programming the cache configuration registers can be found in the TMS320C64x Two-Level Internal Memory Reference Guide (literature number SPRU610 SPRU610). 2.2.2 EDMA Channel Controller There are sixty-four EDMA channels that can be configured in a special on-chip parameter RAM (PaRAM), with each channel corresponding to a specific synchronization event to trigger the transfer. The RAM-based structure of the EDMA allows for a great deal of flexibility. Each channel has a complete parameter set accessible via the peripheral config bus, which makes each channel's transfer parameters independent of one another. To allow for some interaction between transfers a linking mechanism is available to EDMA channels. Once fully exhausted, new channel parameters may be automatically loaded with a new set that is stored in the PaRAM via the linking mechanism. One EDMA TR is issued per synchronization event received. The transfers requested by the EDMA channels are completely dependent on the configuration programmed by the user. Details on programming EDMA channels are not included in this document. For transfer examples see the examples section of the TMS320C6000 TMS320C6000 EDMA Controller Reference Guide (literature number SPRU234 SPRU234). 2.2.3 Master Peripherals Master peripherals include the HPI, the PCI, and the EMAC. Master peripheral servicing is performed without any user intervention. These peripherals have a direct connection to the EDMA, with limited user-programmability. The direct connection allows master peripherals to submit transfer requests to the EDMA transfer controller in the same fashion as the L2 controller and the EDMA channels. TMS320C64x EDMA Architecture 5 SPRA994 SPRA994 The requests made to the EDMA are dependent on the master activity, but consist of transfers between a master peripheral and the rest of the system memory. These transfer requests can transfer data between any location in the DSP's memory map and the master peripherals. The priority level of these transfers is determined by the TR control register (TRCTL), located in the master peripheral register set. For information on programming the master peripherals, see the appropriate chapter referenced in the TMS320C6000 TMS320C6000 DSP Peripherals Overview Reference Guide (literature number SPRU190 SPRU190). 2.3 Transfer Request Bus The transfer requestors to the EDMA are connected to the transfer controller (TC) via the transfer request (TR) bus. If multiple TRs arrive at the TR bus simultaneously, they are submitted in the order of their priority. This has little impact on performance because these requests are arbitrated quickly (in about 2-4 EDMA cycles) compared to data transfer rates. 2.4 Transfer Controller Transfer requests are queued in the transfer controller based on their priority. The transfer controller is the portion of the EDMA that processes the TR and performs the actual data movement (see Figure 1). Within the TC, the TR is shifted into one of the transfer request queues to await processing. The transfer priority level determines the queue to which it is submitted. There are four queues, corresponding to four priority levels, each with a depth of 16 entries: Q0 (urgent), Q1 (high), Q2 (medium), and Q3 (low). Each TMS320C64x transfer requestor is programmable such that it can submit TRs on any priority level. Once the transfer request reaches the head of its queue, it is submitted to the queue registers to be processed. Only one TR from each priority queue can be serviced at a time by the address generation/transfer logic. The transfer logic can process transfers of different priorities concurrently. To maximize the data transfer bandwidth in a system, transfers should be distributed among all four priorities whenever possible. This topic is discussed at length in TMS320C6000 TMS320C6000 EDMA IO Scheduling and Performance (literature number SPRAA00 SPRAA00). The TC contains four queue register sets, one for each priority queue, which monitor the progress of a transfer. Within the register set for a particular queue, the current source address, destination address, and count are maintained for a transfer. These registers are not present in the device's memory map and are unavailable to the CPU. The TC is connected to peripherals via peripheral ports. This is where the actual data movement occurs during a transfer. 2.5 Peripheral Ports Peripherals involved in high speed data traffic (McBSP, UTOPIA, master peripherals - discussed in individual sections, Video Port, EMIF, and L2 controller) have ports that accept commands from the TC as shown in Figure 3. Each includes read and write FIFO command and data buffers between the high speed EDMA engine and the peripheral, which may operate at some lower frequency. The ports receive TC commands and access the peripherals directly, freeing the EDMA to service other transfers while waiting for a response from the peripheral. This design allows transfers to/from different peripherals on different priority levels to occur simultaneously. 6 TMS320C64x EDMA Architecture SPRA994 SPRA994 Peripheral Port High-Speed Read FIFO Buffer Read Commands From TC PeripheralSpeed Transfer Logic Return Data to TC Peripheral Write Commands and Data From TC High-Speed Write FIFO Buffer Data in Command Buffer Read or Write Command Figure 3. Peripheral Port Diagram The number of command buffers in each peripheral port (as well as the default burst size of that port) is fixed in order to maximize efficiency. Table 1. Command Buffers and Buffer Sizes Reads Writes Command Buffers Buffer Size (words) Command Buffers Buffer Size (words) L2 Memory Controller 8 2 8 2 TCP/VCP 8 2 8 2 McBSP 0/1/2 2 1 2 1 Utopia 2 16 2 8 EMIF A 4 16 4 32 EMIF B 4 4 4 8 Peripheral Peripheral availability varies by specific device. Refer to device data sheet. Peripheral and EMIF ports service all commands in the order of their arrival. For example, suppose four read commands arrive followed by four write commands. The four reads are serviced followed by the four writes. TMS320C64x EDMA Architecture 7 SPRA994 SPRA994 In contrast, the L2 port services reads and writes alternately, in the order of their arrival. Again, suppose four read commands arrive followed by four write commands. The first read is serviced, then the first write, followed by the second read, then the second write, and so on. 2.6 Transfer Controller Commands To perform a transfer, the TC sends commands to source and destination ports for data to be read/written. These commands are for small bursts of data, which are less than or equal to the total transfer size of the submitted transfer request. The default burst size and the number of command buffers per port is shown in Table 1. The TC sends commands to the ports for data transfers, but the actual data movement doesn't occur until the port is ready. However, waiting for the port to become ready does not stall the TC. Therefore, if the different queues request transfers to/from different ports, the transfers can occur at the same time. Transfer commands made to the same port(s) are arbitrated by the TC according to priority. To initiate a data transfer, the TC submits a command to the source or destination pipeline. There are three commands generated by the TC: pre-write, read, and write. Commands can be submitted to both pipelines once per cycle by any of the queue register sets. The TC arbitrates every cycle (separately for each pipeline) to allow the highest priority command that is pending to be submitted to the appropriate port. The pre-write command is issued to notify the destination that it is going to receive data. Once the destination has available space to accommodate the incoming data, it sends an acknowledgement to the EDMA that it is ready. After receiving the acknowledgment from the destination, a read command is issued to the source port. Data is read at the maximum frequency of the source into the command buffer, and then passed to the EDMA routing unit to be sent to the destination. Once the routing unit receives the data, the data is sent along with a write command to its destination port. Due to the EDMA's capability to wait for the destination's readiness to receive data, the source resource is free to be accessed for other transfers until the destination is ready. This provides an excellent utilization of resources and is referred to as write-driven processing. All write commands and data are sent from the EDMA to all resources on a single bus. The information is passed at the clock speed of the EDMA, and data from multiple transfers are interleaved based on priority when occurring simultaneously. In this way, the EDMA transfer controller is services commands and data from multiple transfers simultaneously. Also, ports can service more than one active transfer if they have the bandwidth to do so, always giving precedence to the higher priority transfers. This especially useful for the L2 memory port which is the fastest port in the system. The read data arrives on unique busses from each resource. This is to prevent contention and to ensure that data can be read at the maximum rate possible. Once the data arrives at the routing unit, the data that is available for the highest priority transfer is moved from its read bus to the write bus and sent to the destination port. The queue register sets, command bus, and routing unit are depicted in Figure 4. 8 TMS320C64x EDMA Architecture SPRA994 SPRA994 3-Deep SRC Pipeline Queue Registers L2 Port Pre-write Command Read Command McBSP Port UTOPIA Port Master Periph Port Queue0 Register Set Video Port TR's From Priority Queues Queue1 Register Set EMIF B Port Queue2 Register Set EMIF A Port Queue3 Register Set 3-Deep DST Pipeline Write Command + Data Routing Unit Read Data From Ports Figure 4. Command and Data Busses 3 Transfer Request Submission Knowing how and when TRs are submitted is important to understand when scheduling data traffic in a system. The types of TRs submitted to the hardware differ slightly depending on the requestor, but all TRs contain the same essential information: source and destination addresses, element count, and the relationship between the elements within the source and destination regions (fixed, increment, decrement, or indexed.) 3.1 L2 Transfer Requests The L2 transfer controller handles TR submission for CPU data accesses, L1 and L2 cache allocations from EMIF, L1 and L2 cache evictions to EMIF, and QDMA transfers. 3.1.1 CPU and Cache Transfer Requests The L2 controller services CPU requests and maintains L2 cache coherency. The L2 cache is of programmable size, it can be disabled, and it resides between the CPU's L1 cache and the rest of the DSP's memory mapped space. All cacheable memory spaces are serviced by the L1 cache and/or the L2 cache. If L2 cache size is zero (L2 cache is disabled), then cacheable memory is serviced by the L1 cache only. If L2 cache size is not zero (L2 cache is enabled), then cacheable memory is serviced by both L1 and L2 cache. TMS320C64x EDMA Architecture 9 SPRA994 SPRA994 When determining system traffic, it is important to know when the L2 controller generates transfer requests. There are five basic CPU/L2 actions which trigger TR submission, listed above in section 2.2.1. However, one or more of these actions can be triggered based on CPU activity. To determine exactly what circumstances generate which TRs, refer to Table 2. Table 2. EDMA/Cache Activity Due to CPU Accesses Read/Write Destination L1 Controller Action L2 Cache Enabled/ Disabled Number of TRs Submitted Number of Elements per TR Internal registers None Don't care None 0 - Memory mapped control registers Forward request to L2 controller Don't care Read from config bus no EDMA action 0 - L2 SRAM Hit returns data; miss allocates one L1 cache line from L2 controller Don't care Read SRAM no EDMA action 0 - Non-cacheable EMIF Forward request to L2 controller Don't care Submit TR to EDMA 1 1 Write access to cacheable EMIF Hit, data lands in L1D; miss, data is passed onto L2 Disabled Submit TR to EDMA 1 1 Read access to Cacheable EMIF Hit returns data; miss allocates one L1 cache line from L2 controller Disabled Requests 1 L1 cache line from EDMA 1 L1 line size (64 bytes) Cacheable EMIF Hit returns data; miss allocates one L1 cache line from L2 controller Enabled Hit returns L1 cache line; miss allocates one L2 cache line from EDMA 2 ½ L2 line size each (64 bytes each) L2 Controller Action Note the two-level memory structure. A data request traverses the memory hierarchy until the data is found. The hierarchical data access sequence consists of the following: 1. The CPU requests data from the L1 controller. 2. The L1 controller checks L1 memory and requests data from the L2 controller if the data is not in L1. 3. The L2 controller checks L2 memory, and if the data is not in L2, requests data from the peripheral config bus or the EDMA (depending on the address range of the access.) 4. Data requests to the EDMA result in TRs. Cache hits can reduce CPU wait states, and they have the added benefit of reducing EDMA traffic. For example, by using cache to access data in EMIF, the first request allocates one cache line from EMIF by submitting a TR. Subsequent hits to that cache line are returned quickly and no TR is issued. Furthermore, there is less EDMA latency when data is transferred in a block compared to transferring element by element. Also note that a cache writeback to EMIF generates a single TR to write out modified data. These occur anytime there is no space in cache memory for a pending allocation, and the least recently used cache line contains dirty data. 10 TMS320C64x EDMA Architecture SPRA994 SPRA994 For additional details on the two-level cache architecture of the TMS320C64x devices, including how to define memory spaces as cacheable, see the TMS320C64x DSP Two-Level Internal Memory Reference Guide (literature number SPRU610 SPRU610). 3.1.2 QDMA Transfer Requests The L2 controller also submits TRs for QDMA transfers. QDMA transfers are initiated by writing to the QDMA pseudo-registers. Transfers are for simple block or frame transfers and take as little as one cycle to submit. For more information on QDMA transfers, consult TMS320C64x DSP Two-Level Internal Memory Reference Guide (literature number SPRU610 SPRU610). 3.2 EDMA Channel Transfer Requests EDMA channels can be programmed to transfer data in a large variety of ways. Each channel is synchronized to a particular system event. One event corresponds to one TR submission, which transfers all or some of the data described by the parameter set. Due to the large number of configurations possible, the programming of an EDMA channel is not described in this document. For details see the TMS320C6000 TMS320C6000 EDMA Controller Reference Guide (literature number SPRU234 SPRU234). The amount of data transferred by a single TR is shown in Table 3. Table 3. Data Transferred by an EDMA Channel Transfer Request Source Dimension Synchronization Data Transferred by TR 1-D 1-D Read/Write (FS=0) 1 element 1-D 1-D Frame (FS=1) Element count (one frame) Other Array (FS=0) Element count (one array) Other 3.3 Destination Dimension Block (FS=1) (Array count + 1) x element count HPI Transfer Requests The HPI controller submits TRs based on programmable actions performed by the host. To maximize the bandwidth available to host data transfers there are read and write FIFOs, each of which can contain eight 32-bit words. When possible, the HPI bursts multiple words between the HPI FIFOs and the physical memory. Table 4 describes the burst size of the TR submitted, depending on the host activity involved. Table 4. Data Transferred by an HPI Transfer Request Host Access Data Transferred by TR Situation Non-auto-increment read HPI reads HPID register 1 word Non-auto-increment write HPI writes HPID register 1 word Auto-increment read HPI reads HPID register and FIFO is empty 8 words Auto-increment read HPI reads HPID register, FIFO is less than or equal to half full, no outstanding TRs 4 words Auto-increment write HPI writes to HPID and data is the fourth element written since last TR issued 4 words TMS320C64x EDMA Architecture 11 SPRA994 SPRA994 Data is transferred based on the host activity. If the host is performing individual accesses (accessing HPID in non-auto-increment mode), TRs are submitted for each individual element. If the host is performing burst transfers (accessing HPID in auto-increment mode) then TRs are submitted for multiple contiguous elements at a time. See the TMS320C6000 TMS320C6000 DSP Host Port Interface Reference Guide (literature number SPRU578 SPRU578) for additional information on the HPI, including a block diagram, register descriptions, pin listing, and waveforms. 3.4 PCI Transfer Requests The PCI submits TRs based on actions performed by the PCI master, which could be an external host or the DSP. To maximize the bandwidth available there are 16 word read and write buffers for each type of access: master read, master write, slave read, and slave write. When possible (multiple word access to prefetchable memory), the PCI bursts multiple words between the FIFOs and the physical memory. Table 5 describes the burst size of the TR submitted, depending on the activity involved. Table 5. Data Transferred by an PCI Transfer Request PCI Access Situation Data Transferred by TR PCI slave read/writes to non-Prefetchable memory n/a 1 word PCI slave writes to prefetchable memory Slave write buffer is more then 1/4 full 4 words or transfer size (if