| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
Murat Karaorman, Vincent Wan, Jagadeesh Sankaran ABSTRACT This applica
Top Searches for this datasheetGuide eXpressDSP-Compliant Algorithm Producers Consumers Murat Karaorman, Vincent Wan, Jagadeesh Sankaran ABSTRACT This application note provides overview architecture specified TMS320 Algorithm Standard (also known XDAIS). also describes sets APIs used accessing resources: IDMA2 abstract interface ACPY2 library. addition providing overview fundamental abstractions eXpressDSPcompliant algorithms, this application note highlights DMA-related enhancements Algorithm Standard Version TMS320 Algorithm Standard Developer's Kit. Sections this application note provided producers consumers eXpressDSPcompliant algorithms. Texas Instruments, Santa Barbara Contents Introduction Using This Application Note Overview Standard Interfaces Fundamental Abstractions 1.3.1 Logical Channels Handles 1.3.2 Queuing Transfers Queue 1.3.3 Channel Privacy Synchronization 1.3.4 Transfer Configuration Settings Interface Summary What's IDMA2 ACPY2? 1.5.1 C6000 Specific Changes 1.5.2 C5000 Specific Changes Algorithm Consumers: Integrating Algorithms that Integrating Algorithms that Resources ACPY2 Module APIs Frameworks 6000 Specific Issues Algorithm Consumers 2.3.1 ACPY2 Implementation Provided C6x1x 2.3.2 Cache Coherency Issues Algorithm Consumers 2.3.3 Serialization QueueIds ACPY2 C55x Specific Issues Algorithm Consumers 2.4.1 Supporting packed/burst mode Transfers 2.4.2 Addressing Automatic Endianism Conversion Issues Trademarks property their respective owners. SPRA445 Algorithm Producers: Creating Algorithms that IDMA2 ACPY2 Related Changes that Affect Algorithm Developers Rules Guidelines Summary Implementing IDMA2 Interface Configuring Logical Channels Transfers 3.4.1 Performance Considerations Scheduling Asynchronous Transfers Logical Channels 3.5.1 Using ACPY2_start 3.5.2 Using ACPY2_startAligned 3.5.3 Algorithm Design Considerations Synchronizing Serializing Transfers C6000 Specific Issues Algorithm Producers 3.7.1 Cache Coherency Issues Algorithm Producers C5000 Specific Issues Algorithm Producers 3.8.1 Source Destination Types Require 32-Bit Extended Byte Addressing 3.8.2 Source Destination Addresses Byte Addressing 3.8.3 C55x Issue: Supporting Packed/Burst Mode Transfers 3.8.4 C55x Issue: Addressing Automatic Endianism Conversion 3.8.5 Using ACPY2_start ACPY2_startAligned Fast Copy (FCPY) Algorithm Example IFCPY_Interface Functions 4.1.1 Instance Heap Memory Requirements 4.1.2 IDMA2 ACPY2 Interfaces Conclusion References Appendix Code FCPY_TI Algorithm List Figures Figure Figure Figure Figure Figure Figure Figure Figure Figure Transfer Block Client Application Algorithm Interaction with Resources Read Access Coherency Problem Write Access Coherency Problem Cache Line Effects Cache Coherence Performance Figures ACPY/ACPY2 Implementation C6711 1D-to-2D Transfer Example Cache Line Effects Cache Coherence FCPY doCopy Operation List Tables Table Table Table Table Table Table Standard Interfaces Related IDMA Functions ACPY Functions IDMA2_Params Structure Fields IDMA2_ChannelRec Structure Fields Instance Heap Memory Requirements Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Introduction direct memory access (DMA) controller performs asynchronously scheduled data transfers between memory regions without intervention CPU. controller allows movement from internal memory, internal peripherals, external devices occur background, while continues execute other instructions parallel. Algorithms applications achieve greater throughput using overlap data movement with processing. However, eXpressDSP-compliant algorithms allowed directly access control hardware peripherals, which include DMA. system resources controlled client application. TMS320 Algorithm Standard (also known XDAIS) specifies interfaces, that when implemented, allow client application algorithm negotiate resources which turn grants algorithms controlled access services. allow algorithm schedule transfers, client application must inquire, from algorithm during instance creation, about resource requirements grant handles accessing DMA. Each granted handle provides algorithm uniform, private "logical" channel abstraction. Table summarizes these interfaces. Section provides more information. Table Standard Interfaces Related Algorithm Interfaces Implemented Implementations Called IDMA2 deprecated IDMA abstract interfaces ACPY2 (ACPY) access logical channel configure, request, synchronize data transfers Client Application ACPY2 deprecated ACPY concrete interfaces IDMA2 (IDMA) interface query grant logical channels ACPY2 module initialization logical channel object size query support introduced XDAIS specification during first revision 2000 through introduction rules standard interfaces: IDMA ACPY. recently introduced additional rules guidelines enhanced APIs: IDMA2 ACPY2. These APIs deprecate original IDMA ACPY APIs. Throughout rest this application note will exclusively IDMA2 ACPY2 APIs. differences between sets APIs highlighted where applicable summarized section 1.2. Note that original interface header files: idma.h acpy.h still present XDAIS Developer's (Version 2.5) backward compatibility with existing frameworks applications. However, these will gradually phased algorithm vendors transition higher performance functionality ACPY2 IDMA2 interfaces. Using This Application Note addition providing overview access eXpressDSP-compliant algorithms, this application note highlights DMA-related enhancements Algorithm Standard Version TMS320 Algorithm Standard Developer's Kit. separate application note, Design Implementation eXpressDSP-Compliant Manager C6x1x (SPRA789), authors present C6x1x-specific (C6211, C6711, C6416) implementation ACPY2 APIs along with example algorithm application (including source code) demonstrate end-to-end system with algorithms that DMA. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 This application note intended assist both following groups: Consumers. Application developers system integrators algorithms that require should read section which includes several sections with information algorithm consumers. Producers. Algorithm developers implement IDMA2 interface ACPY2 calls perform operations should read following sections: Section which includes several sections with information algorithm producers. Appendix which presents complete algorithm illustrate implement IDMA2 functions ACPY2 functions perform transfers. Overview Standard Interfaces Algorithms must access hardware "logical" channel handles which they request receive from client application. Algorithms submit transfer requests these logical channels through functions provided client application. sets interfaces required accessing resources: IDMA2 ACPY2. IDMA2. algorithms that resources must implement IDMA2 interface. This interface allows algorithm request receive handles representing private logical resources. ACPY2. These functions implemented part client application called algorithm (and possibly client application). client application must implement ACPY2 interface integrate provided ACPY2 interface) order algorithms that resource. ACPY2 interface describes comprehensive list operations algorithm access through logical channels which acquires through IDMA2 protocol. ACPY2 functions allow: Configuring channel transfer parameters Scheduling asynchronous transfers Synchronizing with scheduled transfers (both blocking non-blocking) Chapter Resource, TMS320 Algorithm Standard Rules Guidelines (SPRU352) describes these interfaces. TMS320 Algorithm Standard Reference (SPRU360) provides details each function. Collectively, IDMA2 ACPY2 describe flexible efficient model that greatly simplifies management system resources services client application simple powerful mechanism algorithm configure access services. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 1.3.1 Fundamental Abstractions Logical Channels Handles logical channel fundamental abstraction defined hardware-independent manner through introduction ACPY2 IDMA2 specifications. Each logical channel represents private hardware resource private state identified accessed through handle. Applications charge physical resources. They grant handles algorithms, which have requested them using IDMA2 interface. Algorithms handles call ACPY2 functions order configure logical channel settings, request transfers, synchronize with ongoing transfers. configuration setting logical channel similar hardware register settings particular physical device. most recently configured channel settings applied each transfer request. While logical channels provide uniform resource service abstraction, reality systems come with vastly different physical architectures, limitations, hardware characteristics. client application provide implementation ACPY2 APIs target hardware match expected performance functional requirements. also client application seamlessly arrange sharing physical resources among algorithms that request logical channels which operate. example, ACPY2 library software implement queuing behavior even when underlying physical devices have hardware queuing capability. 1.3.2 Queuing Transfers Queue Several outstanding transfer requests submitted logical channel asynchronous specification submit (ACPY2_start ACPY2_startAligned) functions. important property logical channel specification strict first-in, first-out (FIFO) ordering which submitted transfers carried out. Therefore, logical channel also seen implementing queue transfer requests. Each logical channel seen independent queue; however, with newly introduced queueId attribute logical channel's descriptor that gets requesting algorithm, FIFO ordering property further extends multiple logical channels sharing same queueId. 1.3.3 Channel Privacy Synchronization Algorithms have exclusive ownership each received logical channel. They operate safely without fear external components (other algorithms other system code) accessing channel issuing transfer requests changing channel configuration settings. synchronization calls issued channel, opposed transfer basis. algorithm issue either blocking wait, non-blocking query call synchronize with logical channel's completion status. 1.3.4 Transfer Configuration Settings purpose acquiring logical channel handles submit transfer requests. Each submitted transfer request specifies source destination memory region. background activity asynchronously copies contents source memory region destination. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 properties transfers make them desirable their performance critical algorithms: physical transfer/copy operation takes place "background" under close control specialized circuitry controllers. This allows algorithms issue transfer requests sufficiently advance, perform other useful operations while data being copied background. physical layout source destination transfer blocks does have single contiguous chunk memory. setting channel configuration parameters, algorithms specify complex layout patterns. This lead significant performance improvements, even when algorithm cannot take advantage asynchronous execution sits idle while waiting transfer complete. unit transfer block composed frames elements. Each transfer submitted logical channel ACPY2_start ACPY2_startaligned function. source destination addresses blocks number elements each frame passed function arguments. remaining configuration parameters intrinsic properties logical channel exclusively algorithm calling ACPY2 configuration functions. previously configured properties logical channel time transfer request determine actual memory that gets copied from source destination. Each transfer characterized following list configurable attributes. (Figure illustrates memory layout transfer block characterized these configuration parameters.) Transfer Type: 1D-to-1D, 1D-to-2D, 2D-to-1D 2D-to-2D Element Size: number 8-bit bytes element Number Frames: number frames block, number 65535 Source/Destination Element Index: size between consecutive elements within frame plus element size 8-bit bytes. When element index zero element indexing used Source/Destination Frame Index: size 8-bit bytes between consecutive frames within block. Defined transfers only. Number Elements: number elements frame, number 65535 Source/Destination Addresses: 8-bit byte-addresses element frame index parameters defined independently both source destination. hardware does support setting these independently source destination, case C6x1x EDMA architecture, they must configured with same value. Configure functions should indicate error status when configuration settings supported client implementation. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Source (destination) address Element size Element Element index Frame Element Element Frame index Element Transfer block: Number frames N+1, Number elements Frame Element Element Frame Figure Transfer Block Interface Summary sets standard interfaces required accessing resources: IDMA2 ACPY2. Figure shows which modules implemented client application which algorithm. Arrows indicate which modules other modules. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Algorithm IALG algAlloc(). dmaGetChannels() dmaInit(). manager DMAN DMAN_int() DMAN_. ACPY2 ACPY library ACPY2_configure() ACPY2_start(). hardware IDMA2 Algo functions Client application Framework Figure Client Application Algorithm Interaction with Resources client uses IDMA2 interface query grant handles representing private "logical" resources. algorithm functions call ACPY2 module functions that provided client application schedule transfers. Like IALG, IDMA2 abstract interface accessed through algorithm module's IDMA2 functions table. ACPY2, other hand, concrete interface whose functions referenced directly. Client applications provide general purpose manager module (depicted DMAN Figure granting resources algorithms, designed wrapper around IDMA2 interface. reference DMAN module been developed companion application note, SPRA789, used this purpose. following tables summarize functions structures used IDMA2 ACPY2 interfaces identify items that v2.5 XDAIS Developer's Kit. items described more detail this application note. first column each table indicates specific function relates corresponding deprecated APIs. Table IDMA Functions IDMA Functions dmaChangeChannels dmaGetChannelCnt dmaGetChannels dmaInit Description Called application whenever logical channels moved runtime. Called application query algorithm about number logical channel requests. Called application query algorithm about channel requests initialization time, current channel holdings. Called application grant handles algorithm during initialization. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Table ACPY Functions ACPY Functions ACPY2_complete ACPY2_configure Change ACPY2_exit ACPY2_getChanObjSize ACPY2_init ACPY2_initChannel ACPY2_setNumFrames ACPY2_setSrcFrameIndex ACPY2_setDstFrameIndex ACPY2_start ACPY2_startAligned ACPY2_wait Description Check data transfers specific logical channel have completed Configure logical channel Free resources used ACPY2 module size IDMA2 channel object Initialize ACPY2 module Initialize IDMA2 channel object passed FRAMEWORK FRAMEWORK FRAMEWORK FRAMEWORK Rapidly configure numFrames parameter IDMA2 channel Rapidly configure source frame index parameter IDMA2 channel Rapidly configure destination frame index parameter IDMA2 channel Issue request data transfer using current channel settings Issue request data transfer using current channel settings (assumes aligned addresses) Wait data transfers complete specific logical channel Table IDMA2_Params Structure Fields IDMA Structures xType elemSize numFrames srcElementIndex dstElementIndex srcFrameIndex dstFrameIndex Description Transfer type: 1D1D, 1D2D, 2D1D 2D2D Element transfer size {1,2, bytes} frames elemSize between consecutive elements source data 8-bit bytes) elemSize between consecutive elements destination data 8-bit bytes) Jump between source data frames transfers 8-bit bytes) Jump between destination data frames transfers 8-bit bytes) Table IDMA2_ChannelRec Structure Fields IDMA Removed Removed Structures depth dimension handle queueId Handle logical channel Selects serialization queue Description Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 What's IDMA2 ACPY2? This application note highlights following DMA-related enhancements that were introduced Algorithm Standard Version TMS320 Algorithm Standard Developer's Kit: enhanced IDMA2 ACPY2 APIs deprecate current IDMA ACPY APIs. guidelines high performance. These guidelines allow vendors extract maximum performance benefits when developing algorithms that heavily. (See sections 3.2, 3.3, 3.4, 3.5, 3.6.) Submitted transfers each logical channel performed FIFO/serial order. This diminishes need ACPY2_wait() synchronization when scheduling back-to-back transfers from same buffer. (See sections 2.3.3 3.6.) "queueId" property logical channels. addition FIFO rule above, transfers submitted separate channels sharing same queueId must also complete sequentially. ACPY2 functions take advantage queueIds logical channels separate physical hardware devices queues. (See sections 2.3.3 3.6.) ACPY2 functions optimized configuration optimized data transfer requests. (See section 2.2.) rules guidelines external memory access buffer alignments specific C6000 devices where cache coherence between cache external memory directly supported hardware when simultaneous accesses exist. (See sections 2.3.2 3.7.) "frame index" (formerly stride) supporting separate source destination strides transfer blocks. "element index" field source destination transfer blocks. Removed depth dimension attributes from channel descriptors. 1.5.1 C6000 Specific Changes rules guidelines external memory access buffer alignments devices where cache coherence between cache external memory directly supported hardware, when simultaneous accesses exist. 1.5.2 C5000 Specific Changes Changed using 8-bit byte addressing source destination addresses. Support extended memory addressing capability transfers. [C55x only] rules addressing automatic endianism conversion issues. [C55x only] guidelines enhancing performance through supporting packed/burst modes operation. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Algorithm Consumers: Integrating Algorithms that Algorithm consumers should aware number changes ways they need integrate algorithms that DMA: ACPY2 Interface. Several ACPY2 functions should implemented part ACPY2 module. section 2.2. Rules Guidelines. rule guideline that apply algorithm consumers have been added XDAIS specification. section 2.3.2. Serialization Queueids. transfers submitted same channel queueId must start complete same order. section 2.3.3. C6000 Specific Issue: Cache Coherency. Data that stored both external memory cache cause problems with transfers several ways. section 2.3.2. C5000 Specific Issue. alignment, size, data access rules C55x algorithms introduced section 2.3. Integrating Algorithms that Resources steps integrating algorithm that uses resources will differ depending upon your system needs specific management policy. following steps offer simple generic example that used algorithm that requests resources. These steps incorporated into client's manager module that used grant resources algorithms. Implement include ACPY2 library application. ACPY2 module must implement functions specified XDAIS specification. This non-trivial step beyond scope this note discuss implement ACPY2 APIs. companion application note presents details design implementation ACPY2 library C6x1x which shipped along with that application note. Initialize algorithm calling initialization function, values fields algorithm-specific params structure, which declared i<mod>.h. IMOD_Params modParams; MOD_init(); default instance creation parameters modParams IMOD_PARAMS; standard IALG interface allocate grant memory buffers requested algorithm initialize instance object. IALG_Handle algHandle; *)MOD_VEND_IALG, NULL, (IALG_Params *)&modParams)) NULL) SYS_abort("could create algorithm instance"); Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Obtain reference module's IDMA2 functions table. Call module's dmaGetChannelCnt() number logical channels needed algorithm. IDMA2_Fxns *dmaFxns &MOD_VEND_IDMA2; numChan; numChan dmaFxns->dmaGetChannelCnt(); Allocate space heap stack channel descriptors used IDMA2 calls. IDMA2_ChannelRec *dmaTab (IDMA2_ChannelRec malloc(numChan sizeof(IDMA2_ChannelRec) Call dmaGetChannels() channel properties requested algorithm. numChan dmaTab); Call ACPY2_getChanObjSize() size channel object structure representing logical channels). size each channel object implementation-dependent part implementation ACPY2 library. library must provide size information client application. chanObjSize ACPY2_getChanObjSize(); Allocate memory each logical channel object. dmaTab[i].handle Call ACPY2_initChannel() initialize each channel object. dmaTab[i].queueId); Call dmaInit() pass channels algorithm. (dmaFxns->dmaInit(algHandle, dmaTab) IALG_EOK) return (TRUE); init success ACPY2 Module APIs Frameworks number functions have been added ACPY2 interface specification client application's use. These functions intended provide application frameworks standardize generation logical channel handles. nutshell that application needs generate handle allocate assign memory handle call channel initialization function. memory assigned handle represents private channel state data managed entirely manager's ACPY2 library. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 extern Void ACPY2_init(Void); Initialize ACPY2 module. Void ACPY2_initChannel(IDMA2_Handle handle, qid); Initialize IDMA2 channel object passed (handle). queue id(qid) parameter used ACPY2 library ensure FIFO completion transfers submitted channels sharing same Depending Resource Manager policy, either same value from algorithm, some other consistent internal mapping that value passed APCY2_initiChannel. ACPY2_getChanObjSize(Void); size IDMA2 channel object. Application uses size information allocate space handle. extern Void ACPY2_exit(Void); Free resources used ACPY2 module. Refer TMS320 Algorithm Standard Reference (SPRU360) details about these functions. 2.3.1 6000 Specific Issues Algorithm Consumers ACPY2 Implementation Provided C6x1x C6x1x-specific implementation ACPY2 library provided with companion application note. This application note provides function-by-function description implementation. attachment provides optimized library. retaining configuration resource states with each logical channel descriptor, ACPY2 functions implemented very efficiently. This C6x1x implementation demonstrates this done. 2.3.2 Cache Coherency Issues Algorithm Consumers several C6x1x devices, data that both external memory cache cause problems with transfers several ways. Please refer TMS320C621x/C671x Two-Level Internal Memory Reference Guide (SPRU609), more details. this section summarize relavant issues that might affect algorithm consumers. Figure memory corresponding location been brought into cache. copy cache been modified, been written back external memory. transfer copies data from location another location, would reading stale data. avoid this problem, cache must flushed before read proceeds. cache External memory Figure Read Access Coherency Problem Figure location been brought into cache. Suppose transfer writes data location this case, would access cached data subsequent read, unless cached copy invalidated. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 cache External memory Figure Write Access Coherency Problem deal with these coherency problems, several guidelines rules have been added. Previously, XDAIS rules guidelines applied only algorithms. version TMS320 Algorithm Standard Developer's Kit, following guideline rule have been added that apply client applications. Guideline ensure correctness, C6000 algorithms that implement IDMA2 need supplied with internal memory they request from client. This guideline been added conjunction with requirement XDAIS specification that states client application must inform algorithm type memory (for example, internal external) used each buffer allocates algorithm. client application does this IALG_MemRec structure passed algorithm using algInit(). algorithm this information decide respond does receive type memory requests. Rule C6000 algorithm implemented IDMA2 interface, client must allocate required cache line boundary. These buffers must multiple cache line length size. client must also clean cache entries these buffers before passing them algorithm. This rule targeted application client application writer. ensures that cached entries buffers passed into algorithm flushed avoid coherency problem shown Figure example, fastcopytest.c example described companion application note, Design Implementation eXpressDSP-Compliant Manager C6x1x (SPRA789), uses following macro clean cache before initializing data arrays before performing algorithm that uses transfers. CACHE_clean(.); important that input output buffers allocated cache line boundary multiple cache line length size. shown Figure location accessed other data shares same cache line, entire cache line brought into cache when accessed. Location would then cache, which violates reason behind Rule cache line cache External memory Figure Cache Line Effects Cache Coherence Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 2.3.3 Serialization QueueIds ACPY2 requirement that transfers submitted same channel ACPY2_start() ACPY2_startAligned() must start complete same real-time order they were issued. Additionally, transfers submitted separate channels sharing same queueId must also complete sequentially. existing C6000 devices based EDMA, transfers submitted hardware queue this requirement implicitly met. same stated reasons, implementing queueId policy also easily met. sophisticated managers/ACPY2 libraries, queueId field utilized transparent algorithms, assign different priority levels improve hardware parallelism. 2.4.1 C55x Specific Issues Algorithm Consumers Supporting packed/burst mode Transfers performance studies C55/OMAP 1510 platform identified need perform transfers burst enabled/packed transfer modes much possible. this mode possible achieve speedup factor transfers from internal external memory factor speedup external internal transfers. Guideline below introduced transparently assist ACPY2 library implementations OMAP C55x platforms perform transfers with burst enabled/packed mode. corollary, client applications advised also follow same guideline both application side data, application buffers that passed arguments algorithm's processing functions. Guideline facilitate high performance, C55x algorithms should request transfers with source destinations aligned 32-bit byte addresses. Additionally, where configuration options exist, manager should operate packed/burst mode. 2.4.2 Addressing Automatic Endianism Conversion Issues alignment, size, data access rules govern C55x algorithms ensure correct operation presence possible automatic endianism conversion. issue treated greater detail algorithm producers later section. Application developers should also follow same rules guidelines. Algorithm Producers: Creating Algorithms that This section intended developers eXpressDSP-compliant algorithms that DMA. Following brief summary references changes introduced IDMA2 ACPY2 requirements, discuss algorithm producers implement required interfaces request receive resources, configure channels, schedule transfers, synchronize with completion status scheduled transfers. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 IDMA2 ACPY2 Related Changes that Affect Algorithm Developers Algorithm producers should aware number changes ways they must develop algorithms that DMA: Rules guidelines. number rules guidelines have been added algorithms that resources. summary, section 3.2. IDMA2 interface. algorithms that require resources must implement IDMA2 interface. section 3.3. Configuring channels. Algorithms should optimize transfers configuring each channel single time. Optimized functions configuring channels have been added ACPY2 interface. section 3.4. Scheduling. ACPY2_start() ACPY2_startAligned() function used schedule transfer. ability configure transfer parameters optional argument same scheduling function supported this function. This change from deprecated ACPY_start API. section 3.5. Synchronization. transfers issued same channel must start complete same order they were issued. Additionally, separate channels sharing same queueId must also complete sequentially. section 3.6. C6000 Specific: Cache Coherency. Data that stored both external memory cache cause problems with transfers several ways. section 3.7.1. C5000 Specific: Extended 8-Bit Byte Addressing. source destination addresses ACPY2_start ACPY2_startAligned APIs defined using newly introduced type, IDMA2_AddrPtr, with 32-bit representation C5000 targets. section 3.8. C55x Specific: Endianism Support. alignment, size, data access rules help ensure correct operation presence possible automatic endianism conversion. Rules allow general-purpose C55x algorithms deployable OMAP based target DSPs. Rules Guidelines Summary previous version TMS320 Algorithm Standard specified following rules guidelines algorithms that request resources. Rule data transfer must completed before return caller. Rule algorithms using resource must implement IDMA2 interface. Rule Each IDMA2 methods implemented algorithm must independently relocateable. Rule algorithms must state maximum number concurrent transfers each logical channel. Rule algorithms must characterize average maximum size data transfers logical channel each operation. Also, algorithms must characterize average maximum frequency data transfers logical channel each operation. Guideline data transfer should complete before operations executing parallel. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 following rules guidelines version TMS320 Algorithm Standard Developer's algorithms that resources applications that integrate such algorithms. Guideline Algorithms should request distinct IDMA2 logical channel distinct type transfer issues. Guideline ensure correctness, algorithms using IDMA2 need supplied with internal memory they request from client application using algAlloc(). (Applies client applications.) Rule algorithm using IDMA2 must directly access buffers external memory involved transfers. This includes input buffers passed algorithm through function interface. Rule algorithm implemented IDMA2 interface, input output buffers residing external memory, passed this algorithm through function calls, should allocated cache line boundary multiple caches lines size. application must also clean cache entries these buffers before passing them algorithm. (Applies client applications.) Rule buffers residing external memory involved transfer should allocated cache line boundary multiple cache lines size. Rule Algorithms should stack allocated buffers source destination transfer. Rule C55x algorithms must request data buffers external memory with 32-bit alignment sizes multiples bytes. Rule C55x algorithms must same data type access mode when reading, writing, transferring data that stored external memory application-passed data buffers. Implementing IDMA2 Interface algorithm that requests resources must implement IDMA2 interface. FCPY_TI algorithm discussed this application note provides example implementation IDMA2 interface. following list describes each function that must implemented shows example. dmaChangeChannels() function. This function should update algorithm instance object's persistent memory using channel descriptors table. ======== FCPY_TI_dmaChangeChannels ======== Update instance object with logical channel. Void handle, IDMA2_ChannelRec dmaTab[]) FCPY_TI_Obj *fcpy (Void *)handle; fcpy->dmaHandle1D1D8B dmaTab[CHANNEL0].handle; fcpy->dmaHandle1D2D8B dmaTab[CHANNEL1].handle; fcpy->dmaHandle2D1D8B dmaTab[CHANNEL2].handle; Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 dmaGetChannels() function. This function should fill channel descriptors table (passed client application) with channel characteristics each logical channel required algorithm. ======== FCPY_TI_dmaGetChannels ======== Declare resource requirement/holdings. handle, IDMA2_ChannelRec dmaTab[]) FCPY_TI_Obj *fcpy (Void *)handle; Initial values logical channels dmaTab[CHANNEL0].handle fcpy->dmaHandle1D1D8B; dmaTab[CHANNEL1].handle fcpy->dmaHandle1D2D8B; dmaTab[CHANNEL2].handle fcpy->dmaHandle2D1D8B; Want transfers serialized simplify debugging) dmaTab[CHANNEL0].queueId dmaTab[CHANNEL1].queueId dmaTab[CHANNEL2].queueId return (NUM_LOGICAL_CH); dmaGetChannelCnt() function. This function should return number channels requested dmaGetChannels() function. #define NUM_LOGICAL_CH ======== FCPY_TI_dmaGetChannelCnt ======== Return number logical channels requested. FCPY_TI_dmaGetChannelCnt(Void) return(NUM_LOGICAL_CH); dmaInit() function. This function should save handles logical channels granted framework algorithm instance object's persistent memory. ======== FCPY_TI_dmaInit======== Initialize instance object with granted logical channel. FCPY_TI_dmaInit(IALG_Handle handle, IDMA2_ChannelRec dmaTab[]) FCPY_TI_Obj *fcpy (Void *)handle; fcpy->dmaHandle1D1D8B dmaTab[CHANNEL0].handle; fcpy->dmaHandle1D2D8B dmaTab[CHANNEL1].handle; fcpy->dmaHandle2D1D8B dmaTab[CHANNEL2].handle; return (IALG_EOK); Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Configuring Logical Channels Transfers every logical channel before transfer requests submitted channel, algorithm must configure channel's transfer parameters. Each configurable property characterizes layout each transfer block depicted graphically Figure corresponds fields (shown parenthesis here) IDMA2_Params structure Table Transfer Type (xType): 1D-to-1D, 1D-to-2D, 2D-to-1D 2D-to-2D Element Size (elemSize): number 8-bit bytes element Number Frames (numFrames): number frames block, number 65535 Element Indexes (srcElementIndex size between consecutive elements within dstElementIndex): frame plus element size 8-bit bytes. When element index zero element indexing used Frame Indexes (srcFrameIndex size 8-bit bytes between dstFrameIndex): consecutive frames within block. Defined transfers only. Logical channels always "remember" most recently applied configuration settings, additional reconfiguration unnecessary unless different type transfer setting needed. When transfer request submitted, current channel transfer parameters recorded applied when memory transfer carried out. There several ACPY2 functions configure transfer parameters logical channel type transfer: Configure-all function: ACPY2_configure. takes IDMA2_Params argument, replaces entire channel settings with configuration. Fast configuration functions: ACPY2_setNumFrames, ACPY2_setSrcFrameIndex, ACPY2_setDstFrameIndex. Each function selectively updates number frames, source frame index destination frame index parameters current configuration, respectively. 3.4.1 Performance Considerations algorithms that rely heavily speed completion DMA, transfers minimizing configuration overhead extremely critical. Indeed, addition fast configuration APIs, change transfer request submission APIs, ACPY2_start ACPY2_startAligned, Guideline result from performance requirements that were adequately addressed deprecated ACPY specifications. illustrate cost configuration operations refer Figure that shows performance figures implementation ACPY/ACPY2 library EDMA device C6x1x processors. These numbers were gathered C6711 using Code Composer Studio 2.1's profiler clock, setting breakpoints before after function calls observe change clock's cycle count. three most important numbers this point discussion ones ACPY_start() when passed non-null IDMA_Params structure, ACPY2_start(), ACPY2_startAligned, ACPY2_configure(). Also note that "DAT" calls refer support module chip support library (CSL) their peformance figures included reference. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 cycles ACPY_start NULL) ACPY_start (pNULL) DAT_copy ACPY calling ACPY2_startAligned DAT_wait ACPY2_start ACPY2_wait ACPY2_configure (1D) ACPY2_configure (2D) ACPY2_setSrcFrameIndex Empty assembly Figure Performance Figures ACPY/ACPY2 Implementation C6711 ACPY specifications, high number cycles ACPY_start() with non-null IDMA_Params shows that logical channel configured same time transfer request submitted, function incurs substantial amount overhead. straightforward optimization technique break work into parts: channel configuration other transfer request submission. Therein lies motivation guidelines: Guideline Algorithms should minimize channel (re)configuration overhead requesting dedicated logical channel each distinct type transfer issues, avoid calling ACPY2_configure. Algorithms should fast configuration APIs where possible. Guideline useful follow when different types transfers needed critical loop algorithm. defining different IDMA2 logical channels each transfer type, ACPY2_configure() called each channel beginning algorithm code. Then, transfer requests rapidly submitted these preconfigured channels critical loop using ACPY2_start() calls, eliminating reconfiguration overhead. Using ACPY2_startAligned() cuts transfer request submission overhead from cycles cycles, approximately reduction cycles. This improvement particularly significant when requests submitted within loop. following Guideline algorithm code structured follows minimize channel configuration overhead: Void MYALG_TI_process(.) Configure logical channels Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 ¶ms1); ¶ms2); ¶ms2); Critical loop algorithm channel already figured Scheduling Asynchronous Transfers Logical Channels Algorithms call ACPY2_start() ACPY2_startAligned() function submit request asynchronous transfer memory block copy from specified source destination memory block. only operational difference between ACPY2_startAligned() ACPY2_start() additional requirement ACPY2_startAligned() source destination addresses properly aligned with respect configured element size. Both functions return caller (the algorithm) soon transfer request submitted logical physical hardware queue devices that will asynchronously perform copy operation. exact source destination memory layout that gets physically copied determined transfer parameters time ACPY2_start() ACPY2_startAligned() call been issued. 3.5.1 Using ACPY2_start ACPY2_start() makes assumptions alignment source destination addresses indexes. accepts addresses indexes alignment when allowed architecture, such C6000, adjusts transfer parameters (including element size, number elements, transfer type) transparently perform desired transfer using given alignment, necessary. This provided specifications with intention simplify algorithm development initial stages. ACPY2_start() thus strives maintain simplicity while maintaining reasonable levels performance. 3.5.2 Using ACPY2_startAligned ACPY2_startAligned() API, other hand, expects source destination addresses indexes properly aligned with respect configured element size. When using 32-bit transfer mode, these addresses must 32-bit aligned. 16-bit transfers, 16-bit alignment required. Passing source destination addresses/indexes with incorrect alignment with respect configured element size handle will result unspecified behavior. this respect sole ACPY2_startAligned() guarantee performance eliminating run-time checks prenegotiated contract with algorithm developer. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 3.5.3 Algorithm Design Considerations There cases which well served ACPY2_startAligned there cases which well served ACPY2_start. Good examples cases that served well ACPY2_startAligned constant buffer rate scenarios with input buffering bitstreams processing creating output frames external memory. Such transfers allow full utilization bandwidth everything known about transfer upfront. Source element element Destination 0000 0004 Frame Element size element element Element index element index element element element 0024 0028 002C 0030 Frame element Frame index element element Transfer block: Number frames Number elements element element element element element Frame 04B6 Figure 1D-to-2D Transfer Example Figure illustrates source destination memory layout typical 1D-to-2D transfer example. following code fragment also used perform depicted transfer. MYALG_TI_Obj *myAlg (Void *)handle; myAlg points algorithm instance object IDMA2_Params params; Configure Transfer Parameters params.xType IDMA2_1D2D; params.elemSize IDMA2_ELEM32; params.numFrames 100; params.srcFrameIndex used 1D2D transfer params.dstFrameIndex params.srcElementIndex params.dstElementIndex Configure logical channel ¶ms); Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Schedule transfer from input buffer, instance working buffer, workBuf1. number elements each destination frame ACPY2_start(myAlg->dmaHandle1, (IDMA2_AdrPtr)in, (IDMA2_AdrPtr) (myAlg->workBuf1), 10); Synchronizing Serializing Transfers algorithm submit several transfer requests each logical channel owns. Actual physical transfers started completed hardware resources available ACPY2 module. When algorithm needs check completion status submitted transfers, call blocking wait function, ACPY2_wait(), non-blocking completion status check function, ACPY2_complete(). wait completion status entire channel. That status returned last submitted transfer that logical channel. ACPY2 specifications ensure that transfers issued same channel start complete same real-time order they were issued. Additionally, separate channels sharing same queueId also complete sequentially. need synchronizing transfers reduced because these requirements. This provides additional opportunities optimize algorithms that resources. Consider following pseudocode, which implements double buffering scheme in-place algorithm process: ACPY2_start(h0, src++, buf[0]); copy buffer foreach (pair buffers) ACPY2_start(h1, src++, buf[1]); copy buffer ACPY2_wait(h0); wait buffer ready Process(buf[0]); work ACPY2_start(h0, buf[0], dst++); copy result out. Need ACPY2_wait without serialization guarantee ACPY2_start(h0, src++, buf[0]); copy buffer ACPY2_wait(h1); wait buffer ready Process(buf[1]); work ACPY2_start(h1, buf[1], dst++); copy result out. Need ACPY2_wait without serialization guarantee ACPY2_wait(h0); ACPY2_wait(h1); This code guaranteed correctly under previous XDAIS specifications because there guarantee that ACPY2_start() call line would complete before call line Hence, depending implementation, second transfer request might have been serviced before first one, causing buf[0] buf[1] overwritten with data from source buffer before processing results were written destination buffers. solution using deprecated ACPY specifications insert ACPY_wait call lines wait handle handle1 complete their previous transfers before submitting next transfer. However, this means algorithm wait results copied before process data that been brought through double buffering scheme, thereby missing goal double buffering. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Therefore, necessary XDAIS specification guarantee that transfers submitted same logical channel handled serially. However, even this FIFO ordering guarantee insufficient some cases, following code: ACPY2_start(h0, src++, buf[0]); copy buffer foreach (pair buffers) ACPY2_start(h1, src++, buf[1]); copy buffer ACPY2_wait(h0); wait buffer ready Process(buf[0]); work ACPY2_start(h2, buf[0], dst++); copy result out. Need ACPY2_wait without serialization guarantee ACPY2_start(h0, src++, buf[0]); copy buffer ACPY2_wait(h1); wait buffer ready Process(buf[1]); work ACPY2_start(h3, buf[1], dst++); copy result out. Need ACPY2_wait without serialization guarantee ACPY2_wait(h0); ACPY2_wait(h1); Here, logical channels defined copy results destination buffer. This kind situation more likely when following Guideline define multiple channels different transfer types. Therefore, mechanism needed allow algorithm specify that transfers serviced serially. this effect, field queueId been added IDMA2_ChannelRec structure: typedef struct IDMA2_ChannelRec IDMA2_Handle handle; Handle logical channel queueId; Selects serialization queue IDMA2_ChannelRec; IDMA2 channels sharing same queueId should have their transfers serialized ACPY2 library. This queueId passed client application ACPY2 library using ACPY2_initChannel. 3.7.1 C6000 Specific Issues Algorithm Producers Cache Coherency Issues Algorithm Producers Algorithms must enforce coherence alignment/size constraints internal buffers they request through IALG interface. section 2.3.2, figures show problems that occur coherency between cache external memory observed. Please refer TMS320C621x/C671x Two-Level Internal Memory Reference Guide (SPRU609), more details. this section summarize relavant issues that might affect algorithm developers using DMA. deal with these coherency problems, following rules have been added: Rule C6000 algorithms using IDMA2 must directly access buffers external memory involved transfers. This includes input buffers passed algorithm through function interface. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Rule ensures that XDAIS algorithms operate correctly without need issue cache clean flush operations, which low-level operations that should dealt with client application level. With introduction Rule external buffers involved transfers will cache, therefore external coherency problems would occur. Also remember that Rule which targeted client application writer, ensures that cached entries buffers passed into algorithm flushed avoid coherency problem shown Figure important that these buffers allocated cache line boundary multiple caches lines size. shown Figure some location that accessed DMA, there other data sharing same cache line, entire cache line brought into cache when accessed. Location would then cache, which violates purpose Rule cache line cache External memory Figure Cache Line Effects Cache Coherence Rule C6000 algorithms buffers residing external memory involved transfer should allocated cache line boundary multiple cache lines size. Rule added algorithm writers divide buffers supplied them through their function interface into smaller buffers, then smaller buffers transfers. this case, transfer must also occur buffers aligned cache line boundary. Note that this does mean transfer size needs multiple cache line length size. Instead, buffer containing memory locations involved transfer must considered single buffer; algorithm must directly access part buffer Rule Rule C6000 algorithms should stack allocated buffers source destination transfer. Rule necessary since buffers allocated stack aligned cache line boundaries, there mechanism force alignment. Furthermore, this rule good practice, helps minimize algorithm's stack size requirements. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 3.8.1 C5000 Specific Issues Algorithm Producers Source Destination Types Require 32-Bit Extended Byte Addressing deprecated ACPY_start specification utilized data-address type, Void source destination address arguments. This imposed restriction C5000 applications pass only 23-bit word addresses, since C5000 standard data pointers 23-bit word addresses. While this general limitation when using data pointers with standard programming environment C5000, especially limiting when algorithm needs request transfers data external memory extended address space. Full 32-bit addressing capability needed designing implementing algorithm interfaces which designed application buffers passed using 32-bit addresses With ACPY2 specification, type, IDMA2_AddrPtr, with 32-bit representation C5000 targets been introduced address ACPY limitations: typedef Void (*IDMA2_AdrPtr)(); type used declare source destination addresses ACPY2_start API. function prototype ACPY2_start Void ACPY2_start(IDMA2_Handle, handle, IDMA2_AdrPtr src, IDMA2_AdrPtr dst, cnt); 3.8.2 Source Destination Addresses Byte Addressing deprecated declaration Void data type source destination addresses ACPY_start restricts C5000 applications pass only 23-bit word addresses. This effectively disallows passing byte-addresses source destination transfers, even though most existing hardware devices byte addresses source destination registers. With change ACPY2_start, C5000 targets, algorithms will need explicitly pass byte addresses doing proper typecasting 1-bit left shifting data pointer. macro simplify conversion also found IDMA2 specification. #define IDMA2_ADRPTR(addr) following code snippet illustrates macto utilized algorithm developer: *)in+offset), IDMA2_ADRPTR((Int *)out), numElems); 3.8.3 C55x Issue: Supporting Packed/Burst Mode Transfers C55/OMAP 1510 platform algorithms applications need perform transfers burst enabled/packed transfer modes much possible since this leads speedup factors transfers from Internal external memory factors speedup external internal transfers. Packing: When enabled software, packing packs several consecutive element transfers into wider accesses. example, element size bits, 32-bit wide SARAM port pack accesses that bytes time written into channel FIFO. This reduce overhead improve channel throughput. Packing options determined port access capabilities element size. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 hardware successful packing: Bursting: source/destination packing, start/destination address aligned data type boundary (for 16/32-bit transfer, address should aligned 16/32-bit boundary). frame index element index used single- double-indexed addressing, resulting address should aligned data type boundary. Bursting extension packing concept. burst defined bytes. successful bursting occur, again, source/destination addresses need aligned burst8 (16-bit) burst8 (32-bit) boundary 16-bit 32-bit data transfer, respectively. Guideline transparently assists ACPY2 library implementations OMAP C55x platforms support transfers burst enabled/packed mode. DMA2 Guideline facilitate high performance, C55x algorithms should request transfers with source destinations aligned data type boundary. Also, they frame index element index, resulting address must ensured that aligned data type boundary. 3.8.4 3.8.4.1 C55x Issue: Addressing Automatic Endianism Conversion Summary hardware endianism treatment (OMAP1510) MGS3 GDMA ports (SARAM, DARAM, EMIF, RHEA, API) have different endianisms. GDMA must adapt endianism data read source port order match internal FIFO endianism. Before writing data, GDMA must adapt endianism data internal FIFO match destination port endianism. MGS3 supports Endianism Conversion, during transfer between internal memory (SARAM/DARAM) EMIF, which controlled software channel basis. endianism conversion insures scalar preservation, based data type. Therefore, parameters transfer, example: start address, element index, frame index must chosen according this constraint. 3.8.4.2 Endianism Support Requirements Rules This automatic endianism conversion takes place during 32-bit access external memory through EMIF port. This affects read/writes alike. only mechanism offers disabling auto-conversion, (only operations) initiate transfer non-packed/non-burst mode. Operating non-packed/non-burst mode otherwise desirable since performance penalty huge, original requirement stated pack/burst modes default channel configuration have ability configure channel disable pack/burst. Further complications problem arise when data external memory 32-bit aligned transfer size multiples (32-bits): does perform endianism conversion these cases. Additionally, this behavior restricted just operations normal 32-bit read/writes also subject same laws. Rule ensures automatic endianism conversion controller, while moving data between internal external memory. this rule enforced algorithms, wrong data transfer (endianism issues) would occur pack/burst mode will disabled hardware, software enables them. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 example, 16/32-bit transfer, buffers must aligned 16/32-bit boundary sizes multiples (bytes). Rule C55x Algorithms must request data buffers external memory with 32-bit alignment sizes multiples (bytes). Rule C55x Algorithms must same data-type access mode when reading, writing transferring data stored external memory application passed data buffers. Note that, specific ACPY2 implementations additionally provide framework ability disable enable packed/burst mode channel basis. 3.8.5 Using ACPY2_start ACPY2_startAligned Although specification ACPY2_start discussed Section 3.5.1 makes assumptions alignment source destination addresses accepts addresses alignment, device specific rules guidelines discussed this section impose stricter alignment requirements C55x algorithms when submitting transfer requests. Therefore, even when calling ACPY2_start algorithms must ensure proper alignment requirements. Hence, C55x algorithms encouraged ACPY2_startAligned exclusively. Fast Copy (FCPY) Algorithm Example this section present algorithm, FCPY_TI, used illustrate implement IDMA2 interface ACPY2 calls.The application note, Design Implementation eXpressDSP-Compliant Managaer C6x1x (SPRA789), includes C6x1x based example application that uses algorithm developed here. Full source code algorithm application provided attachment that application note. FCPY algorithm's doCopy() function illustrates contrived scenario copying buffer from location memory another using DMA. process, doCopy() does 2D-to-1D transfer from source work buffer using parameters srcLineLen, srcNumLines, srcStride, copies contents work buffer second work buffer using 1Dto-1D transfer, and, finally, copies contents second work buffer destination using parameters dstLineLen, dstNumLines, dstStride. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 srcLineLen srcStride srcNumLines dstLineLen dstStride dstNumLines Figure FCPY doCopy Operation FCPY instance object configured using following structure: typedef struct IFCPY_Params size; Size this structure srcLineLen; Source line length srcNumLines; Number lines source srcStride; Stride between lines source dstLineLen; Destination line length dstNumLines; Number lines destination dstStride; Stride between lines destination IFCPY_Params; Note that srcLineLen srcNumLines dstLineLen dstNumLines must hold true algorithm operate correctly. Otherwise, behavior undefined. IFCPY_Interface Functions function table algorithm shown below: typedef struct IFCPY_Fxns IALG_Fxns ialg; IFCPY extends IALG XDAS_Bool (*control)(IFCPY_Handle handle, IFCPY_Cmd cmd, IFCPY_Status *status); Void (*doCopy)(IFCPY_Handle handle, in[], out[]); IFCPY_Fxns; addition implementing ialg interface, this algorithm also implements control function (FCPY_TI_control) that commands: IFCPY_GETSTATUS: returns IFCPY_Params structure's non-size parameters IFCPY_SETSTATUS: sets IFCPY_Params structure's non-size parameters doCopy function (FCPY_TI_doCopy) process function algorithm. Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 4.1.1 Instance Heap Memory Requirements This algorithm requests three buffers (Table Table Instance Heap Memory Requirements Buffer Size Sizeof(FCPY_TI_Obj) (srcLineLen srcNumLines) sizeof(Char) (srcLineLen srcNumLines) sizeof(Char) Alignment Space External Internal Internal Attrs Persist Scratch Scratch Alignment these buffers align cache boundaries, that cache coherence issues arise. 4.1.2 IDMA2 ACPY2 Interfaces IDMA2 interface been implemented fcpy_ti request three different logical channels three types transfers required algorithm, following performance guideline. algorithm's processing function, FCPY_TI_doCopy, ACPY2 runtime APIs used show their invocation procedures. Conclusion Collectively, IDMA2 ACPY2 describe flexible efficient model that greatly simplifies management system resources services client application simple powerful mechanism algorithm configure access services. This application note presented overview fundamental abstractions specified supported TMS320 Algorithm Standard. also highlighted DMA-related enhancements algorithm standard version TMS320 Algorithm Standard Developer's Kit. References TMS320 Algorithm Standard Rules Guidelines (SPRU352) TMS320 Algorithm Standard Reference (SPRU360) TMS320 Algorithm Standard Developer's Guide (SPRU424) TMS320C6000 Peripherals Reference Guide (SPRU190) TMS320C621x/C671x Two-Level Internal Memory Reference Guide (SPRU609) Design Implementation eXpressDSP-Compliant Manager C6x1x (SPRA789) Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 Appendix ifcpy.h Code FCPY_TI Algorithm FCPY_TI algorithm follows guidelines achieving high performance. ======== ifcpy.h ======== This header defines types, constants, functions shared implementations FCPY algorithm. #ifndef IFCPY_ #define IFCPY_ #include <ialg.h> #include <xdas.h> #ifdef _cplusplus extern #endif /*_cplusplus*/ ======== IFCPY_Obj ======== This structure must first field FCPY instance objects. typedef struct IFCPY_Obj struct IFCPY_Fxns *fxns; IFCPY_Obj; ======== IFCPY_Handle ======== This handle used reference FCPY instance objects. typedef struct IFCPY_Obj *IFCPY_Handle; ======== IFCPY_Params ======== This structure defines creation parameters FCPY instance objects. typedef struct IFCPY_Params size; Size this structure following parameters read-only srcLineLen; Source line length 8-bit elements) srcNumLines; Number lines source following parameters read/write srcStride; Stride between lines source dstLineLen; Destination line length 8-bit elements) dstNumLines; Number lines destination dstStride; Stride between lines destination IFCPY_Params; extern const IFCPY_Params IFCPY_PARAMS; default params ======== IFCPY_Status ======== This structure defines parameters that changed runtime (read/write), instance status parameters (read-only). typedef struct IFCPY_Status size; Size this structure Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 following parameters read-only srcLineLen; Source line length 8-bit elements) srcNumLines; Number lines source following parameters read/write srcStride; Stride between lines source dstLineLen; Destination line length 8-bit elements) dstNumLines; Number lines destination dstStride; Stride between lines destination IFCPY_Status; ======== IFCPY_Cmd ======== This structure defines control commands FCPY module. typedef enum IFCPY_Cmd IFCPY_GETSTATUS, IFCPY_SETSTATUS IFCPY_Cmd; ======== IFCPY_Fxns ======== This structure defines operations FCPY objects. typedef struct IFCPY_Fxns IALG_Fxns ialg; IFCPY extends IALG XDAS_Bool (*control)(IFCPY_Handle handle, IFCPY_Cmd cmd, IFCPY_Status *status); Void (*doCopy)(IFCPY_Handle handle, Void Void out); IFCPY_Fxns; #ifdef _cplusplus #endif /*_cplusplus*/ #endif IFCPY_ fcpy_ti_priv.h ======== fcpy_ti_priv.h ======== Internal vendor specific (TI) interface header FCPY algorithm. Only implementation source files include this header; this header shipped part algorithm. This header contains declarations that specific this implementation which need exposed order application FCPY algorithm. #ifndef FCPY_TI_PRIV_ #define FCPY_TI_PRIV_ #include #include #include #include <ialg.h> <xdas.h> <ifcpy.h> <idma2.h> #ifdef _cplusplus extern #endif /*_cplusplus*/ Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 typedef struct FCPY_TI_Obj IALG_Obj ialg; MUST first field DAIS algs *workBuf1; on-chip scratch *workBuf2; on-chip scratch srcLineLen; Source line length 8-bit elements) srcNumLines; Number lines source srcStride; Stride between lines source dstLineLen; Destination line length 8-bit elements) dstNumLines; Number lines destination dstStride; Stride between lines destination IDMA2_Handle dmaHandle1D1D8B; logical channel xfers IDMA2_Handle dmaHandle1D2D8B; logical channel xfers IDMA2_Handle dmaHandle2D1D8B; logical channel xfers FCPY_TI_Obj; IALG declarations extern Void FCPY_TI_activate(IALG_Handle handle); extern Void FCPY_TI_deactivate(IALG_Handle handle); extern FCPY_TI_alloc(const IALG_Params *algParams, IALG_Fxns **parentFxns, IALG_MemRec memTab[]); extern FCPY_TI_free(IALG_Handle handle, IALG_MemRec memTab[]); extern FCPY_TI_initObj(IALG_Handle handle, const IALG_MemRec memTab[], IALG_Handle parent, const IALG_Params *algParams); extern Void FCPY_TI_moved(IALG_Handle handle, const IALG_MemRec memTab[], IALG_Handle parent, const IALG_Params *algParams); IFCPY declarations extern Void FCPY_TI_doCopy(IFCPY_Handle handle, Void Void out); extern XDAS_Bool FCPY_TI_control(IFCPY_Handle handle, IFCPY_Cmd cmd, IFCPY_Status *status); IDMA2 declarations extern Void handle, IDMA2_ChannelRec dmaTab[]); extern FCPY_TI_dmaGetChannelCnt(Void); extern handle, IDMA2_ChannelRec dmaTab[]); extern FCPY_TI_dmaInit(IALG_Handle handle, IDMA2_ChannelRec dmaTab[]); #ifdef _cplusplus #endif /*_cplusplus*/ #endif FCPY_TI_PRIV_ Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 fcpy_ti_ialg.c ======== fcpy_ti_ialg.c ======== FCPY Module implementation FCPY module. This file contains implementation required IALG interface. #pragma #pragma #pragma #pragma CODE_SECTION(FCPY_TI_alloc, ".text:algAlloc") CODE_SECTION(FCPY_TI_free, ".text:algFree") CODE_SECTION(FCPY_TI_initObj, ".text:algInit") CODE_SECTION(FCPY_TI_moved, ".text:algMoved") #include <std.h> #include <fcpy_ti_priv.h> #include <ifcpy.h> #include <ialg.h> #define #define #define #define OBJECT WORKBUF1 WORKBUF2 NUMBUFS alignment cache boundary #define ALIGN_FOR_CACHE ======== FCPY_TI_alloc ======== Request memory. FCPY_TI_alloc(const IALG_Params *algParams, IALG_Fxns **parentFxns, IALG_MemRec memTab[]) const IFCPY_Params *params (Void *)algParams; (params NULL) params &IFCPY_PARAMS; interface default params Request memory FCPY object memTab[OBJECT].size sizeof (FCPY_TI_Obj); memTab[OBJECT].alignment alignment required memTab[OBJECT].space IALG_EXTERNAL; memTab[OBJECT].attrs IALG_PERSIST; Request memory working buffer memTab[WORKBUF1].size (params->srcLineLen) (params->srcNumLines) sizeof (Char); memTab[WORKBUF1].alignment ALIGN_FOR_CACHE; memTab[WORKBUF1].space IALG_DARAM0; memTab[WORKBUF1].attrs IALG_SCRATCH; Request memory working buffer memTab[WORKBUF2].size (params->srcLineLen) (params->srcNumLines) sizeof (Char); memTab[WORKBUF2].alignment ALIGN_FOR_CACHE; memTab[WORKBUF2].space IALG_DARAM0; memTab[WORKBUF2].attrs IALG_SCRATCH; return (NUMBUFS); Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 ======== FCPY_TI_free ======== Return complete memTab structure. FCPY_TI_free(IALG_Handle handle, IALG_MemRec memTab[]) FCPY_TI_Obj *fcpy (Void *)handle; FCPY_TI_alloc(NULL, NULL, memTab); memTab[OBJECT].base handle; memTab[WORKBUF1].base fcpy->workBuf1; memTab[WORKBUF1].size (fcpy->srcLineLen) (fcpy->srcNumLines) sizeof (Int); memTab[WORKBUF2].base fcpy->workBuf2; memTab[WORKBUF2].size (fcpy->srcLineLen) (fcpy->srcNumLines) sizeof (Int); return (NUMBUFS); ======== FCPY_TI_initObj ======== Initialize instance object. FCPY_TI_initObj(IALG_Handle handle, const IALG_MemRec memTab[], IALG_Handle parent, const IALG_Params *algParams) FCPY_TI_Obj *fcpy (Void *)handle; const IFCPY_Params *params (Void *)algParams; (params NULL) params &IFCPY_PARAMS; interface defult params addresses internal buffers fcpy->workBuf1 memTab[WORKBUF1].base; fcpy->workBuf2 memTab[WORKBUF2].base; Configure instance object fcpy->srcLineLen params->srcLineLen; fcpy->srcStride params->srcStride; fcpy->srcNumLines params->srcNumLines; fcpy->dstLineLen params->dstLineLen; fcpy->dstStride params->dstStride; fcpy->dstNumLines params->dstNumLines; return (IALG_EOK); ======== FCPY_TI_moved ======== Re-initialize buffer ptrs location. Void FCPY_TI_moved(IALG_Handle handle, const IALG_MemRec memTab[], IALG_Handle parent, const IALG_Params *algParams) FCPY_TI_Obj *fcpy (Void *)handle; Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 fcpy->workBuf1 memTab[WORKBUF1].base; fcpy->workBuf2 memTab[WORKBUF2].base; fcpy_ti_idma2.c ======== fcpy_ti_idma2.c ======== FCPY Module implementation FCPY algorithm This file contains implementation IDMA2 interface #pragma #pragma #pragma #pragma ".text:dmaChangeChannels") ".text:dmaGetChannelCnt") ".text:dmaGetChannels") CODE_SECTION(FCPY_TI_dmaInit, ".text:dmaInit") #include <std.h> #include <fcpy_ti_priv.h> #include <ialg.h> #include <idma2.h> #define CHANNEL0 #define CHANNEL1 #define CHANNEL2 #define NUM_LOGICAL_CH ======== FCPY_TI_dmaChangeChannels ======== Update instance object with logical channel. Void handle, IDMA2_ChannelRec dmaTab[]) FCPY_TI_Obj *fcpy (Void *)handle; fcpy->dmaHandle1D1D8B dmaTab[CHANNEL0].handle; fcpy->dmaHandle1D2D8B dmaTab[CHANNEL1].handle; fcpy->dmaHandle2D1D8B dmaTab[CHANNEL2].handle; ======== FCPY_TI_dmaGetChannelCnt ======== Return number logical channels requested. FCPY_TI_dmaGetChannelCnt(Void) return(NUM_LOGICAL_CH); ======== FCPY_TI_dmaGetChannels ======== Declare resource requirement/holdings. handle, IDMA2_ChannelRec dmaTab[]) FCPY_TI_Obj *fcpy (Void *)handle; Initial values logical channels Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 dmaTab[CHANNEL0].handle fcpy->dmaHandle1D1D8B; dmaTab[CHANNEL1].handle fcpy->dmaHandle1D2D8B; dmaTab[CHANNEL2].handle fcpy->dmaHandle2D1D8B; Want transfers dmaTab[CHANNEL0].queueId dmaTab[CHANNEL1].queueId dmaTab[CHANNEL2].queueId return (NUM_LOGICAL_CH); ======== FCPY_TI_dmaInit======== Initialize instance object with granted logical channel. FCPY_TI_dmaInit(IALG_Handle handle, IDMA2_ChannelRec dmaTab[]) FCPY_TI_Obj *fcpy (Void *)handle; fcpy->dmaHandle1D1D8B dmaTab[CHANNEL0].handle; fcpy->dmaHandle1D2D8B dmaTab[CHANNEL1].handle; fcpy->dmaHandle2D1D8B dmaTab[CHANNEL2].handle; return (IALG_EOK); serialized simplify debugging) fcpy_ti_idmavt.c ======== fcpy_ti_idmavt.c ======== This file contains function table definitions IDMA2 interface implemented FCPY_TI module. #include <std.h> #include <idma2.h> #include <fcpy_ti.h> #include <fcpy_ti_priv.h> ======== FCPY_TI_IDMA2 ======== This structure defines TI's implementation IDMA2 interface FCPY_TI module. IDMA2_Fxns FCPY_TI_IDMA2 module_vendor_interface &FCPY_TI_IALG, IALG functions FCPY_TI_dmaChangeChannels, ChangeChannels FCPY_TI_dmaGetChannelCnt, GetChannelCnt FCPY_TI_dmaGetChannels, GetChannels FCPY_TI_dmaInit initialize logical channels ======== fcpy_ti_ifcpy.c ======== FCPY Module implementation FCPY algorithm This file contains implementation IFCPY abstract interface. #pragma CODE_SECTION(FCPY_TI_doCopy, ".text:doCopy") #pragma CODE_SECTION(FCPY_TI_control, ".text:control") #include <std.h> Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 #include <xdas.h> #include <idma2.h> #include <acpy2.h> #include <ifcpy.h> #include <fcpy_ti_priv.h> #include <fcpy_ti.h> ======== FCPY_TI_doCopy ======== Void FCPY_TI_doCopy(IFCPY_Handle handle, Void Void out) FCPY_TI_Obj *fcpy (Void *)handle; IDMA2_Params params; Configure logical channel params.xType IDMA2_1D1D; params.elemSize IDMA2_ELEM8; params.numFrames used 1D1D transfer params.srcFrameIndex used 1D1D transfer params.dstFrameIndex used 1D1D transfer params.srcElementIndex params.dstElementIndex Configure logical channel ¶ms); Configure logical channel params.xType IDMA2_2D1D; params.elemSize IDMA2_ELEM8; params.numFrames fcpy->srcNumLines; params.srcFrameIndex fcpy->srcStride; params.dstFrameIndex Configure logical channel ¶ms); Configure logical channel params.xType IDMA2_1D2D; params.elemSize IDMA2_ELEM8; params.numFrames params.srcFrameIndex params.dstFrameIndex Configure logical channel ¶ms); fcpy input buffer into working buffer (Void *)in, (Void *)(fcpy->workBuf1), (Uns)(fcpy->srcLineLen)); NOTE: Extra data processing could done here Check that transfer completed before finishing "processing" while copy data from working buffer working buffer (Void *)(fcpy->workBuf1), Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 (Void *)(fcpy->workBuf2), (Uns)((fcpy->srcLineLen) (fcpy->srcNumLines))); wait transfer finish Quickly configure NumFrames FrameIndex values dmaHandle1D2D8B fcpy->dstNumLines); fcpy->dstStride); copy data from working buffer output buffer (Void *)(fcpy->workBuf2), (Void *)out, (Uns)(fcpy->dstLineLen)); wait transfers complete before returning client ======== FCPY_TI_control ======== XDAS_Bool FCPY_TI_control(IFCPY_Handle handle, IFCPY_Cmd cmd, IFCPY_Status *status) FCPY_TI_Obj *fcpy (FCPY_TI_Obj *)handle; (cmd IFCPY_GETSTATUS) status->srcLineLen fcpy->srcLineLen; status->srcNumLines fcpy->srcNumLines; status->srcStride fcpy->srcStride; status->dstLineLen fcpy->dstLineLen; status->dstNumLines fcpy->dstNumLines; status->dstStride fcpy->dstStride; return (XDAS_TRUE); else (cmd IFCPY_SETSTATUS) fcpy->srcStride status->srcStride; fcpy->dstLineLen status->dstLineLen; fcpy->dstNumLines status->dstNumLines; fcpy->dstStride status->dstStride; return (XDAS_TRUE); Should happen return (XDAS_FALSE); fcpy_ti_ialgvt.c ======== fcpy_ti_ialgvt.c ======== This file contains function table definitions IALG IFCPY interfaces implemented FCPY_TI module. #include <std.h> #include <fcpy_ti.h> #include <ifcpy.h> #include <fcpy_ti_priv.h> #define IALGFXNS &FCPY_TI_IALG, NULL, module activate (NULL need initialize buffers Guide eXpressDSP-Compliant Algorithm Producers Consumers SPRA445 FCPY_TI_alloc, NULL, NULL, FCPY_TI_free, FCPY_TI_initObj, FCPY_TI_moved, NULL alloc control (NULL control ops) deactivate (NULL need save data free init moved numAlloc (NULL IALG_MAXMEMRECS) ======== FCPY_TI_IFCPY ======== This structure defines TI's implementation IFCPY interface FCPY_TI module. IFCPY_Fxns FCPY_TI_IFCPY module_vendor_interface IALGFXNS, IALG functions FCPY_TI_control, Control function FCPY_TI_doCopy fcpy Overlay v-tables save data space asm("_FCPY_TI_IALG .set _FCPY_TI_IFCPY"); Guide eXpressDSP-Compliant Algorithm Producers Consumers IMPORTANT NOTICE Texas Instruments Incorporated subsidiaries (TI) reserve right make corrections, modifications, enhancements, improvements, other changes products services time discontinue product service without notice. Customers should obtain latest relevant information before placing orders should verify that such information current complete. products sold subject TI's terms conditions sale supplied time order acknowledgment. warrants performance hardware products specifications applicable time sale accordance with TI's standard warranty. Testing other quality control techniques used extent deems necessary support this warranty. Except where mandated government requirements, testing parameters each product necessarily performed. assumes liability applications assistance customer product design. Customers responsible their products applications using components. minimize risks associated with customer products applications, customers should provide adequate design operating safeguards. does warrant represent that license, either express implied, granted under patent right, copyright, mask work right, other intellectual property right relating combination, machine, process which products services used. Information published regarding third-party products services does constitute license from such products services warranty endorsement thereof. such information require license from third party under patents other intellectual property third party, license from under patents other intellectual property Reproduction information data books data sheets permissible only reproduction without alteration accompanied associated warranties, conditions, limitations, notices. Reproduction this information with alteration unfair deceptive business practice. responsible liable such altered documentation. Resale products services with statements different from beyond parameters stated that product service voids express implied warranties associated product service unfair deceptive business practice. responsible liable such statements. Mailing Address: Texas Instruments Post Office 655303 Dallas, Texas 75265 Copyright 2002, Texas Instruments Incorporated Other recent searchesuPD78081 - uPD78081 uPD78081 Datasheet SP8854D - SP8854D SP8854D Datasheet HL-2870 - HL-2870 HL-2870 Datasheet B64290 - B64290 B64290 Datasheet
Privacy Policy | Disclaimer |