| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
C5000 Software Application Group Texas Instruments Incorporated A
Top Searches for this datasheetOptimized Library Programmers TMS320C54x C5000 Software Application Group Texas Instruments Incorporated ABSTRACT TMS320C54xDSPLIB optimized Function Library programmers TMS320C54x devices. includes over C-callable assembly-optimized general-purpose signal processing routines. These routines typically used computationally intensive real-time applications where optimal execution speed critical. using these routines achieve execution speeds considerably faster than equivalent code written standard ANSI language. addition providing ready-to-use functions, DSPLIB significantly shorten your application development time. DSPLIB includes commonly used routines. Source code provided allow modify functions match your specific needs. routines included within library organized into eight different functional categories: Filtering convolution Adaptive filtering Correlation Math Trigonometric Miscellaneous Matrix Optimized Library Programmers TMS320C54x SPRA480B Contents Introduction.4 Features Benefits.4 DSPLIB: Quality Freeware That Build Contribute Installing DSPLIB.4 DSPLIB Content.4 Install DSPLIB Rebuild DSPLIB.5 Using DSPLIB DSPLIB Data Types.6 DSPLIB Arguments.6 Calling DSPLIB Function from Calling DSPLIB Function from Assembly Where Find Sample Code DSPLIB Tested Allowable Error.8 DSPLIB Deals With Overflow Scaling Issues.9 Where DSPLIB Goes from Here.10 Function Descriptions.10 Arguments Conventions Used DSPLIB Functions Summary Table.11 acorr Auto-correlation Vector atan16 Arctangent Implementation.16 atan2_16 Arctangent Implementation.17 bexp Block Exponent Implementation cbrev Complex Bit-Reverse.19 cfir Complex Filter.20 cifft Inverse Complex convol Convolution.26 corr Correlation (full-length) dlms Adaptive Delayed Filter.29 expn Exponential Base e.31 Filter firdec Decimating Filter firinterp Interpolating Filter firs Symmetric Filter firs2 Symmetric Filter (generic).40 fltoq15 Float Conversion hilb16 Hilbert Transformer.43 iir32 Double-precision Filter iircas4 Cascaded Direct Form Using 4-Coefs Biquad.47 iircas5 Cascaded Direct Form (5-Coefs Biquad) iircas51 Cascaded Direct Form (5-Coefs Biquad) iirlat Lattice Inverse (IIR) Filter firlat Lattice Forward (FIR) Filter.55 log_2 Base Logarithm log_10 Base Logarithm logn Base Logarithm (natural logarithm) maxidx Index Maximum Element Vector maxval Maximum Value Vector minidx Index Minimum Element Vector Optimized Library Programmers TMS320C54x SPRA480B minval mmul mtrans mul32 nblms ndlms neg32 power q15tofl rand16init rand16 recip16 rfft rifft sine sqrt_16 Minimum Value Vector.66 Matrix Multiplication Matrix Transpose 32-bit Vector Multiply Normalized Block Block Filter.70 Normalized Delayed Filter Vector Negate Vector Negate (double-precision) Vector Power Float Conversion.78 Initialize Random Number Generator Random Vector Generation 16-bit Reciprocal Function Forward Real (in-place).82 Inverse Real (in-place) Sine.86 Square Root 16-bit Number.88 Vector Subtract DSPLIB Benchmarks Performance Issues What DSPLIB Benchmarks Provided Performance Considerations Licensing, Warranty Support Licensing Warranty DSPLIB Software Updates.91 DSPLIB Customer Support References Acknowledgments Appendix Overview Fractional Formats Q3.12 Format Q.15 Format Q.31 Format Appendix Calculating Reciprocal Number.95 Appendix Texas Instruments License Agreement Code Optimized Library Programmers TMS320C54x SPRA480B Introduction Features Benefits Hand-coded assembly optimized routines C-callable routines fully compatible with 'C54x compiler Support also provided 'C54x devices with extended program memory addressing (Far mode) Fractional Q15-format operand supported Complete examples provided Benchmarks (time code) provided Tested against Matlab scripts DSPLIB: Quality Freeware That Build Contribute DSPLIB free-of-charge product. use, modify distribute 'C54x DSPLIB 'C54x DSPs with royalty payments. Refer Appendix free-license agreement section section Where DSPLIB Goes from Here this application report details. Installing DSPLIB DSPLIB Content DSPLIB software consists four parts: header file programmers: dsplib.h 54xdsp.lib standards short-call mode (16-bit) 54xdspf.lib far-call mode (24-bits) object libraries different memory models supported compilers: source library allow function customization user 54xdsp.src Example programs linker command files used under "54x_test" subdirectory. Install DSPLIB Read README.1ST file specific details release. First Step: De-archive DSPLIB DSPLIB distributed form executable self-extracting file (54xdsplib.exe) that will automatically restore DSPLIB individual components same directory "execute" self-extracting file from. Following example install DSPLIB. Just type: Optimized Library Programmers TMS320C54x SPRA480B 54xdsplib.exe DSPLIB directory structure content will find follows: 54xdsplib (dir) 54xdsp.lib standards short-call mode 54xdspf.lib far-call mode blt54x.bat re-generate 54xdsp.lib based 54xdsp.src blt54xf.bat re-generate 54xdspf.lib based 54xdsp.src contains subdirectory each routine included library examples(dir) where find complete test cases. include(dir) dsplib.h include file with data types function prototypes tms320.lib include file with type definitions increase TMS320 portability doc(dir) dsplib.pdf code(dir) DSPLIB application report (this document) format contains examples shown application report Second Step: Update Your C_DIR Environment Variable Append full path 54xdsplib directory path your C_DIR environment variable. example, 54xdsplib.exe self-extracting file c:\54xdsplib, your development tools were installed c:\dsptools, this line your c:\autoexec.bat file. C_DIR=. C:\54xdsplib c:\dsptools This allows 'C54x compiler/linker find 'C54x DSPLIB object libraries, 54xdsp.lib 54xdspf.lib. Rebuild DSPLIB full-rebuild 54xdsp.lib and/or 54xdspf.lib rebuild 54xdsp.lib, simply execute blt54x.bat. Warning: This will overwrite existing 54xdsp.lib rebuild 54xdspf.lib, simply execute blt54xf.bat. Warning: This will overwrite existing 54xdspf.lib Optimized Library Programmers TMS320C54x SPRA480B partial rebuild 54xdsp.lib and/or 54xdspf.lib (modification specific DSPLIB function, example fir.asm) Extract source selected function from source archive: ar500 54xdsp.src fir.asm Reassemble your fir.asm assembly source file: asm500 fir.asm Replace object, fir.obj, dsplib.lib object library with newly formed object: ar500 54xdsp.lib fir.obj Using DSPLIB DSPLIB Data Types DSPLIB handles following fractional data types: Q.15 (DATA): Q.15 operand represented short data type (16-bit) that predefined type DATA dsplib.h header file. Q.31 (LDATA): Q.31 operand represented long data type (32-bit) that predefined type LDATA dsplib.h header file. Q.3.12: Contains integer bits fractional bits. Unless specifically noted, DSPLIB operates Q15-fractional data type elements. Appendix presents overview Fractional formats. DSPLIB Arguments DSPLIB functions typically operate over vector operands greater efficiency. Even though these routines used process short arrays even scalars (unless minimum size requirement noted), they will slower those cases. Vector stride always equal Vector operands composed vector elements held consecutive memory locations (vector stride equal Complex elements assumed stored Re-Im format. In-place computation allowed (unless specifically noted): Source operand equal destination operand conserve memory. Calling DSPLIB Function from addition correctly installing DSPLIB software, include DSPLIB function your code have Include dsplib.h include file. Link your code with DSPLIB object code libraries, 54xdsp.lib 54xdspf.lib, depending whether need mode. Optimized Library Programmers TMS320C54x SPRA480B correct linker command file describing memory configuration available your 'C54x board. example, following code contains call acorr, q15tofl fltoq15 routines DSPLIB: User's Guide example #include "dsplib.h" float xf[3] 0.2, 0.3}; float yf[3] short x[3]; short y[3]; short main() (i=0; i<3; i++) y[i] x[i] fltoq15(xf,x,3); acorr(x,y,3,3,raw); q15tofl(y,yf,3); this example, fltoq15 q15tofl DSPLIB functions used convert between floating point fractional values fractional values. However, many applications, your data always maintained format that conversion between float required. above code, ug.c, available under /doc/code subdirectory. compile link this code with 54xdsp.lib simply issue following command: cl500 ug.c 54x.cmd 54xdsp.lib ug.map -oug.out cl500 -v548 ug.c 54x.cmd 54xdsp.lib ug.map -oug.out Note: examples presented this application report have been tested using Texas Instruments 'C54x containing 'C541. Therefore, linker command file used reflects memory configuration available that board. Customization required with different board. overlay mode assumed (default after 'C54x device reset) Refer TMS320C54x Optimizing Compiler User's Guide more in-depth explanation required. Optimized Library Programmers TMS320C54x SPRA480B Warning: DSPLIB routines modify FRCT bit. This cause problems users versions compiler (cl500) prior version interrupt service routines (ISRs) implemented 'C'. Versions prior preserve FRCT entry, therefore FRCT corrupted restored which will lead incorrect results. solution implement ISRs assembly preserve FRCT bit. Users with version above need worry about this. Calling DSPLIB Function from Assembly 'C54x DSPLIB functions were written used from Calling functions from Assembly language source code possible long calling-function conforms with Texas Instruments 'C54x compiler calling conventions. This means that DSPLIB functions expect parameters passed stack reverse order (except first argument that passed 'C54x Accumulator Refer TMS320C54x Optimizing Compiler User's Guide more in-depth explanation required. Keep mind that DSPLIB optimal solution assembly-only programmers. Even though DSPLIB functions invoked from assembly program, result might optimal unnecessary C-calling overhead. Where Find Sample Code find examples every single function DSPLIB, examples subdirectory. This subdirectory contains subdirectory each function. example examples/araw subdirectory contains following files: araw_t.c: main driver testing DSPLIB acorr (raw) function test.h: contains input data(a) expected output data(yraw) acorr (raw) function This test.h file generated using Matlab scripts. test.c: contains function used compare output araw function with expected output data. abias.cmd: example linker command this function ('C541 specific) DSPLIB Tested Allowable Error Version DSPLIB tested against Matlab scripts. Expected data output been generated from Matlab that uses double-precision (64-bit) floating-point operations (default precision Matlab). Test utilities have been added test main drivers automate this checking process. Notice that maximum absolute error value (MAXERROR) passed test function trigger point flag functional error. consider this testing methodology good first pass approximation. Further characterization quantization error ranges each function (under random input) well testing against fixed-point models planned future releases. welcome suggestions you, user, have this respect. Optimized Library Programmers TMS320C54x SPRA480B DSPLIB Deals With Overflow Scaling Issues inherent difficulties programming fixed-point processors, determine deal with overflow issues. Overflow occurs result addition subtraction operations when dynamic range resulting data larger than what intermediate final data types contain. methodology used deal with overflow should depend specifics your signal, type operation your functions architecture used. general, overflow handling methodologies classified five categories: saturation, input scaling, fixed scaling, dynamic scaling system design considerations. important note that 'C54x architectural feature that makes overflow easier deal with presence guard bits both 'C54x accumulators. 40-bit 'C54x accumulators provide eight guard bits allow consecutive operations before accumulator overrun very useful feature when implementing example filters. There four specific ways DSPLIB deals with overflow, reflected each function description: Scaling implemented overflow prevention: this type function, DSPLIB scales intermediate results prevent overflow. Overflow should occur result. Precision affected significantly. This case functions, which scaling used after each stage. scaling implemented overflow prevention: this type function, DSPLIB does scale prevent overflow potentially strong effect data output precision number cycles required. This case example MAC-based operations like filtering, correlation convolutions. best solution those cases design your system, example your filter coefficients with gain less than prevent overflow. this case, overflow could happen unless input scale design overflow. Saturation implemented overflow handling: this type function, DSPLIB enabled 'C54x 32-bit saturation mode (OVM This case certain basic math functions that require saturation mode enabled work. applicable: this type function, nature function operations, there overflow worry about. couple additional DSPLIB features relate overflow/scaling handling: DSPLIB reporting overflow conditions (overflow flag): sometimes predictible overflow risk, most DSPLIB functions have been written return overflow flag (oflag) indication potentially dangerous 32-bit overflow. However, keep mind that guard-bits, 'C54x capable dealing with intermediate 32-bit overflows, still producing correct final result. Therefore, oflag parameter should taken context warning definitive error. Functions handling scaling data block exponent: DSPLIB includes bexp that will return maximum exponent (extra sign bits) vector allow determination correct input scaling. Optimized Library Programmers TMS320C54x SPRA480B final note, DSPLIB provided also source format allow customization DSPLIB functions your specific system needs. Where DSPLIB Goes from Here anticipate DSPLIB improve future releases following areas: Increased number functions: anticipate number functions DSPLIB will grow overtime. welcome user-contributed code. during process developing your application develop routine that seems like good DSPLIB, know. will review test your routine make sure include next DSPLIB software release. Your contribution will fully acknowledged recognized DSPLIB Application Report Acknowledgment Section. this opportunity make your name known your industry peers. Simply email your contribution dsph@ti.com will contact with you. Improved Testing Methodology function characterization: section DSPLIB Tested Allowable Error. Increased Code portability: DSPLIB looks enhance code portability across different TMS320-based platforms. goal provide similar libraries other TMS320 devices that working conjunction with 'C54x compiler intrinsics make C-developing easier fixed-point devices. However, anticipated that 100% portable library across TMS320 devices possible normal device architectural differences. will continue monitoring industry standardization activities terms function libraries. event endorsement community standard library spec, will take necessary steps evolve DSPLIB into industry compliance. Function Descriptions Arguments Conventions Used following convention been followed when describing arguments each individual function: Argument nx,ny,nr DATA LDATA ushort Description Argument reflecting input data vector Argument reflecting output data vector Arguments reflecting size vectors x,y, respectively. functions which case nr=nr, only been used across. Argument reflecting filter coefficient vector (filter routines only) Argument reflecting size vector Data type definition equating short, 16-bit value representing number. DATA instead short recommended increase future portability across devices. Data type definition equating long, 32-bit value representing number. LDATA instead short recommended increase future portability across devices. Unsigned short (16-bit). used this data type directly, because been defined dsplib.h Optimized Library Programmers TMS320C54x SPRA480B DSPLIB Functions Summary Table routines included within library organized into different functional categories: Filtering convolution Adaptive filtering Correlation Math Trigonometric Miscellaneous Matrix functions Functions Description Radix-2 complex forward MACRO Radix-2 complex inverse MACRO Radix-2 real forward MACRO Radix-2 real inverse MACRO Complex bit-reverse function Direct form Symmetric Direct form Optimized routine) Symmetric Direct form (generic routine) Decimating filter Interpolating filter Complex direct form Convolution 16-bit Hilbert Transformer cascade Direct Form coefficients biquad. cascade Direct Form coefficients biquad cascade Direct Form coefficients biquad 32-bit cascade Direct Form coefficients biquad. Lattice inverse filter Lattice forward filter void cfft (DATA short scale) void cifft (DATA short scale) void rfft (DATA short scale) void rifft (DATA short scale) void cbrev (DATA DATA ushort Filtering Convolution short (DATA DATA DATA DATA **dbuffer, ushort ushort short firs (DATA DATA DATA **dbuffer, ushort nh2, ushort short firs2 (DATA DATA DATA DATA **dbuffer, ushort nh2, ushort short firdec (DATA DATA DATA DATA **dbuffer ushort ushort ushort short firinterp (DATA DATA DATA DATA **dbuffer ushort ushort ushort short cfir (DATA DATA DATA DATA **dbuffer, ushort ushort short convol (DATA DATA DATA ushort ushort short hilb16 DATA DATA DATA DATA *db, ushort ushort short iircas4(DATA DATA DATA DATA **dbuffer, ushort nbiq, ushort short iircas5(DATA DATA DATA DATA **dbuffer, ushort nbiq, ushort short iircas51(DATA DATA DATA DATA **dbuffer, ushort nbiq, ushort short iir32(DATA LDATA DATA LDATA **dbuffer, ushort nbiq, ushort short iirlat (DATA DATA DATA DATA ushort ushort short firlat (DATA DATA DATA DATA ushort ushort Optimized Library Programmers TMS320C54x SPRA480B Functions Adaptive Filtering short dlms (DATA DATA DATA DATA **d, DATA *des, DATA step, ushort ushort short ndlms (DATA DATA DATA DATA *dbuffer, DATA *des, ushort ushort l_tau, cutoff, gain, DATA *norm_d) short nblms (DATA *x,DATA *h,DATA DATA *dbuffer, DATA *des, ushort ushort ushort DATA *norm_e, l_tau, cutoff, gain) Correlation short acorr (DATA DATA ushort ushort type) short corr (DATA DATA DATA ushort ushort type) Trigonometric Short sine (DATA DATA ushort Short atan2_16(DATA DATA DATA ushort Short atan16(DATA DATA ushort Math short (DATA DATA DATA ushort ushort scale) short expn (DATA DATA ushort short ldiv16(LDATA DATA DATA DATA *exp, ushort short logn (DATA LDATA ushort short log_2 (DATA LDATA ushort Short log_10 (DATA LDATA ushort short maxidx (DATA ushort short maxval (DATA ushort short minidx (DATA ushort short minval (DATA ushort short mul32(LDATA LDATA LDATA ushort short (DATA DATA ushort short neg32 (LDATA LDATA ushort short power (DATA LDATA ushort short rand16(DATA ushort void rand16init(void) void recip16 (DATA DATA DATA *rzexp, ushort short sqrt_16 (DATA DATA short short (DATA DATA DATA ushort ushort scale) Matrix short mmul (DATA *x1,short row1,short col1,DATA *x2,short row2, short col2,DATA short mtrans(DATA DATA ushort Miscellaneous short bexp(DATA ushort void fltoq15 (float DATA ushort void q15tofl (DATA float ushort (delayed version) Normalized delayed implementation Normalized Block implementation Description Auto-correlation (positive side only) MACRO Correlation (full-length) MACRO sine vector Quadrant Inverse Tangent vector Arctan vector Optimized vector addition Exponent vector Signed vector divide Natural vector base vector base vector Index maximum magnitude vector Maximum magnitude vector Index minimum magnitude vector Minimum element vector 32-bit vector multiply 16-bit vector negate 32-bit vector negate squares vector (power) Random number vector generator Random number generator initialization Vector reciprocal Square root vector Vector subtraction matrix multiply matrix transponse exponent (extra sign-bits) vector. allow determination correct inputscaling) Float conversion float conversion Optimized Library Programmers TMS320C54x SPRA480B acorr Auto-correlation short oflag acorr (DATA DATA ushort ushort type) (defined araw.asm, abias.asm, aubias.asm) Arguments: x[nx] r[nr] Pointer real input vector real elements. Pointer real output vector containing first elements positive side auto-correlation function vector must different than (in-place computation allowed). Number real elements vector Number real elements vector Auto-correlation type selector. Types supported: type raw, will contain autocorrelation type bias, will contain biased autocorrelation type unbias, will contain unbiased autocorrelation Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred type oflag Description: Computes first points positive-side auto-correlation real vector stores results stored real output vector Notice that full-length auto-correlation vector will have 2*nx-1 points with even symmetry around point (r[0]). This routine provides only positive half this memory computational savings. Algorithm: Auto-correlation: Biased Auto-correlation: r[j] <=nr <=nr <=nr r[j] 1/nx Unbiased Auto-correlation: r[j] 1/(nx-abs(j)) Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: Special debugging consideration: This function implemented macro that invokes different autocorrelation routines according type selected. consequence acorr symbol defined. Instead acorr_raw, acorr_bias, acorr_unbias symbols defined. Autocorrelation implemented using time-domain techniques. Example: examples/abias, examples/aubias, examples/araw subdirectories. Optimized Library Programmers TMS320C54x SPRA480B Benchmarks: Cycles Abias Core: ((na-1) (na-2)) ((nlags) Overhead Code size 16-bit words) Araw Core: ((na-2) (na-3)) Overhead Aubias Core: ((nr-2) ((na-1) (na-2)) Overhead Code size 16-bit words) Abias: Araw: Aubias: words words words Optimized Library Programmers TMS320C54x SPRA480B Vector short oflag (DATA DATA DATA ushort ushort scale) (defined add.asm) Arguments: x[nx] y[nx] r[nx] Pointer input data vector size In-place processing allowed Pointer input data vector size Pointer output data vector size containing (x+y) scale (x+y) scale Number elements input output vectors Scale selection Scale divide result prevent overflow Scale does divide Overflow flag. oflag 32-bit overflow occurred oflag 32-bit overflow occurred scale oflag Description: Algorithm: This function adds vectors, element element. (i=0; i++) Overflow Handling Methodology: Scaling implemented overflow prevention (User selectable) Special Requirements: None Implementation Notes: None Example: examples/add subdirectory Benchmarks: Cycles Core: 3*nx/2 Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B atan16 Arctangent Implementation short oflag atan16(DATA DATA ushort (defined atant.asm) Arguments: x[nx] r[nx] Pointer input data vector size contains tangent where Pointer output data vector size containing arctangent range [-pi/4, pi/4] radians. In-place processing allowed equal e.g. atan(1.0) 0.7854 6478h) Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred oflag Description: This function calculates tangent each elements vector result placed resultant vector range [-pi/2 pi/2] radians. example, [0x7fff, 0x3505, 0x1976, 0x0] (equivalent tan(PI/4), tan(PI/8), tan(PI/16), float): atan16(x,r,4) should give [0x6478, 0x3243, 0x1921, 0x0] equivalent [PI/4, PI/8, PI/16 Algorithm: (i=0; i++) atan (x(i)) Overflow Handling Methodology: applicable Special Requirements: Linker command file: must allocate .data section (for polynomial coefficients) Implementation Notes: atan(x), with output scaling factor Uses polynomial compute arctan express number ratio fractional numbers atan2_16 function. Example: examples/atant subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B atan2_16 Arctangent Implementation short oflag atan2_16(DATA DATA DATA ushort (defined arct2.asm) Arguments: q[nx] i[nx] r[nx] Pointer quadrature input vector format) size Pointer in-phase input vector format) size Pointer output data vector format) number representation size containing. In-place processing allowed equal output, contains arctangent (q/I) (1/PI) oflag Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: This function calculates tangent ratio q/I, where atan2_16(Q/I) representing actual range atan2_16(Q/I) result placed resultant vector Output scale factor correction example, [0x1999, 0x1999, 0x0, 0xe667 0x1999] (equivalent [0.2, 0.2, -0.2 0.2] float) [0x1999, 0x3dcc, 0x7ffff, 0x3dcc c234] (equivalent [0.2, 0.4828, 0.4828 -0.4828] float) atan2_16(y, r,4) should give [0x2000, 0x1000, 0x0, 0xf000, 0x7000] equivalent [0.25, 0.125, -0.125 0.875]*pi Algorithm: (j=0; j<nx; j++) r[j] atan2(q(j)/I(j)) Overflow Handling Methodology: applicable Special Requirements: Linker command file: must allocate .data section (for polynomial coefficients) Implementation Notes: None Example: examples/arct2 subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B bexp Block Exponent Implementation short maxexp bexp(DATA ushort Arguments: maxexp x[nx] oflag Return value exponent that used scaling Pointer input vector size Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Computes exponents (number extra sign bits) values input vector returns minimum exponent. This will useful determining maximum shift value that used scaling block data. (short j=0; j<nx; j++) temp exp(x[j]); (temp maxexp) maxexp temp; return maxexp; Algorithm: Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: None Example: examples/bexp subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) words Optimized Library Programmers TMS320C54x SPRA480B cbrev void Complex Bit-Reverse cbrev (DATA DATA ushort x[2*nx] r[2*nx] Pointer complex input vector Pointer complex output vector Number complex elements vectors bit-reverse input complex FFT, should complex size. bit-reverse input real FFT, should half real size. (defined cbrev.asm) Arguments: Description: This function bit-reverses position elements complex vector into output vector In-place bit-reversing allowed. this function conjunction with routines provide correct format input output data. bitreverse linear-order array, obtain bit-reversed order array. bit-reverse bit-reversed order array, obtain linear-order array. applicable Algorithm: Note: 'C54x Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: read with bit-reversed addressing written normal linear addressing. In-place bit-reversing much more cycle consuming compared with off-place bit-reversing However this expense doubling data memory requirements. Example: examples/cfft examples/rfft subdirectories Benchmarks: Cycles Core: (off-place) (in-place) Overhead Code size 16-bit words) (includes support both in-place off-place bit-reverse) Note: 'C54x capable off-place bit-reverse using following code: mvdk #N,ar0 #INPUT, #N*2 *ar2+0b, #DATA source address data looping times drawback this implementation hard-coding destination address with label #DATA. cbrev DSPLIB implementation chosen more generic solution expense extra cycle (3*nx). Optimized Library Programmers TMS320C54x SPRA480B cfir Complex Filter short oflag cfir (DATA DATA DATA DATA **dbuffer, ushort ushort Arguments: x[2*nx] h[2*nh] Pointer compex input vector complex elements (re-Im consecutive locations) Pointer coefficient vector size 2*nh complex elements with re-Im consecutive locations) normal order. example nh=3: b0re, b0im, b1re,b1im,b2re,b2im. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (2*nh). r[2*nx] Pointer complex output vector complex elements (re-Im consecutive locations) In-place computation allowed Delay buffer case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (2*nh). dbuffer[2*nh] oflag Number complex elements vector (input samples) Number complex coefficients Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result Description: Computes real filter (direct-form) using coefficient stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1) r[j] Algorithm: <=nx Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Optimized Library Programmers TMS320C54x SPRA480B Implementation Notes: None Example: examples/cfir subdirectory Benchmarks: Cycles Core: nx*(13 8*nh) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B Cfft Forward Complex void cfft (DATA short scale); (defined cfft#.asm where #=nx) Arguments: x[2*nx] Pointer input vector containing complex elements (2*nx real elements) bit-reversed order. output, vector contains complex elements FFT(x). Complex numbers stored Re-Im. must aligned 2*nx boundary, where size. log(nx) LSBits address must zero Number complex elements vector must constant number (not variable) take following values. 8,16,32,64,128,256,512,1024 scale Flag indicate whether scaling should implemented during computation. (scale scale factor else scale factor Description: Computes Radix-2 complex complex elements stored vector bit-reversed order. original content vector destroyed process. complex elements result stored vector normal-order. (DFT) Algorithm: y[k] 1/(scale factor) (cos(2 sin(2 Overflow Handling Methodology: Scaling implemented overflow prevention Special Requirements: Special linker command file sections required: .sintab (containing twiddle table). .sintab section size refer benchmark information below. This function requires inclusion other files during assembling (automatically included): macros.asm (contains macros used this code) sintab.q15 (contains twiddle table section .sintab) Implementation Notes: This optimized time. Space consumption high separate sine table each stage. This reduce MIPS count also increases twiddle table data space. Optimized Library Programmers TMS320C54x SPRA480B First stages implemented implemented radix-4. Last stage also unrolled optimization. Twiddle factors built-in provided sintab.q15 that automatically included during assembly process. Special debugging consideration: This function implemented macro that invokes different routines according size. consequence, instead cfft symbol being defined, multiple cfft# symbols (where complex size). This routine prevents overflow scaling each intermediate stages. Example: examples/cfft subdirectory Benchmarks: cycles (butterfly core only) size 1024 Cycles (Note) 1672 3795 8542 19049 42098 Code-Size (words) .text section Data-Size (words) .sintab section 1517 Note: Assumes data on-chip dual access that there conflict twiddle table reads instruction fetches (provided linker command file reflects those conditions) Optimized Library Programmers TMS320C54x SPRA480B cifft Inverse Complex void cifft (DATA short scale) (defined cfft#.asm where #=nx) Arguments: x[2*nx] Pointer input vector containing complex elements (2*nx real elements) bit-reversed order representing complex signal. output, vector contains complex elements IFFT(x) signal itself. Complex numbers stored Re-Im format. must aligned 2*nx boundary, where IFFT size. log(nx) LSBits address must zero Number complex elements vector must constant number (not variable) take following values. 8,16,32,64,128,256,512,1024 scale Flag indicate whether scaling should implemented during computation. (scale scale factor else scale factor Description: Computes Radix-2 complex IFFT complex elements stored vector bit-reversed order. original content vector destroyed process. complex elements result stored vector normal-order. (IDFT) Algorithm: y[k] 1/(scale factor)* (cos(2 sin(2 Overflow Handling Methodology: Scaling implemented overflow prevention Special Requirements: Special linker command file sections required: .sintab (containing twiddle table). .sintab section size refer benchmark information below. This function requires inclusion other files during assembling (automatically included): macrosi.asm (contains macros used this code) sintab.q15 (contains twiddle table section .sintab) Implementation Notes: This IFFT optimized time. Space consumption high separate sine table each stage. This reduce MIPS count also increases twiddle table data space. Optimized Library Programmers TMS320C54x SPRA480B First IFFT stages implemented implemented radix-4. Last stage also unrolled optimization. Twiddle factors built-in provided sintab.q15 that automatically included during assembly process. Special debugging consideration: This function implemented macro that invokes different IFFT routines according size. consequence, instead cifft symbol being defined, multiple cifft# symbols (where IFFT complex size) This routine prevents overflow scaling each IFFT intermediate stages. Example: examples/cfft subdirectory Benchmarks: cycles (butterfly core only) IFFT size 1024 Cycles(Note) 1672 3795 8542 19049 42098 Code-size (words) .text section data-size (words) .sintab section 1517 Note: Assumes data on-chip dual access that there conflict twiddle table reads instruction fetches (provided linker command file reflects those conditions) linker command file reflects those conditions) Optimized Library Programmers TMS320C54x SPRA480B convol Convolution oflag short convol (DATA DATA DATA ushort ushort Arguments: x[nr+nh-1] h[nh] oflag Pointer real input vector nr+nh-1 real elements Pointer real input vector real elements Pointer real output vector real elements Number real elements vector Number elements vector Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result Description: Computes real convolution (positive) vectors places results vector Typically used block-by-block filter computation without need using circular addressing restricted data alignment. This function used both block-by-block sample-by-sample filtering (nr=1). r[j] Algorithm: <=nr Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: None Example: examples/convol subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B corr Correlation (full-length) short oflag corr (DATA DATA DATA ushort ushort type) (defined craw.asm, cbias.asm cubias.asm) Arguments: x[nx] x[ny] r[nx+ny-1] Pointer real input vector real elements Pointer real input vector real elements Pointer real output vector containing full-length correlation (nx+ny-1 elements) vector with must different than both (in-place computation allowed). Number real elements vector Number real elements vector Correlation type selector. Types supported: type raw, will contain correlation type bias, will contain biased-correlation type unbias, will contain unbiased-correlation Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred type Oflag Description: Computes full-length correlation vectors stores result vector using time-domain techniques correlation: r[j] Algorithm: <=nr nx+ny-1 <=nr nx+ny-1 <=nr nx+ny-1 Biased correlation: r[j] 1/nr Unbiased correlation: r[j] 1/(nx abs(j)) Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: Special debugging consideration: This function implemented macro that invokes different correlation routines according type selected. consequence corr symbol defined. Instead corr_raw, corr_bias, corr_unbias symbols defined. Correlation implemented using time-domain techniques Example: examples/cbias, examples/cubias, examples/craw subdirectories Optimized Library Programmers TMS320C54x SPRA480B Benchmarks: Cycles Raw: Core: (na-3)(na-2) (na-3)) (nb-na+1)(na-2+8) Overhead Unbias: Core: (((na-3)*53) (na-3)(na-2))+ (nb-na+1)*(11+na-2)) Overhead Bias: Core: ((na-3)*12 (na-3)(na-2)/2)) ((nb na-2)) Overhead Code size 16-bit words) Raw: Unbias: Bias: Optimized Library Programmers TMS320C54x SPRA480B dlms Adaptive Delayed Filter short oflag dlms (DATA DATA DATA DATA **d, DATA *des, DATA step, ushort ushort (defined dlms.asm) Arguments: x[nx] h[nh] Pointer input vector size Pointer filter coefficient vector size stored reversed order: h(n-1), h(0) where h[n] lowest memory address. Memory alignment: circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2(nh) r[nx] dbuffer[nh] Pointer output data vector size equal Pointer location containing address delay buffer Memory alignment: delay buffer circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2(nh) des[nx] step oflag Pointer expected output array Scale factor control learning curve rate 2*mu Number filter coefficients. Filter order nh-1. Length input output data vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Adaptive Delayed (Least-mean-square) filter using coefficients stored vector Coefficients updated after each sample based algorithm using constant step 2*mu. real data input stored vector filter output result stored vector algorithm used adaptation using previous error previous sample ("delayed") take advantage 'C54x instruction. portion: Algorithm: r[i] <=nx Adaptation using previous error previous sample: e(i) des(i)- r(i) bk(i+1) bk(i) 2*mu*e(i-1)*x(i-k-1) Optimized Library Programmers TMS320C54x SPRA480B Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: Delayed version implemented take advantage 'C54x instruction. Effect covergence minimum. reference, following algorithm regular (non-delayed): portion r[i] <=nx Adaptation using current error current sample: e(i) des(i)- r(i) bk(i+1) bk(i) 2*mu*e(i)*x(i-k) Example: examples/dlms subdirectory Benchmarks: Cycles Core: 2*(nh-2)) (14+ 2*nh) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B expn Exponential Base short oflag expn (DATA DATA ushort (defined expn.asm) Arguments: x[nx] r[nx] oflag Pointer input vector size contains numbers normalized between (-1,1) format. Pointer output data vector (Q3.12 format) size equal Length input output data vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: Computes exponent elements vector using Taylor series. (i=0; i<nx; i++) y(i)= ex(i) where x(i) Overflow Handling Methodology: applicable Special Requirements: Linker command file: must allocate .data section (for polynomial coefficients) Implementation Notes: Computes exponent elements vector uses following Taylor series: exp(x) c1*x c3*x^3 c4*x^4 c5*x^5 where 0.0139 0.0348 0.1705 0.4990 1.0001 Example: examples/expn subdirectory Benchmarks: Cycles Core: 12*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B Filter oflag short (DATA DATA DATA DATA **dbuffer, ushort ushort Arguments: x[nx] h[nh] Pointer real input vector real elements. Pointer coefficient vector size normal order: Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). r[nx] dbuffer[nh] Pointer real input vector real elements. In-place computation allowed. Delay buffer case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). oflag Number real elements vector (input samples) Number coefficients Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result Description: Computes real filter (direct-form) using coefficient stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1). r[j] Algorithm: <=nx Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes also convolution function filtering, having input buffer padded with nh-1 zeros beginning buffer. However, having filter implementation that uses Optimized Library Programmers TMS320C54x SPRA480B totally independent delay buffer (dbuffer) gives more control relocation memory your data buffers case dual-buffering filtering scheme. Example: examples/fir subdirectory Benchmarks: Cycles Core: nx*(4+nh) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B firdec Decimating Filter short oflag firdec (DATA DATA DATA DATA **dbuffer ushort ushort ushort (defined decimate.asm) Arguments: x[nx] h[nh] Pointer real input vector real elements. Pointer coefficient vector size normal order: Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). r[nx/D] dbuffer[nh] Pointer real input vector nx/D real elements. In-place computation allowed. Delay buffer case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). Number real elements vector Number coefficients Decimation factor. example means drop every other sample. Ideally, should multiple not, trailing samples will lost process. Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result oflag Description: Computes decimating real filter (direct-form) using coefficient stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1). r[j] Algorithm: <=nx Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Optimized Library Programmers TMS320C54x SPRA480B Implementation Notes: None Example: examples/decim subdirectory Benchmarks: Cycles Cycles (nx/D)*(12+nh+4(D-1)) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B firinterp Interpolating Filter short oflag firinterp (DATA DATA DATA DATA **dbuffer ushort ushort ushort (defined interp.asm) Arguments: x[nx] h[nh] Pointer real input vector real elements. Pointer coefficient vector size normal order: Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). r[nx*I] dbuffer[nh] Pointer real output vector real elements. In-place computation allowed. Delay buffer case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh) oflag Number real elements vector Number coefficients Interpolation factor. example means will sample result every sample Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result Description: Computes interpolating real filter (direct-form) using coefficient stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1). r[t] Algorithm: ]x[t <=nr Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes: None Optimized Library Programmers TMS320C54x SPRA480B Example: examples/decimate subdirectory Benchmarks: Cycles Core: nx*(6+(I-1)*(17+(nh/I) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B firs Symmetric Filter short oflag firs (DATA DATA DATA **dbuffer, ushort nh2, ushort Arguments: x[nx] r[nx] dbuffer[2*nh2] Pointer real input vector real elements. Pointer real input vector real elements. In-place computation allowed. Delay buffer size 2*nh2 case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (2*nh2). oflag Number real elements vector (input samples) Half number coefficients filter (due symmetry there need provide other half) Overflow error flag 32-bit data overflow occurred intermediate final result Description: Computes real filter (direct-form) using coefficients stored program location pointed TI_LIB_COEFFS global label. filter assumed have symmetric impulse response, with first half filter coefficients stored locations pointed TI_LIB_COEFFS. real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sampleby-sample filtering (nx=1). r[j] Algorithm: ]x[t <=nx where symmetric (for example where Only stored program memory pointed TI_LIB_COEFFS global label) Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: Filter coefficients must provided program space with global label called TI_LIB_COEFFS pointing start coefficient table. Optimized Library Programmers TMS320C54x SPRA480B Implementation Notes Although this routine faster than generic symmetric filter routine (firs2) included DSPLIB, restrictive that address coefficients hard-coded global label TI_LIB_COEFFS program memory. This could problem event want multiple filtering routines with different coefficient values. that case, firs2 routine Example: examples/firs subdirectory Benchmarks: Cycles Core: (16+nh) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B firs2 Symmetric Filter (generic) short oflag firs2 (DATA DATA DATA DATA **dbuffer, ushort nh2, ushort Arguments: x[nx] r[nx] h[nh2] Pointer real input vector real elements. Pointer real input vector real elements. In-place computation allowed. Pointer vector containing half filter coefficients. assumes that filter symmetric impulse response (filter coefficients). total number filter coefficients 2*nh2. example filter coefficients then dbuffer[2*nh2] Delay buffer size 2*nh2 case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (2*nh2). oflag Number real elements vector (input samples) Half number coefficients filter (due symmetry there need provide other half) Overflow error flag 32-bit data overflow occurred intermediate final result Description: Computes real filter (direct-form) using coefficients stored array (data memory). filter assumed have symmetric impulse response, array stores only first half filter coefficients. real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1). r[j] Algorithm: ]x[t <=nx where symmetric (for example where Only stored data memory) Overflow Handling Methodology: scaling implemented overflow prevention. Optimized Library Programmers TMS320C54x SPRA480B Special Requirements: None Implementation Notes Although this routine slower than symmetric filter routine (firs) included DSPLIB, does impose restrictions location coefficient vector multiple filtering routines same executable. Example: examples/firs2 subdirectory Benchmarks: Cycles Core: nx*(15 2*nh2) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B fltoq15 Float Conversion short errorcode fltoq15 (float DATA ushort (defined fltoq15.asm) Arguments: x[nx] Pointer floating-point input vector size should contain numbers normalized between (-1,1). errorcode returned value will reflect that condition met. Pointer output data vector size containing equivalent vector Length input output data vectors function returns following error codes: element large represent format element small represent format Both conditions were encountered r[nx] errorcode Description: Convert IEEE floating point numbers store vector into numbers stored vector function returns error codes element x[i] representable format. values that exceed size limit will saturated depending sign. (0x7fff value positive, 0x8000 value negative) values small correctly represented will truncated Algorithm: applicable Overflow Handling Methodology: Saturation implemented overflow handling Special Requirements: None Implementation Notes: None Example: examples/expn subdirectory Benchmarks: Cycles Core: 40*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B hilb16 Hilbert Transformer x[nx] h[nh] Pointer real input vector real elements Pointer coefficient vector size normal order: Every valued filter coefficient i.e. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). r[nx] dbuffer[nh] Pointer real input vector real elements. In-place computation allowed Delay buffer case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). oflag short hilb16 (DATA DATA DATA DATA *dbuffer, ushort ushort Arguments: oflag Number real elements vector (input samples) Number coefficients Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result Description: Computes real filter (direct-form) using coefficient stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1). r[j] Algorithm: <=nx Overflow Handling Methodology: scaling implemented overflow prevention. Optimized Library Programmers TMS320C54x SPRA480B Special Requirements: Every valued filter coefficient This requirement hilbert transformer. example, filter look like this: [0.876 -0.324 -0.002] also convolution function filtering, having input buffer padded with nh-1 zeros beginning buffer. However, having filter implementation that uses totally independent delay buffer (dbuffer) gives more control relocation memory your data buffers case dual-buffering filtering scheme. Implementation Notes Example: examples/fir subdirectory Benchmarks: Cycles Core: nx*(4+nh) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B iir32 Double-precision Filter short oflag iir32(DATA LDATA DATA LDATA **dbuffer, ushort nbiq, ushort Arguments: x[nx] h[5*nbiq] Pointer input data vector size Pointer 32-bit filter coefficient vector with following format. example nbiq= equal high high high high high high high high high high r[nx] dbuffer[3*nbiq] beginning biquad beginning biquad coefs Pointer output data vector size equal than Pointer address 32-bit delay line dbuffer. Each biquad consecutive delay line elements. example nbiq=2: d1(n-2) d1(n-2) high d1(n-1) d1(n-1) high d1(n) d1(n) high d2(n-2) d2(n-2) high d2(n-1) d2(n-1) high d2(n) d2(n) high beginning biquad beginning biquad Optimized Library Programmers TMS320C54x SPRA480B case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (3*nbiq). nbiq oflag Number biquads Number elements input output vectors Overflow flag. oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Computes cascaded filter nbiquad biquad sections using 32-bit coefficients 32-bit delay buffers. input data assumed single-precision bits). Each biquad section implemented using Direct-form biquad coefficients biquad) stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function more efficient block-by-block filter implementation C-calling overhead. However, used sample-by-sample filtering (nx=1) Algorithm: (for biquad) d(n) x(n) a1*d(n-1) a2*d(n-2) y(n) b0*d(n) b1*d(n-1) b2*d(n-2) Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes: None Example: examples/iir32 subdirectory Benchmarks: Cycles Core: nx*(12 48*nbiq) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B iircas4 Cascaded Direct Form Using 4-Coefs Biquad short oflag iircas4(DATA DATA DATA DATA **dbuffer, ushort nbiq, ushort (defined iir4cas4.asm) Arguments: x[nx] h[4*nbiq] Pointer input data vector size Pointer filter coefficient vector with following format: .a1I where biquad index (i.e. a21: coefficient biquad Pole (recursive) coefficients Zero (non-recursive) coefficients r[nx] dbuffer[2*nbiq] Pointer output data vector size equal than Pointer address delay line Each biquad delay line elements separated nbiq locations following format: d1(n-1), d2(n-1),.di(n-1) d1(n-2), d2(n-2).di(n-2) where biquad index (i.e. d2(n-1) (n-1)th delay element biquad case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (2*nbiq) nbiq oflag Number biquads Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Computes cascade filter nbiquad biquad sections. Each biquad section implemented using Direct-form biquad coefficients biquad) stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function more efficient block-by-block filter implementation C-calling overhead. However, used sample-by-sample filtering (nx=1) Optimized Library Programmers TMS320C54x SPRA480B Algorithm: (for biquad) d(n) x(n) a1*d(n-1) a2*d(n-2) y(n) d(n) b1*d(n-1) b2*d(n-2) Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes: None Example: examples/iircas4 subdirectory Benchmarks: Cycles Core: 4*nbiq) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B iircas5 Cascaded Direct Form (5-Coefs Biquad) short oflag iircas5(DATA DATA DATA DATA **dbuffer, ushort nbiq, ushort (defined iircas5.asm) Arguments: x[nx] h[5*nbiq] Pointer input data vector size Pointer filter coefficient vector with following format: .a1i where biquad index (i.e. a21: coefficient biquad Pole (recursive) coefficients Zero (non-recursive) coefficients r[nx] dbuffer[2*nbiq] Pointer output data vector size equal than Pointer address delay line Each biquad delay line elements separated nbiq locations following format: d1(n-1), d2(n-1),.di(n-1) d1(n-2), d2(n-2).di(n-2) where biquad index(i.e. d2(n-1) (n-1)th delay element biquad case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (2*nbiq). nbiq oflag Number biquads Number elements input output vectors Overflow flag. oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Computes cascade filter nbiquad biquad sections. Each biquad section implemented using Direct-form biquad coefficients biquad) stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function more efficient block-by-block filter implementation C-calling overhead. However, used sample-by-sample filtering (nx=1). coefficients instead facilitates design filters with Unit gain less Optimized Library Programmers TMS320C54x SPRA480B that (for overflow avoidance) typically achieved filter coefficient scaling. Algorithm: (for biquad) d(n) x(n) a1*d(n-1) a2*d(n-2) y(n) b0*d(n) b1*d(n-1) b2*d(n-2) Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes: None Example: examples/iircas5 subdirectory Benchmarks: Cycles Core: 5*nbiq) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B iircas51 Cascaded Direct Form (5-Coefs Biquad) short oflag iircas51(DATA DATA DATA DATA **dbuffer, ushort nbiq, ushort (defined iircas51.asm) Arguments: x[nx] h[5*nbiq] Pointer input data vector size Pointer filter coefficient vector with following format: .b0I where biquad index (i.e. a21: coefficient biquad where biquad index (i.e. a21: coefficient biquad Pole (recursive) coefficients Zero (non-recursive) coefficients r[nx] dbuffer[4*nbiq] Pointer output data vector size equal than Pointer adress delay line dbuffer. Each biquad delay line elements stored consecutively memory following format: x1(n-1), x1(n-2), y1(n-1), y1(n-2) xi(n-2), xi(n-2), yi(n-1),yi(n-2) where biquad index(i.e. x1(n-1) (n-1)th delay element biquad case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: need memory alignment. nbiq oflag Number biquads Number elements input output vectors Overflow flag. oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Computes cascade filter nbiquad biquad sections. Each biquad section implemented using Direct-form biquad coefficients biquad) stored vector real data input stored vector filter output result stored vector Computes cascade filter nbiquad biquad sections. Each biquad section implemented using Direct-form biquad coefficients biquad) stored vector real data input stored vector filter output result stored vector Optimized Library Programmers TMS320C54x SPRA480B coefficients instead facilitates design filters with Unit gain less that (for overflow avoidance) typically achieved filter coefficient scaling. Algorithm: (for biquad) y(n)= b0*x(n) b1*x(n-1) b2*x(n-2) a1*y(n-1) a2*y(n-2) Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes: This implementation does circular addressing delay buffer. Instead takes advantage DELAY instruction. this reason delay buffer pointer will always point between successive block calls. Example: examples/iircas51 subdirectory Benchmarks: Cycles Core: 8*nbiq) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B iirlat Lattice Inverse (IIR) Filter x[nx] h[nh] Pointer real input vector real elements. Pointer lattice coefficient vector size normal order: Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). r[nx] d[nh] Pointer real input vector real elements. In-place computation allowed. Delay buffer case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). short oflag iirlat (DATA DATA DATA DATA Arguments: oflag Number real elements vector (input samples) Number coefficients Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result Description: Computes real lattice filter implementation using coefficient stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1) eN[n] x[n] ei-1[n] ei[n] hie'i-1[n-1], e'i[n] -kie e'i-1[n-1], y[n] e0[n] e'0[n] (N-1), (N-1), Algorithm: Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes: None Example: examples/iirlat subdirectory Optimized Library Programmers TMS320C54x SPRA480B Benchmarks: Cycles Overhead Code size 16-bit words) Core: nx[(3*nh) Optimized Library Programmers TMS320C54x SPRA480B firlat Lattice Forward (FIR) Filter short oflag firlat (DATA DATA DATA DATA Arguments: x[nx] h[nh] Pointer real input vector real elements Pointer lattice coefficient vector size normal order: Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh). r[nx] d[nh] Pointer real input vector real elements. In-place computation allowed. Delay buffer case multiple-buffering schemes, this array should initialized first block only. Between consecutive blocks, delay buffer preserves previous output elements needed. Memory alignment: this circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2 (nh) oflag Number real elements vector (input samples) Number coefficients Overflow error flag 32-bit data overflow occurred intermediate final result 32-bit data overflow occurred intermediate final result Description: Computes real lattice filter implementation using coefficient stored vector real data input stored vector filter output result stored vector This function retains address delay filter memory containing previous delayed values allow consecutive processing blocks. This function used both block-by-block sample-by-sample filtering (nx=1) e0[n] e'0[n] x[n], ei[n] ei-1[n] hie'i-1[n-1], e'i[n] -hie e'i-1[n-1], y[n] eN[n] Algorithm: Overflow Handling Methodology: scaling implemented overflow prevention. Special Requirements: None Implementation Notes: None Optimized Library Programmers TMS320C54x SPRA480B Example: examples/firlat subdirectory Benchmarks: Cycles Core: nx[(3*nh) Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B log_2 Base Logarithm short oflag log_2 (DATA LDATA ushort (defined log_2.asm) Arguments: x[nx] r[nx] oflag Pointer input vector size Pointer output data vector (Q31 format) size Length input output data vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: Computes base elements vector using Taylor series. (i=0; i<nx; i++) y(i)= log2 x(i) where x(i) Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: 1.4427 ln(x) with M(x)*2^P(x) M*2^P 1.4427 (ln(M) ln(2)*P) 1.4427 (ln(2*M) (P-1)*ln(2)) 1.4427 (ln((2*M-1)+1) (P-1)*ln(2)) 1.4427 (f(2*M-1) (P-1)*ln(2)) with f(u) ln(1+u). polynomial approximation f(u) f(u) polynomial coefficients follows: 0.000 0.999 -0.497 0.315 -0.190 0.082 -0.017 Optimized Library Programmers TMS320C54x SPRA480B coefficients used calculation derived from follows: 1581d 16381d -16298d 20693d -24950d 21677d -9130d 0062Dh 03FFDh 0C056h 050D5h 09E8Ah 054Adh 0DC56h Example: examples/log_2 subdirectory Benchmarks: Cycles Core: 60*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B log_10 Base Logarithm short oflag log_10 (DATA LDATA ushort (defined log_10.asm) Arguments: x[nx] r[nx] oflag Pointer input vector size Pointer output data vector (Q31 format) size Length input output data vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: Computes base elements vector using Taylor series. (i=0; i<nx; i++) y(i)= log10 x(i) where x(i) Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: 0.4343 ln(x) with M(x)*2^P(x) M*2^P 0.4343 (ln(M) ln(2)*P) 0.4343 (ln(2*M) (P-1)*ln(2)) 0.4343 (ln((2*M-1)+1) (P-1)*ln(2)) 0.4343 (f(2*M-1) (P-1)*ln(2)) with f(u) ln(1+u). polynomial approximation f(u): f(u) polynomial coefficients follows: 0.000 0.999 -0.497 0.315 -0.190 0.082 -0.017 Optimized Library Programmers TMS320C54x SPRA480B coefficients used calculation derived from follows: 1581d 16381d -16298d 20693d -24950d 21677d -9130d 0062Dh 03FFDh 0C056h 050D5h 09E8Ah 054ADh 0DC56h Example: examples/log_10 subdirectory Benchmarks: Cycles Core: 55*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B logn Base Logarithm (natural logarithm) short oflag logn (DATA LDATA ushort (defined logn.asm) Arguments: x[nx] r[nx] oflag Pointer input vector size Pointer output data vector (Q31 format) size Length input output data vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: Computes base elements vector using Taylor series. (i=0; i<nx; i++) y(i)= logn x(i) where x(i) Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: 0.4343 ln(x) with M(x)*2^P(x) M*2^P 0.4343 (ln(M) ln(2)*P) 0.4343 (ln(2*M) (P-1)*ln(2)) 0.4343 (ln((2*M-1)+1) (P-1)*ln(2)) 0.4343 (f(2*M-1) (P-1)*ln(2)) with f(u) ln(1+u). polynomial approximation f(u): f(u) polynomial coefficients follows: 0.000 0.999 -0.497 0.315 -0.190 0.082 -0.017 Optimized Library Programmers TMS320C54x SPRA480B coefficients used calculation derived from follows: 1581d 16381d -16298d 20693d -24950d 21677d -9130d 0062Dh 03FFDh 0C056h 050D5h 09E8Ah 054ADh 0DC56h Example: examples/logn subdirectory Benchmarks: Cycles Core: 39*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B maxidx Index Maximum Element Vector short maxidx (DATA ushort (defined maxidx.asm) Arguments: x[nx] Description: Algorithm: Pointer input vector size Index vector element with maximum value Length input data vector Returns index maximum element vector case multiple maximum elements, contains index last maximum element found applicable Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: None Example: examples/maxidx subdirectory Benchmarks: Cycles Core: 3*nx even) approx 3*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B maxval Maximum Value Vector short maxval (DATA ushort (defined maxval.asm) Arguments: x[nx] Description: Algorithm: Pointer input vector size Maximum value vector Length input data vector Returns maximum element vector applicable Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: None Example: examples/maxval subdirectory Benchmarks: Cycles Core: 2*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B minidx Index Minimum Element Vector short minidx (DATA ushort (defined minidx.asm) Arguments: x[nx] Description: Algorithm: Pointer input vector size Index vector element with minimum value Lenght input data vector Returns index minimum element vector case multiple minimum elements, contains index last minimum element found. applicable Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: Different implementation than maxidx because unable cmps instruction with min. Example: examples/minidx subdirectory Benchmarks: Cycles Core: 5*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B minval Minimum Value Vector short minval (DATA ushort (defined minval.asm) Arguments: x[nx] Description: Algorithm: Pointer input vector size Maximum value vector Lenght input data vector Returns minimum element vector applicable Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: None Example: examples/minval subdirectory Benchmarks: Cycles Core: 2*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B mmul Matrix Multiplication short oflag mmul (DATA *x1,short row1,short col1,DATA *x2,short row2,short col2,DATA Arguments: x1[row1*col1]: Pointer input vector size Pointer input matrix size row1*col1 row1 r[row1*col2] Pointer output data vector size row1*col2 number rows matrix number columns matrix Pointer input matrix size row2*col2 number rows matrix number columns matrix Pointer output matrix size row1*col2 Length input data vector row1 col1 x2[row2*col2]: row2 col2 r[row1*col2] Description: Algorithm: Returns minimum element vector Multiply input matrix input matrix using nested loops: temp temp temp A(i,j) B(j,k) C(i,k) temp Overflow Handling Methodology: applicable Special Requirements: Verify that dimensions input matrices legal. Implementation Notes: None Example: examples/minval subdirectory Benchmarks: Cycles Overhead Core: row1*(7+(11+(6*col1))*col2) Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B mtrans Matrix Transpose short oflag mtrans(DATA DATA ushort (defined mtrans.asm) Arguments: x[row*col] r[row*col] Description: Algorithm: Pointer input matrix. In-place processing allowed. Number rows matrix Number columns matrix Pointer output data vector size containing This function transponse matrix C(j,i) A(i,j) Overflow Handling Methodology: Scaling implemented overflow prevention (User selectable) Special Requirements: None Implementation Notes: None Example: examples/mtrans subdirectory Benchmarks: Cycles Core: [5+(col*6)] Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B mul32 32-bit Vector Multiply short oflag mul32(LDATA LDATA LDATA ushort (defined mul32.asm) Arguments: x[nx] y[nx] r[nx] oflag Pointer input data vector size In-place processing allowed Pointer input data vector size Pointer output data vector size containing Number elements input output vectors Overflow flag. oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: This function multiply 32-bit vectors, element element, produce 32-bit vector. (i=0; i++) Overflow Handling Methodology: Scaling implemented overflow prevention (User selectable) Special Requirements: None Implementation Notes: None Example: examples/add subdirectory Benchmarks: Cycles Core: 7*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B nblms Normalized Block Block Filter short oflag nblms (DATA *x,DATA *h,DATA DATA **dbuffer, DATA *des, ushort ushort ushort DATA **norm_e, l_tau, cutoff, gain) (defined nblms.asm) Arguments: x[nx] h(nh) Input data vector size (reference input) Pointer filter coefficient vector size stored reversed order: h(n-1), h(0) where h[n] lowest memory address. Memory alignment: circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2(nh) r[nx] dbuffer[nh] Pointer output data vector size equal Pointer location containing address delay buffer Memory alignment: delay buffer circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2(nh) des[nx] bsize norm_e l_tau cutoff gain oflag Pointer expected output array Number filter coefficients. Filter order nh-1. Length input output data vectors number blocks blocksize (number coefficients updated each input sample) Note: (number coefficients) nb*bsize pointer normalized error buffer decay constant long-term filtering power estimate lowest allowed value power estimate step size constant: 2*beta= beta1/abs_power 2^(gain) abs_power Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Optimized Library Programmers TMS320C54x SPRA480B Description: Normalized Delayed (NDLMS) Block implementation using coefficients stored vector Coefficients updated after each sample based algorithm. real data input stored vector filter output result stored vector algorithm used adaptation uses previous error previous sample ("delayed") takes advantage 'C54x instruction. Restrictions: This version does allow consecutive calls this routine dual buffering fashion. Algorithm: portion more detailed description algorithm, refer [4]. r[i] <=nx Adaptation using previous error previous sample e(i) d(i) y(i); var(i) (1-beta)*var(i-1) beta*[abs(x(i)) cutoff]; (j=0: j<nb; j++) bkj(i+1) bkj(i) [2*mu*e(i)*x(i-k)]/[var(i)^2] Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: Linker command file: must allocate .ebuffer section (for polynomial coefficients) (error) (signal power estimate) Implementation Notes: Delayed version implemented take advantage 'C54x instruction. Effect covergence minimum. reference, following algorithm regular (non-delayed): portion r[i] <=nx Adaptation using current error current sample: e(i) des(i)- r(i) bk(i+1) bk(i) 2*mu*e(i)*x(i-k) Example: examples/ndlms subdirectory Optimized Library Programmers TMS320C54x SPRA480B Benchmarks: Cycles Core: Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B ndlms Normalized Delayed Filter short oflag ndlms (DATA DATA DATA DATA *dbuffer, DATA *des, ushort ushort l_tau, cutoff, gain, DATA *norm_d) (defined ndlms.asm) Arguments: x[nx] h(nh) input data vector size (reference input) Pointer filter coefficient vector size stored reversed order h(n-1), h(0) where h[n] lowest memory address. Memory alignment: circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2(nh) r[nx] dbuffer[nh] Pointer output data vector size equal Pointer location containing address delay buffer Memory alignment: delay buffer circular buffer must start k-bit boundary (that LSBs starting address must zeros) where log2(nh) des[nx] l_tau cutoff gain norm_d oflag Pointer expected output array Number filter coefficients. Filter order nh-1. Length input output data vectors Decay constant long-term filtering power estimate lowest allowed value power estimate step size constant: 2*beta= beta1/abs_power 2^(gain) abs_power pointer normalized delay buffer Overflow flag. oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Normalized Delayed (NDLMS) Block implementation using coefficients stored vector Coefficients updated after each sample based algorithm. real data input stored vector filter output result stored vector algorithm used adaptation using previous error previous sample ("delayed") take advantage 'C54x instruction. Restrictions: This version does allow consecutive calls this routine dual buffering fashion. Algorithm: more detailed description algorithm, refer [4]. Optimized Library Programmers TMS320C54x SPRA480B portion r[i] <=nx Adaptation using previous error previous sample e(i) des(i)- r(i) var(i) (1-beta)*var(i-1) beta*[abs(x(i)) cutoff]; bk(i+1) bk(i) Overflow Handling Methodology: scaling implemented overflow prevention Special Requirements: None Implementation Notes: Delayed version implemented take advantage 'C54x instruction. Effect covergence minimum. Example: examples/ndlms subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B Vector Negate short oflag (DATA DATA ushort (defined neg.asm) Arguments: x[nx] r[nx] Pointer input data vector size In-place processing allowed Pointer output data vector size In-place processing allowed Special cases: x[I] 32768, then 321767 with oflag 32767, then 321768 with oflag oflag Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred This shoud taken warning: overflow negation number happen naturally when negating (-1). Description: Algorithm: This function negates each elements vector (fractional values). (i=0; i++) Overflow Handling Methodology: Saturation implemented overflow handling Special Requirements: None Implementation Notes: None Example: examples/neg subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B neg32 Vector Negate (double-precision) short oflag neg32 (LDATA LDATA ushort (defined neg32.asm) Arguments: x[nx] r[nx] Pointer input data vector size In-place processing allowed Pointer output data vector size In-place processing allowed Special cases: 32768*2^16 then 321767*2^16 with oflag 32767*2^16 then 321768*2^16 with oflag oflag Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred This should take warning: overflow negation number happen naturally when negating (-1). Description: Algorithm: This function negates each elements vector (fractional values). (i=0; i++) Overflow Handling Methodology: Saturation implemented overflow handling Special Requirements: None Implementation Notes: None Example: examples/neg32 subdirectory Benchmarks: Cycles Core: 4*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B power Vector Power short oflag power (DATA LDATA ushort (defined power.asm) Arguments: x[nx] r[1] Pointer input data vector size In-place processing allowed Pointer output data vector element format Special cases: 32768*2^16 then 321767*2^16 with oflag 32767*2^16 then 321768*2^16 with oflag oflag Number elements input vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: This function calculates power (sum products) vector. Power (i=0; i++) power *x(I) Overflow Handling Methodology: scaling implemented overflow handling Special Requirements: None Implementation Notes: None Example: examples/power subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B q15tofl Float Conversion void q15tofl (DATA float ushort (defined q152fl.asm) Arguments: x[nx] r[nx] Description: Algorithm: Pointer input vector size Pointer floating-point output data vector size containing floating-point equivalent vector Length input output data vectors Converts stored vector IEEE floating point numbers stored vector applicable Overflow Handling Methodology: Saturation implemented overflow handling Special Requirements: None Implementation Notes: None Example: examples/ug subdirectory Benchmarks: Cycles Core: 11+36*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B rand16init Initialize Random Number Generator void rand16init(void) (defined rand16i.asm) Arguments: Description: Algorithm: None Initializes seed random number generation routine applicable Overflow Handling Methodology: scaling implemented overflow handling Special Requirements: Implementation Notes: Allocation .bss section required linker command file. This function initializes global variable rndnum global memory used random number generation routine (rand16) Example: examples/rand subdirectory Benchmarks: Cycles Code size 16-bit words) Total Optimized Library Programmers TMS320C54x SPRA480B rand16 Random Vector Generation short oflag rand16(DATA ushort (defined rand16.asm) Arguments: x[nx] oflag Pointer input data vector size Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: Computes vector random numbers Linear Congruential Method Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: None Example: examples/rand16 subdirectory Benchmarks: Cycles Core: nx*4 Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B recip16 16-bit Reciprocal Function void recip16 (DATA DATA DATA *rexp, ushort (defined recip16.asm) Arguments: x[nx] r[nx] rexp[nx] Description: Pointer input data vector size Pointer output data buffer Pointer exponent buffer output values. These exponent values integer format. Number elements input output vectors This routine returns fractional exponential portion reciprocal number. Since reciprocal always greater than returns exponent such that: r[i] rexp[i] true reciprocal floating-point Appendix-Calculating reciprocal number Algorithm: Overflow Handling Methodology: None Special Requirements: None Implementation Notes: None Example examples/recip16 subdirectory Benchmarks: Cycles Core: Overhead Code size 16-bit words) words words data space Optimized Library Programmers TMS320C54x SPRA480B rfft Forward Real (in-place) void rfft (DATA short scale) (defined rfft#.asm where #=nx) Arguments: x[nx] Pointer input vector containing real elements bit-reversed order. output, vector contains half (nx/2 complex elements) output following order. Real symmetric function around Nyquist point, this reason only half FFT(x) elements required. output will contain FFT(x) following format: y(0)Re y(nx/2)im Nyquist y(1)Re y(1)Im y(2)Re y(2)Im y(nx/2)Re y(nx/2)Im Complex numbers stored Re-Im format must aligned 2*nx boundary, where size. log(nx) LSBits address must zero Number real elements vector must constant number (not variable) take following values. =16,32,64,128,256,512,1024 scale Flag indicate whether scaling should implemented during computation. (scale scale factor else scale factor Description: Computes Radix-2 real real elements stored vector bit-reversed order. original content vector destroyed process. first nx/2 complex elements FFT(x) stored vector normal-order. (DFT) Algorithm: y[k] 1/(scale factor) (cos(2 sin(2 Overflow Handling Methodology: Scaling implemented overflow prevention (See section 6.3) Optimized Library Programmers TMS320C54x SPRA480B Special Requirements: Special linker command file sections required: .sintab (containing twiddle table). .sintab section size refer benchmark information below. This function requires inclusion other files during assembling (automatically included): macros.asm (contains macros used this code) sintab.q15 (contains twiddle table section .sintab) unpack.asm (containing code unpacking results) Implementation Notes: Implemented complex size nx/2 followed unpack stage unpack real results. Therefore, implementation Notes cfft function apply this case. Notice that normally real sequence size produces complex sequence size real numbers) that will input sequence. accomodate results without requiring extra memory locations, output reflects only half spectrum (complex output). This still provides full information because real sequence even symmetry around center nyquist point(N/2). Special debugging consideration: This function implemented macro that invokes different routines according size. consequence, instead rfft symbol being defined, multiple rfft# symbols (where real size) When scale this routine prevents overflow scaling each intermediate stages unpacking stage. Example: examples/rfft subdirectory Benchmarks: cycles (butterfly core only) size 1024 Cycles(Note) 'C541 1160 2516 5470 11881 25716 Code-size (words) .text section 1517 data-size (words) .sintab section Note: Assumes data on-chip dual access that there conflict twiddle table reads instruction fetches (provided linker command file reflects that) Optimized Library Programmers TMS320C54x SPRA480B rifft Inverse Real (in-place) void rifft (DATA short scale) (defined rifft#.asm where #=nx) Arguments: x[nx] Pointer input vector containing real elements bit-reversed order, shown below Y(0)Re y(nx/2)im Nyquist y(2)Re y(2)Im y(1)Re y(1)Im y(nx/2)Re y(nx/2)Im where fft(x) output, vector contains complex elements corresponding IFFT(x) signal itself. Complex numbers stored Re-Im format must aligned 2*nx boundary, where IFFT size. log(nx) LSBits address must zero Number real elements vector must constant number (not variable) take following values. =16,32,64,128,256,512,1024 scale Flag indicate whether scaling should implemented during computation. (scale scale factor else scale factor Description: Computes Radix-2 real IFFT real elements stored vector bitreversed order. original content vector destroyed process. nx/2 complex elements IFFT(x) stored vector normal-order. (IDFT) Algorithm: y[k] 1/(scale factor) (cos(2 sin(2 Overflow Handling Methodology: Scaling implemented overflow prevention Optimized Library Programmers TMS320C54x SPRA480B Special Requirements: Special linker command file sections required: .sintab (containing twiddle table). .sintab section size refer benchmark information below. This function requires inclusion other files during assembling (automatically included): macrosi.asm (contains macros used this code) sintab.q15 (contains twiddle table section .sintab) unpacki.asm (containing code unpacking results) Implementation Notes: Implemented complex IFFT size nx/2 followed unpack stage unpack real IFFT results. Therefore, implementation Notes cfft function apply this case. Notice that normally IFFT real sequence size produces complex sequence size real numbers) that will input sequence. accomodate results without requiring extra memory locations, output reflects only half spectrum (complex output). This still provides full information because IFFT real sequence even symmetry around center nyquist point(N/2). Special debugging consideration: This function implemented macro that invokes different IFFT routines according size. consequence, instead rfft symbol being defined, multiple rifft# symbols (where IFFT real size) When scale this routine prevents overflow scaling each IFFT intermediate stages unpacking stage. Example: examples/rifft subdirectory Benchmarks: cycles (butterfly core only) IFFT size 1024 Cycles (Note) 'C541 1160 2516 5470 11881 25716 Code-size (words) .text section data-size (words) .sintab section 1517 Note: Assumes data on-chip dual access that there conflict twiddle table reads instruction fetches (provided linker command file reflects that) Optimized Library Programmers TMS320C54x SPRA480B sine Sine short oflag sine (DATA DATA ushort (defined sine.asm) Arguments: x[nx] Pointer input vector size contains angle radians between [-pi, normalized between [-1,1) format xrad example: pi/4 will equivalent 0.25 0x200 format. r[nx] oflag Pointer output vector containing sine vector format Number elements input output vectors Overflow flag. oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: Computes sine elements vector uses following Taylor series compute angle quadrant (0-pi/2) (i=0; i<nx; i++) y(i)= sin(x(i)) where x(i) xrad/pi Overflow Handling Methodology: applicable Special Requirements: Linker command file: .data section must allocated. Implementation Notes: Computes sine elements vector uses following Taylor series compute angle quadrant (0-pi/2) sin(x) c1*x c2*x^2 c3*x^3 c4*x^4 c5*x^5 3.140625x 0.02026367 5.3251 0.5446778 1.800293 angle other quadrant calculated using symmetries that angle into quadrant Example: examples/sine subdirectory Optimized Library Programmers TMS320C54x SPRA480B Benchmarks: Cycles Core: 20*nx 18*nx Overhead Code size 16-bit words) (worst case) (best case) program space) data space) Optimized Library Programmers TMS320C54x SPRA480B sqrt_16 Square Root 16-bit Number short oflag sqrt_16 (DATA DATA short (defined sqrtv.asm) Arguments: x[nx] r[nx] oflag Pointer input vector size Pointer output vector size containing sqrt(x). In-place operation allowed equal Number elements input output vectors Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred Description: Algorithm: Calculates square root each element input vector storing results output vector (i=0; i<nx;i++) r[i] sqrt(x(i)) where 0<=i <=nx Overflow Handling Methodology: applicable Special Requirements: None Implementation Notes: None Example: examples/sine subdirectory Benchmarks: Cycles Core: 42*nx Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B Vector Subtract short oflag (DATA DATA DATA ushort ushort scale) (defined sub.asm) Arguments: x[nx] y[nx] r[nx] Pointer input data vector size In-place processing allowed Pointer input data vector size Pointer output data vector size containing (x-y) scale (x-y) scale Number elements input output vectors Scale selection Scale divide result prevent overflow Scale does divide Overflow flag oflag 32-bit overflow occurred oflag 32-bit overflow occurred scale oflag Description: Algorithm: This function adds vectors, element element. (i=0; i++) Overflow Handling Methodology: Scaling implemented overflow prevention (User selectable) Special Requirements: None Implementation Notes: None Example: examples/sub subdirectory Benchmarks: Cycles Core: 3*nx/2 Overhead Code size 16-bit words) Optimized Library Programmers TMS320C54x SPRA480B DSPLIB Benchmarks Performance Issues functions DSPLIB provided with execution time code size benchmarks. While developing included functions, tried compromise between speed, code size ease use. However with exceptions, highest priority given optimize speed ease-of-use, last code size. Even though DSPLIB used first estimation processor performance specific function, should have mind that generic nature DSPLIB might extra cycles required customer specific use. What DSPLIB Benchmarks Provided DSPLIB documentation includes benchmarks instruction cycles memory consumption. following benchmarks typically included: Calling register initialization overhead Number cycles kernel code: Typically provided form equation that function data size parameters. consider kernel core) code, instructions contained between _start _end labels that each functions Memory consumption: Typically program size 16-bit words reported. functions requiring significant internal data allocation, data memory consumption also provided. When stack usage local variables minimum, that data consumption reported. functions which difficult determine number cycles kernel code function data size parameters, have included direct cycle count specific data sizes. Performance Considerations Benchmark cycles presented assume best case conditions, typically assuming: 0-wait state memory external memory program data data allocation on-chip DARAM no-pipeline hits. linker command file showing memory allocation used during testing benchmarking 'C54x included under example subdirectory. Remember, execution speed system dependent where different sections program data located memory. sure account such differences, when trying explain routine taking more time that reported DSPLIB benchmarks. Optimized Library Programmers TMS320C54x SPRA480B Licensing, Warranty Support Licensing Warranty 'C54x DSPLIB distributed free-of-charge product under generic Texas Instrument License Form presented Appendix BETA RELEASE SPECIAL DISCLAIMER: This DSPLIB software release preliminary (Beta). intended evaluation only. Testing characterization been fully completed. Production release will typically follow after month Beta release explicit guarantees paced that date. DSPLIB Software Updates 'C54x DSPLIB software updates will periodically released, incorporating product enhancement fixes. DSPLIB software updates will posted they become available same location download this information. Source code previous releases will kept public prevent customer problem case decide discontinue change functionality DSPLIB functions. Make sure read readme.1st file available root directory every release. DSPLIB Customer Support have question want report problems suggestions regarding 'C54x DSPLIB, contact Texas Instruments dsph@ti.com. encourage software report form (report.txt) contained DSPLIB directory report problem associated with 'C54xDSPLIB. References MathWorks, Inc. Matlab Signal Processing Toolbox User's Guide. Natick, MathWorks, Inc., 1996. Lehmer, D.H. "Mathematical Methods large-scale computing units." Proc. Sympos. Large-Scale Digital Calculating Machinery, Cambridge, 1949. Cambridge, Harvard University Press, 1951. Oppenheim, Alan Ronald Schafer. Discrete-Time Signal Processing. Englewood Cliffs, Prentice Hall, 1989. Digital Signal Processing with TMS320 Family (SPR012) TMS320C54x Peripherals. Reference Volume (SPRU131) TMS320C54x Optimizing Compiler User's Guide (SPRU103) Matlab registered trademark MathWorks, Inc. Optimized Library Programmers TMS320C54x SPRA480B Acknowledgments DSPLIB includes code contributed following people: Aaron Aboagye Jeff Axelrod Karen Baldwin Philippe Cavalier Pascal Dorster Allison Frantz Pedro Gelabert Mike Hannah Jeff Hayes Natalie Messine Jelena Nikolic Greg Peake Rosemarie Piedra Cesar Ramirez Alex Tessarolo Carol Chow Pierre Ponce Julius Kusuma Optimized Library Programmers TMS320C54x SPRA480B Appendix Overview Fractional Formats Unless specifically noted, DSPLIB functions format more exact Q0.15. Qm.n format, there bits used represent twos complement integer portion number, bits used represent twos complement fractional portion. m+n+1 bits needed store general Qm.n number. extra needed store sign number most-significant position. representable integer range specified (-2m,2m) finest fractional resolution 2-n. example, most commonly used format Q.15. Q.15 means that 16-bit word used express signed number between positive negative one. most-significant binary digit interpreted sign format number. Thus Q.15 format, decimal point placed immediately right sign bit. fractional portion right sign stored regular twos complement format. Q3.12 Format Q.3.12 format places sign after fourth binary digit from right, next bits contain twos complement fractional component. approximate allowable range numbers Q.3.12 representation (-8,8) finest fractional resolution 2-12 2.441x104. Table Value Q3.12 Fields Q.15 Format Q.15 format places sign leftmost binary digit, next leftmost bits contain twos complement fractional component. approximate allowable range numbers Q.15 representation (-1,1) finest fractional resolution 2-15 3.05 10-5. Table Value Q.15 Fields Optimized Library Programmers TMS320C54x SPRA480B Q.31 Format Q.31 format spans 16-bit memory words. 16-bit word stored lower memory location contains least-significant bits, higher memory location contains mostsignificant bits sign bit. approximate allowable range numbers Q.31 representation (-1,1) finest fractional resolution 2-31 4.66 10-10. Table Value Q.31 Memory Location Fields Table Value Q.31 High Memory Location Fields Optimized Library Programmers TMS320C54x SPRA480B Appendix Calculating Reciprocal Number most optimal method calculating inverse fractional number (Y=1/X) normalize number first. This limits range number follows: Xnorm Xnorm -0.5 resulting equation becomes: 1/(Xnorm*2^-n) 2^n/Xnorm where 1,2,3,.,14,15 Letting 2^n: Substituting into equation (2): 1/Xnorm Letting 1/Xnorm: 1/Xnorm Substituting into equation (4): given range Xnorm, range calculate value various options possible: Taylor Series Expansion 2nd,3rd,4th,. Order Polynomial (Line Best Fit) Successive Approximation method chosen this example (c). Successive approximation yields most optimum code versus speed versus accuracy option. method outlined below yields accuracy bits. Assume Ym(new) exact value 1/Xnorm: Ym(new) 1/Xnorm Ym(new)*X (c1) (c2) Optimized Library Programmers TMS320C54x SPRA480B Assume Ym(old) estimate value 1/X: Ym(old)*Xnorm Ym(old)*Xnorm (c3) where error calculation Assume that Ym(new) Ym(old) related follows: Ym(new) Ym(old) where difference values Substituting (c2) (c4) into (c3): Ym(old)*Xnorm Ym(new)*Xnorm (Ym(new) Dy)*Xnorm Ym(new)*Xnorm Ym(new)*Xnorm Dy*Xnorm Ym(new)*Xnorm Dy*Xnorm 1/Xnorm (c5) (c4) Assume that 1/Xnorm approximately equal Ym(old): Ym(old) (approx) Substituting (c6) into (c4): Ym(new) Ym(old) Dxy*Ym(old) Substituting from (c3) into (c7): Ym(new) Ym(old) (Ym(old)*Xnorm 1)*Ym(old) Ym(new) Ym(old) Ym(old)^2*Xnorm Ym(old) Ym(new) 2*Ym(old) Ym(old)^2*Xnorm (c8) (c7) (c6) after each calculation equate Ym(old) Ym(new): Ym(old) Ym(new) Then equation (c8) evaluates 2*Ym Ym^2*Xnorm (c9) start with initial estimate then equation (c9) will converge solution very rapidly (typically iterations 16-bit resolution). initial estimate either obtained from look table, from choosing mid-point, simply from linear interpolation. method chosen this problem latter. This simply accomplished taking complement least significant bits Xnorm value. Optimized Library Programmers TMS320C54x SPRA480B Appendix Texas Instruments License Agreement Code DOWNLOAD THIS PROGRAM AGREE THESE TERMS. Texas Instruments Incorporated grants license Program only country where acquired Program copyrighted licensed (not sold). transfer title Program you. obtain rights other than those granted under this license. Under this license, may: Program more machines time; Make copies Program backup purposes within your enterprise; Modify Program merge into another program; Make copies original file downloaded distribute provided that transfer copy this license other party. other party agrees these terms first Program. must reproduce copyright notice other legend ownership each copy partial copy, Program. NOT: Sublicense, rent, lease, assign Program; Reverse assemble, reverse compile, otherwise translate Program. non-TI DSPs warrant that Program free from claims third party copyright, patent, trademark, trade secret, other intellectual property infringement. Under circumstances liable following: Third-party claims against losses damages; Loss damage your records data; Economic consequential damages (including lost profits savings) incidental damages, even informed their possibility. Some jurisdictions allow these limitations exclusions, they apply you. warrant uninterrupted error free operation Program. have obligation provide service, defect correction, maintenance Program. have obligation supply Program updates enhancements even such later become available. DOWNLOAD THIS PROGRAM AGREE THESE TERMS. THERE WARRANTIES, EXPRESS IMPLIED, INCLUDING IMPLIED WARRANTIES MERCHANTABILITY FITNESS PARTICULAR PURPOSE. Some jurisdictions allow exclusion implied warranties, above exclusion apply you. Optimized Library Programmers TMS320C54x SPRA480B terminate this license time. terminate this license fail comply with terms. either event, must destroy your copies Program. responsible payment taxes resulting from this license. sell, transfer, assign, subcontract your rights obligations under this license. attempt void. Neither bring legal action more than years after cause action arose. This license governed laws State Texas. Optimized Library Programmers TMS320C54x IMPORTANT NOTICE Texas Instruments subsidiaries (TI) reserve right make changes their products discontinue product service without notice, advise customers obtain latest version relevant information verify, before placing orders, that information being relied current complete. products sold subject terms conditions sale supplied time order acknowledgment, including those pertaining warranty, patent infringement, limitation liability. warrants performance semiconductor products specifications applicable time sale accordance with TI's standard warranty. Testing other quality control techniques utilized extent deems necessary support this warranty. Specific testing parameters each device necessarily performed, except those mandated government requirements. Customers responsible their applications using components. order minimize risks associated with customer's applications, adequate design operating safeguards must provided customer minimize inherent procedural hazards. assumes liability applications assistance customer product design. does warrant represent that license, either express implied, granted under patent right, copyright, mask work right, other intellectual property right covering relating combination, machine, process which such semiconductor products services might used. TI's publication information regarding third party's products services does constitute TI's approval, warranty endorsement thereof. Copyright 2000, Texas Instruments Incorporated Other recent searchesSTK15C88 - STK15C88 STK15C88 Datasheet SD1414-12 - SD1414-12 SD1414-12 Datasheet CVCO55BE-1930-1990 - CVCO55BE-1930-1990 CVCO55BE-1930-1990 Datasheet AD7986 - AD7986 AD7986 Datasheet 2SJ210 - 2SJ210 2SJ210 Datasheet
Privacy Policy | Disclaimer |