The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers.    


Datasheet Search Engine   
 
Part # or Description: • 5V RS232 Driver • 2SC5066* • "Real Time Clock" • "USB connector" • "blue led" 5mm • 10 watt zener diode • 2N3055* motorola
 
Search Tip: Try entering the part number only. Include a wildcard (eg. lm317* or 1n4148*)

 

 

Mathew George, (Joe) Mohsen Khayami Digital Signal Processing Solution


Datasheet Thumbnail

  

Download PDF



Top Searches for this datasheet



Using TMS320C6x Non-Traditional Applications
Mathew George, (Joe) Mohsen Khayami Digital Signal Processing Solutions
Abstract
Texas Instruments (TITM) TMS320C6x digital signal processor (DSP) architecture, with RISC-like instruction set, flexible parallelism, conditional execution, used nontypical applications from microcontroller-type FPGA/ASIC/data flow-type tasks. This paper uses code examples explore ways efficiently handle manipulation, address manipulation, dataflow configurations. addition, this document includes example table lookup benchmark system architecture discussion data input/output.
Contents
Introduction. CPU/Instruction Features With Code Examples. Manipulation Address Manipulation Decision Execution (Conditionally Execute Advantages Over Test/Branch) Application Example. Table Lookup Example Description Table Lookup Example Code. System Discussion-C6x DMAs Data (Eliminate Components) Conclusion. Appendix Table Lookup Code. ipp.c iploop.sa ipp.cmd ipploop.asm (tool generated) ipptab.asm
Digital Signal Processing Solutions
1999
Figures
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Clear/Set/Toggle Example. Clear/Set/Toggle Code. Byte-Swap Example. Byte-Swap Code. Table Parsing Example Table Parsing Code Link List Example. Link List Code. Decision Execution Concepts Decision Execution Example (Comparator) Decision Execution Code (Bit Test/Branch). Decision Execution Code (Conditional Execute). Decision Execution Code (Conditional Execute Parallel). Decision Execution Code (Conditional Execute Parallel-II) Decision Execution Code (Conditional Execute-Software Pipelined) Table Lookup Example Description Table Lookup Example Code Initialization. Primary Lookup Table Loop Linear Assembly. Primary Lookup Table Loop Pure Assembly C6x) Architecture Architecture (Size) Architecture (Speed) C6202 Architecture With Second
Introduction
TMS320C6x traditional DSP, even though handles traditional applications, such filtering, FFTs, vocoders, that other DSPs also includes variety additional features that make attractive non-standard applications. These applications include, limited
Microcontroller-style manipulation "bit banging" often called) instructions some cases performed even better than with microcontrollers single 5-ns cycle) Byte addressibility Address manipulation (with improved results over C3x/C4x) Dynamic operations (performed well those C3x/C4x) Efficient "conditionally execute" method classic test/branch seen "controller"-type housekeeping code Ability replace FPGA/ASIC with "dataflow" style design innovative tools develop these operations easily Elegant four-channel (direct memory access) data movement
Using TMS320C6x Non-Traditional Applications
This application report examines various aspects their implementation code hardware. Specific architectural features described accompanied code examples. document includes application example discusses tools assist process optimization. elegant hardware architecture also presented data movement processing. information presented this document should encourage appreciation power DSP.
CPU/Instruction Features With Code Examples
examine various non-traditional features architecture through instruction that does well traditional DSPs often not. graphic example concept and/or application followed code segment provided each. Note that code examples authentic (assembled simulator) assembly code, most UNOPTIMIZED meant purely descriptive, academic purposes. features classified into three descriptive groups: manipulation, address manipulation, decision execution. these features important enabling optimally execute some these non-traditional functions.
Manipulation
This section examines types manipulation done both microcontroller-type ASIC/FPGA-type applications. microcontroller-type applications, registers often manipulated control peripherals perform housekeeping functions. ASIC/FPGAtype applications, fast data streams often manipulated. Note that manipulation usually done data addresses (for more information, section, Address Manipulation).
Clear/Set/Toggle
Figure shows value register being set, cleared, toggled, then placed another register. This value might have been loaded from register part data stream.
Figure Clear/Set/Toggle Example
Manipulation
Clear/Set/Toggle with Single Cycle Instruction.
1234h 0000h
Clear
1234h
0ffffh
Toggle
5555h
0aaaah
Using TMS320C6x Non-Traditional Applications
code that corresponds each these operations shown Figure
Figure Clear/Set/Toggle Code
Manipulation
Set/Clear/Toggle with single cycle instruction*.
.text Typical banging bitbang: MVKH .data bbdata: .word bbdata, bbdata, *A15, Initialize pointer with MVK/MVKH** Load value bbdata=12345555h 12345555h CLEAR upper byte upper halfword 00345555h lower byte upper halfword 00ff5555h TOGGLE 0ff00aaaah
012345555h
*See TMS320C62xx Instruction Reference Guide 3-42, 117. **See TMS320C62xx Instruction Reference Guide 3-77 3-80.
three operations shown bold Figure Each instruction executed single cycle. example, address (pointer) bbdata, where bbdata located .data section, loaded into register using MVK/MVKH instructions. value bbdata loaded into register using instruction with *A15 acting pointer register. pipeline considerations. NOTE: Remember that most following code examples UNOPTIMIZED. SET/CLR accomplished specifying from which which needs cleared from bits value specified here constant (and fits well opcode because 5-bit constants well opcode). With XOR, entire bits toggled because constant value "-1" signextended before operation done. Please note that above instructions also executed using mask (and hence real-time dynamically, needed) register. These operations standard microcontrollers well supported other TMS320 DSPs. C2xx requires accumulator C54x requires accumulators, making them unavailable other operations. (parallel logic unit) that directly manipulates data, thus off-loading accumulator offering advantage over other fixed-point processors. Even does have SET/CLR. TMS320 family, only parallel processor (PP) offers improved performance over these types operations.
Using TMS320C6x Non-Traditional Applications
Byte-Swapping
Byte swapping classic operation going back Intel Motorola models bigendian little-endian. Usually very difficult software, shifters made problem much easier other TMS320 DSPs. byte-swapping operation shown Figure
Figure Byte-Swap Example
Manipulation
Byte swap done with "Extract" Instruction EXT/EXTU.
Byte Swap
1234h
3412h
This operation must accomplished pure software without shifter. Multiplication (which slow most probably required along with some masking addition. case C6x, instruction makes even easier letting programmer actually pick contiguous bits wants manipulate. This powerful feature implemented interesting with shifts cycle. (See Figure code.)
Using TMS320C6x Non-Traditional Applications
Figure Byte-Swap Code
Manipulation
.text Byteswap using EXTU (extract unsigned) instruction byteswap:MVK bbdata, Initialize pointer MVKH bbdata, *A15, Load value bbdata=12345555h 12345555h EXTU Extract "34" 12345555h, 00000034h EXTU Extract "12" 00000034h, 00000012h Shift**/align make "3400" 00003400h, 00000012h Swap adding 00003400h, 00003412h *A15 Store "3412"
Byte Swapping (EXTU* does shifts cycle).
Dynamic EXTU (with registers) shown Table Lookup
*See TMS320C62xx Instruction Reference Guide 3-55. **See TMS320C62xx Instruction Reference Guide 3-94.
After loading pointer value last example codes, EXTU (the means unsigned) pulls appropriate bits specified this case constants. first bolded EXTU pulls "34" saves register while second bolded EXTU pulls "12" saves register. "34" then left-shifted make "3400" added "12". other processors without instruction, values must masked off. Using instruction, first number denotes many bits left throw out. slight wrench system fact that second number denotes many bits right throw PLUS many bits left already thrown out. That's right. must specify bits left twice. This because operation accomplished with shifts cycle. After shift left throw bits left, must shift right same distance return original position start shifting right, want right-justified answer destination register. some operations, such optimized byte-swap, might want answer right justified.) This instruction graphically explained TMS320C62xx Instruction Reference Guide, 3-55. Again, note that value dynamic register (this shown Application Example section). Again, remember that this example UNOPTIMIZED. (This operation accomplished cycles instead cycles parallelizing EXTU with them right justifying "34" first cycle then adding second cycle.) data only item that might need manipulated. Addresses also often need some manipulation, discussed following section.
Using TMS320C6x Non-Traditional Applications
Address Manipulation
Manipulation section, said that manipulation often performed data. Theoretically, perform manipulation addresses also, treating data. This section describes treats addresses. Most processors provide specific, separate address registers allow "pointer"-type address manipulation. contrast, C6x, "general-purpose" registers used address/pointer registers.
Table Parsing
Table parsing/lookup important allow base-pointer register setup, from which offsets applied jump through table. C3x/C4x does this well, only C54x came close fixed-point processors that constant immediate) modify usually only good stacks. Figure shows contrived example table lookup summing some Pythagorean triples. base register with address "8000h" offset indicated value "[]".
Figure Table Parsing Example
Address Manipulation
Table Parsing (base address offset) single cycle (and using byte addressability).
(le) 8000h 8001h 8002h 8000h 8003h 8004h 8005h 8006h 8007h 8008h
Although this example complex, shows capability byte addressing that DSPs other than (and dynamic memory only) support. corresponding code itself shown Figure
Using TMS320C6x Non-Traditional Applications
Figure Table Parsing Code
Address Manipulation
Table Parsing (base address offset)* single cycle.
.text Table look (base address offset) Pythagorean Triples c^2=a^2 calculation tablel: table, Initialize pointer MVKH table, *A15[0], *A15[1], *A15[2] Load Load 00000009h, 00000010h Calculate 000019h Store table: .word
table:
.data .byte .byte
(3^2) (4^2) (5^2) (6^2) (8^2) (10^2)
Dynamic parsing/addressing available with registers.
*See TMS320C62xx Instruction Reference Guide 3-20.
previous examples pointer, bbdata loaded into register using MVK/MVKH instructions. value bbdata loaded into register using instruction show byte addressibility), with *A15 acting pointer register. pipeline considerations. Remember that code examples UNOPTIMIZED. This contrived example reads squares table bytes adds them together. resulting value then written over initialized "zero", again byte. index each element denoted "[]" instruction. Remember that pure load/store architecture, only instructions perform address accesses. Thus, "*"with only these instructions. This method also works well manipulating registers dedicated peripheral (such McBSP DMA). main peripheral control register often comes first memory map, which used base. Other secondary peripheral registers used offsets. this example, offsets constant (immediate) offsets that derided beginning this section. following section shows only dynamic example also other features.
Using TMS320C6x Non-Traditional Applications
Link Lists
Dynamically calculating pointer addresses often constitute programming practice often used extensively real-time processor code. This example first calculates initial address linked list (dynamically). then shows pointer access accomplished using same register feature mentioned Address Manipulation section.). finally, example shows subtle feature link list circular with instruction C6x. Figure shows example.
Figure Link List Example
Address Manipulation
Pointer/address calculation (dynamic) including link lists (example becomes circular after initial calculation)
(le)
xptr
(80008000h)
yptr
(le)
zptr
(80008200h)
xptr
Initial Pointer Calculation
firstlnk (80000000h)
(le)
yptr
(80008100h)
zptr
value xptr initially dynamically calculated (and forced 80008000h) then link list points next location circular fashion. Note that each "ptr" could arbitrary place memory that just points next "ptr" arbitrary place memory. Figure shows corresponding code:
Using TMS320C6x Non-Traditional Applications
Figure Link List Code
Address Manipulation
Link lists using fact that registers used both calculation/general purpose pointer/address functions.
.text Circular three element link list load llcirc: firstlnk, MVKH firstlnk, 08000h, MVKH circ: Initialize firstlnk pointer Hand calc xptr offset Clear upper bits ;A1=8000h, A15=80000000h firstlnk (bad programming practice) xptr=80008000h Load next link
A15,
*A15, yptr, zptr, xptr, yptr. circ Repeat infinitely .data firstlnk .word firstlnk .sect "ptrs" xptr .word yptr yptr .word zptr zptr .word xptr
80000000h 80008000h 80008100h 80008200h
pointer "firstlnk" initialized data memory 80000000h push example) loaded with MVK/MVKH. Then address that pointer "firstlink" points 8000h added hard-coded address hard-coded linker command file) xptr. (There technically should separate "ptrs" section with .sect directive EACH pointer accurate addresses shown comments .sect directive above). Then overwrites present pointer with next circular endless-loop fashion. This single instruction pointer update/overwrite possible because registers (A0-A15 B0-B15) BOTH calculation general-purpose) address auxiliary) registers. other TMS320 this. Thus, cycles wasted moving value from general-purpose register/accumulator address/auxiliary register. address data manipulation only features that does well. execution code, especially decision execution, should next examined.
Using TMS320C6x Non-Traditional Applications
Decision Execution (Conditionally Execute Advantages Over Test/Branch)
Much non-traditional code involves "controller", housekeeping-type functions that often involve decision trees with testing branches. disadvantage this many DSPs (and other microprocessors) branching overhead caused deep pipelines. Previous TMS320 DSPs needed cycles overhead branch (sometimes overhead reduced with delayed branch instruction). overhead cycles traditional sense, delay slots used. (Microcontrollers often have shorter pipelines much slower cycle times, overall execution speed much worse than with DSP.) Often delay slots help when there tight data dependencies; that when next decision based very operations following results last decision. Such configuration inherently inefficient. option optimally execute decisions with tight data dependencies uses feature which every instruction conditionally executed. This option presents linear, non-branching method achieving these decision trees. concept shown Figure
Figure Decision Execution Concepts
Conditionally Execute Parallel) Advantages over Test/Branch
Conditionally Execute
test Branch test Branch
Conditionally Execute Conditionally Execute
test Branch
Branch
Instead classic "bit test branch" that flushes pipeline each decision, shown left side diagram, either execute instruction based condition. This method avoids branching overhead.
Using TMS320C6x Non-Traditional Applications
Other microprocessors, often RISCs, this methodology. Some, such Intel IA64, execute both legs branch ahead time until determined which will used, which point other voided. course, this method expensive hardware. method software-based less expensive hardware.
Comparator Example
real-world example illustrate concept saturation input signal seen Figure using comparator function C6x.
Figure Decision Execution Example (Comparator)
Comparator Example (unsigned)
Analog positive rail 16-bit Inputs Digital positive rail FFFF 8000 Digital
negative rail Digital positive rail FFFF 8000
negative rail
Compare Saturate Analog positive rail
Int/Hex
16-bit
negative rail Outputs Int/Hex
negative rail
this example system, analog signal converted digital, resulting 16-bit unsigned value. voltage (that 8000h), will saturated maximum positive unsigned value 0ffffh C6x. voltage (that 8000h) will saturated minimum negative unsigned value 0ffffh C6x. digital signal then converted analog. This example nothing fancy does allow compare styles decision execution. more classical "bit test branch" shown Figure implemented assembly with conditional branch instruction (called BCND other TMS320 DSPs).
Using TMS320C6x Non-Traditional Applications
Figure Decision Execution Code (Bit Test/Branch)
Test/Branch
Every instruction conditional Instead typical "bit test branch" with much pipeline overhead (using registers constants):
OLD: CMPGT *A15, 8000h, 0000h, A4=0ffffh Load value Test greater than (8000h) a000h then 00000001h 2000h then 00000000h branch (pos sat)
[A1]
possat LOOP LOOP
negsat: possat:
not, fall thru clear (neg sat) 00000000h
(pos sat) 0000ffffh
Note that some values have been pre-loaded into that register operation used bits (For brevity have omitted every MVK/MVKH seen previous examples). data LDWed (into register tested (with register 8000h. result written register branch conditioned [A1] (other TMS320 DSPs have specific "branch conditional" instruction). value 8000h, code branches possat: positively saturates value. value 8000h, falls through branch negsat: negatively saturates value. Often design code condition that statistically happen more often will fall through, although this applicable, comparing sine waves. count cycles execute loop once 1(LDW)+ 4(NOP) 1(AND) 6(B) 1(AND/OR) cycles. improve this number? "conditionally execute" method more conducive shown Figure
Using TMS320C6x Non-Traditional Applications
Figure Decision Execution Code (Conditional Execute)
Conditional Execute
operations "Conditional Execute" method saving pipeline overhead:
NEW: *A15, 8000h, 0000h, A4=0ffffh Load value
negsa: [!A1] possa: [A1]
Mask 0/!0 check a000h then 00000001h 2000h then 00000000h !=1, clear (neg sat) 00000000h (pos sat) 0000ffffh Tight loop
LOOP:
LOOP
Note that this code example also, some values have been pre-loaded into that register operation used bits. (Again, brevity have omitted every MVK/MVKH seen previous examples). Again data LDWed this time tested doing operation with value 8000h that will result either register. (The reason using "and" along with other optimization methods discussed section, Optimization Methods/RationalesASIC/FPGA. CMPGT would have been just valid). Then used conditional test negative positive saturation, identical Figure conditions mutually exclusive; thus, executed while other becomes NOP. branches needed (The tight loop just meant give example). This code equivalent that seen Figure think about benchmarking number cycles execute code. count cycles execute loop once 1(LDW) 4(NOP) 1(AND) 2(AND/OR) cycles. Again, improve this number?
Optimizing Code (Parallelism Unit Utilization)
Examining code, that positive negative saturation instructions have data dependencies between them (for more information data dependencies, TMS320C6000 Programmer's Guide, literature number SPRU198). Thus nothing prevents from executing them same time. start optimizing code adding "||" code perform negative positive saturation same cycle. Again note that conditions mutually exclusive; thus, executed while other becomes parallel, shown Figure
Using TMS320C6x Non-Traditional Applications
Figure Decision Execution Code (Conditional Execute Parallel)
Conditional Execute Parallel)
also start parallelizing code, start more functional units (except multiplier?) take seven cycles:
NEW: negsapossa: [!A1] [A1] LOOP: *A15, LOOP LOOP Load value Mask 0/!0 check !=1, clear (neg sat) (pos sat)
Tight loop
Note that mutually exclusive conditionals (like [!A] [A1]) always have conditional acting NOP.
second issue concerned about unit resources. Because unit cannot used twice same cycle, must also unit, shown Figure count cycles execute loop once 1(LDW) 4(NOP) 1(AND) 1(AND/OR) cycles. interesting note that cost branching that unit becomes NOP. Thus, could almost that instead losing cycles from Figure units cycles potential units), "lose" only unit instead forty-eight. consider parallelize using more units. This accomplished bringing values time, keeping them separate sides, executing parallel. Each units load value into respectively first cycle wait appropriate NOPs. Each units test each values write result into registers, respectively, sixth cycle. Then conditionally positively saturate values using units, even trick conditionally negatively saturate values using units (multiply value "0") seventh cycle. code Figure
Using TMS320C6x Non-Traditional Applications
Figure Decision Execution Code (Conditional Execute Parallel-II)
Conditional Execute Parallel)
Better yet, bring values saturated, parallelize algorithm, execute seven cycles (but doubling throughput cycles/val), even multiplier clear) shown below:
nosw: *A15, *B15, 0Fh, 0Fh, 00h, 00h, Load value Load value
[A1] [B1] ||[!A1] ||[!B1]
Mask 0/!0 check Mask 0/!0 check LSN=Fh (pos sat) LSN=Fh (pos sat) !=1, clear LSN=0 (neg sat) !=1, clear LSN=0 (neg sat)
Note that this 4-bit "nibble" saturation.
Technically, units have latency negatively saturated values would ready until eighth cycle. Nevertheless, counting "||" combinations, number cycles comes values, thus averaging cycles value. Note that example simplified doing nibbles could stick with constants. Using registers possible, resource conflicts will start appear Figure spread accesses among registers. Finally, software pipelining, kernel shown Figure possible (for more information, TMS320C6000 Programmer's Guide, literature number SPRU198).
Using TMS320C6x Non-Traditional Applications
Figure Decision Execution Code (Conditional Execute-Software Pipelined)
Conditional Execute (with Pipeline)
Best yet, bring values saturated, heavily software pipelined, execute single cycle, even multiplier clear) shown below:
PIPED LOOP PROLOG PIPED LOOP KERNEL *A15, Load value *B15, Load value Mask 0/!0 check Mask 0/!0 check [A1] 0Fh, LSB=Fh (pos sat) [B1] 0Fh, LSB=Fh (pos sat) ||[!A1] 00h, !=1, clear LSB=0 (neg sat) ||[!B1] 00h, !=1, clear LSB=0 (neg sat) PIPED LOOP EPILOG
With prolog epilog, this code would running 1600 MIP's, except that unit left looping!
After some prolog initialize pipeline, above kernel uses eight units execute samples cycle. Then some epilog code often needed gracefully exit from kernel. This method allows values loaded, compared, saturated single cycle, assuming, course, appropriate prolog epilog code. This eliminates "NOP following "LDW" seen previous code examples. Thus, cycles theoretical maximum values could processed, with prolog/epilog overhead probably more like 55-60 cycles. Thus, effective benchmark cycle values cycles value. unit Figure available looping. Thus, there ways repeat this instruction, example, times. method dual-cycle loop that will cause take 105-110 cycles (for more information, TMS320C6000 Programmer's Guide, literature number SPRU198). second method unroll loop. other words, repeat/copy times, have available code space. Thus, "loop" benchmark remains within 55-60 cycles with classic code size speed tradeoff.
Optimization Methods/RationalesASIC/FPGA
section, Optimizing Code (Parallelism Unit Utilization), Figure shows 8000h test performed using CMPGT instruction. Figure through Figure equivalent test could done using instruction. Such method chosen allow flexibility later unit allocation instructions because CMPGT only available units. Because available units units, using this equivalent test makes later flexibility allocation units possible.
Using TMS320C6x Non-Traditional Applications
compiler uses similar trick when tests value being equal something. i==5) could tested subtracting from variable testing 0!/0. Because this operation available units (.L, .D's), gives greater flexibility unit allocation. feature "conditionally executes" allowing (such mutually exclusive conditions) when operation performed feature parallelism architecture offers interesting observation. This architecture allows operation sequential execution mode with very little branches many conditionals, similar dataflow seen FPGA ASIC. These functions could lockstep very fast speeds much easier program/route than when implemented FPGA/ASIC.
Decision Execution Cycle Summary
Thus, summarize cycle savings Table (please bear with relative levels optimization that were presented academically concepts across):
Table Execution Decision Cycle Summary
Coding Style test branch (Figure Conditionally execute (Figure Conditionally execute with parallel saturate (Figure Dual value conditionally execute with parallel saturate (Figure Software pipelined dual value conditionally execute with parallel saturate (Figure Cycles ~0.5
that have seen specifics heart architecture assembly, advanced tools help make using this architecture easier.
Application Example
section, CPU/Instruction Features With Code Examples, examined various specific features architecture, albeit written assembly. Often programmer, especially starting out, does want involved intricacies certain CPU's assembly language. Thus, they write ANSI produce portable, general code. There various code optimization levels between ANSI pure assembly (intrinsics, callable assembly, etc.) that will fully explored with benchmarks future application report with code benchmarks. this section, write something unique called "linear assembly" through code-generation tool called "assembly optimizer" (for more information, TMS320C6000 Optimizing Compiler User's Guide, literature number SPR187). presented example just first pass non-traditional application.
Using TMS320C6x Non-Traditional Applications
Table Lookup Example Description
examine certain networking lookup algorithm implemented implemented TNETX15VE address lookup engine. course, additional optimizations possible hand assembly, assembly optimizer tool accomplish some functions have mentioned. algorithm explained Figure full code listed Appendix
Figure Table Lookup Example Description
Table Lookup Example using EXTU (Algorithm)
Code Summary (assume setup already):
Input value lookup. Traverse through table 6-bit chunks. Read pointer value/linklist next lookup. iteration loop (32/6~=6). Written linear assembly (using optimizer). Example Steps
Load 0851C928h into register. Base=table= 80000000h. Extract using EXTU instruction first bits offset. offset base 80000000h 80000002h. Load value 80000002h 01h. base table (value<<6) 80000000h (40h). Internal Memory
0x80000000
80000002h
0x80000040
80000045h
0x80000080
80000087h
0x800000C0
800000C9h
0x80000100
8000010ah
0x80000140
80000140h
Extract using EXTU instruction next bits =5h, offset. value 0851C928h: offset base 80000040h 80000045h values Load value 80000045h 02h. base table (value<<6) 80000000h (80h). Repeat from EXTU more times.
binary
code summary gives overview what code does, while example steps through contrived actual data value used. actual data value displayed lower right-hand hex, binary, 6-bit values coded hex. table, hard-coded internal memory, displayed upper right side graphic. specific initialized values (along with their addresses) used this contrived example displayed boxes scale. This example shows EXTU instruction. assumed that lookup table built, code benchmarks apply processing 32-bit value. algorithm ended being six-iteration loop. Loops obviously good DSPs. More iterations would helpful would require buffering much more data system level bytes packet). other words, bytes buffer space iterations loop needed. Thus, thousand iterations, would need (1000/6) 333K bytes, which prohibitive some systems.
Using TMS320C6x Non-Traditional Applications
Table Lookup Example Code
initialization code written initialization code should actual lookup function could ANSI with intrinsics, linear assembly, pure assembly. Figure shows both main code beginning called linear assembly function named "iploop" (The code Figure actually resides separate files. code ".c" file. linear assembly ".sa" file that stands "serial assembly".)
Figure Table Lookup Example Code Initialization
main() Init pointer data *llptr; data 0x0851C928; //Assign 0x80000000 (reserved linker programming practice) call llptr (int 0x80000000; ipploop (llptr, data); main _ipploop:.cprocllptr, data .regcount, cstal, cstbr, cstfinal .reg base, offset count; init cnount mvk0, cstal; init shift cstbr; mvk0, base init base
called linear assembly function from calling function resembles function with passable parameters return value. half shows code that hard-codes pointer internal memory location 0x80000000 (and allocates memory using linker) with pointer. Then function called, function, with passed parameter pointer data value. When using .cproc, called linear assembly function understands passed parameters from calling function linear assembly function. bottom half shows linear assembly function file parameters received used function along with some initializations. Figure shows iploop() function written linear assembly that appears same file shown Figure (for more information, TMS320C6000 Optimizing Compiler User's Guide, literature number SPR187).
Using TMS320C6x Non-Traditional Applications
Figure Primary Lookup Table Loop Linear Assembly
loop: build cstal, cstfinal cstal, cstfinal, cstfinal; annoying cstbr, cstfinal, cstfinal llptr, base, base base with llptr extu data, cstfinal, offset offset base, offset, base base *base, offset next offset
update base offset, base ;increment cstal cstbr cstal, cstal cstbr, cstbr [count] [count] count, count loop
offset->base
.return count
Linear assembly allows mnemonics with symbolic (including passed parameter) values. written "you think without optimization software pipelining. Figure shows meat code. EXTU instruction meat loop. extracts bits from data value offset looks base next table location. EXTU used dynamically cstal cstbr variables specify which bits extract. They pasted together into register cstfinal beginning loop updated toward end. loop counter operation needed seen last lines code before return. Note that return value merely confirms that loop executed. after code through assembly optimizer, pure assembly automatically generated. Figure shows kernel optimized assembly that would reside ".asm" file. Note that epilog prolog have been omitted brevity that little time spent optimizing this looking data dependencies.
Using TMS320C6x Non-Traditional Applications
Figure Primary Lookup Table Loop Pure Assembly
PIPED LOOP KERNEL EXTU A0,A5,A5 A4,A6,A6 base with llptr offset
B0,0x1,B0 A5,A6,A5 base *A5,B4 next offset
0x6,A7,A7 A7,0x5,A6 A3,0x6,A3 A7,A6,A6 annoying
.S1X B4,0x6,A5 offset->base A3,A6,A6
Thus, Figure shows assembly optimizer generated assembly code softwarepipelined kernel. have think about software pipelining optimization because done you. code clearly shows number cycles required. count cycles data dependencies follow sets parallel bars.
System Discussion-C6x DMAs Data (Eliminate Components)
mentioned earlier, certain architectural features that make powerful operation "dataflow" applications. addition, provides efficient configuration bringing data on-chip taking data off-chip without much overhead. Also, internal memory allows elimination expensive external device I/Os, such FIFOs. networking data mover typical example shown Figure
Using TMS320C6x Non-Traditional Applications
Figure C6x) Architecture
C6x) Architecture
Router
FIFO FPGA
FIFO
FIFO
FIFO
Mbit/s/32=25 Quad
Physical Layer (Phy) 10/100Mbit Mbit/s
Physical Layer (Phy) 10/100Mbit Mbit/s
Physical Layer (Phy) 10/100Mbit Mbit/s
Physical Layer (Phy) 10/100Mbit Mbit/s
Let's eliminate FIFO's FPGA!
this networking example, physical layer (PHY) akin speech codec typical system. media access controller MAC) receives digital data from Ethernet wire would) sent router (which could imagine like host, much faster) through FIFOs FPGA. Everything running fast with many parts bi-directional manner. maximum size Ethernet packet 1538 bytes. Figure shows substituted FIFOs Figure
Using TMS320C6x Non-Traditional Applications
Figure Architecture (Size)
DMA's Dataflow (memory size)
EMIF Router Program Data MCSP MCSP
TMS320C62xx
Size Ethernet Packet: 1538 bytes->(2K*4MAC*2 bi-dir)=16K Quad chip 16K*2=32K (for efficient ping pong)
Physical Layer (Phy)
Physical Layer (Phy)
Physical Layer (Phy)
Physical Layer (Phy)
internal memory able replace FIFOs. Size-wise there easily enough internal memory entire maximum Ethernet packet size 1538 bytes into each direction (discussed Figure total C6201B silicon. Because internal memory well partitioned C6201B silicon, doubling buffer size with ping-pong approach would cause less CPU/DMA conflicts. Figure shows DMA/EMIF replaces FPGA addresses speeds bandwidths necessary system operation.
Using TMS320C6x Non-Traditional Applications
Figure Architecture (Speed)
DMA's Dataflow (speed)
EMIF Router Data Program
TMS320C62xx
Function: does data moving work. MCSP What does that conveniently path Protocol conversion, VOIP switch, repeater, encryption, compression, echo cancellation.
MCSP
Speed:
Quad Physical Layer (Phy) Physical Layer (Phy)
[(100Mbit/ s)*4 MAC's*2dir]/32bits Unidirectional*2= Bidirectional turnaround?
Physical Layer (Phy)
Physical Layer (Phy)
four channels give elegant solution each direction each "ports" that hooking Speed-wise, might have trouble keeping presently uncharacterized "bus turnaround" issues bi-directional manner. enhance discussion, modify have second have C6202, some enhancements system architecture made, shown Figure
Figure C6202 Architecture With Second
Second parallel EMIF) would speed unidirectional systems eliminating "bus turnaround" overhead. Each handles direction.
Parallel Data C6202 Process Parallel Data
Second parallel EMIF) would simplify bidirectional systems interface logic providing second "port" parallel access.
Router Data In/Out C6202 Process Quad Data In/Out
Using TMS320C6x Non-Traditional Applications
second parallel adds some major advantages system interfacing only reducing bandwidths also
simpler uni-directional system, there turnaround overhead because side writing side reading. more complex bi-directional system, second provides second "port" parallel access simpler decode (see Figure 23).
latter looks like router described this section.
Conclusion
TMS320C6x CPU/architecture variety features attractive non-typical functions, especially dataflow/"virtual FPGA"-type architecture. preferable write much this code linear assembly because compiler does comprehend these features. four-channel provides attractive architecture such dataflow applications (the second parallel appropriate uni- bi-directional applications).
Using TMS320C6x Non-Traditional Applications
Appendix Table Lookup Code
following code used example described section, Application Example. following command lines were used invoke tools:
cl6x ipp.c cl6x ipploop.sa asm6x ipptab.asm lnk6x ipp.cmd
ipp.c
#include #include #include #include
extern void ipploop();
main() *llptr; data 0x0851C928; Could hack making this 0x80000000 simulator this works llptr (int 0x80000000; ipploop (llptr, data); main
iploop.sa
Texas Instruments, Inc. Linear Assembly perform Packet Parsing Executive Author: David Alter, PhD.
Using TMS320C6x Non-Traditional Applications
Author: George Date: 02/02/98 Description: Requirements: Parameters: Return: finds doesn't llptr Table parse Parse header chunks
.def
_ipploop
_ipploop:
.cproc llptr, data
.reg .reg
count, cstal, cstbr, cstfinal base, offset
mvk0,
count
init cnount
cstal init shift cstbr
mvk0, base
init base
build loop: shlcstal, cstfinal addcstal, cstfinal, cstfinal; annoying addcstbr, cstfinal, cstfinal addllptr, base, base base with llptr
Using TMS320C6x Non-Traditional Applications
extu
data, cstfinal, offset; offset
addbase, offset, base base ldb*base, offset next offset
update base shloffset, base offset->base
;increment cstal cstbr addcstal, cstal subcstbr, cstbr
[count] [count]
subcount, count loop
.return count
.endproc
ipp.cmd
lnk.cmd v1.00 Texas Instruments Incorporated
Copyright 1996-1997
-heap 0x2000
-stack 0x0800
Link Command file test code
ipp.out ipp.map
ipp.obj ipploop.obj ipptab.obj
Using TMS320C6x Non-Traditional Applications
c:\dsp\c6x\c6xc\lib\rts6201.lib
MEMORY VECS: 00000000h 00400h reset interrupt vectors
PMEM: 00000400h 0FC00h intended initialization LTABLE0: 80000000h 0003Fh table LTABLE1: 80000040h 0003Fh table LTABLE2: 80000080h LTABLE3: 800000C0h LTABLE4: 80000100h LTABLE5: 80000140h 0003Fh table 0003Fh table 0003Fh table 0003Fh table
BMEM: 80008000h 08000h /*.bss, .system, .stack, cinit
SECTIONS vectors .text lnktable0 lnktable1 lnktable2 lnktable3 lnktable4 lnktable5 .tables .data .stack .bss .sysmem .cinit .const .cio .far VECS PMEM LTABLE0 LTABLE1 LTABLE2 LTABLE3 LTABLE4 LTABLE5 BMEM BMEM BMEM BMEM BMEM BMEM BMEM BMEM BMEM
Using TMS320C6x Non-Traditional Applications
ipploop.asm (tool generated)
TMS320C6x ANSI Codegen 1.10 Date/Time created: 14:12:41 1998 Version
GLOBAL FILE PARAMETERS Architecture Endian Memory Model TMS320C6200 Little Small
Redundant Loops Enabled Pipelining Debug Info Enabled Debug
.set .set .set
.file "ipploop.sa"
Texas Instruments, Inc.
Linear Assembly perform Packet Parsing Executive Author: David Alter, PhD. Author: George Date: 02/02/98
Using TMS320C6x Non-Traditional Applications
Description: Requirements: Parameters: Return: finds doesn't llptr Table parse Parse header chunks
.def
_ipploop
.sect ".text" .align .sym _ipploop,_ipploop,36,2,0
.func
FUNCTION NAME: _ipploop Regs Modified Regs Used A0,A1,A3,A4,A5,A6,A7,B0,B4,B5
_ipploop: _ipploop: .cproc llptr, data .reg .reg .sym .sym count, cstal, cstbr, cstfinal base, offset
llptr,0,4,4,32 data,4,4,4,32
.line
Using TMS320C6x Non-Traditional Applications
.L1X
B4,A4 A4,A0
.sym .sym .sym .sym .sym .sym
count,16,4,4,32 cstal,7,4,4,32 cstbr,3,4,4,32 cstfinal,6,4,4,32 base,5,4,4,32 offset,6,4,4,32
.line .line .line 0x1a,A3 0x0,A7 init shift 0x6,B0 init cnount
.line CMPGTU BRANCH OCCURS loop: .line A7,0x5,A6 .L1X 0x0,A5 B0,1,A1 init base
.line A7,A6,A6 annoying
.line A3,A6,A6
.line A0,A5,A5 base with llptr
.line EXTU A4,A6,A6 offset
.line A5,A6,A5 base
.line *A5,A6 next offset
Using TMS320C6x Non-Traditional Applications
.line
A6,0x6,A5
offset->base
.line 0x6,A7,A7
.line A3,0x6,A3
.line B0,0x1,B0
.line BRANCH OCCURS BRANCH OCCURS CSR,B5 -2,B5,B4 loop
B4,CSR B0,1,B0
PIPED LOOP PROLOG A7,0x5,A6 A7,A6,A6 A3,A6,A6 annoying
PIPED LOOP KERNEL
EXTU
A0,A5,A5 A4,A6,A6
base with llptr offset
B0,0x1,B0
Using TMS320C6x Non-Traditional Applications
A5,A6,A5
base
*A5,B4
next offset
0x6,A7,A7 A7,0x5,A6
A3,0x6,A3 A7,A6,A6
annoying
.S1X
B4,0x6,A5 A3,A6,A6
offset->base
PIPED LOOP EPILOG
EXTU
A0,A5,A5 A4,A6,A6
base with llptr offset
A5,A6,A5 *A5,B4
base next offset
0x6,A7,A7
.S1X
A3,0x6,A3 B4,0x6,A5
offset->base
B5,CSR
.line BRANCH OCCURS L10:
Using TMS320C6x Non-Traditional Applications
.line .L1X B0,A4
BRANCH OCCURS .endfunc 64,000000000h,0
.endproc
ipptab.asm
.global ippacket .sect data Table USAGE This table Packet Revision Data: 04/22/97 TEXAS INSTRUMENTS, INC.
ippacket:
values value
.word
.sect "lnktable0" t000: .byte .byte
Using TMS320C6x Non-Traditional Applications
.byte 01h; Packet .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte
.sect "lnktable1" t001: .byte .byte .byte .byte .byte .byte 02h; Packet .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte
.sect "lnktable2" t002: .byte .byte
Using TMS320C6x Non-Traditional Applications
.byte .byte .byte .byte .byte .byte 03h; Packet .byte .byte .byte .byte .byte .byte .byte .byte
.sect "lnktable3" t003: .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte 04h; Packet .byte .byte .byte .byte .byte .byte
.sect "lnktable4" t004: .byte .byte .byte
Using TMS320C6x Non-Traditional Applications
.byte .byte .byte .byte .byte .byte .byte .byte 05h; Packet .byte .byte .byte .byte .byte
.sect "lnktable5" t005: .byte 00h; Packet .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte .byte
.end
Using TMS320C6x Non-Traditional Applications
Contact Numbers
INTERNET Semiconductor Home Page www.ti.com/sc Distributors www.ti.com/sc/docs/distmenu.htm PRODUCT INFORMATION CENTERS Americas Phone +1(972) 644-5580 +1(972) 480-7800 Email sc-infomaster@ti.com Europe, Middle East, Africa Phone Deutsch +49-(0) 8161 3311 English +44-(0) 1604 3399 +34-(0) Francais +33-(0) 1-30 Italiano +33-(0) 1-30 +44-(0) 1604 Email epic@ti.com Japan Phone International +81-3-3344-5311 Domestic 0120-81-0026 International +81-3-3344-5317 Domestic 0120-81-0036 Email pic-japan@ti.com
Asia Phone International +886-2-23786800 Domestic Australia 1-800-881-011 Number -800-800-1450 China 10810 Number -800-800-1450 Hong Kong 800-96-1111 Number -800-800-1450 India 000-117 Number -800-800-1450 Indonesia 001-801-10 Number -800-800-1450 Korea 080-551-2804 Malaysia 1-800-800-011 Number -800-800-1450 Zealand 000-911 Number -800-800-1450 Philippines 105-11 Number -800-800-1450 Singapore 800-0111-111 Number -800-800-1450 Taiwan 080-006800 Thailand 0019-991-1111 Number -800-800-1450 886-2-2378-6808 Email tiasia@ti.com
trademark Texas Instruments Incorporated. Other brands names property their respective owners.
Using TMS320C6x Non-Traditional Applications
IMPORTANT NOTICE Texas Instruments subsidiaries (TI) reserve right make changes their products discontinue product service without notice, advise customers obtain latest version relevant information verify, before placing orders, that information being relied current complete. products sold subject terms conditions sale supplied time order acknowledgement, including those pertaining warranty, patent infringement, limitation liability. warrants performance semiconductor products specifications applicable time sale accordance with TI's standard warranty. Testing other quality control techniques utilized extent deems necessary support this warranty. Specific testing parameters each device necessarily performed, except those mandated government requirements. CERTAIN APPLICATIONS USING SEMICONDUCTOR PRODUCTS INVOLVE POTENTIAL RISKS DEATH, PERSONAL INJURY, SEVERE PROPERTY ENVIRONMENTAL DAMAGE ("CRITICAL APPLICATIONS"). SEMICONDUCTOR PRODUCTS DESIGNED, AUTHORIZED, WARRANTED SUITABLE LIFE-SUPPORT DEVICES SYSTEMS OTHER CRITICAL APPLICATIONS. INCLUSION PRODUCTS SUCH APPLICATIONS UNDERSTOOD FULLY CUSTOMER'S RISK. order minimize risks associated with customer's applications, adequate design operating safeguards must provided customer minimize inherent procedural hazards. assumes liability applications assistance customer product design. does warrant represent that license, either express implied, granted under patent right, copyright, mask work right, other intellectual property right covering relating combination, machine, process which such semiconductor products services might used. TI's publication information regarding third party's products services does constitute TI's approval, warranty, endorsement thereof. Copyright 1999 Texas Instruments Incorporated
Using TMS320C6x Non-Traditional Applications

Other recent searches


WP7083SED - WP7083SED   WP7083SED Datasheet
GI338 - GI338   GI338 Datasheet
CMOZ43V - CMOZ43V   CMOZ43V Datasheet
BHC4103SS - BHC4103SS   BHC4103SS Datasheet
73S8009R - 73S8009R   73S8009R Datasheet
501800to2000MHz - 501800to2000MHz   501800to2000MHz Datasheet
1728800000 - 1728800000   1728800000 Datasheet

 

Privacy Policy | Disclaimer
© 2012 Datasheet Archive