| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
Eduardo Asbun Chiouguey Chen Texas Instruments, Inc. Abstract
Top Searches for this datasheetImplementation MPEG-4 Motion Compensation Using TMS320C62x Eduardo Asbun Chiouguey Chen Texas Instruments, Inc. Abstract This application report describes implementation MPEG-4 motion compensation Texas Instruments (TITM) TMS320C62x digital signal processor (DSP). MPEG-4 standard coding audiovisual information being developed Motion Picture Experts Group (MPEG). MPEG-4 became International Standard December 1998. Motion compensation basic component MPEG-4 other video compression standards such MPEG-1, MPEG-2, H.261, H.263, H.263+. Code development flow increase performance discussed. Implementation issues, such memory access pattern code size versus performance, also examined. Digital Signal Processing Solutions August 1999 Contents Introduction TMS320C62x Fixed-Point Block-Based Video Compression Motion Estimation Motion Compensation.4 Implementing Motion Compensation 'C62x.5 Case Integer Accuracy Case Half-Pixel Accuracy Horizontal Direction Case Half-Pixel Accuracy Vertical Direction Case Half-Pixel Accuracy Both Horizontal Vertical Directions.10 Code Benchmarks.11 Conclusions References.13 Appendix Motion Compensation Code: Case Appendix Motion Compensation Code: Case Appendix Motion Compensation Code: Case Appendix Motion Compensation Code: Case Appendix Complete Code Motion Compensation.47 Figures Figure Figure Figure Figure Figure 4:2:0 Chrominance Subsampling Motion-Compensated Prediction Bilinear Interpolation Scheme.7 Four Possible Memory Alignments Reference Block.8 Example Data Manipulation Case Examples Example Example Example Example Language Implementation Case Language Implementation Case Language Implementation Case C.10 Language Implementation Case D.10 Implementation MPEG-4 Motion Compensation Using TMS320C62x Introduction This application report describes implementation MPEG-4 motion compensation TMS320C62x. MPEG-4 ISO/IEC standard coding audiovisual information being developed Motion Picture Experts Group (MPEG). Tools being specified support functionalities such interactivity between application, universal accessibility, high degree compression. Motion compensation basic component MPEG-4 video other video compression standards such MPEG-1, MPEG-2, H.261, H.263, H.263+. Therefore, work presented this application report relevant implementations these standards `C62x. TMS320C62x Fixed-Point TMS320C62x devices fixed-point DSPs that feature VelociTIarchitecture [2]. VelociTI architecture high-performance, advanced, very-long-instructionword (VLIW) architecture developed Texas Instruments. VelociTI, together with development tool evaluation tools, provides faster development time higher performance embedded applications through increased instruction-level parallelism. DSPs such `C62x well suited implementation motion compensation. This application presents high degree parallelism that exploited meet realtime requirements typical video conferencing applications. operations required motion compensation, such bilinear interpolation, readily implemented fixedpoint DSP. Block-Based Video Compression Contiguous frames video sequence have high degree temporal correlation, they taken small intervals time (typically 10-30 frames second). order achieve significant ratio compression, temporal redundancy between frames exploited using compression technique known interframe coding. described Motion Estimation Motion Compensation section, frame selected reference, subsequent frames predicted from Interframe coding, however, does work well frames that exhibit temporal correlation, such scene changes. this case different model used, where spatial correlation between adjacent pixels exploited. This technique known intraframe coding. Pixels transformed into another domain order reduce redundancy compact energy signal. Discrete Cosine Transform Wavelet Transform, among others, have been used this purpose. color space used this implementation MPEG-4 Motion Compensation with 4:2:0 chrominance subsampling. chrominance components subsampled factor both horizontal vertical directions, shown Figure pixel uses byte, take value between 255. Implementation MPEG-4 Motion Compensation Using TMS320C62x Figure 4:2:0 Chrominance Subsampling purpose motion estimation compensation, frame partitioned into macroblocks, arrays nonoverlapping luminance pixels together with block spatially corresponding pixels each chrominance components macroblock luminance pixels composed four blocks pixels. MPEG-4 makes provisions handle variety frame input formats. implementation targets Common Intermediate Format (CIF), pixels luminance component. Other formats (QCIF, 4CIF, ITU-R 601, etc) supported frame size input parameter program. CIF-sized 4:2:0 frame occupies bytes component, bytes each chrominance components, total bytes. Pixels frame stored row-wise memory; that pixel (i,j) adjacent memory pixels (i,j-1) (i,j+1). luminance component stored first, followed (chrominance) components. Motion Estimation Motion Compensation movement macroblock from reference frame current frame determined using technique known motion estimation search macroblock current frame conducted over portion all) reference frame. best matching macroblock (under certain criteria) selected, motion vector obtained, shown Figure motion vector consists horizontal vertical components. This motion vector expressed integer half-pixel accuracy. Half-pixel accuracy corresponds bilinear interpolation. Implementation MPEG-4 Motion Compensation Using TMS320C62x Figure Motion-Compensated Prediction regions where motion vector macroblock does provide with accurate description translation pixels macroblock, four motion vectors macroblock (one motion vector block) used. This mode used macroblock-by-macroblock basis. MPEG-4, mode known unrestricted motion vectors used. this mode, motion vectors point positions outside reference frame. This mode particularly useful scenes with higher degree motion, when objects move around edges entering exiting frame. predictive frame constructed from motion vectors obtained macroblocks frame. Macroblocks from reference frame replicated locations indicated motion vectors. This technique known motion compensation. predictive error frame (PEF) calculated taking difference between current predicted frames, intraframe encoded. Since energy likely low, amount bits necessary encoding small. Motion compensation used both encoder decoder produce motion compensated version current frame. This motion compensated frame reconstructed using predicted frame PEF. Since motion compensation used both ends video codec time-consuming task, efficient implementation essence. Implementing Motion Compensation 'C62x motion compensation module receives frame size, motion vectors, reference frame input. each macroblock, there four motion vectors. implementation, motion compensation done block-by-block basis. Therefore, when motion vector macroblock used, motion compensation routine called four times with same motion vector each block macroblock. This done reduce code size. However, implementation extended operate macroblocks. Implementation MPEG-4 Motion Compensation Using TMS320C62x language version motion compensation (referred code") implemented first. goal validate code verify correctness using actual sequences motion vector data. sequences were reconstructed using code, compared reconstruction done implementation H.263 from which motion vector data obtained. code profiled identify portions code where performance could increased. Intrinsics were introduced produce "Natural version code. Intrinsics special functions that directly inlined `C62x/'C67x instructions. "_nassert" used indicate optimizer that loops were executed certain number times. "Optimized obtained after refining Natural code, using code transformations unrolling certain loops. example, when block pixels copied from location memory another, four pixels copied time casting pointer byte into pointer integer. Other optimizations described following sections. critical performance areas Optimized code were isolated. Linear assembly code (assembly code that been register-allocated unscheduled) written these loops. TMS320C62x/C67x Programmer's Guide (SPRU198) more information about code development flow. decision using linear assembly based following considerations: Computationally intensive routines code written callable routines gives programmer greater control over resources pass memory bank information tools code recompiled make improved versions Code Generation Tools they become available performance obtained with linear assembly very close performance that would obtained using hand-optimized assembly easier write debug than hand-optimized assembly allows code reused other 'C6x platforms implementation motion compensation supports integer half-pixel accuracy. Therefore, four cases considered: Case Integer accuracy Case Half-pixel accuracy horizontal direction Case Half-pixel accuracy vertical direction Case Half-pixel accuracy both horizontal vertical directions These four cases illustrated Figure Implementation MPEG-4 Motion Compensation Using TMS320C62x Figure Bilinear Interpolation Scheme bilinear interpolation scheme rounding_type rounding_type rounding_type where denotes division truncation rounding_type takes value These four cases represent core computation motion compensation kernels that have been optimized `C62x. Code cases code, Natural Optimized Linear Assembly, Assembly Code generated Code Generation Tools) shown Appendices respectively. complete code motion compensation, including driver programs, shown Appendix Memory bank conflicts issue that impact performance implementation motion compensation. `C62x four memory banks, 2-bytes-wide each. Because each bank single-ported memory, only access each bank allowed cycle. accesses single bank given cycle result memory stall that halts pipeline operation cycle, while second value read from memory. (See TMS320C62x/C67x Programmer's Guide more information about internal memory banks C6000 family.) implementation, care been taken minimize memory bank conflicts staggering accesses data internal memory. assume that both current reference blocks internal memory. motion vectors represented notation; that decimal point placed between bits This notation allows handling half-pixel accuracy motion vector information. example, "0000 0011b" represents number 1.5. Implementation MPEG-4 Motion Compensation Using TMS320C62x Case Integer Accuracy Motion compensation case involves copying block pixels from reference frame into current frame. corresponding language implementation shown Example Example Language Implementation Case (m=0; m<8; m++) (n=0; n<8; n++) current_block[m][n] reference_block[m][n]; improvement over original code would transfer four pixels time (one word) casting pointer byte into pointer word bytes). Also, loop unrolling (used inner loop) would improve performance. However, writing linear assembly gives better control over memory accesses performed. Since motion vector point pixel reference frame, alignment reference block known. When word loaded, word address aligned word boundary, incorrect value loaded. Therefore, worst case (byte alignment) needs assumed, four cases considered shown Figure Figure Four Possible Memory Alignments Reference Block bytes words) possible solution memory alignment problem implementation case would transfer byte time original code. more efficient strategy copy eight pixels time. assure correct memory alignment, three words (which always contain eight pixels) read. Since memory address reference block known, corresponding case Figure determined. manipulating words accordingly, possible pack eight pixels into words that stored memory. example access reference block, which byte aligned, shown Figure linear assembly code case shown Appendix Implementation MPEG-4 Motion Compensation Using TMS320C62x Figure Example Data Manipulation Case Memory (reference frame) Registers Registers Registers Registers Memory (current frame) Figure bytes memory shown Endian order, while bytes registers shown Little Endian order. Case Half-Pixel Accuracy Horizontal Direction Motion compensation case involves bilinear interpolation according formulas previously shown Implementing Motion Compensation 'C62x section. corresponding language implementation shown Example Example Language Implementation Case (m=0; m<8; m++) (n=0; n<8; n++) current_block[m][n] (reference_block[m][n] reference_block[m][n+1] rounding_type) first improvement code perform division using right shift position. standard calls division truncation; therefore, right shift fast, correct implementation. Another improvement compute constant that replaces term rounding_type, saving substraction iteration. memory alignment issues similar case strategy chosen interpolate pixel time. produce efficient code, inner loop unrolled order convert nested loop into single loop. reason that code generation tools produce more efficient code single loops than nested loops. Implementation MPEG-4 Motion Compensation Using TMS320C62x pixels resides contiguous memory locations. Therefore, access pattern internal memory predetermined order avoid memory bank hits. linear assembly code case shown Appendix Case Half-Pixel Accuracy Vertical Direction Motion compensation case case involves bilinear interpolation according formulas previously shown Implementing Motion Compensation 'C62x section. corresponding language implementation shown Example Example Language Implementation Case (m=0; m<8; m++) (n=0; n<8; n++) current_block[m][n] (reference_block[m][n] reference_block[m+1][n] rounding_type) Pixels located rows above below current pixel reside adjacent memory locations, they located distance equal frame width. Since width CIF-size frame multiple (and, thus, multiple these pixels reside same memory bank. Therefore, read same cycle, pipeline stall would occur. solve this problem, strategy similar case used: pixels processed column-by-column, creating single loop. advantage this strategy that facilitates code generation tools. linear assembly code case shown Appendix Case Half-Pixel Accuracy Both Horizontal Vertical Directions motion compensation case four pixels from reference frame used produce pixel current frame. corresponding language implementation shown Example Example Language Implementation Case (m=0; m<8; m++) (n=0; n<8; n++) current_block[m][n] (reference_block[m][n] reference_block[m][n+1] reference_block[m+1][n] reference_block[m+1][n+1] rounding_type) Implementation MPEG-4 Motion Compensation Using TMS320C62x strategy used similar case that pixels processed row-wise. inner loop unrolled create single loop, pointers used access pixels contiguous rows. software pipelining, accesses contiguous rows cause memory bank conflicts, because instructions reordered. Even though single pointer would have been suficient, pointers loading pixels from memory indicates code generation tools independence between load instructions. linear assembly code case shown Appendix Code Benchmarks Benchmarks code described Implementing Motion Compensation 'C62x section summarized Table code benchmarked shown Appendices A-D. These benchmarks were obtained using Code Generation Tools version 2.10. options used compiler were: -pm. TMS320C6000 Optimizing Compiler User's Guide (SPRU187) more information about compiler tools. Table Code Benchmarks Motion Compensation Block code Natural Clock count 1020 1020 1341 Code Size (FPs) Optimized Clock count Code size (FPs) Linear Assembly Clock count Code size (FPs) Motion Compensation Case Case Case Case fetch packets Clock count 1023 1023 1346 Code size (FPs) case there large difference code size between Natural Optimized because loop unrolling. same effect present cases between Optimized Linear Assembly. case difference code size larger because rows being processed time. Conclusions this application report have presented implementation MPEG-4 motion compensation `C62x. code written validated, most computationally intensive portions code were optimized `C62x. Linear assembly code written certain routines maximize performance. Linear assembly several advantages over hand-optimized assembly: gives programmer greater flexibility than access resources available reduces overhead calling function easier write debug than assembly code generation tools able generate very efficient code, with performance almost good hand-optimized assembly Implementation MPEG-4 Motion Compensation Using TMS320C62x Imaging applications exhibit high degree parallelism that exploited VLIW architectures such 'C6x. Although code generation tools able produce efficient code, important identify portions code where program spends most time. Improving performance this code boost overall performance application. advantage using software pipelining this application limited number times loop performed. Since block processed time, trip count both inner outer loops Consider nested loop, where inner loop software pipelined. kernel loop executed only small number times, because inner loop unrolled several times software pipeline loop. Therefore, being used full capacity. solve this problem, outer loop could folded into inner loop maximize number cycles which `C62x runs performance. This solution would require write hand-optimized code. Another further improve performance would instructions 'C6x that operate pixels. example, instruction that four pairs pixels would accelerate interpolation step motion compensation. size internal memory available issue that impact performance applications that operate large amounts data, such video coding. Although this issue addressed this application report assumed that data already internal memory), important devise strategy manage data that available when needed. efficiency this strategy will determine overall performance implementation MPEG-4 motion compensation. Implementation MPEG-4 Motion Compensation Using TMS320C62x References MPEG-4 Video Group, Overview MPEG-4 Standard, ISO/IEC JTC1/SC29/WG11 N2323, Dublin, Ireland, July 1998. TMS320C62x/C67x Technical Brief, (SPRU197), April 1998. MPEG-4 Video Group, MPEG-4 Video Verification Model version 10.1, ISO/IEC JTC1/SC29/WG11 MPEG98/M3464, Tokyo, Japan, March 1998. Bhaskaran Konstantinides, Image Video Compression Standards, Second edition, Kluwer Academic Publishers, Norwell, Massachusetts, 1997. Mitchell, Pennebaker, Fogg, LeGall, MPEG Video Compression Standard, Chapman Hall, Digital Multimedia Standards Series, York, 1996. Implementation MPEG-4 Motion Compensation Using TMS320C62x Appendix Motion Compensation Code: Case Code Integer accuracy: copy block void MC_case_a(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size) for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] ref[r_x+m][r_y+n]; Natural Code Integer accuracy: copy block void MC_case_a(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size) _nassert(size==8); for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] ref[r_x+m][r_y+n]; Optimized Code Integer accuracy: copy block void MC_case_a(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size) _nassert(size==8); for(m=0; m<size; m++) curr[c_x+m][c_y ref[r_x+m][r_y curr[c_x+m][c_y+1] ref[r_x+m][r_y+1]; curr[c_x+m][c_y+2] ref[r_x+m][r_y+2]; curr[c_x+m][c_y+3] ref[r_x+m][r_y+3]; curr[c_x+m][c_y+4] ref[r_x+m][r_y+4]; curr[c_x+m][c_y+5] ref[r_x+m][r_y+5]; curr[c_x+m][c_y+6] ref[r_x+m][r_y+6]; curr[c_x+m][c_y+7] ref[r_x+m][r_y+7]; Implementation MPEG-4 Motion Compensation Using TMS320C62x Linear Assembly Linear Assembly version "MC_case_a" .def .sect _MC_case_a: .cproc .reg .reg .reg .reg .reg Calculate pointers _MC_case_a ".text" ref, curr, r_x, c_x, r_y, c_y, num_cols r_temp1, r_temp2, c_temp1, c_temp2 p_r, p_c, np_r lshift, rshift, count r_w1, r_w2, r_w3, r_w4 temp "p_c" "p_r" r_x, 0x05, r_temp1 c_x, 0x05, c_temp1 r_y, ref, r_temp2 c_y, curr, c_temp2 r_temp1, r_temp2, c_temp1, c_temp2, num_cols, num_cols r_temp1 NUM_COLS c_temp1 NUM_COLS r_temp2 c_temp2 curr r_temp1 r_temp2 c_temp1 c_temp2 update np_r Initialize loop counter count Loop performed times Obtain distance shifting SUB.L Loop loop: .trip SHRU SHRU *np_r++[1], r_w1 Load first word *np_r++[1], r_w2 Load second word *np_r++[num_cols], r_w3 Load third word r_w1, r_w3, r_w2, r_w2, rshift, lshift, lshift, rshift, r_w1 r_w3 r_w4 r_w2 Make room LSByte Make room MSByte LSByte MSByte 0xFFFC, temp p_r, temp, p_r, 0x0003, 0x04, rshift, rshift, 0x03, lshift, 0x03, obtain LSBits Word-aligned access only define alignment Obtain "lshift" bits left shift bits right shift np_r rshift lshift rshift lshift r_w1, r_w4, r_w1 r_w2, r_w3, r_w2 Obtain actual word Obtain actual word Implementation MPEG-4 Motion Compensation Using TMS320C62x r_w1, *p_c++[1] Store word r_w2, *p_c++[num_cols] Store word p_c, Since num_cols short word Loop back [count] count, count [count] loop .endproc Output from Assembly Optimizer TMS320C6x ANSI Codegen Version 2.10.beta Date/Time created: 11:57:22 1998 GLOBAL FILE PARAMETERS Architecture TMS320C6200 Endian Little Interrupt Threshold Disabled Memory Model Small Speculative Load Threshold Redundant Loops Enabled Pipelining Enabled Debug Info Debug Info .set .set .set .global $bss Linear Assembly version "MC_case_a" .def .sect .sect _MC_case_a ".text" ".text" FUNCTION NAME: _MC_case_a Regs Modified Regs Used _MC_case_a: _MC_case_a: .cproc ref, curr, r_x, c_x, r_y, c_y, num_cols .reg r_temp1, r_temp2, c_temp1, c_temp2 .reg p_r, p_c, np_r .reg lshift, rshift, count .reg r_w1, r_w2, r_w3, r_w4 .reg temp Implementation MPEG-4 Motion Compensation Using TMS320C62x .L1X .L2X .L2X .L1X .L2X .L2X A6,0x5,A3 A8,A4,A0 A3,A0,A4 B6,0x5,B4 B8,B4,B5 B4,A0 0x3,A4,A6 A10,0x2,A3 A3,B7 A6,0x3,A6 0x4,A6,A7 A6,B9 A0,B5,A8 0xfffc,A5 A8,B5 A4,A5,A0 0x8,B0 CSR,B1 A0,B4 A7,0x3,A7 -2,B1,B6 B6,CSR B0,4,B0 |19| |25| |20| r_temp1 r_x*NUM_COLS r_temp1 r_temp2 c_temp1 c_x*NUM_COLS |38| |37| define alignment |40| |39| |38| |40| |26| |35| bits left Obtain "lshift" shift c_temp1 c_temp2 obtain LSBits |40| |36| Word-aligned access only |31| Loop performed times |40| |40| |40| |40| |40| |40| bits right shift ;*-* SOFTWARE PIPELINE INFORMATION Loop label loop Known Minimum Trip Count Known Trip Count Factor Loop Carried Dependency Bound(^) Unpartitioned Resource Bound Partitioned Resource Bound(*) Resource Partition: A-side B-side units units units units cross paths address paths Long read paths Long write paths Logical (.LS) unit) Addition (.LSD) unit) Bound(.L .LS) Bound(.L .LSD) Implementation MPEG-4 Motion Compensation Using TMS320C62x Searching software pipeline schedule Schedule found with iterations parallel Done Epilog removed Speculative load beyond user threshold Speculative Load Threshold Unknown ;*-* PIPED LOOP PROLOG loop: .trip 4,B4,B4 ^|9| SHRU SHRU .D1T1 .D2T2 .D1T1 .L1X .D1T1 .D2T2 .D1T1 .S2X .L1X .S1X .D1T1 *A0++,A3 4,B4,B4 ^|44| ^|9| ^|46| ^|45| Load first word *B4++[B7],B6 *A0++,A5 B4,A0 4,B4,B4 *A0++,A3 4,B4,B4 Load third word Load second word ^|9| ^|9| ^|44| ^|9| Load first word A3,A6,A3 |48| Make room LSByte *B4++[B7],B6 ^|46| Load third word *A0++,A5 ^|45| Load second word A5,B9,B8 A5,A7,A4 B4,A0 A3,A4,A4 loop 4,B4,B4 B6,A7,A5 *A0++,A3 4,B4,B4 |51| MSByte |50| LSByte ^|9| |53| Obtain actual word ^|62| ^|9| |49| Make room MSByte ^|44| Load first word ^|9| loop: PIPED LOOP KERNEL SHRU SHRU .L2X .D2T2 .D1T1 .D1T1 .D2T2 .S2X .L1X 4,B5,B5 B8,A5,B8 A3,A6,A3 *B4++[B7],B6 *A0++,A5 A4,*A8++ B8,*B5++[B7] B0,0x1,B0 A5,B9,B8 A5,A7,A4 B4,A0 ^|9| |54| Obtain actual word @|48| Make room LSByte ^|46| Load third word ^|45| Load second word ^|56| Store word ^|57| Store word @|61| Loop back @|51| MSByte @|50| LSByte ^|9| Implementation MPEG-4 Motion Compensation Using TMS320C62x .S1X .S1X .D1T1 0x4,B5,B5 B5,A8 A3,A4,A4 loop 4,B4,B4 4,A8,A8 B6,A7,A5 *A0++,A3 4,B4,B4 ^|59| Since num_cols short word ^|9| @|53| Obtain actual word ^|62| ^|9| ^|9| @|49| Make room MSByte ^|44| Load first word ^|9| PIPED LOOP EPILOG SHRU SHRU SHRU SHRU .L2X .D2T2 .D1T1 .D1T1 .D2T2 .S2X .L1X .S1X .S1X .L2X .D1T1 .D2T2 .S2X .S1X .S1X .L2X .D1T1 .D2T2 4,B5,B5 B8,A5,B8 A3,A6,A3 *B4++[B7],B6 *A0++,A5 A4,*A8++ B8,*B5++[B7] A5,B9,B8 A5,A7,A4 B4,A0 0x4,B5,B5 B5,A8 A3,A4,A4 4,A8,A8 B6,A7,A5 4,B5,B5 B8,A5,B8 A3,A6,A3 A4,*A8++ B8,*B5++[B7] A5,B9,B8 A5,A7,A4 0x4,B5,B5 B5,A8 A3,A4,A4 4,A8,A8 B6,A7,A5 4,B5,B5 B8,A5,B8 ^|9| @|54| Obtain actual word @@|48| Make room LSByte ^|46| Load third word ^|45| Load second word ^|56| Store word ^|57| Store word @@|51| MSByte @@|50| LSByte ^|9| ^|59| Since num_cols short word ^|9| @@|53| Obtain actual word ^|9| @@|49| Make room MSByte ^|9| @@|54| Obtain actual word @@@|48| Make room LSByte ^|56| Store word ^|57| Store word @@@|51| MSByte @@@|50| LSByte ^|59| Since num_cols short word ^|9| @@@|53|Obtain actual word ^|9| @@@|49| Make room MSByte ^|9| @@@|54|Obtain actual word Store word Store word A4,*A8++ ^|56| B8,*B5++[B7] ^|57| Implementation MPEG-4 Motion Compensation Using TMS320C62x .S1X 0x4,B5,B5 B5,A8 4,A8,A8 ^|59| ^|9| ^|9| Since num_cols short word B1,CSR |40| BRANCH OCCURS .endproc Implementation MPEG-4 Motion Compensation Using TMS320C62x Appendix Motion Compensation Code: Case Code Interpolate rows void MC_case_b(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m][r_y+n ref[r_x+m][r_y+n+1] rounding_type)/2; Natural Code Interpolate rows void MC_case_b(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) _nassert(size>=8); for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m][r_y+n ref[r_x+m][r_y+n+1] rounding_type)/2; Optimized Code Interpolate rows void MC_case_b(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) _nassert(size>=8); for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m][r_y+n ref[r_x+m][r_y+n+1] rounding_type)>>1; Implementation MPEG-4 Motion Compensation Using TMS320C62x Linear Assembly Linear Assembly version "MC_case_b" .def .sect MC_case_b: .cproc .reg .reg .reg .reg Calculate pointers _MC_case_b ".text" ref, curr, r_x, c_x, r_y, c_y, num_cols, rounding p_r, r_temp1, r_temp2, c_temp1, c_temp2 r_a, r_b, temp count, const "p_c" "p_r" r_x, 0x05, r_temp1 c_x, 0x05, c_temp1 r_y, ref, r_temp2 c_y, curr, c_temp2 r_temp1, r_temp2, c_temp1, c_temp2, rounding, const r_temp1 NUM_COLS c_temp1 NUM_COLS r_temp2 c_temp2 curr r_temp1 r_temp2 c_temp1 c_temp2 const rounding Initialize loop counter Loop loop: .trip count Loop performed times LDBU LDBU SHRU LDBU SHRU *+p_r[0], *+p_r[1], r_a, const, temp r_b, temp, temp temp, temp temp, *+p_c[0] *+p_r[2], r_b, const, temp r_a, temp, temp temp, temp temp, *+p_c[1] Load byte Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU SHRU LDBU SHRU LDBU SHRU LDBU SHRU LDBU SHRU LDBU SHRU *+p_r[3], r_a, const, temp r_b, temp, temp temp, temp temp, *+p_c[2] *+p_r[4], r_b, const, temp r_a, temp, temp temp, temp temp, *+p_c[3] *+p_r[5], r_a, const, temp r_b, temp, temp temp, temp temp, *+p_c[4] *+p_r[6], r_b, const, temp r_a, temp, temp temp, temp temp, *+p_c[5] *+p_r[7], r_a, const, temp r_b, temp, temp temp, temp temp, *+p_c[6] *+p_r[8], r_b, const, temp r_a, temp, temp temp, temp temp, *+p_c[7] Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Move next Move next Loop back p_c, num_cols, p_r, num_cols, [count] count, count [count] loop .endproc Implementation MPEG-4 Motion Compensation Using TMS320C62x Output from Assembly Optimizer TMS320C6x ANSI Codegen Version 2.10.beta Date/Time created: 11:57:23 1998 GLOBAL FILE PARAMETERS Architecture TMS320C6200 Endian Little Interrupt Threshold Disabled Memory Model Small Speculative Load Threshold Redundant Loops Enabled Pipelining Enabled Debug Info Debug Info .set .set .set .global $bss Linear Assembly version "MC_case_b" .def .sect .sect _MC_case_b ".text" ".text" FUNCTION NAME: _MC_case_b Regs Modified: B8,B9 Regs Used B6,B7,B8,B9,B10 _MC_case_b: _MC_case_b: .cproc ref, curr, r_x, c_x, r_y, c_y, num_cols, rounding .reg p_r, .reg r_temp1, r_temp2, c_temp1, c_temp2 .reg r_a, r_b, temp .reg count, const .L1X 0x1,B10,A3 .L1X B3,A9 B6,0x5,B4 A6,0x5,A5 A8,A4,A4 B8,B4,B5 |19| |18| c_temp1 c_x*NUM_COLS r_temp1 r_x*NUM_COLS Implementation MPEG-4 Motion Compensation Using TMS320C62x .S1X .L2X .L1X .L2X .L2X .L1X A5,A4,A6 B4,A0 A10,B9 A0,B5,A5 A6,B2 0x8,B4 CSR,B3 A5,B7 B4,A1 -2,B3,B4 B4,CSR A1,1,A1 |24| |19| |25| |30| |30| |30| |30| |30| |30| |30| |30| r_temp1 r_temp2 c_temp1 c_temp2 Loop performed times ;*-* SOFTWARE PIPELINE INFORMATION Loop label loop Known Minimum Trip Count Known Trip Count Factor Loop Carried Dependency Bound(^) Unpartitioned Resource Bound Partitioned Resource Bound(*) Resource Partition: A-side B-side units units units units cross paths address paths Long read paths Long write paths Logical (.LS) unit) Addition (.LSD) unit) Bound(.L .LS) Bound(.L .LSD) Searching software pipeline schedule find schedule Schedule found with iterations parallel Done Epilog removed Speculative load beyond user threshold Speculative Load Threshold Unknown ;*-* PIPED LOOP PROLOG loop: .trip LDBU .D2T2 *+B2(3),B4 ^|49| Load byte LDBU .D1T2 *+A6(4),B8 ^|55| Load byte LDBU .D1T1 *+A6(5),A0 ^|61| Load byte Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU LDBU LDBU LDBU SHRU .D1T1 .D2T2 .D2T2 .L2X .L2X .D2T2 .D2T2 .L2X *+A6(6),A4 *+B2(1),B1 *+B2(2),B8 B4,A3,B6 B8,A3,B6 *B2,B6 B8,B6,B8 B8,0x1,B5 B5,*+B7(3) A0,B6,B5 A0,A3,A0 ^|67| ^|37| Load byte Load byte ^|43| Load byte |56| First part |62| First part ^|35| Load byte |57| Second part |58| Divide (w/tru) ^|59| Store result |63| Second part |68| First part loop: PIPED LOOP KERNEL SHRU LDBU LDBU SHRU LDBU SHRU LDBU SHRU LDBU SHRU LDBU LDBU SHRU .D1T1 .L2X .D1T1 .L2X .L1X .D2T2 .D1T2 .D1T1 .D2T2 .L2X .D2T2 .D1T1 .D1T1 .D2T2 B5,0x1,B5 A4,A0,A4 A4,A3,A0 *+A6(7),A4 B1,A3,B6 B2,B9,B2 *+A6(8),A4 A4,0x1,A4 B8,B6,B8 B8,A3,B0 B2,A6 *+B2(3),B4 B8,0x1,B8 B4,B0,B4 *+A6(4),B8 A1,0x1,A1 A4,*+A5(5) B8,*+B7(1) B4,0x1,B4 B6,A3,B6 B4,*+B7(2) loop B1,B6,B4 *+A6(5),A0 A4,A0,A0 A4,A3,A7 B4,0x1,B6 *+A6(6),A4 *+B2(1),B1 A0,0x1,A7 A4,A7,A4 |64| Divide |69| Second |74| First ^|73| Load |44| First ^|86| Move (w/tru) part part byte part next ^|79| Load byte |70| Divide (w/tru) |45| Second part |50| First part ^|9| ^|49| Load byte |46| Divide (w/tru) |51| Second part ^|55| Load byte |88| Loop back ^|71| Store result ^|47| Store result |52| Divide (w/tru) |38| First part ^|53| Store result ^|89| |39| Second part ^|61| Load byte |75| Second part |80| First part |40| Divide (w/tru) ^|67| Load byte ^|37| Load byte Divide (w/tru) Second part |76| |81| Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU SHRU LDBU SHRU .L1X .D2T2 .L2X .D1T1 .L2X .D2T2 .D1T2 .D2T1 .D1T1 .L1X .D2T2 .L2X B6,A0 *+B2(2),B8 B4,A3,B6 A7,*+A5(6) A4,0x1,A4 B8,A3,B6 *B2,B6 B8,B6,B8 B5,*+A5(4) B7,B9,B7 A0,*B7 B8,0x1,B5 A4,*+A5(7) B7,A5 B5,*+B7(3) A0,B6,B5 A0,A3,A0 ^|43| Load byte @|56| First part ^|77| Store result |82| Divide (w/tru) @|62| First part ^|35| Load byte @|57| Second part ^|65| Store result ^|85| Move next ^|41| Store result @|58| Divide (w/tr) ^|83| Store result ^|9| ^|59| Store result @|63| Second part @|68| First part PIPED LOOP EPILOG SHRU LDBU LDBU SHRU SHRU SHRU SHRU SHRU .D1T1 .L2X .D1T1 .L2X .L1X .D1T1 .D2T2 .L2X .D2T2 .L1X B5,0x1,B5 A4,A0,A4 A4,A3,A0 *+A6(7),A4 B1,A3,B6 B2,B9,B2 *+A6(8),A4 A4,0x1,A4 B8,B6,B8 B8,A3,B0 B2,A6 B8,0x1,B8 B4,B0,B4 A4,*+A5(5) B8,*+B7(1) B4,0x1,B4 B6,A3,B6 B4,*+B7(2) B1,B6,B4 A4,A0,A0 A4,A3,A7 B4,0x1,B6 A0,0x1,A7 A4,A7,A4 B6,A0 @|64| Divide @|69| Second @|74| First ^|73| Load @|44| First ^|86| Move ^|79| Load @|70| Divide @|45| Second @|50| First ^|9| (w/tr) part part byte part next byte (w/tr) part part @|46| @|51| Divide (w/tr) Second part ^|71| Store result ^|47| Store result @|52| Divide (w/tr) @|38| First part ^|53| Store result @|39| Second part @|75| @|80| @|40| @|76| @|81| @|9| Second part First part Divide (w/tr) Divide (w/tr) Second part Implementation MPEG-4 Motion Compensation Using TMS320C62x SHRU .D1T1 .D1T2 .D2T1 .D1T1 .L1X A7,*+A5(6) A4,0x1,A4 B5,*+A5(4) B7,B9,B7 A0,*B7 A4,*+A5(7) B7,A5 ^|77| Store result @|82| Divide (w/tr) ^|65| ^|85| ^|41| ^|83| ^|9| Store result Move next Store result Store result B3,CSR |30| .S2X BRANCH OCCURS .endproc Implementation MPEG-4 Motion Compensation Using TMS320C62x Appendix Motion Compensation Code: Case Code Interpolate rows void MC_case_c(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m ][r_y+n] ref[r_x+m+1][r_y+n] rounding_type)/2; Natural Code Interpolate rows void MC_case_c(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) _nassert(size>=8); for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m ][r_y+n] ref[r_x+m+1][r_y+n] rounding_type)/2; Optimized Code Interpolate rows void MC_case_c(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) _nassert(size>=8); for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m ][r_y+n] ref[r_x+m+1][r_y+n] rounding_type)>>1; Implementation MPEG-4 Motion Compensation Using TMS320C62x Linear Assembly Linear Assembly version "MC_case_c" .def .sect _MC_case_c: .cproc .reg .reg .reg .reg _MC_case_c ".text" ref, curr, r_x, c_x, r_y, c_y, num_cols, rounding p_r, p_c, ptr_temp r_temp1, r_temp2, c_temp1, c_temp2 r_a, r_b, temp count, const "p_c" "p_r" r_temp1 NUM_COLS c_temp1 NUM_COLS r_temp2 c_temp2 curr r_temp1 r_temp2 c_temp1 c_temp2 const rounding Calculate pointers r_x, 0x05, r_temp1 c_x, 0x05, c_temp1 r_y, ref, r_temp2 c_y, curr, c_temp2 r_temp1, r_temp2, c_temp1, c_temp2, rounding, const Initialize loop counter count Loop loop: Loop performed times .trip LDBU LDBU SHRU LDBU SHRU LDBU SHRU *p_r++[num_cols], *p_r++[num_cols], r_a, const, temp r_b, temp, temp temp, temp temp, *p_c++[num_cols] *p_r++[num_cols], r_b, const, temp r_a, temp, temp temp,1, temp temp, *p_c++[num_cols] *p_r++[num_cols], r_a, const, temp r_b, temp, temp temp, temp Load byte Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU SHRU LDBU SHRU LDBU SHRU LDBU SHRU LDBU SHRU temp, *p_c++[num_cols] Store result *p_r++[num_cols], r_b, const, temp r_a, temp, temp temp, temp temp, *p_c++[num_cols] *p_r++[num_cols], r_a, const, temp r_b, temp, temp temp, temp temp, *p_c++[num_cols] *p_r++[num_cols], r_b, const, temp r_a, temp, temp temp, temp temp, *p_c++[num_cols] *p_r++[num_cols], r_a, const, temp r_b, temp, temp temp, temp temp, *p_c++[num_cols] *p_r, r_b, const, temp r_a, temp, temp temp, temp temp, *p_c++[num_cols] Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Load byte First part Second part Divide (w/truncation) Store result Multiply Adjust frame Move next Adjust frame Move next Loop back num_cols, 0x03, ptr_temp p_r, ptr_temp, p_r, 0x01, p_c, ptr_temp, p_c, 0x01, [count] count, count [count] loop .endproc Output from Assembly Optimizer TMS320C6x ANSI Codegen Version 2.10.beta Date/Time created: 11:57:25 1998 GLOBAL FILE PARAMETERS Architecture TMS320C6200 Endian Little Interrupt Threshold Disabled Implementation MPEG-4 Motion Compensation Using TMS320C62x Memory Model Small Speculative Load Threshold Redundant Loops Enabled Pipelining Enabled Debug Info Debug Info .set .set .set .global $bss Linear Assembly version "MC_case_c" .def .sect .sect _MC_case_c ".text" ".text" FUNCTION NAME: _MC_case_c Regs Modified: Regs Used B7,B8,B9,B10 _MC_case_c: _MC_case_c: .cproc ref, curr, r_x, c_x, r_y, c_y, num_cols, rounding .reg p_r, p_c, ptr_temp .reg r_temp1, r_temp2, c_temp1, c_temp2 .reg r_a, r_b, temp .reg count, const .L1X .L1X .L2X .L1X 0x1,B10,A4 B6,0x5,B4 A8,A4,A3 B8,B4,B5 B4,A0 0x8,B0 A6,0x5,A5 CSR,B2 A10,B5 A0,B5,A9 A5,A3,A3 A10,A7 -2,B2,B4 B4,CSR B0,1,B0 |19| c_temp1 c_x*NUM_COLS |19| |30| |18| |30| |25| |25| |24| |25| Loop performed times r_temp1 r_x*NUM_COLS c_temp1 c_temp2 r_temp1 r_temp2 |30| |30| |30| Implementation MPEG-4 Motion Compensation Using TMS320C62x ;*-* SOFTWARE PIPELINE INFORMATION Loop label loop Known Minimum Trip Count Known Trip Count Factor Loop Carried Dependency Bound(^) Unpartitioned Resource Bound Partitioned Resource Bound(*) Resource Partition: A-side B-side units units units units cross paths address paths Long read paths Long write paths Logical (.LS) unit) Addition (.LSD) unit) Bound(.L .LS) Bound(.L .LSD) Searching software pipeline schedule find schedule Schedule found with iterations parallel Done Epilog removed Speculative load beyond user threshold Speculative Load Threshold Unknown ;*-* PIPED LOOP PROLOG loop: .trip LDBU .D1T1 *A3++[A7],A0 ^|35| Load byte LDBU .D1T1 *A3++[A7],A0 ^|37| Load byte LDBU .D1T1 *A3++[A7],A3 ^|43| Load byte LDBU .D1T1 *A3++[A7],A3 ^|49| Load byte LDBU .D1T1 *A3++[A7],A5 ^|55| Load byte LDBU LDBU SHRU LDBU .L2X .D2T2 .D2T2 .D2T2 A0,A4,A3 A3,B4 |38| First ^|9| part A0,A3,A6 |39| Second part A0,A4,A0 |44| First part *B4++[B5],B6 ^|61| Load byte A3,A0,A0 |45| Second part A3,A4,A5 |50| First part *B4++[B5],B7 ^|67| Load byte B5,0x3,B9 A3,A5,A6 A6,0x1,A8 *B4++[B5],B8 |85| Multiply |51| Second part |40| Divide (w/tru) ^|73| Load byte Implementation MPEG-4 Motion Compensation Using TMS320C62x loop: PIPED LOOP KERNEL SHRU LDBU SHRU SHRU SHRU LDBU SHRU LDBU SHRU LDBU SHRU LDBU LDBU LDBU .D1T1 .D2T2 .D1T1 .D1T1 .L2X .L1X .L2X .L1X .D1T1 .L1X .D1T1 .L2X .L2X .D1T1 .L2X .D2T2 .D1T1 .D2T2 .D1T1 .D2T2 .D1T1 .D2T2 .L2X .D2T2 A8,*A9++[A7] A0,0x1,A8 A5,A4,A0 B4,B9,B4 *B4,B7 A6,0x1,A6 A8,*A9++[A7] A3,A4,A3 0x1,B4,B4 A6,*A9++[A7] A5,A3,A0 B6,A0,B4 B4,A3 B7,A4,B6 B4,0x1,B4 A0,0x1,A5 B6,A4,A0 *A3++[A7],A0 B8,B6,B6 B8,A4,A0 A5,*A9++[A7] B7,A0,B8 B0,0x1,B0 A9,B1 B8,0x1,B8 *A3++[A7],A0 B7,A0,B6 B6,0x1,B4 B4,*B1++[B5] *A3++[A7],A3 B6,0x1,B6 loop B8,*B1++[B5] *A3++[A7],A3 ^|41| Store result |46| Divide (w/tru) |62| First part ^|87| Adjust |79| Load byte |52| Divide (w/tru) ^|47| Store result |56| First part ^|88| Move next ^|53| Store result |57| Second part |63| Second part ^|9| |74| First part |64| Divide (w/tru) |58| Divide (w/tru) |68| First part ^|35| Load byte |75| Second part |80| First part ^|59| Store result |69| Second part |93| Loop back ^|9| |70| Divide (w/tru) ^|37| Load byte |81| Second part |76| Divide (w/tru) ^|65| Store result ^|43| Load byte |82| Divide (w/tru) ^|94| ^|71| Store result ^|49| Load byte B4,*B1++[B5] ^|77| Store result *A3++[A7],A5 ^|55| Load byte B6,*B1++[B5] ^|83| Store result A0,A4,A3 @|38| First part A3,B4 ^|9| B1,B9,B6 A0,A3,A6 A0,A4,A0 *B4++[B5],B6 ^|90| Adjust @|39| Second part @|44| First part ^|61| Load byte Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU SHRU LDBU .D2T2 .L1X .D2T2 0x1,B6,B1 A3,A0,A0 A3,A4,A5 *B4++[B5],B7 B1,A9 B5,0x3,B9 A3,A5,A6 A6,0x1,A8 *B4++[B5],B8 ^|91| Move next @|45| Second part @|50| First part ^|67| Load byte ^|9| @|85| Multiply @|51| Second part @|40| Divide (w/tr) ^|73| Load byte PIPED LOOP EPILOG SHRU LDBU SHRU SHRU SHRU SHRU SHRU SHRU .D1T1 .D2T2 .D1T1 .D1T1 .L2X .L1X .L2X .L1X .L1X .D1T1 .L2X .L2X .L2X .D2T2 .D2T2 A8,*A9++[A7] A0,0x1,A8 A5,A4,A0 B4,B9,B4 *B4,B7 A6,0x1,A6 A8,*A9++[A7] A3,A4,A3 0x1,B4,B4 A6,*A9++[A7] A5,A3,A0 B6,A0,B4 B4,A3 B7,A4,B6 B4,0x1,B4 A0,0x1,A5 B6,A4,A0 B8,B6,B6 B8,A4,A0 A5,*A9++[A7] B7,A0,B8 A9,B1 B8,0x1,B8 ^|41| Store result @|46| Divide (w/tr) @|62| First part ^|87| Adjust @|79| Load byte @|52| Divide (w/tr) ^|47| Store result @|56| First part ^|88| Move next ^|53| Store result @|57| Second part @|63| Second part ^|9| @|74| @|64| @|58| @|68| First Divide Divide First part part (w/tr) (w/tr) @|75| Second part @|80| First part ^|59| Store result @|69| Second part ^|9| @|70| Divide (w/tr) B7,A0,B6 @|81| Second part B6,0x1,B4 @|76| Divide (w/tr) B4,*B1++[B5] ^|65| Store result B6,0x1,B6 @|82| Divide (w/tr) B8,*B1++[B5] ^|71| Store result .D2T2 B4,*B1++[B5] ^|77| Store result .D2T2 B6,*B1++[B5] ^|83| Store result B1,B9,B6 ^|90| Adjust 0x1,B6,B1 ^|91| Move next .L1X B1,A9 ^|9| Implementation MPEG-4 Motion Compensation Using TMS320C62x B2,CSR BRANCH OCCURS |30| .endproc Implementation MPEG-4 Motion Compensation Using TMS320C62x Appendix Motion Compensation Code: Case Code Interpolate rows columns void MC_case_d(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m ][r_y+n ref[r_x+m ][r_y+n+1] ref[r_x+m+1][r_y+n ref[r_x+m+1][r_y+n+1] rounding_type)>>2; Natural Code Interpolate rows columns void MC_case_d(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) _nassert(size>=8); for(m=0; m<size; m++) for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m ][r_y+n ref[r_x+m ][r_y+n+1] ref[r_x+m+1][r_y+n ref[r_x+m+1][r_y+n+1] rounding_type)/4; Optimized Code Interpolate rows columns void MC_case_d(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const r_x, const c_x, const r_y, const c_y, const size, const rounding_type) _nassert(size>=8); for(m=0; m<size; m++) Implementation MPEG-4 Motion Compensation Using TMS320C62x for(n=0; n<size; n++) curr[c_x+m][c_y+n] (ref[r_x+m ][r_y+n ref[r_x+m ][r_y+n+1] ref[r_x+m+1][r_y+n ref[r_x+m+1][r_y+n+1] rounding_type)>>2; Linear Assembly Linear Assembly version "MC_case_d" _MC_case_d ".text" .cproc ref, curr, r_x, c_x, r_y, c_y, num_cols, .def .sect _MC_case_d: rounding .reg .reg .reg .reg p_r, p_r1, p_r2, r_temp1, r_temp2, c_temp1, c_temp2 r_a1, r_a2, r_b1, r_b2, temp, temp1, temp2 count, const, ptr_temp "p_c" "p_r" r_temp1 NUM_COLS c_temp1 NUM_COLS r_temp2 c_temp2 curr r_temp1 r_temp2 c_temp1 c_temp2 const rounding Calculate pointers r_x, 0x05, r_temp1 c_x, 0x05, c_temp1 r_y, ref, r_temp2 c_y, curr, c_temp2 r_temp1, r_temp2, c_temp1, c_temp2, rounding, const Initialize loop counter count p_r, p_r1 p_r, num_cols, p_r2 Loop loop: .trip LDBU LDBU LDBU LDBU SHRU *+p_r1[0], r_a1 *+p_r2[0], r_a2 r_a1, r_a2, temp1 *+p_r1[1], r_b1 *+p_r2[1], r_b2 temp1, const, temp1 r_b1, r_b2, temp2 temp1, temp2, temp temp, temp temp, *+p_c[0] Loop performed times Load byte/1st pair Load byte/1st pair Load byte/2nd pair Load byte/2nd pair First part Second part Third part Divide (w/truncation) Store result Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU LDBU SHRU LDBU LDBU SHRU LDBU LDBU SHRU LDBU LDBU SHRU LDBU LDBU SHRU LDBU LDBU SHRU LDBU LDBU SHRU *+p_r1[2], r_a1 *+p_r2[2], r_a2 temp2, const, temp2 r_a1, r_a2, temp1 temp1, temp2, temp temp, temp temp, *+p_c[1] *+p_r1[3], r_b1 *+p_r2[3], r_b2 temp1, const, temp1 r_b1, r_b2, temp2 temp1, temp2, temp temp, temp temp, *+p_c[2] *+p_r1[4], r_a1 *+p_r2[4], r_a2 temp2, const, temp2 r_a1, r_a2, temp1 temp1, temp2, temp temp, temp temp, *+p_c[3] *+p_r1[5], r_b1 *+p_r2[5], r_b2 temp1, const, temp1 r_b1, r_b2, temp2 temp1, temp2, temp temp, temp temp, *+p_c[4] *+p_r1[6], r_a1 *+p_r2[6], r_a2 temp2, const, temp2 r_a1, r_a2, temp1 temp1, temp2, temp temp, temp temp, *+p_c[5] *+p_r1[7], r_b1 *+p_r2[7], r_b2 temp1, const, temp1 r_b1, r_b2, temp2 temp1, temp2, temp temp, temp temp, *+p_c[6] *+p_r1[8], r_a1 *+p_r2[8], r_a2 temp2, const, temp2 r_a1, r_a2, temp1 temp1, temp2, temp temp, temp temp, *+p_c[7] p_r1, num_cols, p_r1 p_r2, num_cols, p_r2 Load byte/3rd pair Load byte/3rd pair First part Second part Third part Divide (w/truncation) Store result Load byte/4th pair Load byte/4th pair First part Second part Third part Divide (w/truncation) Store result Load byte/5th pair Load byte/5th pair First part Second part Third part Divide (w/truncation) Store result Load byte/6th pair Load byte/6th pair First part Second part Third part Divide (w/truncation) Store result Load byte/7th pair Load byte/7th pair First part Second part Third part Divide (w/truncation) Store result Load byte/8th pair Load byte/8th pair First part Second part Third part Divide (w/truncation) Store result Load byte/9th pair Load byte/9th pair First part Second part Third part Divide (w/truncation) Store result Move p_r1 next Move p_r2 next Implementation MPEG-4 Motion Compensation Using TMS320C62x p_c, num_cols, Move Loop back next [count] count, count [count] loop .endproc Output from Assembly Optimizer TMS320C6x ANSI Codegen Version 2.10.beta Date/Time created: 11:57:27 1998 GLOBAL FILE PARAMETERS Architecture TMS320C6200 Endian Little Interrupt Threshold Disabled Memory Model Small Speculative Load Threshold Redundant Loops Enabled Pipelining Enabled Debug Info Debug Info .set .set .set .global $bss Linear Assembly version "MC_case_d" .def .sect .sect _MC_case_d ".text" ".text" FUNCTION NAME: _MC_case_d Regs Modified: Regs Used: _MC_case_d: _MC_case_d: .cproc ref, curr, r_x, c_x, r_y, c_y, num_cols, rounding .reg p_r, p_r1, p_r2, .reg r_temp1, r_temp2, c_temp1, c_temp2 .reg r_a1, r_a2, r_b1, r_b2, temp, temp1, temp2 .reg count, const, ptr_temp .D2T2 B11,*SP-(32); Implementation MPEG-4 Motion Compensation Using TMS320C62x .D2T1 .D2T2 .D2T2 .D2T1 .L2X .L1X .D2T1 .L1X .L2X .D2T1 .L2X .L1X .D2T1 A10,*+SP(4) B3,*+SP(24) B10,*+SP(28) A6,0x5,A3 |18| A8,A4,A0 A11,*+SP(8) A3,A0,A3 B6,0x5,B4 B8,B4,B5 A3,B1 B4,A0 A3,A10,A6 A14,*+SP(20) A0,B5,A10 A6,B3 A13,*+SP(16) 0x8,B0 A10,A2 CSR,B11 A10,B10 0x2,B10,A11 A12,*+SP(12) -2,B11,B4 B4,CSR B0,2,B0 |24| |19| |32| |24| |24| |25| |32| |30| |24| |32| |32| r_temp1 r_x*NUM_COLS r_temp1 r_temp2 c_temp1 c_x*NUM_COLS c_temp1 c_temp2 Loop performed times |32| |32| |32| ;*-* SOFTWARE PIPELINE INFORMATION Loop label loop Known Minimum Trip Count Known Trip Count Factor Loop Carried Dependency Bound(^) Unpartitioned Resource Bound Partitioned Resource Bound(*) Resource Partition: A-side B-side units units units units cross paths address paths Long read paths Long write paths Logical (.LS) unit) Addition (.LSD) unit) Bound(.L .LS) Bound(.L .LSD) Implementation MPEG-4 Motion Compensation Using TMS320C62x Searching software pipeline schedule Schedule found with iterations parallel Done Epilog removed Speculative load beyond user threshold Speculative Load Threshold Unknown ;*-* PIPED LOOP PROLOG loop: .trip LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU LDBU .D2T2 .D1T1 .D2T2 .D2T2 .D1T1 .D2T2 .D1T1 .D2T2 .D1T1 .D2T2 .D1T1 .D2T2 .D1T1 .D2T2 .L1X .D1T1 .D2T2 .L1X .D2T2 .L2X .D2T2 .L2X .L1X .L2X .L1X .L2X .L1X .D2T2 .D1T1 *+B1(8),B7 *+A6(3),A3 *+B3(1),B4 *+B1(7),B5 *+A6(7),A4 *+B1(6),B2 *+A6(4),A4 *+B1(5),B5 *+A6(5),A7 *+B1(4),B5 *+A6(2),A7 *+B1(3),B6 *+A6(6),A4 *+B1(2),B8 B5,A4,A1 *+A6(8),A3 *B1,B8 A1,A11,A9 B5,A7,A8 *+B1(1),B7 B5,A4,B5 A8,A11,A5 *B3,B5 A6,A2,A6 B5,A11,B6 B6,A3,A0 A0,A11,A7 B1,A2,B1 B8,A7,A12 A6,B3 A12,A11,A14 B7,A3,A3 *+B1(8),B7 *+A6(3),A3 ^|97| ^|58| ^|42| ^|89| ^|90| ^|81| ^|66| ^|73| ^|74| ^|65| ^|50| Load byte/9th Load byte/4th Load byte/2nd Load byte/8th Load byte/8th Load byte/7th Load byte/5th Load byte/6th Load byte/6th Load byte/5th Load byte/3rd byte/4th byte/7th byte/3rd part byte/9th ^|57| Load ^|82| Load ^|49| Load |92| Second ^|98| Load ^|37| Load byte/1st |99| First part |76| Second part ^|41| Load byte/2nd |68| Second part |83| First part ^|38| Load byte/1st ^|106| Move p_r2 next |75| |60| First part Second part |67| First part ^|105| Move p_r1 next |52| Second part ^|9| |59| First part |100| Second part ^|97| Load byte/9th ^|58| Load byte/4th Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU SHRU LDBU LDBU .L2X .D2T2 .D2T2 .D1T1 B5,A7,B7 B7,B4,B4 A3,A9,A3 *+B3(1),B4 B8,B5,B5 A14,A0,A0 A3,0x2,A13 *+B1(7),B5 *+A6(7),A4 |69| Third part |44| Second part |101| Third part ^|42| Load byte/2nd |39| |61| Third part |102| Divide (w/tr) ^|89| Load byte/8th ^|90| Load byte/8th loop: PIPED LOOP KERNEL SHRU SHRU LDBU LDBU LDBU LDBU SHRU SHRU LDBU LDBU SHRU LDBU LDBU SHRU LDBU LDBU LDBU SHRU LDBU .L1X .L2X .D2T2 .D1T1 .L2X .L1X .D2T2 .D1T1 .D2T2 .D1T1 .L1X .D2T2 .D1T1 .D2T2 .L1X .D1T1 .D1T1 .D2T2 .D1T1 .L1X .D2T2 B7,0x2,B8 B4,A11,A0 B5,A11,B5 A0,0x2,A7 *+B1(6),B2 *+A6(4),A4 B6,A8,B5 B2,A4,A9 A12,A0,A0 B5,B4,B4 *+B1(5),B5 *+A6(5),A7 A9,A11,A0 A0,0x2,A12 B4,0x2,B6 *+B1(4),B5 *+A6(2),A7 B5,0x2,B9 A0,A1,A8 B6,A0 *+B1(3),B6 *+A6(6),A4 A8,0x2,A8 *+B1(2),B8 B5,A4,A1 *+A6(8),A3 A9,A5,A5 A7,*+A10(2) *B1,B8 A1,A11,A9 B0,0x1,B0 A5,0x2,A5 A8,*+A10(6) B5,A7,A8 *+B1(1),B7 |70| Divide (w/tru) |51| First part |43| First part |62| Divide (w/tru) ^|81| Load byte/7th ^|66| Load byte/5th |77| Third part |84| Second part |53| Third part |45| Third part ^|73| Load byte/6th ^|74| Load byte/6th |91| First part |54| Divide (w/tru) |46| Divide (w/tru) ^|65| Load byte/5th ^|50| Load byte/3rd |78| Divide (w/tru) |93| Third part ^|57| Load byte/4th ^|82| Load byte/7th |94| Divide (w/tru) ^|49| Load byte/3rd @|92| Second part ^|98| Load byte/9th |85| Third part ^|63| Store result ^|37| Load byte/1st @|99| First part |109| Loop back |86| Divide (w/tru) ^|95| Store result @|76| Second part ^|41| Load byte/2nd Implementation MPEG-4 Motion Compensation Using TMS320C62x LDBU LDBU LDBU LDBU SHRU LDBU LDBU .D1T1 .L2X .D2T2 .D1T2 .D2T1 .L2X .L1X .D1T2 .D2T1 .L2X .L1X .L2X .L1X .D2T2 .D1T1 .D1T1 .L2X .D2T2 .L2X .D2T2 .D1T1 A5,*+A10(5) loop B5,A4,B5 A8,A11,A5 *B3,B5 A6,A2,A6 B8,*+A10(3) A0,*B10 B5,A11,B6 B6,A3,A0 B9,*+A10(4) A12,*+B10(1) A0,A11,A7 B1,A2,B1 B8,A7,A12 A6,B3 A12,A11,A14 B7,A3,A3 *+B1(8),B7 *+A6(3),A3 A10,A2,A10 A13,*+A10(7) B5,A7,B7 B7,B4,B4 A3,A9,A3 *+B3(1),B4 A10,B10 B8,B5,B5 A14,A0,A0 A3,0x2,A13 *+B1(7),B5 *+A6(7),A4 ^|87| Store result ^|110| @|68| Second part @|83| First part ^|38| Load byte/1st ^|106| Move p_r2 next result result part part ^|71| Store ^|47| Store @|75| First @|60| Second ^|79| Store result ^|55| Store result @|67| First part ^|105| Move p_r1 next @|52| Second part ^|9| @|59| First part @|100| Second part ^|97| Load byte/9th ^|58| Load byte/4th ^|107| Move next ^|103| Store result @|69| Third part @|44| Second part @|101| Third part ^|42| Load byte/2nd ^|9| @|39| @|61| Third part @|102| Divide (w/t) ^|89| Load byte/8th ^|90| Load byte/8th PIPED LOOP EPILOG SHRU SHRU LDBU LDBU LDBU LDBU SHRU .L1X .L2X .D2T2 .D1T1 .L2X .L1X .D2T2 .D1T1 B7,0x2,B8 B4,A11,A0 B5,A11,B5 A0,0x2,A7 *+B1(6),B2 *+A6(4),A4 B6,A8,B5 B2,A4,A9 A12,A0,A0 B5,B4,B4 *+B1(5),B5 *+A6(5),A7 A9,A11,A0 A0,0x2,A12 @|70| Divide (w/tr) @|51| First part @|43| First part @|62| Divide (w/tr) ^|81| Load byte/7th ^|66| Load byte/5th @|77| Third part @|84| Second part @|53| Third part @|45| Third part ^|73| Load byte/6th ^|74| Load byte/6th First part Divide (w/tr) @|91| @|54| Implementation MPEG-4 Motion Compensation Using TMS320C62x SHRU LDBU LDBU SHRU LDBU LDBU SHRU LDBU LDBU LDBU SHRU LDBU LDBU SHRU .D2T2 .D1T1 .L1X .D2T2 .D1T1 .D2T2 .L1X .D1T1 .D1T1 .D2T2 .D1T1 .L1X .D2T2 .D1T1 .L2X .D2T2 .D1T2 .D2T1 .L2X .L1X .D1T2 .D2T1 .L2X .L1X .L2X .L1X .D1T1 .L2X .L2X B4,0x2,B6 *+B1(4),B5 *+A6(2),A7 B5,0x2,B9 A0,A1,A8 B6,A0 *+B1(3),B6 *+A6(6),A4 A8,0x2,A8 *+B1(2),B8 B5,A4,A1 *+A6(8),A3 A9,A5,A5 A7,*+A10(2) *B1,B8 A1,A11,A9 A5,0x2,A5 A8,*+A10(6) B5,A7,A8 *+B1(1),B7 A5,*+A10(5) B5,A4,B5 A8,A11,A5 *B3,B5 A6,A2,A6 B8,*+A10(3) A0,*B10 B5,A11,B6 B6,A3,A0 B9,*+A10(4) A12,*+B10(1) A0,A11,A7 B1,A2,B1 B8,A7,A12 A6,B3 A12,A11,A14 B7,A3,A3 A10,A2,A10 A13,*+A10(7) B5,A7,B7 B7,B4,B4 A3,A9,A3 A10,B10 B8,B5,B5 A14,A0,A0 A3,0x2,A13 @|46| Divide (w/tr) ^|65| Load byte/5th ^|50| Load byte/3rd @|78| Divide (w/tr) @|93| Third part @|9| ^|57| Load byte/4th ^|82| Load byte/7th @|94| Divide (w/tr) ^|49| Load byte/3rd @@|92| Second part ^|98| Load byte/9th @|85| Third part ^|63| Store result ^|37| Load byte/1st @@|99| First part @|86| Divide (w/tr) ^|95| Store result @@|76| Second part ^|41| Load byte/2nd ^|87| Store result @@|68| Second part @@|83| First part ^|38| Load byte/1st ^|106| Move p_r2 next ^|71| Store ^|47| Store @@|75| First @@|60| Second result result part part ^|79| Store result ^|55| Store result @@|67| First part ^|105| Move p_r1 next @@|52| Second part ^|9| @@|59| First part @@|100| Second part ^|107| Move next ^|103| Store result @@|69| Third part @@|44| Second part @@|101| Third part ^|9| @@|39| @@|61| Third part @@|102| Divide (w/) Implementation MPEG-4 Motion Compensation Using TMS320C62x SHRU SHRU SHRU SHRU SHRU SHRU SHRU .L1X .L2X .L2X .L1X .L1X .D1T1 .D1T1 .D1T1 .D1T2 .D2T1 .D1T2 .D2T1 B7,0x2,B8 B4,A11,A0 B5,A11,B5 A0,0x2,A7 B6,A8,B5 B2,A4,A9 A12,A0,A0 B5,B4,B4 A9,A11,A0 A0,0x2,A12 B4,0x2,B6 B5,0x2,B9 A0,A1,A8 B6,A0 A8,0x2,A8 A9,A5,A5 A7,*+A10(2) A5,0x2,A5 A8,*+A10(6) A5,*+A10(5) B8,*+A10(3) A0,*B10 @@|70| @@|51| @@|43| @@|62| @@|77| @@|84| @@|53| @@|45| Divide First First Divide Third Second Third Third part part part part part part (w/t) (w/t) @@|91| @@|54| @@|46| @@|78| @@|93| @@|9| @@|94| First part Divide (w/t) Divide (w/t) Divide (w/t) Third part Divide (w/t) @@|85| Third part ^|63| Store result @@|86| Divide (w/t) ^|95| Store result ^|87| ^|71| ^|47| Store result Store result Store result Store result Store result B9,*+A10(4) ^|79| A12,*+B10(1) ^|55| .D1T1 .L2X A10,A2,A10 ^|107| A13,*+A10(7) ^|103| A10,B10 ^|9| Move next Store result .D2T2 *+SP(24),B3 B11,CSR |32| .D2T1 .D2T1 .D2T1 .D2T1 .D2T1 .D2T2 *+SP(20),A14 *+SP(16),A13 *+SP(12),A12 *+SP(8),A11 *+SP(4),A10 *+SP(28),B10 .D2T2 *++SP(32),B11 BRANCH OCCURS .endproc Implementation MPEG-4 Motion Compensation Using TMS320C62x Appendix Complete Code Motion Compensation motion_comp.c Motion Compensation MPEG-4 Program: Motion_comp.c #include <stdio.h> #include "globals.h" Perform motion compensation void perform_MC(uchar ref[NUM_ROWS][NUM_COLS], uchar curr[NUM_ROWS][NUM_COLS], const Pixel_Pos position, const short size, const motion_vector) c_x, c_y, r_x, r_y, mv_x, mv_y; mv_case; Type interpolation First, determine type interpolation that will performed. Half-pixel accuracy supported. quarter-pixel format, which currently supported, results following: X.25 -X.75, interpolation done. -X.25 X.75, half-pixel interpolation done.*/ positive integer. above results from checking CASE_a; Initially assume integer accuracy*/ mv_case Notice that CASE_d occurs when both CASE_b CASE_c occur. Half-pixel accuracy done when "1". (Q2) (motion_vector.x 0x02) Half-pixel Top-Down mv_case CASE_c; (motion_vector.y 0x02) mv_case CASE_b; Half-pixel Left-Right*/ corner coord appropriate component curr image (int)position.x; (int)position.y; Coord corner appropriate component image*/ These coord relative corner current block mv_x (int)(motion_vector.x>>2); mv_y int)(motion_vector.y>>2); These coord relative corner reference image Implementation MPEG-4 Motion Compensation Using TMS320C62x mv_x; switch (mv_case) mv_y; case CASE_a: Integer accuracy MC_case_a(ref, curr, r_x, c_x, r_y, c_y, 0);/* Copy block break; case CASE_b: Interpolate rows MC_case_b(ref, curr, r_x, c_x, r_y, c_y, NUM_COLS, break; case CASE_c: Interpolate columns MC_case_c(ref, curr, r_x, c_x, r_y, c_y, NUM_COLS, break; case CASE_d: Interpolate rows columns MC_case_d(ref, curr, r_x, c_x, r_y, c_y, NUM_COLS, break; default: puts("Invalid interpolation scheme!"); puts("Error motion_comp"); exit(1); switch Motion compensation using four motion vectors macroblock performed signal components. void motion_comp_4mv(Image *image_ref, Image *image_curr, const Image_Size size, const short mb_size, const Four_MV *motion_vector) short Indices short mv_x, mv_y; short num_mb_tb, num_mb_lr; short blk_size; short mv_case; Indicates type interpolation Pixel_Pos tl_pixel; Indicates top-left pixel subblock chrom_mv; num_mb_tb size.rows MB_SIZE_EXP; num_mb_lr size.cols MB_SIZE_EXP; blk_size mb_size>>1; _nassert(num_mb_tb _nassert(num_mb_lr for(i=0; i<num_mb_tb; i++) for(j=0; j<num_mb_lr; j++) frame top-bottom left-right Block size half size Implementation MPEG-4 Motion Compensation Using TMS320C62x (i*num_mb_lr) mv's list component FOUR_MV: four blocks Block (TL) tl_pixel.x i*mb_size; tl_pixel.y j*mb_size; perform_MC(image_ref->y, image_curr->y, tl_pixel, blk_size, motion_vector[k].mv[0]); Block (TR) tl_pixel.y blk_size; perform_MC(image_ref->y, image_curr->y, tl_pixel, blk_size, motion_vector[k].mv[1]); Block (BL) tl_pixel.x blk_size; tl_pixel.y j*mb_size; perform_MC(image_ref->y, image_curr->y, tl_pixel, blk_size, motion_vector[k].mv[2]); Block (BR) tl_pixel.y blk_size; perform_MC(image_ref->y, image_curr->y, tl_pixel, blk_size, motion_vector[k].mv[3]); components There only block each chrom components tl_pixel.x i*blk_size; tl_pixel.y j*blk_size; (motion_vector[k].mode ONE_MV) chrom_mv.x motion_vector[k].mv[0].x; chrom_mv.y motion_vector[k].mv[0].y; else FOUR_MV motion vector chrominance components four divided chrom_mv.x (motion_vector[k].mv[0].x motion_vector[k].mv[1].x motion_vector[k].mv[2].x motion_vector[k].mv[3].x)>>3; chrom_mv.y (motion_vector[k].mv[0].y motion_vector[k].mv[1].y motion_vector[k].mv[2].y motion_vector[k].mv[3].y)>>3; component perform_MC(image_ref->u, image_curr->u, tl_pixel, blk_size, chrom_mv); component perform_MC(image_ref->v, image_curr->v, tl_pixel, blk_size, chrom_mv); Implementation MPEG-4 Motion Compensation Using TMS320C62x main.c Motion Compensation MPEG-4 Program: Main.c #include <stdio.h> #include "globals.h" motion_comp_4mv(Image *image_curr, Image *image_ref, const Image_Size, const short mb_size, const Four_MV Image *allocate_image(Image_Size); Four_MV *allocate_mv(Image_Size, short); void create_image(Image Image_Size); void get_motion_vectors(Four_MV Image_Size, short); void print_image(Image Image_Size); main(void) Image *image_curr; Image *image_ref; Image_Size image_size; short mb_size; Four_MV *motion_vectors; Current image Reference image Image size Macroblock size Motion vectors (hor vert) Setting parameters image_size.rows NUM_ROWS; image_size.cols NUM_COLS; mb_size MB_SIZE; Allocate reference current images image_ref allocate_image( image_size image_curr allocate_image( image_size Allocate motion vectors motion_vectors allocate_mv(image_size, mb_size); Create reference image create_image(image_ref, image_size); Read motion vector information image_size, mb_size); motion compensation puts("\nPerforming Motion Compensation."); Implementation MPEG-4 Motion Compensation Using TMS320C62x Done. Exit. puts("Motion Compensation: done."); print_image( image_curr, image_size); Implementation MPEG-4 Motion Compensation Using TMS320C62x Contact Numbers INTERNET Semiconductor Home Page www.ti.com/sc Distributors www.ti.com/sc/docs/distmenu.htm PRODUCT INFORMATION CENTERS Americas Phone +1(972) 644-5580 +1(972) 480-7800 Email sc-infomaster@ti.com Europe, Middle East, Africa Phone Deutsch +49-(0) 8161 3311 English +44-(0) 1604 3399 +34-(0) Francais +33-(0) 1-30 Italiano +33-(0) 1-30 +44-(0) 1604 Email epic@ti.com Japan Phone International +81-3-3344-5311 Domestic 0120-81-0026 International +81-3-3344-5317 Domestic 0120-81-0036 Email pic-japan@ti.com Asia Phone International Domestic Australia Number China Number Hong Kong Number India Number Indonesia Number Korea Malaysia Number Zealand Number Philippines Number Singapore Number Taiwan Thailand Number Email +886-2-23786800 1-800-881-011 -800-800-1450 10810 -800-800-1450 800-96-1111 -800-800-1450 000-117 -800-800-1450 001-801-10 -800-800-1450 080-551-2804 1-800-800-011 -800-800-1450 000-911 -800-800-1450 105-11 -800-800-1450 800-0111-111 -800-800-1450 080-006800 0019-991-1111 -800-800-1450 886-2-2378-6808 tiasia@ti.com VelociTI trademarks Texas Instruments Incorporated. Implementation MPEG-4 Motion Compensation Using TMS320C62x IMPORTANT NOTICE Texas Instruments subsidiaries (TI) reserve right make changes their products discontinue product service without notice, advise customers obtain latest version relevant information verify, before placing orders, that information being relied current complete. products sold subject terms conditions sale supplied time order acknowledgement, including those pertaining warranty, patent infringement, limitation liability. warrants performance semiconductor products specifications applicable time sale accordance with TI's standard warranty. Testing other quality control techniques utilized extent deems necessary support this warranty. Specific testing parameters each device necessarily performed, except those mandated government requirements. CERTAIN APPLICATIONS USING SEMICONDUCTOR PRODUCTS INVOLVE POTENTIAL RISKS DEATH, PERSONAL INJURY, SEVERE PROPERTY ENVIRONMENTAL DAMAGE ("CRITICAL APPLICATIONS"). SEMICONDUCTOR PRODUCTS DESIGNED, AUTHORIZED, WARRANTED SUITABLE LIFE-SUPPORT DEVICES SYSTEMS OTHER CRITICAL APPLICATIONS. INCLUSION PRODUCTS SUCH APPLICATIONS UNDERSTOOD FULLY CUSTOMER'S RISK. order minimize risks associated with customer's applications, adequate design operating safeguards must provided customer minimize inherent procedural hazards. assumes liability applications assistance customer product design. does warrant represent that license, either express implied, granted under patent right, copyright, mask work right, other intellectual property right covering relating combination, machine, process which such semiconductor products services might used. TI's publication information regarding third party's products services does constitute TI's approval, warranty, endorsement thereof. Copyright 1999 Texas Instruments Incorporated Implementation MPEG-4 Motion Compensation Using TMS320C62x Other recent searchesTCLT11 - TCLT11 TCLT11 Datasheet MC9S12D-FamilyPP - MC9S12D-FamilyPP MC9S12D-FamilyPP Datasheet LM2CR01W - LM2CR01W LM2CR01W Datasheet IS61LV6416 - IS61LV6416 IS61LV6416 Datasheet B78475 - B78475 B78475 Datasheet B78475P1565A002 - B78475P1565A002 B78475P1565A002 Datasheet B78475P1564A002 - B78475P1564A002 B78475P1564A002 Datasheet ACLM-4619F - ACLM-4619F ACLM-4619F Datasheet 8AL050S-325F1-AN - 8AL050S-325F1-AN 8AL050S-325F1-AN Datasheet
Privacy Policy | Disclaimer |