The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers.    


Datasheet Search Engine   
 
Part # or Description: • 5V RS232 Driver • 2SC5066* • "Real Time Clock" • "USB connector" • "blue led" 5mm • 10 watt zener diode • 2N3055* motorola
 
Search Tip: Try entering the part number only. Include a wildcard (eg. lm317* or 1n4148*)

 

 

Jagadeesh Sankaran ABSTRACT This application report describes Ree


Datasheet Thumbnail

  

Download PDF



Top Searches for this datasheet



Reed Solomon Decoder: TMS320C64x Implementation
Jagadeesh Sankaran
ABSTRACT This application report describes Reed Solomon decoder implementation TMS320C64xDSP family. Reed Solomon codes have been widely accepted preferred (ECC) error control coding scheme ADSL networks, digital cellular phones high-definition television systems. reason their widespread prevalence their excellent robustness burst errors. Programmable implementations Reed Solomon decoder offer system designer unique flexibility trade data bandwidth vary error correcting capability that desired given channel. efficient method perform Reed Solomon decoding Peteren-Gorenstein-Zierler (PGZ) algorithm. Digital signal processors TMS320C6000DSP platform seek exploit data level instruction level parallelism algorithms having multiple units capable working tandem, obtain extremely high levels performance. advanced code generation tools help user identifying parallelism decoding algorithm multiple units exploit. This application note helps overcome this challenge showing steps involved developing efficient implementation complete (204,188,8) Reed Solomon decoder TMS320C64x family. This application report
Digital Signal Processing Solutions
Identifies various processing steps that involved development Reed Solomon decoder. Discusses instructions features C6400 used implementing efficient Reed Solomon decoder. Presents complete implementation (204,188,8) Reed Solomon decoder C6400 DSP.
Contents Introduction Reed Solomon Codes Galois Fields Overview C6400 Examples using GMPY4 different GF(2^M)
TMS320C64x TMS320C6000 trademarks Texas Instruments.
SPRA686
Peterson-Gorenstein-Zierler (PGZ) Reed Solomon Decoder Syndrome Computation Berlekamp Massey Algorithm Chien Search Forney Algorithm C6400 Implementation Reed Solomon Decoder Syndrome Computation: Case Case 5.8.1 First Recombination Step Similar Case 5.8.2 Second recombination step Case 5.10 Code Implementation 5.10.1 Berlekamp Massey Algorithm 5.10.2 Chien Search Algorithm 5.11 Forney Algorithm 5.12 Decoder driver function 5.13 Performance Reed Solomon Decoder References
List Figures Figure Figure Figure Figure Figure Figure Moulo Finite Field Math C62x/C67x C64x GMPY4 Operation C64x Default Polynomial GFPGFR Generator Polynomial GFPGFR G(x) Programming GFPGFR Generator Polynomial G(x) GF(64) List Tables Table Performance Decoder (204,188,8) Code Table Code Size (204,188,8) Decoder Table Performance Decoder (102,94,4) Code
Introduction Reed Solomon Codes
This section presents brief overview Reed Solomon codes their associated terminology. also discusses advantages programmable implementation Reed Solomon decoder. Reed Solomon codes particular case non-binary codes. They extremely popular because their capacity correct burst errors. Their capacity correct burst errors stems from fact that they word oriented rather than bit-oriented. bit-oriented code such code would treat this situation many independent single-bit errors. Reed Solomon code, however single error means all-incorrect bits within single word. Therefore (Reed Solomon) codes designed combat burst errors channel. fact codes particular case non-binary codes.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
structure Reed Solomon code specified following parameters:
length code-word bits, often chosen number errors correct
code-word this code then takes form block words. number words block which always equal words, which words parity check words. example, code uses block length bytes, which parity data bytes. number data bytes usually referred symbol Thus code usually described compact (N,K,T) notation. code discussed above example compact notation (255,249,3). When number data bytes protected close block length defined words technique called shortening used change block length. shortened code which both encoder decoder agree part allowable code space. example, (204,188,8) code would only allowable code words defined Reed Solomon code. error correcting code, such code, said systematic user data encoded appears verbatim encoded code word. Thus systematic (204,188,8) code would have data bytes provided user appearing verbatim encoded code word, appended parity words encoder form block words. choice using systematic code merely from point simplicity lets decoder recover data bytes strip parity bytes easily, because structure systematic code. programmable implementation encoder decoder attractive solution offers system designer unique flexibility trade-off data bandwidth error correcting capability that desired based condition channel. This done providing user capability vary data bandwidth error correcting capability that required. C6400 offers unique rich instruction that allows development high performance Reed Solomon decoder minimizing development time required without compromising flexibility that desired. This application note will discuss following sections develop efficient implementation complete (204,188,8) decoder solution C6400 DSP. This Reed Solomon code chosen example because used widely scheme ADSL modems.
Galois Fields
This section presents brief review properties Galois fields. This section presents utmost minimum detail that required order understand encoding decoding. comprehensive review Galois fields obtained from references coding theory [1]. field elements which binary operations performed. Addition multiplication must satisfy commutative, associative distributive laws. field with finite number elements finite field. Finite fields also called Galois fields after their inventor [1]. example binary field {0,1} under modulo addition modulo multiplication denoted GF(2). modulo addition subtraction operations defined tables shown following figure. first first column indicate inputs Galois field adder multiplier. e.g.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Modulo Addition (XOR)
Modulo Multiplication
Figure Moulo Finite Field Math general prime number then shown that GF(p) finite field with elements that GF(pm) extension field with elements. addition various elements field generated various powers field element raising different powers. example GF(256) elements which generated raising primitive element different powers. addition, polynomials whose coefficients binary belong GF(2). polynomial over GF(2) degree said irreducible divisible polynomial over GF(2) degree less than greater than zero. polynomial F(X) irreducible polynomial divisible either irreducible polynomial degree which divides X2m-1 known primitive polynomial. given there more than primitive polynomial. example primitive polynomial which often used most communication standards F(X) Galois field addition easy implement software, same modulo addition. e.g. elements GF(28) then their addition done simply operation follows: (11101) (10000) (01101). Galois field multiplication other hand more complicated shown following example, which computes elements GF(24), repeated multiplication primitive element generate field elements GF(24) primitive polynomial G(x) degree chosen follows G(x) order make multiplication modulo that results multiplication still elements field, element that fifth brought back into 4-bit result using following identity This identity used repeatedly form different elements field, setting Thus elements field enumerated follows: {0,1,,2,3,1 3,.1 Since primitive element GF(24), generate field elements GF(24) {0,1,2,4,8,3,6,7,12,11,.9}.
Overview C6400
This section presents overview C6400DSP. discusses specific architectural enhancements that have been made significantly increase performance Reed Solomon encoding decoding. C6400 uniquely tailored implementing Reed Solomon based error control coding because provides hardware support performing Galois field multiplies. absence hardware effectively perform Galois field math, previous implementations made logarithms perform multiplication finite fields. This limited performance programmable implementations Reed Solomon decoders architectures.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
C62x/C67x Instruction Fetch Instruction Dispatch Instruction Decode Control Register Emulation Instruction Fetch Interrupt Control Instruction Dispatch Advanced Instruction Packing Instruction Decode Data Path Register File A15-A0 Data Path Register File B15-B0 Data Path Register File A15-A0 A31-A16 Data Path Register File B15-B0 B31-B16 Advanced Emulation C64x Control Registers Interrupt Control
Dual 32-Bit Load/Store Path (Dual 64-Bit Load Path C67x Only)
Dual 64-Bit Load/Store Paths
Figure C62x/C67x C64x Galois field addition performed operation, multiplication operation performed GMPY4 instruction. C6400 allows 8-bit operations performed parallel every cycle. addition general-purpose registers that allow architecture obtain extremely high levels performance. action Galois field multiplier shown figure below. Galois field multiplier accepts integers, each which contains packed bytes multiplies them shown below produce four packed bytes integer.
A0,C1
A1,C2
B2,C3
where
denotes Galois field multiplication.
Figure GMPY4 Operation C64x
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
"GMPY4" instruction denotes that four Galois field multiplies being performed parallel. architecture issue such GMPY4s parallel every cycle, thus performing eight Galois field multiplies parallel. This provides architecture capability attain levels performance Reed Solomon based coding. addition Galois field used, programmed using GFPGFR register. ability these instructions directly from "intrinsics" helps considerably reduce software development time. Galois field division used often finite field math operations, that implemented look-up table required.
Examples using GMPY4 different GF(2^M)
following code fragment illustrates "gmpy4" instruction used directly from perform four Galois field multiplies parallel. Previous DSPs that have this instruction, would typically perform Galois field addition using logarithms. example, field elements would multiplied
exp[log[a] log[b]]. seen that three lookup-table operations have performed each Galois field multiply. some computational stages Reed-Solomon such syndrome accumulate Chien search inputs multiplier fixed, hence table look avoided, thereby allowing Galois field multiplies every cycle. architectural capabilities C6400 directly give boost terms Galois field multiplier capability. C6400 allows eight Galois field multiplies performed parallel, gmpy4 instructions, each data-path. This example performs Galois field multiplies GF(256) with generator polynomial defined follows: G(X) generator polynomial written pattern shown below 1+4+8+16) 0x1D:
0x11D
Figure Default Polynomial GFPGFR device comes powered with G(x) shown above generator polynomial GF(256), most communications standards make this polynomial Reed Solomon based coding. some other generator polynomial some other GF(2m) desired then user should initialize GFPGFR (Galois field polynomial generator) [5]. behavior GMPY4 instruction controlled programming GFPGFR (Galois field polynomial generator). parameters required program GFPGFR namely size polynomial generator. size field three bits smaller than degree generator polynomial, this case generator polynomial eight field computed from LSBs pattern represented 0x11D hexadecimal. always GF(256) hence only LSBs need represented generator polynomial control register. behavior GMPY4 instruction controlled programming GFPGFR (Galois field polynomial generator). parameters required program GFPGFR namely size polynomial generator. size field seven bits smaller than degree generator polynomial this case generator polynomial eight field computed from eight LSBs pattern represented hexadecimal. ninth always GF(256) hence only eight LSBs need represented generator polynomial control register.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Example Example Showing Galois Field Multiplies
inline GMPY( op1, Operands polynomial representation. multiplication power representation. exp_table2[log_table[op1] log_table[op2]]; ((op1 (op2 return(t0); void main() symbol_word0 0xFFCADEBA; symbol_word1 0xABDE876E; Previous DSP's would logarithm tables implement Galois field multiplication. unsigned char byte0 GMPY(0xBA, 0x6E); unsigned char byte1 GMPY(0xDE, 0x87); unsigned char byte2 GMPY(0xCA, 0xDE); unsigned char byte3 GMPY(0xFF, 0xAB); C6400 uses dedicated instruction accessible from shown below, performs four multiplies parallel. symbol_word0 0xFFCADEBA symbol_word1 0xABDE876E prod_word=(0xFF prod_word _gmpy4(symbol_word0, symbol_word1);
example shown above illustrates _gmpy4 instruction C64x with default generator polynomial GF(256) from code. example shown next page illustrates different generator polynomial case GF(256). this case generator polynomial defined follows: G(x) shown below. this case GFPGFR needs programmed, writing control word register. parameters required program this control register, size polynomial generator. size smaller than degree generator polynomial, this case generator polynomial hexadecimal pattern generator polynomial, shown below. Note that this field since always assumed This shown Figure field, always assumed
G(x) 0x69
Control word: 0x69
Figure Generator Polynomial GFPGFR G(x)
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Example Example Showing GMPY4 Instruction C64x
#include <stdio.h> #include <stdlib.h> #include <c6x.h> void main() Default Gen. Polynomial Default Control word 700001D, =(16+8+4+1)=29 Control word G(X): 0x69 Field size: Polynomial degree Control word: 7000069, *128 (64+32+8+1) Capture Default control word Perform Galois field unsigned control_word 0x7000069; printf("Default GFPGFR \n", GFPGFR); printf("2 GMPY \n", _gmpy4(2,128)); Update GFPGFR then perform Galois field multiply GFPGFR control_word; printf("GFPGFR \n", GFPGFR); printf("2 GMPY \n", _gmpy4(2,128));
next example shown below illustrates _gmpy4 instruction field sizes other than GF(256). generator polynomial chosen this case uses Galois field GF(64) defined G(x) this case each field element represented using bits. this case generator polynomial left justified that corresponding always assumed control word derived from this left shifted polynomial which denoted G'(x) control word derived shown below. field size this case derived from original polynomial from left shifted polynomial G'(x). control word shown Figure field, always assumed 0x69:
Generator polynomial: 0x9C
Figure Programming GFPGFR Generator Polynomial G(x) GF(64)
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Example Example Showing Program GFPGFR Generator Polynomial GF(64)
#include <stdio.h> #include <stdlib.h> #include <c6x.h> void main() Consider: GF(2^6) Left justify get: 1001 1100 Field size: (order poly elements field need left justified hence 2*32 _gmpy4(8,128) Result needs right shifted obtain final result. computations done with numbers that have been left justified, produce number finally numbers right shifted convert them back numbers. unsigned control_word 0x500009C; GFPGFR control_word; printf("2 GMPY4 \n", _gmpy4(8,128)
Peterson-Gorenstein-Zierler (PGZ) Reed Solomon Decoder
Peterson-Gorenstein-Zierler algorithm decoding Reed Solomon codes consists four steps shown below: Syndrome Computation Berlekamp Massey Algorithm solving error locator polynomial Chien Search Algorithm solving roots error locator polynomial Forney algorithm computing error magnitudes
algorithmic details computational complexity each these algorithms discussed below.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Syndrome Computation
syndrome accumulate first step Reed-Solomon decoding process. This done detect there errors received code word. Since code word generated multiplying with generator polynomial, received code word error free then modulus with respect generator polynomial should evaluate zero. code word without errors, received code word error that channel introduces. encoder takes data encoded encodes codeword which multiple generator polynomial case systematic codes adding parity words after data bytes. syndrome then defined remainder obtained dividing received code word generator polynomial received code word then expressed where associated channel error. Since multiple definition, actually evaluates following: Conceptually this amounts computing Fourier transform received message powers primitive element Evaluating Fourier transform error polynomial these locations checking spectral nulls lets know received code word error free. Galois fields form GF(2m) primitive element Thus powers listed 128, 116, 232, 205, 135, case G(X) Thus case code like (204,188,8) GF(256) code there will 2*T) syndromes where symbols stand following: number words received code word. number data bytes encoded. number byte errors that Reed Solomon code correct. number parity bytes added code, 2*T, GF(256): stands Galois field with elements. this case primitive element elements field generated various powers this primitive element definition. syndromes expressed follows: modG where (0,1,2,3,.15). received code word expressed polynomial form follows:, r0xN-1 r1XN-2 rN-1 where length received code word first powers beta specified follows: Beta {0,1,.15} where i'th power syndromes expanded follows:
)AAA )AAA
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
)AAA
Since elements involved computation belong Galois field operations addition multiplication also done over finite fields. Addition over finite fields merely operation while multiplication over finite fields Galois field multiply. Thus .itimes) where denotes Galois field multiplication.
Berlekamp Massey Algorithm
been shown previous section, syndrome computation, that syndromes merely depend error polynomial. assumed that error pattern e(X) errors locations Xj11 .Xjg. errors assumed have unit magnitude keep following equations simple understand. Since E(ai), syndromes written shown below:
)@@@
where unknown. method solving these equations decoding algorithm Reed Solomon codes. Once aj1, values found, tell locations where error occurred. general there many solutions, equations shown above, however solution that yields error pattern with smallest number errors right solution, represents most probable error pattern e(X) caused channel noise. denote where then equation shown above also equivalently expressed
)AAA )AAA 2T*1
error locator polynomial defined follows:
Lambda(X) 1X)(1 2X)(1 3X).(1 seen that roots this polynomial inverses error locations, expanded shown, Lambda(X) 20X2 coefficients related shown below:
g-1g. These known elementary power symmetric functions i's. fact relating them syndromes gives equations known Newton's identities that used Berlekamp-Massey algorithm.
Sg-1 Sg-2
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Berlekamp-Massey serves iterative algorithm solve outlined below. steps through iterated times Berlekamp algorithm. syndromes denoted Initialize algorithm variables: k=0, T(x) where degree Lambda(x) this iteration. k+1, Compute discrepancy follows:
then step Modify Lambda polynomial follows: (k-1) DkT(x) then step T(x) (k-1) (x)/Dk T(x) x.T(x) present iteration then
seen that algorithm tries iteratively solve error locator polynomial solving equation after another updating error locator polynomial. turns that cannot solve equation some step, then computes error weights last non-zero discriminant found, delays weighted result increase polynomial degree
Chien Search
Chien search algorithm efficient technique determining zeroes error locator polynomial degree lam_deg. error locator polynomial polynomial whose roots constructed reciprocal locations where errors occurred. error locator polynomial typically obtained solving minimum degree solution that satisfies Newton's power symmetric equations. error locator polynomial thus polynomial degree depending number errors that have occurred received code word. There closed form solution solving roots degree polynomial. Since root obviously elements field, exhaustive search substituting each field elements error locator polynomial only out. Chien search effective algorithm this exhaustive search efficient manner.
Forney Algorithm
This algorithm addresses problem calculating error magnitudes given values error polynomial Lambda(X) syndromes. Chien search determines roots error locator polynomial, which enables construct linear equations, find non-zero value coefficients. error polynomial non-zero except errors. Fourier transform non-zero coefficients gives following:
This system linear equations unknowns, unknown coefficients known error locations. matrix form this written
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
This solved using traditional methods matrix inversion which would take O(u3), however Forney algorithm presents faster method same. order this done, polynomial introduced, which referred error evaluator polynomial (X). From definition error locator polynomial Lambda(x) error polynomial e(x) satisfy following equation time domain: ekgk 0,1,2,.,n-1. This equation transformed into frequency domain shown, E(x).Lambda(x) 0,1,., Multiplication polynomials equivalent linear convolution since field finite, actually ends being circular convolution. Therefore equation rewritten shown: E(x).Lambda(x)mod(xn-1) This implies that product multiple (xn-1), multiple being error evaluator polynomial. Thus equation re-written follows: E(x).Lambda(x) (x)(xn-1). Therefore equation solved error evaluator polynomial follows:
E(x) W(x)(x Lambda(x)
coefficients error polynomial calculated evaluating inverse Fourier transform known error locations that computed from results Chien search follows:
W(x)(x ,.,l Lambda(x)
However these precisely locations where both numerator denominator evaluate zero. Therefore applying L'Hospitals rule, derived that
kW(a l1,l2,.l. Lambda
where Lambda' derivative error locator polynomial Lambda(x). turns that derivative polynomial GF(2) simple compute powers zeroed even powers shifted down one. This enables easy computation error magnitudes. With knowledge these four algorithms decoder implementation C6400 developed.
C6400 Implementation Reed Solomon Decoder
four steps required perform Reed Solomon decoding were discussed earlier. This section focuses these different pieces implemented C6400 make full available resources. each algorithm will maximize number Galois field multiplies use. Galois field multiply latency cycles, hence certain loops need unrolled take full advantage latency, that GMPY4s issued every cycle. addition byte quantities need packed together form packed integer serve inputs multiplier.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Syndrome Computation:
syndrome computation algorithm first developed case This case involves computing syndromes. This section also examines code developed re-used case This section also discusses implement algorithm
Case
Since GF(256) elements each element field expressed byte bits). Galois field multiplier that accepts quantities input each which contains packed field elements returns output results multiply quantity containing packed field elements together. syndrome formally defined follows:
modG where (0,1,2,3,.15).
received code word expressed polynomial form follows:
where length received code word case (204,188,8) code equal 204. first powers beta specified shown below, where Beta {0,1,.15} where j'th power i'th root generator polynomial. syndromes expanded follows:
)AAA )AAA )AAA
seen that computing syndromes amounts polynomial evaluation roots defined Beta. This done recursively follows using Horner's rule. Using Horner's rule example, written recursively shown below: (x(x(x(x serves efficient computing polynomial equations. shall denote these 0123, 4567, 891011 12131415. recursive computation shown below:
(((((r )AAA N*2) N*1)
computations that need performed seen from following code description algorithm:
syn_acc_cn const unsigned char *restrict unsigned char *restrict unsigned char *restrict const unsigned char *restrict RS_N, RS_2T byte_i,
beta,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
index, s1_i; beta_i, tmp1, ret_val; ret_val syndrome accumulate routine contains loops. outer loop iterates times evaluates syndromes. inner loop index iterates number elements finite field performs syndrome calculation. tmp1 multiplies terms together gets updated adding next byte. iterations this code traced shown below: Iteration tmp1 beta0 s1_i beta0 Iteration tmp1 beta0 beta0 s1_I r0*beta0*beta r1*beta0 RS_2T; i++) s1_i byte_i[0]; beta_i beta[i]; (index index RS_N index++) tmp1 _gmpy4(s1_i, beta_i]; s1_i tmp1 byte_i[index] s1[i] s1_i; syndromes non-zero then return value zero This indicates that received codeword does have errors. received codeword syndromes non-zero*/ then return value indicate error. RS_2T; i++) (s1[i] ret_val return(ret_val);
file: syn_acc_c.c Copyright 2000 Texas Instruments, Incorporated. Rights Reserved.
clearly seen that syndrome computation involves loops, outer loop that iterates once every syndrome inner loop that iterates over field elements. order obtain best performance from architecture, need unroll compute this case) parallel. various powers read packed quantities into register. order this shall approach similar that
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
radix FFT. received code word read starting locations N/4, Horner's rule applied recursively four these packed beta-words using input data read four locations times. following equations packed quantity. addition each term form ijkl where denotes gmpy4 instruction. syndromes evaluated partially using following words shown below. operations required first four syndromes contained terms s1_word0, s2_word0, s3_word0 s4_word0. seen that these partial results first four syndromes
s1_word0 0123
0123
AAAAAAAAAAAA.
4*2b 0123
s2_word0 0123 s3_word0 0123
4)1b 0123 2)1b 0123
AAAAAA. AAA.
2*2b 0123 4*2b 0123
s4_word0 0123
4)1b 0123
AAA. N*2b 0123
comparison with definition syndromes shows that these partial results need re-combined shown below yield first four syndromes packed word S0123:
0123 s1_word0 0123 S2_word0 0123 s3_word0b 0123 s4_word0
equations that shown below, outline similar steps other syndromes, wherein four words type{s1_wordj, s2_wordj,s3_wordj,s4_wordj} contain partial results syndromes {4*j, 4*j+1,4*j+2,4*j+3}. recombination equations these partial results also shown right after expressions partial results. These partial words contained words re-combined form words, where each word contains four syndromes. re-combination equations done outside main loop. main loop iterates times produces words with partial results.
s1_word1 4567
4567
AAAAAAAAAAAA.
4*2b 4567
s2_word1 4567 s3_word1 4567
4567 4567
AAAAAA. AAA.
2*2b 4567 4*2b 4567
s4_word1 4567
4567
AAA. N*2b 4567
4567 s1_word1b 4567 s2_word1b 4567 s3word1b 4567 s4_word1
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
s1_word2 891011 891011 AAAAAAAAAAAA. s2_word2 891011 s3_word2 891011
891011 891011
4*2b 891011
AAAAAA. AAA.
2*2b 891011 4*2b 891011
s4_word2 891011
891011
AAA. N*2b 891011
891011 s1_word2b 891011 s2_word2b 891011 s3word2b 891011 s4_word2
s1_word3 12131415 12131415 AAAAAAAAAAAA. s2_word3 12131415 s3_word3 12131415
12131415 12131415
4*2b 12131415
AAAAAA. AAA.
2*2b 12131415 4*2b 12131415
s4_word3 12131415
12131415
AAA. N*2b 12131415
12131415 s1_word3b 12131415 s2_word3b 12131415 s3_word3b 12131415 s4_word3
Case
case there syndromes compute. Instead using separate scheme implement calculation same computational elements will used case However, this case, received data will read locations starting from N/8, N/4, 3N/8, N/2, 5N/8, 7N/8. This approach similar reading input data radix FFT. Horner's rule applied recursively times compute these partial results. Once again, equations involved first four syndromes shown. case eight partial results computed that need re-combined two-step re-combination. first step re-uses combination rules that were used that second step re-combination coded additional step.
s1_word0 0123
0123
AAAAAAAAA
8*2b 0123
s2_word0 0123 s3_word0 0123
0123 0123
AAA.
8*2b 0123
AAA.
8*2b 0123
s4_word0 0123
0123
8*2b 0123
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
s1_word2 0123
0123
AAAAAA. AAA. AAA.R
4*2b 0123 2*2b 0123
s2_word2 0123 s3_word2 0123 s4_word2 0123
0123 0123 0123
4*2b 0123
AAAR N*2b 0123
5.8.1
First Recombination Step Similar Case
Sword0 s1_word0 0123 s2_word0b 0123 s3_word0b 0123 s4_word0b 0123 Sword2 s1_word2 0123 s2_word2b 0123 s3_word2b 0123 s4_word2
5.8.2
Second recombination step
0123 Sword0 Sword2 This second step recombination gives first four syndromes. similar equations written next four syndromes shown below. steps recombination also shown. This gives next four syndromes. loop that produces partial results iterates times, which half iteration count case results case produced half time that required case
s1_word1 4567
4567
AAAAAA.R
8*2b 4567
s2_word1 4567 s3_word1 4567
4567 4567
4567
AAA.
4567
s4_word1 4567
4567
AAA.
4567
s1_word3 4567
4567
AAAAAA. AAA.
4*2b 4567 2*2b 4567
s2_word3 4567 s3_word3 4567 s4_word3 4567
4567 4567 4567
AAA. AAA.
4567
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
These words that contain partial results re-combined shown below. Unlike case there stages re-combination re-combination rules case specified below:
Sword1 s1_word1b 4567 s2_word1b 4567 s3_word1b 4567 s4_word1b 4567 Sword3 s1_word3 4567 s2_word3b 4567 s3_word3b 4567 s4_word3
4567 Sword1 Sword3 this same computational elements used computing syndromes case
Case
same algorithm used computing syndromes case this case certain number zeroes appended received code word make augmented code word whose length multiple multiple However section that computes partial results using Horner's rule uses length augmented code word compute hence slight overhead. None less same computational elements used. case (255, 239, code first syndrome written follows:
AAA.
This also computed augmented code word length case appending original code word with zero. this case first syndrome written
AAA.
implementation thus keeps zeroing first byte, prepare this augmented code word.
5.10 Code Implementation
C6400 good compiler target allows obtaining highest levels performance directly from code. shown earlier Galois field multiplier accessed _gmpy4 instruction. syndrome accumulate requires syndromes computed computing syndrome polynomial size each. Hence total number galois field multiplies required case 204, 188, code, syndrome accumulate requires 204*16 3264 Galois field multiplies. C64x perform eight Galois field multiplies given cycle allowing core syndrome computation loop performed about cycles. C6400 allows users obtain this performance directly from code. piped loop kernel that performs this operation cycle loop case performs computations syndromes parallel. piped loop kernel performs Galois field multiplies every cycle. complete code provided below enable clear understanding various operations that being performed core loop that computes partial syndromes recombination loop that performs final computation syndromes. resulting piped loop kernel also shown below. roots, various powers {N/4, N/2, 3N/4] powers shown below. 5N/4 power table used hence case number iterations accumulate loop case number iterations N>>3.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
beta[ These tables shown case default generator polynomial, G(X) shown below: beta[ beta_RS_N_4[ beta_RS_N_2[ beta_RS_3N-4[ 146, 221, 146, 221, 221, 146, 146, 128, 116, 232, 205, 135, 38}; 146, 221, 146, 221, 221, 146, 221, 146, 221, 146, 221,
case each array length data massaged right powers. beta {beta0123, beta4567, beta0123, beta4567}; beta_RS_N_4 beta_RS_N_2 beta_RS_3N_4 beta_RS_5N_4
NAME: syn_acc Syndrome Accumulate REED-SOLOMON decoder USAGE This routine following prototype: syn_acc const unsigned char *restrict byte_i, unsigned char *restrict unsigned char *restrict const unsigned char *restrict beta, const unsigned char *restrict beta_RS_N_4, const unsigned char *restrict beta_RS_N_2, const unsigned char *restrict beta_RS_3N_4, const unsigned char *restrict beta_RS_5N_4, RS_N, RS_2T syn_accumulate routine accepts received codeword byte_i array computes RS_2T syndromes stores them output array. array scratch array characters. beta arrays contain various powers primitive element finite field. Copyright 2000 Texas Instruments, Incorporated. Rights Reserved. syn_acc_c const unsigned char *restrict byte_i, unsigned char *s1, unsigned char *s2, const unsigned char *restrict beta, const unsigned char *restrict beta_RS_N_4, const unsigned char *restrict beta_RS_N_2, const unsigned char *restrict beta_RS_3N_4,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
const unsigned char *restrict beta_RS_5N_4, RS_N, RS_2T double double double double betadword0, betadword1, beta4dword0, beta4dword1; beta2dword0, beta2dword1, beta3dword0, beta3dword1; beta5dword0, beta5dword1; *betaptr, *beta4ptr, *beta2ptr, *beta3ptr, *beta5ptr; betaword0, beta4word0, beta2word0, beta3word0, beta5word0, betaword1, beta4word1, beta2word1, beta3word1, beta5word1; betaword2, beta4word2, beta2word2, beta3word2, betaword3; beta4word3; beta2word3; beta3word3;
unsigned unsigned unsigned unsigned unsigned
offset1, offset2, offset3; offset4, offset5, offset6, offset7; const unsigned char *byte_iptr, *byte_ioff1ptr, *byte_ioff2ptr; const unsigned char *byte_ioff3ptr,*byte_ioff4ptr, *byte_ioff5ptr; const unsigned char *byte_ioff6ptr, *byte_ioff7ptr; unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned byteval, byteval1, byteval2, byteval3; byteval4, byteval5, byteval6, byteval7; s0_startval, s1_startval, s2_startval, s3_startval; s4_startval, s5_startval, s6_startval, s7_startval; s1_w0, s2_w0, s3_w0, s4_w0; s1_w1, s2_w1, s3_w1, s4_w1; s1_w2, s2_w2, s3_w2, s4_w2; s1_w3, s2_w3, s3_w3, s4_w3; iters; status, modval, diff, val; zero ret0, ret1, ret2, ret3, ret10, ret23, ret; byte_i0, byte_i4, tmp1w0, tmp2w0, tmp3w0, tmp4w0, temp; temp1w0, temp1w1, temp1w2, temp1w3, byte_i1, byte_i5, tmp1w1, tmp2w1, tmp3w1, tmp4w1, temp2w0, temp2w1, temp2w2, temp2w3, byte_i2, byte_i6, tmp1w2, tmp2w2, tmp3w2, tmp4w2, temp3w0, temp3w1, temp3w2, temp3w3, byte_i3; byte_i7; tmp1w3; tmp2w3; tmp3w3; tmp4w3; sum2w0, sum2w1, sum2w2, sum2w3, sum1w0, sum1w1, sum1w2, sum1w3, sumw0; sumw1; sumw2; sumw3;
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Obtain integer pointer store syndromes word-wide quants*/ Load various powers beta from tables dword wide quantities extract high halves intrinsics. betaword0,.betaword3: Roots described Assumptions beta4word0,.beta4word3: various powers beta^N/4 beta2word0,.beta2word3: various powers beta^N/2 beta3word0,./beta3word3: various powers beta^3N/4 unsigned betaptr beta4ptr beta2ptr beta3ptr beta5ptr *s1ptr (double (double (double (double (double (unsigned (s1); (beta); (beta_RS_N_4); (beta_RS_N_2); (beta_RS_3N_4); (beta_RS_5N_4); betadword1 beta4dword1 beta2dword1 beta3dword1 betaptr[1]; beta4ptr[1]; beta2ptr[1]; beta3ptr[1];
betadword0 beta4dword0 beta2dword0 beta3dword0 beta5dword0
betaptr[0]; beta4ptr[0]; beta2ptr[0]; beta3ptr[0]; beta5ptr[0];
betaword0 _lo(betadword0); betaword2 _lo(betadword1); beta4word0 _lo(beta4dword0); beta4word2 _lo(beta4dword1); beta2word0 _lo(beta2dword0); beta2word2 _lo(beta2dword1); beta3word0 _lo(beta3dword0); beta3word2 _lo(beta3dword1); beta5word0 _lo(beta5dword0);
betaword1 _hi(betadword0); betaword3 _hi(betadword1); beta4word1 _hi(beta4dword0); beta4word3 _hi(beta4dword1); beta2word1 _hi(beta2dword0); beta2word3 _hi(beta2dword1); beta3word1 _hi(beta3dword0); beta3word3 _hi(beta3dword1); beta5word1 _hi(beta5dword0);
Adjust closest multiple closest multiple*/ status else status This done adding This modified anded with mask that compliment result zero then result reset either indicates that already multiple this case does change. result represents remainder obtained dividing*/ original difference between remainder number zeroes that needs inserted. RS_N; RS_2T status 1:0; (status) temp
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
modval temp val; (!modval) modval diff modval; diff; case pointers perform radix4 Fourier transform. case pointers perform radix transform. order reduce conditional code pointers computed speculatively. diff non-zero these pointers subtracted from diff offset them appropriate amount These pointers byte_iptr.byte_ioff7ptr offset1 offset3 offset6 offset1 offset3 offset5 offset7 offset1 offset2; offset2 offset4; diff; diff; diff; diff; byte_i; offset2 offset5 offset7 offset2 offset4 offset6 offset4 offset1 offset4; offset1 offset6; diff; diff; diff;
byte_iptr
byte_ioff1ptr byte_iptr offset1; byte_ioff3ptr byte_iptr offset3; byte_ioff5ptr byte_iptr offset5;
byte_ioff2ptr byte_iptr offset2; byte_ioff4ptr byte_iptr offset4;
byte_ioff6ptr byte_iptr offset6; byte_ioff7ptr byte_iptr offset7; Load data from these offsets. diff non-zero then zero first byte, otherwise start loading using byte_iptr. values various offsets loaded speculatively into byteval,. byteval7. byteval (!diff) byteval (diff) diff--; byteval1 byteval3 byteval5 byteval7
*byte_iptr++;
*byte_ioff1ptr++; *byte_ioff3ptr++; *byte_ioff5ptr++; *byte_ioff7ptr++;
byteval2 *byte_ioff2ptr++; byteval4 *byte_ioff4ptr++; byteval6 *byte_ioff6ptr++;
Adjust loop iteration count status; iters (st) iters iters--;
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Prepare packed quad with replicated bytes each quad using pack2 packl4 instructions. Notice that then four words i={0,.3} identical. only words identical. s0_startval s1-startval then s0_strtval s1_startval different Therefore s0_startval s1-startval r0r0r0r0 s2_startval s3_startval rN/4 rN/4 rN/4 rN/4 s4_startval s5_startval rN/2 rN/2 rN/2 rN/2 s6-startval s7_startval r3N/4 r3N/4 r3N/4 r3N/4 s0_startval r0r0r0r0 s1-startval rN/8rn/8rN/8rN/8 s2_startval rN/4rN/4rN/4rN/4 s3_startval r3N/8r3N/8r3N/8r3N/8 s4_startval rN/2rN/2rN/2rN/2 s5_startval r5N/8r5N/8r5N/8r5N/8 s6-startval r3N/4r3N/4r3N/4r3N/4 temp _pack2(byteval,byteval); s1_startval _packl4(temp, temp); s0_startval s1_startval; temp _pack2(byteval1,byteval1); (st) s0_startval _packl4(temp,temp); temp _pack2(byteval2,byteval2); s3_startval _packl4(temp, temp); s2_startval s3_startval; temp _pack2(byteval3,byteval3); (st) s2_startval _packl4(temp,temp); temp _pack2(byteval4,byteval4); s5_startval _packl4(temp, temp); s4_startval s5_startval; temp _pack2(byteval5,byteval5); (st) s4_startval _packl4(temp,temp); temp _pack2(byteval6,byteval6); s7_startval _packl4(temp, temp); s6_startval s7_startval; temp _pack2(byteval7,byteval7); (st) s6_startval _packl4(temp,temp); s1_w0 s1_w1 s1_startval; s3_w0 s3_w1 s5_startval; s1_w2 s1_w3 s0_startval; s3_w2 s3_w3 s4_startval; s2_w0 s2_w1 s3_startval; s4_w0 s4_w1 s7_startval; s2_w2 s2_w3 s2_startval; s4_w2 s4_w3 s6_startval;
Compute words each s1,s2,s3 Perform Horner's rule expansion using partial words
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
iters; i++) Load bytes locations N/8, N/4,.7N/8. Note that loads*/ N/8, 3N/8, 5N/8 7N/8 needed only first byte should delayed till appropriate number zeroes been inserted case (!diff) byteval *byte_iptr++; (diff) diff--; byteval2 *byte_ioff2ptr++; byteval4 *byte_ioff4ptr++; byteval6 *byte_ioff6ptr++; (st) byteval1 *byte_ioff1ptr++; (st) byteval3 *byte_ioff3ptr++; (st) byteval5 *byte_ioff5ptr++; (st) byteval7 *byte_ioff7ptr++; temp byte_i1 temp (st) temp byte_i3 temp (st) temp byte_i5 temp (st) temp byte_i7 temp (st) _pack2(byteval, byteval); byte_i0 _packl4(temp,temp); _pack2(byteval1,byteval1); byte_i1 _packl4(temp,temp); _pack2(byteval2, byteval2); byte_i2 _packl4(temp,temp); _pack2(byteval3,byteval3); byte_i3 _packl4(temp,temp); _pack2(byteval4, byteval4); byte_i4 _packl4(temp,temp); _pack2(byteval5,byteval5); byte_i5 _packl4(temp,temp); _pack2(byteval6, byteval6); byte_i6 _packl4(temp,temp); _pack2(byteval7,byteval7); byte_i7 _packl4(temp,temp);
s1_word0 s2_word0 s3_word0 s4_word0
0123 AAAAAAAAAAAA. 0123
0123
0123
0123 0123
AAAAAA. AAA.
0123 0123
0123
0123
0123
AAA. N*2b 0123
tmp1w0 s1_w0 tmp2w0 s2_w0 tmp3w0 _gmpy4(s1_w0, betaword0); tmp1w0 byte_i0; _gmpy4(s2_w0, betaword0); tmp2w0 byte_i2; _gmpy4(s3_w0, betaword0);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
s3_w0 tmp4w0 s4_w0
tmp3w0 byte_i4; _gmpy4(s4_w0, betaword0); tmp4w0 byte_i6;
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
s1_word1 s2_word1 s3_word1 s4_word1
4567
4567
AAAAAAAAAAAA.
AAAAAA. AAA.
4567
4567 4567
4567 4567
4567 4567
4567
4567
AAA. N*2b 4567
tmp1w1 s1_w1 tmp2w1 s2_w1 tmp3w1 s3_w1 tmp4w1 s4_w1 _gmpy4(s1_w1, betaword1); tmp1w1 byte_i0; _gmpy4(s2_w1, betaword1); tmp2w1 byte_i2; _gmpy4(s3_w1, betaword1); tmp3w1 byte_i4; _gmpy4(s4_w1, betaword1); tmp4w1 byte_i6;
s1_word2 s2_word2 s3_word2 s4_word2
891011 AAAAAAAAAAAA. 891011
891011 891011
891011 891011
891011 891011
AAAAAA. AAA.
891011
891011
891011
AAA. N*2b 891011
tmp1w2 s1_w2 tmp2w2 s2_w2 tmp3w2 s3_w2 tmp4w2 s4_w2 _gmpy4(s1_w2, betaword2); tmp1w2 byte_i1; _gmpy4(s2_w2, betaword2); tmp2w2 byte_i3; _gmpy4(s3_w2, betaword2); tmp3w2 byte_i5; _gmpy4(s4_w2, betaword2); tmp4w2 byte_i7;
s1_word3 s2_word3 s3_word3 s4_word3
12131415 AAAAAAAAAAAA. 12131415
12131415 12131415
12131415 12131415
12131415 12131415
AAAAAA. AAA.
12131415
12131415
12131415
AAA. N*2b 12131415
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
tmp1w3 s1_w3 tmp2w3 s2_w3 tmp3w3 s3_w3 tmp4w3 s4_w3
_gmpy4(s1_w3, betaword3); tmp1w3 byte_i1; _gmpy4(s2_w3, betaword3); tmp2w3 byte_i3; _gmpy4(s3_w3, betaword3); tmp3w3 byte_i5; _gmpy4(s4_w3, betaword3); tmp4w3 byte_i7;
Completely unroll perform re-combination loop using other powers beta.
0123 s1_word0 S2_word0 s3_word0b s4_word0b 0123 0123 0123 0123 4567 s1_word1b s2_word1b s3word1b s4_word1 4567 4567 4567
891011 s1_word2b s2_word2b s3word2b 891011 s4_word2 891011 891011
12131415 s1_word3b s2_word3b 12131415 s3word2b 12131415 s4_word3 12131415
temp1w0 temp2w0 temp3w0 sum2w0 (st) sum1w0 sumw0 temp1w1 temp2w1 temp3w1 sum2w1 (st) sum1w1 sumw1 temp1w2 temp2w2 temp3w2 sum2w2 sum1w2 sumw2 temp1w3 temp2w3 temp3w3 sum2w3 sum1w3 sumw3 s4_w0 _gmpy4(s1_w0, beta3word0); _gmpy4(s2_w0, beta2word0); _gmpy4(s3_w0, beta4word0); temp1w0 temp2w0; _gmpy4(s4_w0,beta5word0); temp3w0 s4_w0; sum1w0 sum2w0; _gmpy4(s1_w1, beta3word1); _gmpy4(s2_w1, beta2word1); _gmpy4(s3_w1, beta4word1); temp1w1 temp2w1; _gmpy4(s4_w1,beta5word1); temp3w1 s4_w1; sum1w1 sum2w1;
s4_w1
_gmpy4(s1_w2, beta3word2); _gmpy4(s2_w2, beta2word2); _gmpy4(s3_w2, beta4word2); temp1w2 temp2w2; temp3w2 s4_w2; sum1w2 sum2w2; _gmpy4(s1_w3, beta3word3); _gmpy4(s2_w3, beta2word3); _gmpy4(s3_w3, beta4word3); temp1w3 temp2w3; temp3w3 s4_w3; sum1w3 sum2w3;
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Perform more level re-combination (st) (st) *s1ptr++ sumw0 sumw0 sumw2; sumw1 sumw1 sumw3; sumw0; *s1ptr++ sumw1;
Check syndromes non-zero using split compares. just check sumw0 sumw1 eight syndromes. check syndromes. result non-zero return ret0 _cmpgtu4(sumw0, zero); ret10 ret0 ret1; (!st) *s1ptr++ sumw2; ret2 _cmpgtu4(sumw2, zero); ret23 ret2 ret3; (!st) ret10 ret23; (st) ret10; (ret) return(ret); ret1 _cmpgtu4(sumw1, zero);
(!st) *s1ptr++ sumw3; ret3 _cmpgtu4(sumw3, zero);
file syn_acc_i.c Copyright 1999 Texas Instruments, Incorporated. Rights Reserved.
This piped loop kernel iterates times depending whether piped loop kernel computes Galois field multiplies cycles maximizing Galois field multiplier bandwidth. compiler flags used were -mwtx -mv6400". numbers listed corresponding line numbers code. seen that every cycle both GMPY4 instructions being used perform eight multiplies parallel.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
PIPED LOOP KERNEL
Cycle Cycle [!A0] Cycle [!A0] Cycle Cycle [!A0]
PACK2 PACK2 GMPY4 GMPY4 PACKL4 PACK2 LDBU LDBU
.M2X .D2T2 .D1T1
B27,B27,B9 A7,A7,A21 A20,A24,A17 B17,A26,B19 A8,A8,A5 B30,B30,B16 *B4++,B27 *A29++,A8
|538| |544| |553| |578| |529| |535| |518| |515|
PACK2 GMPY4 GMPY4 PACK2 PACKL4 PACKL4 LDBU LDBU
.M2X .D2T2 .D1T1
A6,A6,A22 A4,A25,A4 B18,A26,B0 B22,B22,B21 B16,B16,B18 A19,A19,A20 *B8++,B30 *A27++,A16
|532| |571| |574| |511| |535| |541| |514| |513|
GMPY4 BDEC GMPY4 PACKL4
.M2X
A17,A26,A31 A5,A4 A20,A16,A3 L2,A1 B20,A25,B31 B18,B1,B29 B18,B17 B21,B21,B5
|580| |530| |562| |565| |560| |536| |511|
GMPY4 PACKL4 GMPY4
.D2X
A18,A24,A16 A20,A18 A0,1,A0 B5,B2 B5,B20 A5,B0,B16 A22,A22,A4 B29,B23,B1
|551| |542| |512|
|558| |532| |560|
PACKL4 GMPY4 LDBU GMPY4
.S2X .M1X .D2T2
B2,B31,B25 A5,B30,B26 A4,A8,A30 A4,A16,A23 A21,A21,A18 A3,B23,A16 *B28++,B22 B16,B23,B0
|556| |549| |567| |576| |544| |562| |511| |558|
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Cycle Cycle Cycle
PACK2 PACKL4 GMPY4 GMPY4 LDBU PACKL4 GMPY4 GMPY4 PACK2 PACK2 LDBU
.S1X .D1T1 .M2X .D1T1
B7,B7,B24 B9,B9,B17 A18,A4,A4 B2,A19,A5 B25,B23,B31 A30,A25,A8 *A9++,A7 B17,B1,B19 B17,B19,B17 B24,B24,B20 B26,A24,B30 A5,A24,A19 A8,A8,A19 A16,A16,A8 *A28++,A6
|526| |538| |571| |547| |556| |567| |519| |569| |578| |526| |549| |547| |541| |529| |517|
LDBU GMPY4 GMPY4
.L1X .D2T2 .M2X
B18,A16,A18 A18,A31,A17 A20,A17,A20 B20,B0,B18 B20,B31,B20 *B6++,B7 B19,A25,B1 A23,A26,A16
|551| |580| |553| |574| |565| |517| |569| |576|
5.10.1
Berlekamp Massey Algorithm
Berleykamp Massey Algorithm solves lambda polynomial equation where
convolution lambda syndromes zero. code solve lambda given particular syndrome given below:
TEXAS INSTRUMENTS, INC. NAME bk_massey USAGE This routine C-callable called void bk_massey_cn(unsigned char unsigned GF_inv, fail_code, lam_deg, unsigned char lambda);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
pointer syndromes GF_inv pointer inverse table, each entry inverses number errors correct fail_code type zero failure lam_deg pointer lambda degree lambda lamda polynomial (See compiler reference guide.) DESCRIPTION Berleykamp Massey function solves error locator polynomial equation, Lambda where denotes convolution. Both Lambda Syndrome polynomials order respectively. #define RS_2T (16) #define RS_T void bk_massey_cn(unsigned char unsigned GF_inv, fail_code, lam_deg, unsigned char lambda unsigned char Told[RS_2T]; unsigned char Tnew[RS_2T]; unsigned char syn_rev[RS_T]; time reversed syndrome input unsigned char delta, i,j,k,L,case0, case1; RS_T; i++) lambda[i] syn_rev[i] Told[i] lambda[0] Told[1] delta s[0]; syn_rev[1] s[0]; 2*T; j++) case0 2*L) case1 delta ~case0; (case1) RS_T; i--) Tnew[i] Told[i-1]; (unsigned char) (0xff GF_inv[0xff delta]); RS_T; i--) case1) Tnew[i] GMPY(lambda[i-1],q); RS_T; i--) lambda[i] GMPY(delta,Told[i]);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
RS_T; i--) Told[i] Tnew[i]; delta s[k]; RS_T; i--) delta GMPY(lambda[i], syn_rev[i]); RS_T; i--) syn_rev[i] syn_rev[i-1]; syn_rev[1] s[k]; k++; *lam_deg *fail_code Copyright 1997-2000 Texas Instruments Incorporated. Rights Reserved
functions GMPY refer generic Galois field multiply which generic code call function using lookup tables. GF_inv table containing inverses elements Galois field used. This contain entries. optimized intrinsic code below shows this code optimized.
TEXAS INSTRUMENTS, INC. NAME bk_massey USAGE This routine C-callable called void bk_massey_c(unsigned char unsigned GF_inv, fail_code, lam_deg, unsigned char lambda); pointer syndromes GF_inv pointer inverse table, each entry inverses number errors correct fail_code type zero failure lam_deg pointer lambda degree lambda lamda polynomial (See compiler reference guide.)
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
DESCRIPTION berlekamp massey function solves error locator polynomial equation, Lambda where denotes convolution. Both Lambda Syndrome polynomials order respectively. ASSUMPTIONS input data coeeficients stored double word aligned boundaries. table GF_inv must aligned byte boundary. Each entry table inverse Galois field. Four inverses, packed each int. GF_inv table centered circular buffer using circular buffer pointer. This code assumes LITTLE_ENDIAN system. code interrupt tolerant. Interupts disabled reenabled start code. RS_T function works TECHNIQUES special table GF_inv inverses packed into word. This simplifies code slightly. Also inner loops order RS_T completely unrolled placed registers. SHLMB used advance polynomial alpha. error detected this code. inner loops only iterate times, this increase speed. incorrect lamda found, chien search will identify BIBLIOGRAPHY This algorithm taken from University Washington, Dept. electrical engineering, Matache Copyright 1997-1999 Texas Instruments Incorporated. Rights Reserved void bk_massey_c(unsigned char unsigned GF_inv, fail_code, lam_deg, unsigned char lambda) unsigned unsigned delta; case0, case1; unsigned q3210; unsigned lam8765, lam4321_, lam4321, lam0; unsigned Tn8765, Tn4321; unsigned Tnew8765, Tnew4321; unsigned T8765, T4321; unsigned dlam8765, dlam4321; unsigned unsigned s1234, s5678; unsigned d8765, d4321; unsigned del3210, del1032, del2301, del0123; unsigned delta_1, delta_2; lam8765 lam4321 T8765 T4321
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
s1234 s5678 (unsigned int) s[0]; delta s5678; delta _packl4(delta, delta); delta _packl4(delta, delta); 2*T; i++) case0 2*L; case0 case0 case1 delta ~case0; if(case1) q3210 GF_inv[0xff delta]; lam4321_ lam4321; Tn4321 _gmpy4(lam4321_, q3210); Tn8765 _gmpy4(lam8765, q3210); Tnew8765 _shlmb(Tn4321, Tn8765); Tnew4321 _shlmb(q3210, Tn4321); (!case1) Tnew8765 _shlmb(T4321, T8765); (!case1) Tnew4321 T4321 dlam4321 _gmpy4(delta, T4321); lam4321 dlam4321 lam4321; dlam8765 _gmpy4(delta, T8765); lam8765 dlam8765 lam8765; T8765 Tnew8765; T4321 Tnew4321; (unsigned int) *++s; _packl4(d0, d0); _packl4(d1, d1); d8765 _gmpy4(s1234, lam8765); d4321 _gmpy4(s5678, lam4321); del3210 d8765 d4321; del1032 _packlh2(del3210, del3210); del2301 _swap4(del3210); del3210 del3210 del0123 _swap4(del1032); delta_1 del3210 del1032; delta_2 del0123 del2301; delta delta_2 delta_1; s1234 _shlmb(s5678, s1234); s5678 (s5678 lam0 lambda[0] lam0; _memd8(&lambda[1]) _itod(lam8765, lam4321);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
lam_deg[0] fail_code[0] Copyright 1997-1999 Texas Instruments Incorporated. Rights Reserved
section optimized assembly code shows inner loop used
LOOP: PACKL4 GMPY4 PACKL4 GMPY4 ANDN .M2X .D2T1 .M2X .S2X B_d0, B_d0, B_delta, A_T4321, *B_GF_inv[B_delta], A_case0, B_d1, B_delta, B_delta, A_k, B_d1, A_T8765, A_case0, A_L, B_d1 B_dlam4321 A_q3210 A_case0 ;expand syndrome ;make lambda change ;q=inv_table[delta] ;make signed mask
B_d2 ;expand s[k] B_dlam8765 ;make lamda changes B_case1 B_lam4321 ;case1=delta&&case0 ;update degree ;lambda [4.1]
B_case1]SUB GMPY4 GMPY4 GMPY4 GMPY4 BDEC SHLMB SHLMB SHLMB LDBU ROTL .D1X .D1T2 .M1X .M2X .M1X .M2X
B_dlam4321, B_lam4321,
A_s5678, B_lam4321, B_d4321 ;see lam's==0? B_dlam8765, B_lam8765, B_lam8765 ;update lam[8.5] B_lam8765, A_q3210, A_Tn8765 ;generate gradient A_s1234, A_lam4321, LOOP, A_s5678_, A_s5678, A_k, B_d8765, A_q3210, A_Tn4321, *++A_s[1], B_lam4321, B_del3210, B_del3210, A_T4321, A_T4321, B_d0, A_s1234, B_d4321, A_Tn4321, A_Tn8765, B_d0 B_del2301 B_del3210, A_T8765, B_lam8765, B_d8765 A_q3210, A_Tn4321 ;see lam's ;generate gradient ;GMPY pipeline delay ;for(i=0;i<2T;i++){ ;shift window ;shift window ;update count
A_s5678 A_s1234
B_del3210 ;check solution A_Tnew4321 ;update T(x) A_Tnew8765 ;update T(x) ;get next s[k] A_lam4321 ;live long ;check solution B_del1032 ;check solution A_Tnew4321 ;shift x.T(x) poly A_Tnew8765 ;shift x.T(x) poly ;check solution ;check solution ;update poly
SWAP4 PACKLH2 ||[!B_case1]SHL ||[!B_case1]SHLMB.L1 SWAP4
B_del1032, B_del0123 B_del3210, B_d2, A_Tnew4321, A_T4321
B_del3210
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
SUBAH
A_s5678, B_del0123, B_del3210,
B_del2301, B_del1032,
A_s5678_ B_delta_2 B_delta_1 B_delta A_case0
;part window ;check solution ;check solution ;check solution ;update poly -2*L
B_delta_2, B_delta_1, A_Tnew8765, A_T8765 A_k, A_L,
optimized assembly produces cycle loop. uses packed lookup table each entry inverse duplicated times word. table shown here GF(2^8) field generator polynomial 0x1D.
unsigned GF_inv[GF8+PAD] 0x00000000, 0x01010101, 0x47474747, 0xa7a7a7a7, 0xadadadad, 0x9d9d9d9d, 0x3d3d3d3d, 0xaaaaaaaa, 0xd8d8d8d8, 0x72727272, 0xe0e0e0e0, 0x3e3e3e3e, 0x90909090, 0xdededede, 0xa0a0a0a0, 0x83838383, 0x6c6c6c6c, 0xedededed, 0x60606060, 0x56565656, 0x70707070, 0xd0d0d0d0, 0x26262626, 0x8b8b8b8b, 0x48484848, 0x89898989, 0xa4a4a4a4, 0xc3c3c3c3, 0x50505050, 0x22222222, 0xabababab, 0x0c0c0c0c, 0x36363636, 0x5f5f5f5f, 0x92929292, 0x4e4e4e4e, 0x30303030, 0x88888888, 0x16161616, 0x67676767, 0x38383838, 0x23232323, 0x81818181, 0x1a1a1a1a, 0x13131313, 0xc1c1c1c1, 0x97979797, 0x0e0e0e0e, 0x24242424, 0x57575757, 0xb9b9b9b9, 0xc4c4c4c4, 0x52525252, 0x8d8d8d8d, 0x20202020, 0xecececec, 0x28282828, 0xd1d1d1d1, 0xe9e9e9e9, 0xfbfbfbfb, 0xdbdbdbdb, 0x77777777, 0x84848484, 0xcdcdcdcd, 0x1b1b1b1b, 0x54545454, 0x7c7c7c7c, 0xcccccccc, 0x49494949, 0x31313131, 0x53535353, 0x69696969, 0x18181818, 0xdfdfdfdf, 0x9b9b9b9b, 0xbcbcbcbc, 0x8e8e8e8e, 0x7a7a7a7a, 0xdddddddd, 0x5d5d5d5d, 0xc0c0c0c0, 0x4c4c4c4c, 0x55555555, 0x4b4b4b4b, 0x39393939, 0x2c2c2c2c, 0x1f1f1f1f, 0x33333333, 0x6f6f6f6f, 0x40404040, 0xcfcfcfcf, 0x15151515, 0xf8f8f8f8, 0xa6a6a6a6, 0x2b2b2b2b, 0x45454545, 0x68686868, 0x25252525, 0xcbcbcbcb, 0x37373737, 0xcacacaca, 0x17171717, 0xefefefef, 0x2f2f2f2f, 0x11111111, 0xdadadada, 0x06060606, 0xfefefefe, 0xa1a1a1a1, 0xe4e4e4e4, 0x27272727, 0x02020202, 0x44444444, 0x0f0f0f0f, 0xf4f4f4f4, 0xbabababa, 0x98989898, 0x96969696, 0x58585858, 0x66666666, 0x80808080, 0x2a2a2a2a, 0x51515151, 0x8a8a8a8a, 0x4a4a4a4a, 0x6e6e6e6e, 0x2e2e2e2e, 0x5e5e5e5e, 0xa9a9a9a9, 0xe1e1e1e1, 0xd5d5d5d5, 0x04040404, 0x1e1e1e1e, 0x93939393, 0x8c8c8c8c, 0x61616161, 0x63636363, 0x41414141, 0x5b5b5b5b, 0x4d4d4d4d, 0xb3b3b3b3, 0x32323232, 0xd9d9d9d9, 0x79797979, 0xbbbbbbbb, 0xfcfcfcfc, 0x1d1d1d1d, 0xb0b0b0b0, 0x2d2d2d2d, 0xf5f5f5f5, 0x4f4f4f4f, 0x5c5c5c5c,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
0x0b0b0b0b, 0xacacacac, 0x1c1c1c1c, 0x34343434, 0xcececece, 0x9c9c9c9c, 0x87878787, 0xebebebeb, 0xc5c5c5c5, 0x95959595, 0x12121212, 0x65656565, 0xd2d2d2d2, 0x85858585, 0x29292929, 0xf9f9f9f9, 0x10101010, 0x99999999, 0x14141414, 0x86868686, 0xfafafafa, 0x6d6d6d6d, 0xe3e3e3e3, 0x03030303, 0x42424242, 0x7f7f7f7f,
0xdcdcdcdc, 0x09090909, 0x82828282, 0xc2c2c2c2, 0x3b3b3b3b, 0x08080808, 0xe5e5e5e5, 0xf2f2f2f2, 0x64646464, 0x9a9a9a9a, 0x59595959, 0xb8b8b8b8, 0xf7f7f7f7, 0x7d7d7d7d, 0x71717171, 0x43434343, 0x73737373, 0x0a0a0a0a, 0x3f3f3f3f, 0xb1b1b1b1, 0x74747474, 0x21212121, 0xe7e7e7e7, 0x8f8f8f8f, 0xd4d4d4d4, 0xffffffff,
0xbdbdbdbd, 0xc7c7c7c7, 0x9f9f9f9f, 0x46464646, 0x0d0d0d0d, 0xbebebebe, 0xeeeeeeee, 0xbfbfbfbf, 0x07070707, 0xaeaeaeae, 0xa5a5a5a5, 0xa3a3a3a3, 0x62626262, 0xa8a8a8a8, 0xc8c8c8c8, 0xd7d7d7d7, 0x76767676, 0x19191919, 0xe6e6e6e6, 0xe2e2e2e2, 0xf3f3f3f3, 0xb2b2b2b2, 0xb5b5b5b5, 0xd3d3d3d3, 0xe8e8e8e8, 0x7e7e7e7e,
0x94949494, 0xa2a2a2a2, 0xc6c6c6c6, 0x05050505, 0x3c3c3c3c, 0xb7b7b7b7, 0x6b6b6b6b, 0xafafafaf, 0x7b7b7b7b, 0xb6b6b6b6, 0x35353535, 0x9e9e9e9e, 0x5a5a5a5a, 0x3a3a3a3a, 0xf6f6f6f6, 0xd6d6d6d6, 0x78787878, 0x91919191, 0xf0f0f0f0, 0xf1f1f1f1, 0xb4b4b4b4, 0x6a6a6a6a, 0xeaeaeaea, 0xc9c9c9c9, 0x75757575, 0xfdfdfdfd};
performance algorithm 30*T where number errors corrected.
5.10.2
Chien Search Algorithm
Chien search algorithm solves roots error locator polynomial plugging each field element after other checks valid solution. gain clear understanding, algorithm code algorithm shown below:
void ch_srch restrict unsigned char lambda[], lam_deg, restrict unsigned char zeros[], restrict *fail, const restrict unsigned char alpha[], restrict state[], RS_T, const restrict unsigned char exp_table2[], const restrict unsigned char log_table[], const restrict unsigned char div_inv_table[]
ptr,result0; *state0; const unsigned char *alpha0 alpha
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
state0 state; Initialize elements zeros array zero. addition compute lam1*alpha, lam2*alpha^2,.lamK*alpha^K where lami: term lambda polynomial alpha: primitive element Galois field degree lambda polynomial lam_deg store these state[0], state[1], operator denotes Galois field multiplication operator*/ denotes Galois field addition. RS_T; j++) zeros[j] _gmpy4(alpha0[j]] lambda[j+1]); state0[j] each 256, elements GF(256) field compute polynomial check zero. Multiply each state entry state[j] corresponding power alpha^j computing next iteration loop. zeros found, store index where zero found. 256; i++) result0 result0 state0[j] (!result0) index where zero found find value zero,*/ using exponentiation table. number zeroes found equal degree lambda polynomial fail code zeroes lam_deg greater than RS_T, maximum number errors that corrected, fail code zeros RS_T; j++) (result0^state0[j]); _gmpy4(state0[j],alpha0[j]); zeros[ptr++]=
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
RS_T; i++) zeros[i] exp_table2[zeros[i]];
code clearly shows that Chien search algorithm loops, outer loop that iterates once over field elements, inner loop that iterates over Since C6400 perform up-to Galois field multiplies parallel, possible completely unroll inner loop search elements field rate cycle/field element. principal idea behind efficient implementation Chien search maximize vector Galois field multiplies. This done splitting search elements into four parallel searches elements performing each them parallel. Since power definition, divisible case GF(256) example. Thus efficient implementation Chien search splits finite field elements into four sub-fields shown below: {1,.64} {65,.128} {129.192} {193,.256}
Note that consequence splitting finite field into four sub-fields that order which zeroes found different from obtained doing straight search elements. However this problem because relative order zeroes important Reed Solomon decoding process. Thus i'th iteration Chien search loop evaluates error locator polynomial with i'th element from fields named error locator polynomial Lambda(X) assumed have errors. error locator polynomial then defined shown below. worst case degree polynomial same number errors code correct. order gain clear understanding steps involved first iterations loop associated values four possible zeroes each iteration shown. iteration:
1a64 2a128 .Ta64 1a128 2a256 .Ta128 1a192 2a384 .Ta192
iteration:
.Ta2 1a65 2a130 .Ta65
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
1a129 2a258 .Ta129 1a193 2a386 .Ta193
seen that individual terms multiplied {a,a2,.a update individual terms then summed together. following code intended show these operations that clear understanding different operations gained. This version code also been optimized handle cases
NAME ch_srch Chien Search Reed Solomon Decoding USAGE This routine following prototype: void ch_srch_c const restrict unsigned char lambda[], lam_deg, restrict unsigned char zeros[], restrict *fail, const restrict unsigned char alpha[], restrict state[], RS_T, const restrict unsigned char exp_table2[], const restrict unsigned char log_table[], const restrict unsigned char div_inv_table[] where: lambda: Error locator polynomial lam_deg: Degree error locator polynomial. have maximum value zeros: Output array where roots error locator polynomial stored. fail: Pointer where fail/pass state algorithm stored alpha: Array which organized follows alpha0, alpha1,. alpha7 alpha0^64, alpha1^64,.alpha7^64, alpha0^128, alpha1^128,.alpha7^128, alpha0^192,.alpha7^192} total elements that contain 0,64,128 192th powers various roots generator polynomial. case Reed Solomon, this various powers primitive element alpha.Thus case 204,188,8 code alpha given below unsigned char alpha[] 128, 190, 100, 169, 116, 184, 169, 205, 128, state: Temporary scratch array used other versions code.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
RS_T: maximum number errors that Reed Solomon code correct. exp_table2: array that contains various powers primitive element alpha. log_table: array that contains various logarithms primitive element alpha. div_inv_table: table containing various multiplicative inverses each field element. chien search routine accepts error locator polynomial lambda degree lam_deg finds zeroes polynomial returns these zeroes array. number zeroes degree polynomial then fail code returned. ASSUMPTIONS specific alignment requirements expected. This code Endian Neutral. scratch array state bytes does alias with other memory. exp_table2: contains values each cases exp[loga +logb]*/ log_table: contains values each field element. div_inv_table: contains values each elements. NOTES being used alpha table needs modified. different field being used tables need modified. Copyright 1999 Texas Instruments, Incorporated. Rights Reserved. void ch_srch_c( unsigned char restrict *lambda, lam_deg, unsigned char restrict *zeros, restrict *fail, const unsigned char restrict *alpha, restrict *state, RS_T, const unsigned char restrict *exp_table2, const unsigned char restrict *log_table, const unsigned char restrict *div_inv_table
pointers load various powers alphaI where: alpha0: reads zeroth power alpha0,.alpha7 alpha1: reads sixty fourth power alpha0.alpha7 alpha2: reads hundred twenty eight power alpha0.alpha7 alpha3: reads hundred ninety second power alpha0.alpha7*/ alpha array dword aligned then these individual pointers*/ will also dword aligned hence they cast double point- perform loads using double word wide loads. resulting high halves loaded using intrinsics.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
const const const const double double double double
unsigned unsigned unsigned unsigned
char char char char
*alpha0 *alpha1 *alpha2 *alpha3 (double (double (double (double
alpha; alpha alpha alpha (alpha0); (alpha1); (alpha2); (alpha3);
*alpha0ptr *alpha1ptr *alpha2ptr *alpha3ptr
double alpha0_dword, alpha1_dword, alpha2_dword, alpha3_dword; unsigned unsigned unsigned unsigned unsigned unsigned alpha00, alpha20, temp0, temp2, temp4, temp6, alpha01, alpha21, temp1, temp3, temp5, temp7, alpha10, alpha30, temp10, temp12, temp14, temp16, alpha11; alpha31; temp11; temp13; temp15; temp17;
unsigned aword00, aword01, aword02, aword03; unsigned aword0, aword1, aword2, aword3; unsigned aword4, aword5, aword6, aword7; unsigned alpha0123word0, alpha0123word1, alpha0123word2; unsigned alpha0123word4, alpha0123word5, alpha0123word6; unsigned alpha0123word7, alpha0123word3; double double unsigned unsigned unsigned unsigned *lamptr (double (lambda); lam_dword0, lam_dword1; lam0123, lam4567, lam8xxx; lamword00, lamword01, lamword02, lamword03; lamword0, lamword1, lamword2, lamword3; lamword4, lamword5, lamword6, lamword7;
unsigned state0123word0, state0123word1, state0123word2; unsigned state0123word4, state0123word5, state0123word6; unsigned state0123word3, state0123word7; unsigned constant0; unsigned unityconstant; unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned unsigned char *zerosptr1, *zerosptr2; char *zerosptr1orig, *zerosptr2orig; result0123word0, result0123word1, result0123word2; result0123word4, result0123word5, result0123word6; result0123word3, result0123word7; result0, result1, result2, result3; ptr, limit; elements1, elements2; lam01, zero_val;
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
alpha0_dword alpha1_dword alpha2_dword alpha3_dword
alpha0ptr[0]; alpha1ptr[0]; alpha2ptr[0]; alpha3ptr[0];
alpha01:alpha00 alpha11:alpha10 alpha21:alpha20 alpha31:alpha30 Since inner loop unrolled completely, four powers each alpha need packed together inner most loop. example first word must contain, powers alpha0 following order: This requires packing first words This done using packs, namely _pack2/_packh2 _packl4 /_packh4 This results following: alpha0123word0 alpha0123word1 alpha0123word2 alpha0123word3 alpha0123word4 alpha0123word5 alpha0123word6 alpha0123word7 alpha00 _lo(alpha0_dword); alpha01 _hi(alpha0_dword); alpha10 _lo(alpha1_dword); alpha11 _hi(alpha1_dword); alpha20 _lo(alpha2_dword); alpha21 _hi(alpha2_dword); alpha30 _lo(alpha3_dword); alpha31 _hi(alpha3_dword); temp0 _pack2(alpha10, alpha00); temp1 _pack2(alpha30, alpha20); temp10 _packl4(temp1, temp0); temp11 _packh4(temp1, temp0); temp2 _packh2(alpha10, alpha00); temp3 _packh2(alpha30, alpha20); temp12 _packl4(temp3, temp2); temp13 _packh4(temp3, temp2); temp4 _pack2(alpha11, alpha01); temp5 _pack2(alpha31, alpha21); temp14 _packl4(temp5, temp4); temp15 _packh4(temp5, temp4); temp6 _packh2(alpha11, alpha01); temp7 _packh2(alpha31, alpha21); temp16 _packl4(temp7, temp6); temp17 _packh4(temp7, temp6);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
alpha0123word0 temp10; alpha0123word1 temp11; alpha0123word2 temp12; alpha0123word3 temp13; alpha0123word4 temp14; alpha0123word5 temp15; alpha0123word6 temp16; alpha0123word7 temp17; addition each state needs updated multiplying appropriate power alpha. Hence since four them need updated various powers alpha packed together. this case variables contain foll: aword0 aword1 where aword2 where aword3 where aword4 where aword7 where aword00 _pack2(alpha00, alpha00); aword0 _packl4(aword00, aword00); aword1 _packh4(aword00, aword00); aword01 _packh2(alpha00, alpha00); aword2 _packl4(aword01, aword01); aword3 _packh4(aword01, aword01); aword02 _pack2(alpha01, alpha01); aword4 _packl4(aword02, aword02); aword5 _packh4(aword02, aword02); aword03 _packh2(alpha01, alpha01); aword6 _packl4(aword03, aword03); aword7 _packh4(aword03, aword03); addition each lambda needs packed replicated four bytes. Therefore lamwordj contains lamj lamj lamj lamj packed quantity where j<=8 lam_dword0 lamptr[0]; lam_dword1 lamptr[1]; lam0123 _lo(lam_dword0); lam4567 _hi(lam_dword0); lam8xxx _lo(lam_dword1); lam0123 lam4567 lamword00 lamword0 lamword1 lamword01 lamword2 lamword3 _shrmb(lam4567,lam0123); _shrmb(lam8xxx,lam4567); _pack2(lam0123, lam0123); _packl4(lamword00, lamword00); _packh4(lamword00,lamword00); _packh2(lam0123, lam0123); _packl4(lamword01, lamword01); _packh4(lamword01, lamword01);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
lamword02 _pack2(lam4567, lam4567); lamword4 _packl4(lamword02, lamword02); lamword5 _packh4(lamword02, lamword02); lamword03 _packh2(lam4567, lam4567); lamword6 _packl4(lamword03, lamword03); lamword7 _packh4(lamword03, lamword03); Completely unroll loop that does initial state computation perform following: word0 lam0*alp0 lam0*alp0^64 lam0*alp0^128 lam0*alp0^192 word1 lam1*alp0^2 lam1*alp0^128 lam1*alp0^256 lam0^alp0^384 word7 lam0^alp0^8 lam1*alp0^64^8 lam0*alp0^192^8 state0123word0 _gmpy4(alpha0123word0, lamword0); state0123word1 _gmpy4(alpha0123word1, lamword1); state0123word2 _gmpy4(alpha0123word2, lamword2); state0123word3 _gmpy4(alpha0123word3, lamword3); state0123word4 _gmpy4(alpha0123word4, lamword4); state0123word5 _gmpy4(alpha0123word5, lamword5); state0123word6 _gmpy4(alpha0123word6, lamword6); state0123word7 _gmpy4(alpha0123word7, lamword7); constants needed loop namely mask lower byte packed computation four possible zeroes. constant0 0x000000FF; unityconstant 0x01010101; Twin pointers used zeros aray away with dep- endency pointer. then zerosptr2 zeros else zeros Also four state words 4,.7 zeroed that result0123word3 percolate through result0123word7 case permitting same code used zerosptr1 zeros; zerosptr2 zeros (RS_T zerosptr2 zeros (RS_T state0123word4 state0123word5 state0123word6 state0123word7 Save original value pointers figure number zeroes found outside loop. zerosptr1orig zerosptr1; zerosptr2orig zerosptr2; equal then div_inv_table find solution lam01 (lam0123 &0x000000FF);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
(RS_T zeros[0] div_inv_table[lam01]; RS_T either then elemts state-words*/ update state words check zeroes store indices, zero been found. <=64; i++) result0123word0 unityconstant; result0123word0 (result0123word0 state0123word0); state0123word0 _gmpy4(state0123word0, aword0); result0123word1 (result0123word0 state0123word1); state0123word1 _gmpy4(state0123word1, aword1); result0123word2 (result0123word1 state0123word2); state0123word2 _gmpy4(state0123word2, aword2); result0123word3 (result0123word2 state0123word3); state0123word3 _gmpy4(state0123word3, aword3); result0123word4 (result0123word3 state0123word4); state0123word4 _gmpy4(state0123word4, aword4); result0123word5 (result0123word4 state0123word5); state0123word5 _gmpy4(state0123word5, aword5); result0123word6 (result0123word5 state0123word6); state0123word6 _gmpy4(state0123word6, aword6); result0123word7 (result0123word6 state0123word7); state0123word7 _gmpy4(state0123word7, aword7); result0 result0123word7 constant0; result1 (result0123word7 constant0; result2 (result0123word7 constant0; result3 (result0123word7 24); (!result0) *zerosptr1++ (!result1) *zerosptr1++ (!result2) *zerosptr2-- 128; (!result3) *zerosptr2-- 192; nuber zeroes found elements1 elements2. elements1 turn found deducting incremented pointers from original pointers. then zeroes found elements1 zerosptr1 zerosptr1orig; elements2 zerosptr2orig zerosptr2; elements1 elements2; (RS_T
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
equal lam_degree then fail code set. lam_degree greater than fail code set. both cases zeros array filled with 1's. same lam_deg find actual zeroes from indices zeroes through exponentiation tables. *fail limit (ptr lam_deg) *fail (ptr lam_deg) limit (lam_deg RS_T) *fail (lam_deg RS_T) limit RS_T RS_T; i++) zero_val exp_table2[zeros[i]]; (!limit) zero_val zeros[i] zero_val; file: ch_srch_i.c Copyright 1999 Texas Instruments, Incorporated. Rights Reserved.
piped loop kernel shown below obtained using assembly optimizer write serial assembly version loop, obtain cycle loop which Galois field multiplies performed rate multiplies/cycle.
L_1: [!A_res0]STB .D1T1 A_i0, SHRU .S2X A_result0123w7, A_res123x, GMPY4 A_state0123w6, A_result0123w5, B_result0123w2, GMPY4 B_state0123w0, B_unityconstant, L_2: A_i1, A_i0, ||[!A_res1]STB .D1T1 A_i1, B_res23xx, SHRU .S2X A_result0123w7, GMPY4 A_state0123w7, GMPY4 B_state0123w3, B_result0123w0, *A_zerosptr1++ A_constant0, A_aword6, A_state0123w6, B_state0123w3, B_aword0, B_result0123w0 Cycle zero low16 sw6*aw6 rw5^sw6 rw2^sw3 sw0*aw0 Init.res Cycle i1++ i0++ zero Low16 res>>16 sw7*aw7 sw3*aw3 rw0^sw0
B_res23xx A_res1 A_state0123w6 A_result0123w6 B_result0123w3 B_state0123w0
A_i1 A_i0 *A_zerosptr1++ B_constant0, B_res2 B_res3 A_aword7, A_state0123w7 B_aword3, B_state0123w3 B_state0123w0, B_result0123w0
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
L_3: B_i2, ||[!B_res2]STB .D2T2 B_i2, BDEC LOOP8, A_result0123w6, GMPY4 A_state0123w4, .L1X B_result0123w3, GMPY4 B_state0123w1, B_result0123w0, L_4: B_i3, ||[!B_res3]STB .D2T2 B_i3, SHRU A_result0123w7, A_result0123w7, GMPY4 A_state0123w5, A_result0123w4, GMPY4 B_state0123w2, B_result0123w1, *B_zerosptr2-- A_iters A_state0123w7, A_aword4, A_state0123w4, B_aword1, B_state0123w1, B_i2
A_result0123w7 A_state0123w4 A_result0123w4 B_state0123w1 B_result0123w1
B_i3 *B_zerosptr2-- A_res123x A_constant0, A_res0 A_aword5, A_state0123w5 A_state0123w5, A_result0123w5 B_aword2, B_state0123w2 B_state0123w2, B_result0123w2
Cycle i2++ zero Branch Cycle zero res>>16 Low16
5.11 Forney Algorithm
This routine accepts inputs from other three routines, namely syndrome, Chien search Berleykamp Massey computes error magnitudes performs correction. principal equation that needs evaluated repeated once again order understand various computational elements involved: Lambda *k). Forney algorithm computes first values, omega although practice up-to values computed using syndrome. Sine most errors corrected values Omega suffice. addition derivative lambda polynomial also needs computed, both these polynomials need evaluated each zeroes found using Chien search plugged into expression solve error magnitudes these locations. With knowledge error magnitudes error locations where errors corrected. definition omega polynomial repeated here order understand computed: E(x).Lambda(x) (x)(x addition recall that E(x) Fourier transform error polynomial evaluated roots, which syndromes 2,., addition recall that because field finite, this corresponds circular convolution. terms Omega enumerated shown below:
next term evaluate derivative lambda polynomial, which merely shifted version lambda polynomial zeroed even locations because property GF(2^m). actual error magnitude evaluation viewed evaluation numerator denominator. numerator evaluation Omega polynomial k'th zero, denominator evaluation derivative Lambda polynomial obtained from Berleykamp-Massey zero. Each denominator term also individually multiplied zero instead multiplying numerator These concepts seen clearly following code.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
NAME Forney Forney Algorithm Reed Solomon Decoder USAGE This routine following prototype: void forney_cn( const unsigned char restrict s[], unsigned char restrict lambda[], lam_deg, unsigned char restrict zeros[], unsigned char restrict byte_i[], *fail, const unsigned char *restrict tables[], RS_N, unsigned char restrict scratch[] errors that code correct array that contains syndromes lambda: array error locator polynomial degree where lambda[0] lam_deg:Degree lambda polynomial. zeros: Zeroes error locator polynomial found using Chien search. byte_i: received code word bytes fail: pointer store status correction tables: array pointers that contain foll: pointers tables[0] ----> exp_table exponenentials primitive element tables[1] ----> log_table logarithms field elements exp_table[8] inversely log[29] case (204,188,8) code default generator polynomial 100011101 GF(256) RS_N: total number bytes including parity bytes, case (204,188,8) code. scratch: temporary scratch array size fold temporary results. forney routine accepts syndromes array computed using syndrome routine, "lambda" error locator polynomial computed using "Berley Kamp" equivalent, "zeros" array that contains roots error locator polynomial, "byte_i" received code word corrects errors received codeword array.
ASSUMPTIONS array syndromes size lambda: error locator polynomial computed Berley Kamp equiv. lam_deg: degree error locator polynomial. zeros: zeroes error locator polynomial found Chien search. byte_i: received code word size fail: location where fail status stored
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
None these arrays overlap memory. Copyright 2000 Texas Instruments, Incorporated. Rights Reserved. void forney_cn( const unsigned char *restrict unsigned char *restrict lambda, lam_deg, unsigned char *restrict zeros, unsigned char *restrict byte_i, *fail, const unsigned char *restrict *restrict tables, RS_N, unsigned char *restrict scratch i,j,k; omega[3T] syn[4T] den[T] num[T] +poly[2T]*/ unsigned char *omega scratch; unsigned char *syn omega 3*T; unsigned char *numerator 4*T; unsigned char *denominator numerator unsigned char *poly denominator unsigned char err_pos_i, err_mag_i, RS_T RS_2T 2*T; const unsigned char *exp_table2 tables[0]; const unsigned char *div_inv_table tables[1]; const unsigned char *log_table tables[2];
Create array size copy syndrome values from Compute omega polynomial circular convolution Lambda polynomial Berlekamp-Massey syndrome, recursive- Horner's rule. RS_T; i++) syn[i] RS_T; (RS_2T RS_T); i++) syn[i] s[i-RS_T]; RS_2T RS_T; i<(RS_2T RS_2T); i++) syn[i] RS_2T j--) omega[j]= RS_T; i++) omega[j]^=
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
compute derivative lambda poly store result This done shifting terms lambda polynomial zeroing even terms. RS_T; RS_2T; i++) poly[i] RS_T;i++) poly[i] lambda[i]; (i%2 poly[i] find error magnitudes evaluating Omega polynomial derivative Lambda polynomial zeroes found using Chien search routine. addition instead multiplying numerator alpha^k multiply denominator alpha^-k which contained zeroes array. (i=0; RS_T; i++) denominator[i] numerator[i] RS_T j--) RS_T; i++) _gmpy4(zeros[i]] log_table[denominator[i]); denominator[i] denominator[i]^= poly[1+j]; _gmpy4([zeros[i],numerator[i]); numerator[i] numerator[i]^= omega[j]; RS_T; i++) _gmpy4(denominator[i], zeros[i]); denominator[i] div_inv table divide denominator zeroes array. log_table find from alpha^k. Multiply result 1/denominator with numerator obtain error magnitude. Deduct from error location modified error location. fail code been perform error correction. RS_T; i++) err_pos_i err_mag_i
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
div_inv_table[denominator[i]]; div_inv_table[zeros[i]]; lam_deg) _gmpy4(numerator[i], err_mag_i err_pos_i log_table[a]; (!denominator[i]) *fail (err_pos_i RS_N *fail RS_N-1 err_pos_i; subtract error from r(x) (*fail byte_i[k] byte_i[k] err_mag_i; file: forney_c.c Copyright 2000 Texas Instruments, Incorporated. Rights Reserved.
This code optimized capabilities C6400 optimizing each computational steps Forney algorithm. These steps shown below: Omega polynomial calculation: Each term lambda polynomial (byte) read replicated four bytes form packed word. This packed word then multiplied with certain number syndromes shown below. example multiples first syndromes, while multiplies first syndromes, till multiplies only first syndrome These partial terms written explicitly. Each column indicates many multiplies need performed given Consider case errors:
2,., 2,.,
zeroes found from Chien search read into registers numerator denominator computations done using registers instead reading writing memory. This results words numerator words denominator.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
loop that detects error magnitudes reads denominator numerator shifting calculated denominator words byte time, switch between words after four bytes first word have been consumed.
NAME forney Forney Algorithm Reed Solomon Decoder USAGE This routine following prototype: void forney_cn( const unsigned char restrict s[], unsigned char restrict lambda[], lam_deg, unsigned char restrict zeros[], unsigned char restrict byte_i[], *fail, const unsigned char *restrict tables[], RS_N, unsigned char restrict scratch[] errors that code correct array that contains syndromes lambda: array error locator polynomial degree where lambda[0] lam_deg:Degree lambda polynomial. zeros: Zeroes error locator polynomial found using Chien search. byte_i: received code word bytes fail: pointer store status correction tables: array pointers that contain foll: pointers tables[0] ----> exp_table exponenentials primitive element tables[1] ----> log_table logarithms field elements exp_table[8] inversely log[29] case (204,188,8) code default generator polynomial 100011101 GF(256) RS_N: total number bytes including parity bytes, case (204,188,8) code. scratch: temporary scratch array size fold temporary results. forney routine accepts syndromes array computed using syndrome routine, "lambda" error locator polynomial computed using "Berleykamp" equivalent, "zeros" array that contains roots error locator polynomial, "byte_i" received code word corrects errors received codeword array.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
ASSUMPTIONS array syndromes size lambda: error locator polynomial computed equiv. lam_deg: degree error locator polynomial. zeros: zeroes error locator polynomial found Chien search. byte_i: received code word size fail: location where fail status stored None these arrays overlap memory. Copyright 2000 Texas Instruments, Incorporated. Rights Reserved. void forney_c( const unsigned char *restrict unsigned char *restrict lambda, lam_deg, unsigned char *restrict zeros, unsigned char *restrict byte_i, *fail, const unsigned char *restrict *restrict tables, RS_N, unsigned char *restrict scratch i,j,k; omega[3T] syn[4T] den[T] num[T] +poly[2T]*/ unsigned char *omega scratch; unsigned char *syn omega 3*T; unsigned char *numerator 4*T; unsigned char *denominator numerator unsigned char *poly denominator unsigned char err_pos_i, err_mag_i, unsigned char constant 0xFF; unsigned lam_t20, lam_word0; unsigned lam_t31, lam_word1; unsigned lam_t64, lam_word2; unsigned lam_t75, lam_word3; unsigned lam_word4, lam_word00; unsigned lam_word5, lam_word01; unsigned lam_word6; unsigned lam_word7; unsigned *syn_ptr; unsigned sword3210, sword7654; unsigned statew0, statew1; unsigned statew2, statew3; unsigned statew4, statew5; unsigned statew6, statew7; unsigned statew8, statew9;
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
unsigned statew10, statew11; unsigned *omega_ptr (unsigned (omega); unsigned omega_word0, omega_word1; unsigned *lamptr (unsigned (lambda); unsigned *poly_ptr (unsigned (poly); unsigned poly_word0, poly_word1, poly_word2, poly_word3; RS_T const unsigned char *div_inv_table tables[1]; const unsigned char *log_table tables[2]; double *zerosptr, zero_val; unsigned zeroword0, zeroword1; unsigned denword0, denword1; unsigned numword0, numword1; unsigned char *ptr_omega, *ptr_poly; unsigned polyval, polyword, omegaval, omegaword; unsigned denval, numval, zerval; unsigned denom_i, numer_i, zero_i; Compute omega(x) S(x)*lambda(x) Note: array already form S(x) syndrome polynomial. omega[0] lam0 omega[1] lam0 lam1 omega[2] lam0 lam1 lam2 omega[3] lam0 lam1 lam2 lam3 omega[7] lam0 lam1 lam2 lam3 +lam4 lam5 lam6 lam_word00 *lamptr++ /*lam3210*/ lam_word01 *lamptr++ /*lam7654*/ lam_t20 _pack2(lam_word00, lam_word00); lam20 lam_t31 _packh2(lam_word00, lam_word00); lam31 lam_t64 _pack2(lam_word01, lam_word01); lam64 lam_t75 _packh2(lam_word01, lam_word01); lam75 lam_word0 _packl4(lam_t20, lam_t20) lam_word1 _packh4(lam_t20, lam_t20) lam_word2 _packl4(lam_t31, lam_t31) lam_word3 _packh4(lam_t31, lam_t31) lam_word4 _packl4(lam_t64, lam_t64) lam_word5 _packh4(lam_t64, lam_t64) lam_word6 _packl4(lam_t75, lam_t75) lam_word7 _packh4(lam_t75, lam_t75) syn_ptr (unsigned sword3210 *syn_ptr++ sword7654 *syn_ptr++
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
statew0 statew1 sword7654 sword3210
_gmpy4(lam_word0, sword3210); _gmpy4(lam_word0, sword7654); _shlmb(sword3210, sword7654);
/*-----------------------*/ l0s3 l0s2 l0s1 l0s0 l0s7 l0s6 l0s5 l0s4 sword7654 sword6543 sword3210 sword210x
statew2 _gmpy4(lam_word1, sword3210); l1s2 l1s1 l1s0 l100 statew3 _gmpy4(lam_word1, sword7654); l1s6 l1s5 l1s4 l1s3 sword7654 _shlmb(sword3210, sword7654); sword7654 sword5432 sword3210 sword3210 sword10xx statew4 _gmpy4(lam_word2, sword3210); l2s1 l2s0 l200 l200 statew5 _gmpy4(lam_word2, sword7654); l2s5 l2s4 l2s3 l2s2 sword7654 _shlmb(sword3210, sword7654); sword7654 sword5432 sword3210 sword3210 sword10xx statew6 _gmpy4(lam_word3, sword3210); l3s0 l3s0 l300 l300 statew7 _gmpy4(lam_word3, sword7654); l3s4 l3s3 l3s2 l3s1 sword7654 _shlmb(sword3210, sword7654); sword7654 sword4321 sword3210 sword3210 sword0xxx statew8 _gmpy4(lam_word4, sword7654); l4s3 l4s2 l4s1 l4s0 sword7654 statew9 _gmpy4(lam_word5, sword7654); l5s2 l5s1 l5s0 l500 sword7654 statew10 _gmpy4(lam_word6, sword7654); l6s1 l6s0 l600 l600 sword7654 statew11 _gmpy4(lam_word7, sword7654); l7s0 l700 l700 l700 partial results together form terms omega words store them out. omega_word0 statew0 statew2 statew4 statew6; omega_word1 statew1 statew3 statew5 statew7 statew8 statew9 statew10 statew11; *omega_ptr++ omega_word0; *omega_ptr++ omega_word1; compute derivative lambda poly store result lambda polynomial's terms ANDED stored form derivative. poly_word3 poly_word2 poly_word0 (lam_word00 0xFF00FF00); poly_word1 (lam_word01 0xFF00FF00); *poly_ptr++ poly_word0; *poly_ptr++ poly_word1; *poly_ptr++ poly_word2; *poly_ptr++ poly_word3;
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
find error magnitudes error positions compu- ting numerator denominator that contain terms, where assumed utmost words. words numerator numword0 numword1 words denominator denword0 denword1 zerosptr (double (zeros); zero_val *zerosptr; zeroword0 _lo(zero_val); zeroword1 _hi(zero_val); denword1 denword0 numword1 numword0 Omega polynomial "poly" polynomial read bytes next loop, starting with omega[8], poly[9] which ptr_omega omega ptr_poly poly _nassert(RS_T Inner loop that iterates each zeroes completely unrolled loop iterates times building result evaluating omega polynomial Lambda' polynomial eight zeroes recursively using Horner's rule. denword0 computes denominator first four zeroes denword1 computes denominator second four zeroes numword0 computes numerator first four zeroes numword1 computes numerator second four zeroes RS_T-1; j--) polyval=*ptr_poly omegaval=*ptr_omega denword0=(denword0^polyword); numword0=(numword0^omegaword); denword1=(denword1^polyword); numword1=(numword1^omegaword);
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Instead scaling numerator alpha^k denominator scaled alpha^-k which definition zero zeroes array. This done multiplying denword0 zeroword0 denword1 zeroword1. denword0 _gmpy4(denword0,zeroword0); denword1 _gmpy4(denword1,zeroword1); Core loop that performs division each byte denomina- with numerator. loop that switches from denominator/numerator word next after four bytes previous denominator/numerator word have been exhausted lookup table perform division then multiply numerator denominator error magnitude. Also perform division convert alpha^-k alpha^k then perform logarithm table lookup determine location. With error magnitude location known, error corrected, performing received data byte with error magnitude. denom_i denword0; zero_i zeroword0; numer_i numword0; for(i i++) denval (denom_i constant); zerval (zero_i constant); numval (numer_i constant); denom_i (denom_i zero_i (zero_i numer_i (numer_i div_inv_table[denval]; j--; if(!j) denom_i denword1; if(!j) numer_i numword1; if(!j) zero_i zeroword1; if(!j) err_mag_i _gmpy4( numval, err_pos_i (!denval) *fail (err_pos_i RS_N-1) *fail RS_N-1 err_pos_i; subtract error from r(x) (!*fail) byte_i[k] byte_i[k]^err_mag_i
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
file: forney_i.c Copyright 2000 Texas Instruments, Incorporated. Rights Reserved.
assembly optimizer used obtain optimal performance writing serial assembly version loop shown above achieve correction eight errors cycles.
5.12 Decoder driver function
following code shows different pieces called from using simple driver function. driver function additional overhead cycles worst case when there errors then overhead cycles. Thus, overhead from driver about 6.3%. code driver shown below:
NAME rs_dec_driver-- Driver file Reed Solomon Decoder USAGE This routine following prototype: rs_dec_n_k( unsigned char byte_i[], unsigned char byte_o[], fail_epg) byte_i: input array encoded characters length byte_o: decoded array data bytes length fail_epg:fail code DESCRIPTION Reed Solmon decoder accepts input array byte_i corrects upto errors returns corrected array byte_o. Assumptions bytes byte_o contain valid data bytes parity bytes longer valid byte_o tables shown this file required valid decode following fail codes Fail code: Number zeroes found, equal lam_deg degree Lambda polynomial
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Fail code: Degree Lambda Polynomial greater than maximum number errors that corrected Fail code: Denominator Forney expression Fail code: Error position past Copyright 1999 Texas Instruments, Incorporated. Rights Reserved. Arguments decoder: (204,188,8) #define RS_N #define RS_K #define RS_T #define RS_2T RS_T Log-table entries code words: bytes Exp_table entries code words: bytes Div_inv_table entries bytes Five beta tables: bytes Syndrome array syndromes: bytes Lambda array entries: bytes alpha array various powers primitive element: bytes Chien search state_asm: bytes zeros_asm: bytes scratch: bytes GF_inv: 1024 bytes Total Data Memory: 2342 bytes
unsigned char log_table[256] 255, 198, 223, 238, 104, 199, 100, 224, 141, 239, 129, 193, 105, 248, 200, 113, 138, 101, 225, 147, 142, 218, 240, 130, 181, 194, 125, 106, 249, 185, 201, 154, 120, 228, 114, 166, 191, 139, 102, 221, 253, 226, 152, 179, 145, 136, 208, 148, 206, 143, 150, 219, 189, 241, 210, 131, 182, 163, 195, 126, 110, 107, 250, 133, 186, 202, 155, 159, 121, 212, 229, 172, 115, 243, 167, 112, 192, 247, 140, 128, 103, 222, 237, 197, 254, 227, 165, 153, 119, 184, 180, 124, 146, 217, 137, 209, 149, 188, 207, 205, 144, 135, 151, 178, 220, 252, 190, 242, 211, 171, 158, 132, 109, 162, 216, 183, 123, 164, 118, 196, 236, 127, 111, 246, 108, 161, 157, 170, 251, 134, 177, 187, 204, 203, 176, 156, 169, 160, 245, 235, 122, 117, 215, 79,174, 213, 233, 230, 231, 173, 232, 116, 214, 244, 234, 168,
unsigned char exp_table2[512]
128, 116, 232, 205, 135,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
152, 180, 117, 234, 201, 143, 192, 157, 156, 148, 106, 212, 181, 119, 238, 193, 159, 140, 160, 186, 105, 210, 185, 111, 222, 161, 190, 194, 153, 188, 101, 202, 137, 120, 240, 253, 231, 211, 187, 107, 214, 177, 127, 254, 225, 223, 163, 182, 113, 226, 217, 175, 134, 136, 104, 208, 189, 103, 206, 129, 124, 248, 237, 199, 147, 118, 236, 197, 151, 102, 204, 133, 184, 109, 218, 169, 158, 132, 168, 154, 164, 170, 146, 114, 228, 213, 183, 115, 230, 209, 191, 198, 145, 126, 252, 229, 215, 179, 123, 246, 241, 255, 227, 219, 171, 150, 196, 149, 110, 220, 165, 174, 130, 100, 200, 141, 112, 224, 221, 167, 166, 162, 178, 121, 242, 249, 239, 195, 155, 172, 138, 144, 122, 244, 245, 247, 243, 251, 235, 203, 139, 176, 125, 250, 233, 207, 131, 108, 216, 173, 142, 128, 116, 232, 205, 135, 152, 180, 117, 234, 201, 143, 192, 157, 156, 148, 106, 212, 181, 119, 238, 193, 159, 140, 160, 186, 105, 210, 185, 111, 222, 161, 190, 194, 153, 188, 101, 202, 137, 120, 240, 253, 231, 211, 187, 107, 214, 177, 127, 254, 225, 223, 163, 182, 113, 226, 217, 175, 134, 136, 104, 208, 189, 103, 206, 129, 124, 248, 237, 199, 147, 118, 236, 197, 151, 102, 204, 133, 184, 109, 218, 169, 158, 132, 168, 154, 164, 170, 146, 114, 228, 213, 183, 115, 230, 209, 191, 198, 145, 126, 252, 229, 215, 179, 123, 246, 241, 255, 227, 219, 171, 150, 196, 149, 110, 220, 165, 174, 130, 100, 200, 141, 112, 224, 221, 167, 166, 162, 178, 121, 242, 249, 239, 195, 155, 172, 138, 144, 122, 244, 245, 247, 243, 251, 235, 203, 139, 176, 125, 250, 233, 207, 131, 108, 216, 173, 142,
unsigned char div_inv_table[256]
216, 108, 135, 227, 142, 244, 114, 192, 237, 137, 111, 248, 213, 104, 140, 202, 209, 217, 161, 223, 130, 159, 198, 229, 238, 107, 165, 113, 200, 246, 230, 240, 231, 181, 234, 167, 122, 186, 173, 157, 221, 152, 224, 102, 144, 222, 128, 138, 112, 208, 164, 195, 207, 169, 146, 166, 136, 129, 193, 203, 185, 196, 141, 239, 179, 233, 251, 218, 121, 219, 119, 187, 124, 204, 228, 176, 155, 188, 220, 189, 148, 194, 206, 235, 242, 191, 175, 197, 100, 123, 101, 184, 163, 158, 210, 247, 249, 215, 214, 115, 118, 120, 134, 177, 226, 241, 250, 116, 243, 180, 143, 211, 201, 212, 232, 117, 170, 160, 131, 139, 171, 103, 151, 236, 132, 205, 105, 172, 156, 149, 154, 133, 125, 153, 109, 127, 255, 150, 110, 225, 147, 254, 252, 245, 199, 162, 190, 183, 174, 182, 168, 145, 178, 106, 126,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Array pointers tables that used Forney const unsigned char *tables[3] {exp_table2, div_inv_table, log_table}; Header files Library components different algorithms #include "syn_accumulate_h.h" #include "bk_massey_h.h" #include "ch_srch_h.h" #include "forney_h.h" Align following arrays double word boundary GF_inv boundary, because circular buffering used. This decoder work correctly. #pragma DATA_ALIGN(beta, #pragma DATA_ALIGN(beta_RS_N_4, #pragma DATA_ALIGN(beta_RS_N_2, #pragma DATA_ALIGN(beta_RS_3N_4, #pragma DATA_ALIGN(beta_RS_5N_4, #pragma DATA_ALIGN(syn_ASM, #pragma DATA_ALIGN(lambda_ASM, 16); #pragma DATA_ALIGN(zeros_ASM, #pragma DATA_ALIGN(alpha, #pragma DATA_ALIGN(state_ASM, #pragma DATA_ALIGN(scratch, #pragma DATA_ALIGN(GF_inv, 1024); beta array contains powers primitive elements, beta_RS_N_4 contains power roots generator polynomial, beta_RS_N_2 contains power roots generator polynomial These tables used syndrome accumulate routine, syn_acc. unsigned char beta[16] 128, 116, 232, 205, 135, unsigned char beta_RS_N_4[16]= 146, 221, 146, 221, 146, 221, unsigned char beta_RS_N_2[16]= 221, 146, 221, 146, 221, 146, unsigned char beta_RS_3N_4[16]={ 146, 221, 146, 221, 146, 221, unsigned char beta_RS_5N_4[16]={ const unsigned char alpha[] 128,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
190, 100, 169, 184, 169, 128, unsigned GF_inv[256] 0x00000000, 0x01010101, 0x8e8e8e8e, 0x47474747, 0xa7a7a7a7, 0x7a7a7a7a, 0xadadadad, 0x9d9d9d9d, 0xdddddddd, 0x3d3d3d3d, 0xaaaaaaaa, 0x5d5d5d5d, 0xd8d8d8d8, 0x72727272, 0xc0c0c0c0, 0xe0e0e0e0, 0x3e3e3e3e, 0x4c4c4c4c, 0x90909090, 0xdededede, 0x55555555, 0xa0a0a0a0, 0x83838383, 0x4b4b4b4b, 0x6c6c6c6c, 0xedededed, 0x39393939, 0x60606060, 0x56565656, 0x2c2c2c2c, 0x70707070, 0xd0d0d0d0, 0x1f1f1f1f, 0x26262626, 0x8b8b8b8b, 0x33333333, 0x48484848, 0x89898989, 0x6f6f6f6f, 0xa4a4a4a4, 0xc3c3c3c3, 0x40404040, 0x50505050, 0x22222222, 0xcfcfcfcf, 0xabababab, 0x0c0c0c0c, 0x15151515, 0x36363636, 0x5f5f5f5f, 0xf8f8f8f8, 0x92929292, 0x4e4e4e4e, 0xa6a6a6a6, 0x30303030, 0x88888888, 0x2b2b2b2b, 0x16161616, 0x67676767, 0x45454545, 0x38383838, 0x23232323, 0x68686868, 0x81818181, 0x1a1a1a1a, 0x25252525, 0x13131313, 0xc1c1c1c1, 0xcbcbcbcb, 0x97979797, 0x0e0e0e0e, 0x37373737, 0x24242424, 0x57575757, 0xcacacaca, 0xb9b9b9b9, 0xc4c4c4c4, 0x17171717, 0x52525252, 0x8d8d8d8d, 0xefefefef, 0x20202020, 0xecececec, 0x2f2f2f2f, 0x28282828, 0xd1d1d1d1, 0x11111111, 0xe9e9e9e9, 0xfbfbfbfb, 0xdadadada, 0xdbdbdbdb, 0x77777777, 0x06060606, 0x84848484, 0xcdcdcdcd, 0xfefefefe, 0x1b1b1b1b, 0x54545454, 0xa1a1a1a1, 0x7c7c7c7c, 0xcccccccc, 0xe4e4e4e4, 0x49494949, 0x31313131, 0x27272727, 0x53535353, 0x69696969, 0x02020202, 0x18181818, 0xdfdfdfdf, 0x44444444, 0x9b9b9b9b, 0xbcbcbcbc, 0x0f0f0f0f, 0x0b0b0b0b, 0xdcdcdcdc, 0xbdbdbdbd, 0xacacacac, 0x09090909, 0xc7c7c7c7, 0x1c1c1c1c, 0x82828282, 0x9f9f9f9f, 0x34343434, 0xc2c2c2c2, 0x46464646, 0xcececece, 0x3b3b3b3b, 0x0d0d0d0d, 0x9c9c9c9c, 0x08080808, 0xbebebebe, 0x87878787, 0xe5e5e5e5, 0xeeeeeeee, 0xebebebeb, 0xf2f2f2f2, 0xbfbfbfbf, 0xc5c5c5c5, 0x64646464, 0x07070707, 0x95959595, 0x9a9a9a9a, 0xaeaeaeae,
116, 205,
0xf4f4f4f4, 0xbabababa, 0x98989898, 0x96969696, 0x58585858, 0x66666666, 0x80808080, 0x2a2a2a2a, 0x51515151, 0x8a8a8a8a, 0x4a4a4a4a, 0x6e6e6e6e, 0x2e2e2e2e, 0x5e5e5e5e, 0xa9a9a9a9, 0xe1e1e1e1, 0xd5d5d5d5, 0x04040404, 0x1e1e1e1e, 0x93939393, 0x8c8c8c8c, 0x61616161, 0x63636363, 0x41414141, 0x5b5b5b5b, 0x4d4d4d4d, 0xb3b3b3b3, 0x32323232, 0xd9d9d9d9, 0x79797979, 0xbbbbbbbb, 0xfcfcfcfc, 0x1d1d1d1d, 0xb0b0b0b0, 0x2d2d2d2d, 0xf5f5f5f5, 0x4f4f4f4f, 0x5c5c5c5c, 0x94949494, 0xa2a2a2a2, 0xc6c6c6c6, 0x05050505, 0x3c3c3c3c, 0xb7b7b7b7, 0x6b6b6b6b, 0xafafafaf, 0x7b7b7b7b, 0xb6b6b6b6,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
0x12121212, 0x59595959, 0xa5a5a5a5, 0x35353535, 0x65656565, 0xb8b8b8b8, 0xa3a3a3a3, 0x9e9e9e9e, 0xd2d2d2d2, 0xf7f7f7f7, 0x62626262, 0x5a5a5a5a, 0x85858585, 0x7d7d7d7d, 0xa8a8a8a8, 0x3a3a3a3a, 0x29292929, 0x71717171, 0xc8c8c8c8, 0xf6f6f6f6, 0xf9f9f9f9, 0x43434343, 0xd7d7d7d7, 0xd6d6d6d6, 0x10101010, 0x73737373, 0x76767676, 0x78787878, 0x99999999, 0x0a0a0a0a, 0x19191919, 0x91919191, 0x14141414, 0x3f3f3f3f, 0xe6e6e6e6, 0xf0f0f0f0, 0x86868686, 0xb1b1b1b1, 0xe2e2e2e2, 0xf1f1f1f1, 0xfafafafa, 0x74747474, 0xf3f3f3f3, 0xb4b4b4b4, 0x6d6d6d6d, 0x21212121, 0xb2b2b2b2, 0x6a6a6a6a, 0xe3e3e3e3, 0xe7e7e7e7, 0xb5b5b5b5, 0xeaeaeaea, 0x03030303, 0x8f8f8f8f, 0xd3d3d3d3, 0xc9c9c9c9, 0x42424242, 0xd4d4d4d4, 0xe8e8e8e8, 0x75757575, 0x7f7f7f7f, 0xffffffff, 0x7e7e7e7e, 0xfdfdfdfd syn_ASM: array which syndromes computed, lambda_ASM: array which lambda values computed zeros_ASM: array which zeroes computed unsigned char syn_ASM[RS_2T]; unsigned char lambda_ASM[RS_T+1]; unsigned char zeros_ASM[RS_T]; arrays performing temporary calculations holding state information. unsigned char scratch[11 RS_T]; state_ASM[36]; Reed Solomon decoder accepts encoded data array byte_i size returning decoded output array byte_o. Fail codes returned Fail codes returned fail_epg. rs_dec_n_k( unsigned char byte_i[], unsigned char byte_o[], fail_epg) lam_deg error
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Initialize syndrome array, lambda array, zeros, scratch area using memset memset( syn_ASM, sizeof(syn_ASM)); memset( lambda_ASM, sizeof(lambda_ASM)); memset( zeros_ASM, sizeof(zeros_ASM)); memset( scratch, sizeof(scratch)); Perform syndrome computation received code word. there errors*/"error" zero, this used avoid performing other steps when there errors. error syn_asm( byte_i, syn_ASM, beta, beta_RS_N_4, beta_RS_N_2, beta_RS_3N_4, beta_RS_5N_4, 204, Initialize fail_epg zero, assume that data decoded vaid. *fail_epg Perform Berley_Kamp Massey algorithm detemine Lambda polynomial bk_massey_asm( syn_ASM, GF_inv, RS_T, fail_epg, &lam_deg, lambda_ASM Perform Chien search algorithm determine zeroes error locator polynomial. fail code set, then implies that there more than errors. This used exit decoder, without perfoming forney algorithm. However Forney algorithm still called without checkinh fail status because correction input codeword performed, fail code set. ch_srch_asm( lambda_ASM, alpha, log_table, lam_deg, zeros_ASM, state_ASM, RS_T, div_inv_table); fail_epg, exp_table2,
errors have occurred, call forney perform correction, otherwise exit. This equivalent checking error syndome routine non_zero. (lam_deg) forney_asm syn_ASM, lambda_ASM, lam_deg, zeros_ASM, byte_i, fail_epg, tables, RS_T,
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
RS_N,
scratch
Return fail code calling routine return(*fail_epg);
5.13 Performance Reed Solomon Decoder
following table indicates cycles case There additional optimizations that performed yield lower cycle count. each section cycle count obtained from tools using compiler, assembly optimizer hand coding mentioned indicate that tools only reach performance levels hand coding also drastically reduce software development time. Case errors (204,188,8) code Table Performance Decoder (204,188,8) Code
Name Module Syndrome Accumulate Chien Search Berlekamp-Massey Forney Driver Function Reed Solomon Decoder Code cycles 1110 cycles cycles cycles cycles 2180 cycles Assembly Optimizer cycles cycles cycles cycles cycles 1293 cycles Hand Optimized cycles cycles cycles cycles cycles 1268 cycles
data shown case (204,188,8) worst case running time, example when eight errors have occurred received data. there been errors syndromes would have been zero decoder proceed decode next block without executing other three algorithms. Similarly there more than eight errors then Chien search sets fail flag error correction attempted. three columns indicate performance obtained from existing compiler time this document being written. compiler's performance likely improve significantly immediate future. performance data shown compares intrinsic code, partitioned linear assembly code hand optimized code. seen that barring Chien search algorithm, possible obtain performance comparable hand assembly code directly from performance case optimized further obtain decoder under 1000 cycles. code-size each these modules also shown below. seen that code-size about bytes entire decoder.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Table Code Size (204,188,8) Decoder
Name Module Syndrome Accumulate Chien Search Berlekamp-Massey Forney Total Code size Code 1084 bytes bytes bytes bytes 3032 bytes Assembly Optimizer 1100 bytes bytes bytes 1036 bytes 3684 bytes Hand Optimized 1128 bytes bytes bytes bytes 3088 bytes
Case errors (102,94,4) code Table Performance Decoder (102,94,4) Code
Name Module Syndrome Accumulate Chien Search Berlekamp-Massey Forney Reed Solomon Decoder Code cycles 1110 cycles cycles cycles 1600 cycles Assembly Optimizer cycles cycles cycles cycles cycles Hand assembly cycles cycles cycles cycles cycles
Chien search code been modified same code used, because performance this case limited latency, instead writing separate code, code re-used. performance case further optimized under cycles. Since same code used both code-size table remains same.
References
Texas Instruments Inc. eXpressDSP Algorithm Standard (xDAIS): Rules Guidelines, SPRU352, September 1999. Error Control Coding Costello Linn Prentice Hall TMS320C6000 Instruction SPRU189
Acknowledgements
author would like thank David Hoyle Todd Wolf their contributions C64x implementation Reed Solomon Decoder.
Reed Solomon Decoder: TMS320C64x Implementation
SPRA686
Conclusions
This application report demonstrates implementation (204, 188, Reed Solomon decoder TMS320C64x architecture. Techniques that yield highly parallel implementations individual pieces Petersen-Gorenstein-Zierler (PGZ) algorithm were presented. These techniques clearly demonstrate benefits having DSPs that provide support Galois Field based operations. These algorithms able make maximum Galois field multiplier bandwidth availability eight distinct units sustain parallel processing TMS320C64x architecture. been shown that optimizing compiler C6000 architecture able extract data instruction level parallelism complicated algorithms involved developing Reed Solomon decoder, thus minimizing software development time effort. clock speeds 800MHZ, Reed Solomon decoder 6Mbytes/sec ADSL channel requires processing load 4.75 This allows multichannel implementation Reed Solomon decoder C64x DSP.
Reed Solomon Decoder: TMS320C64x Implementation
IMPORTANT NOTICE Texas Instruments subsidiaries (TI) reserve right make changes their products discontinue product service without notice, advise customers obtain latest version relevant information verify, before placing orders, that information being relied current complete. products sold subject terms conditions sale supplied time order acknowledgment, including those pertaining warranty, patent infringement, limitation liability. warrants performance semiconductor products specifications applicable time sale accordance with TI's standard warranty. Testing other quality control techniques utilized extent deems necessary support this warranty. Specific testing parameters each device necessarily performed, except those mandated government requirements. Customers responsible their applications using components. order minimize risks associated with customer's applications, adequate design operating safeguards must provided customer minimize inherent procedural hazards. assumes liability applications assistance customer product design. does warrant represent that license, either express implied, granted under patent right, copyright, mask work right, other intellectual property right covering relating combination, machine, process which such semiconductor products services might used. TI's publication information regarding third party's products services does constitute TI's approval, warranty endorsement thereof.
Copyright 2000, Texas Instruments Incorporated

Other recent searches


XAPP283 - XAPP283   XAPP283 Datasheet
Color - Color   Color Datasheet
Space - Space   Space Datasheet
Converter - Converter   Converter Datasheet
CrCb - CrCb   CrCb Datasheet
SCHS293 - SCHS293   SCHS293 Datasheet
MALS068X - MALS068X   MALS068X Datasheet
ISA1286AS1 - ISA1286AS1   ISA1286AS1 Datasheet
HMC326MS8G - HMC326MS8G   HMC326MS8G Datasheet
FW050R - FW050R   FW050R Datasheet
FW100R - FW100R   FW100R Datasheet
FW150R - FW150R   FW150R Datasheet
F1001 - F1001   F1001 Datasheet
CXT853 - CXT853   CXT853 Datasheet

 

Privacy Policy | Disclaimer
© 2012 Datasheet Archive