| The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers. |
DS260 June 2009 Product Specification Xilinx® LogiCOREIP Fast Fou
Top Searches for this datasheetFast Fourier Transform v7.0 DS260 June 2009 Product Specification Xilinx® LogiCOREIP Fast Fourier Transform (FFT) implements Cooley-Tukey algorithm, computationally efficient method calculating Discrete Fourier Transform (DFT). Overview core computes N-point forward inverse (IDFT) where 3-16. fixed-point inputs, input data vector complex values represented dual bx-bit two's-complement numbers, that bits each real imaginary components data sample, where range bits inclusive. Similarly, phase factors bits wide. single-precision floating-point inputs, input data vector complex values represented dual 32-bit floating-point numbers with phase factors represented 25-bit fixed-point numbers. memory on-chip using either Block Distributed RAM. element output vector represented using bits each real imaginary components output data. Input data presented natural order output data either natural bit/digit reversed order. complex nature data input output intrinsic algorithm, implementation. Three arithmetic options available computing FFT: Full-precision unscaled arithmetic Scaled fixed-point, where user provides scaling schedule Block floating-point (run-time adjusted scaling) Features Drop-in module Virtex®-6, Virtex-5, Virtex-4, Spartan®-6, Spartan-3/XA, Spartan-3E/XA Spartan-3A/XA/AN/3A FPGAs Forward inverse complex FFT, run-time configurable Transform sizes Data sample precision Phase factor precision Arithmetic types: Unscaled (full-precision) fixed-point Scaled fixed-point Block floating-point Fixed-point floating-point interface Rounding truncation after butterfly Block Distributed data phase- factor storage Optional run-time configurable transform point size Run-time configurable scaling schedule scaled fixed-point cores Bit/digit reversed natural output order Optional cyclic prefix insertion digital communications systems Four architectures offer trade-off between core size transform time Bit-accurate model function system modeling available download with Xilinx CORE Generatorand Xilinx System Generator DSPv11.2 higher point size choice forward inverse transform, scaling schedule cyclic prefix length run-time configurable. Transform type (forward inverse), scaling schedule cyclic prefix length changed frame frame basis. Changing point size resets core. Four architecture options available: Pipelined, Streaming I/O, Radix-4, Burst I/O, Radix-2, Burst I/O, Radix-2 Lite, Burst I/O. detailed information about each architecture, "Architecture Options." 2003-2009 Xilinx, Inc. rights reserved. XILINX, Xilinx logo, Brand Window, other designated brands included herein trademarks Xilinx, Inc. other trademarks property their respective owners. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Theory Operation computationally efficient algorithm computing Discrete Fourier Transform (DFT) sample sizes that positive integer power sequence defined x(n)e Equation where transform size inverse (IDFT) given Equation Algorithm core uses Radix-4 Radix-2 decompositions computing DFT. Burst architectures, decimation-in-time (DIT) method used, while decimation-in-frequency (DIF) method used Pipelined, Streaming architecture. When using Radix-4 decomposition, N-point consists log4 stages, with each stage containing Radix-4 butterflies. Point sizes that power need extra Radix-2 stage combining data. N-point using Radix2 decomposition log2 stages, with each stage containing Radix-2 butterflies. inverse (IFFT) computed conjugating phase factors corresponding forward FFT. Finite Word Length Considerations Burst architectures process array data successive passes over input data array. each pass, algorithm performs Radix-4 Radix-2 butterflies, where each butterfly picks four complex numbers, respectively, returns four complex numbers same memory. numbers returned memory core potentially larger than numbers picked from memory. strategy must employed accommodate this dynamic range expansion. full explanation scaling strategies their implications beyond scope this document; more information about this topic; [Ref [Ref Radix-4 FFT, values computed butterfly stage experience growth factor 5.242 This implies growth bits. Radix-2, growth factor 2.414 This implies growth bits. This growth handled three ways: Performing calculations with scaling carrying significant integer bits computation Scaling each stage using fixed-scaling schedule Scaling automatically using block floating point www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 significant integer bits retained when using full-precision unscaled arithmetic. width data path increases accommodate growth through butterfly. growth fractional bits created from multiplication truncated rounded) after multiplication. width output (input width log2(transform length) This accommodates worst case scenario growth. Consider unscaled Radix-2 FFT: datapath each stage must grow adder subtractor butterfly add/subtract full-scale values produce sample which grown width bit. This yields log2(transform length) part increase output width relative input width. complex multiplier preserves magnitude input applies rotation complex plane), theoretically produce bit-growth when magnitude input greater than (for example, magnitude 1.414). This means that complex multiplier growth must only considered once entire process, yielding additional increase output width relative input width. example, 1024-point transform with input bits consisting integer fractional bits, output bits with integer bits fractional bits. Note that core does have specific location binary point. output simply maintains same binary point location input. above example, input with integer bits fractional bits would have unscaled output bits with integer bits fractional bits. When using scaling, scaling schedule used divide factor each stage. scaling insufficient, butterfly output grow beyond dynamic range cause overflow. result scaling applied implementation, transform computed scaled transform. scale factor defined Equation where scaling (specified bits) applied stage scaling results final output sequence being modified factor 1/s. forward FFT, output sequence (k), 0,.,N computed core defined x(n)e Equation inverse FFT, output sequence Equation Radix-4 algorithm scales factor each stage, factor equal factor inverse equation (Equation Radix-2, scaling factor each stage provides factor 1/N. With block floating point, each stage applies sufficient scaling keep numbers range, scaling tracked block exponent. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 with unscaled arithmetic, scaled block floating-point arithmetic, core does have specific location binary point. location binary point output data inherited from input data then shifted scaling applied. Floating Point Considerations core optionally accepts data IEEE-754 single precision format with 32-bit words consisting 1-bit sign, 8-bit exponent, 23-bit fraction. construction word matches that Xilinx® Floating Point Operator core. Implementing full floating point FPGA expensive terms resources required. floating-point option Xilinx® core utilizes higher precision fixed-point internally achieve similar noise performance full floating-point FFT, with significantly fewer resources. Figure illustrates levels noise performance possible selecting either bits bits phase factor width. increasing phase factor width bits, more resources required, depending target FPGA device. X-Ref Target Figure Figure Comparison Levels Noise Performance Figure shows ratio difference between various models double precision MATLAB® data peak amplitude. models shown single-precision MATLAB function (calculated casting input data single-precision floating-point type), Xilinx core using 24-bit phase factor width, Xilinx core using 25-bit phase factor width. calculate error signal, randomized impulse magnitude time) used input signal, with error averaged over five simulation runs. optimization options (memory types XtremeDSPslice optimization) remain available when floating point input data selected, allowing user trade resources with transform time. www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Transform time Burst architectures increased approximately number points transform, input normalization requirements. Pipelined, Streaming architecture, initial latency fill pipeline increased, data still streams through core with gaps. Denormalized Numbers floating-point interface core does support denormalized numbers. match behavior Xilinx Floating Point Operator core, core treats denormalized operands zero, with sign taken from denormalized number. NaNs Infinity core detects Infinity value input, output samples associated with current input frame NaN. sign zero exponent fraction bits Real-Valued Input Data core accepts complex data samples, perform transform real-valued data setting imaginary input samples zero. finite wordlength effects described above, noise introduced during transform, resulting output data being perfectly symmetric. algorithms have different noise effects different calculation order. thorough treatment this topic, refer [Ref [Ref asymmetry between halves result more noticeable larger point sizes. addition, noise more prominent lower frequency bins. Therefore, Xilinx recommends that upper half (N/2+1 points) output data used when performing real-valued FFT. Rounding Implementation option available, architectures, apply convergent rounding data after butterfly stage. However, selecting this option does apply convergent rounding points datapath where wordlength reduction occurs. particular, outputs complex multipliers datapath truncated reduce datapath width (while still maintaining adequate precision) simple rounding constant added fractional bits. This constant implements non-symmetric, round-towards-minus-infinity rounding, introduce small bias results over large number samples. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Architecture Options core provides four architecture options offer trade-off between core size transform time. "Pipelined, Streaming I/O". Allows continuous data processing. "Radix-4, Burst I/O". Loads processes data separately, using iterative approach. smaller size than pipelined solution longer transform time. "Radix-2, Burst I/O". Uses same iterative approach Radix-4, butterfly smaller. This means smaller size than Radix-4 solution, transform time longer. "Radix-2 Lite, Burst I/O". Based Radix-2 architecture, this variant uses time-multiplexed approach butterfly even smaller core, cost longer transform time. Figure illustrates trade-off throughput versus resource four architectures. rule thumb, each architecture offers factor difference resource from next architecture. example even power point size. This does require Radix-4 architecture have additional Radix-2 stage. four architectures configured fixed-point interface with three fixed-point arithmetic methods (unscaled, scaled block floating-point) instead floating-point interface. X-Ref Target Figure Figure Resource versus Throughput Architecture Options Digit Reversal Each architecture offers option natural reversed ordering output data, with data being input natural order. algorithm reorders samples during processing such that data input natural order output reversed order. core optionally output data natural order. However, this imposes cost each architecture. Burst architectures, this imposes time penalty, because unloading data cannot take place same time loading input data next frame, separate unload load phases required. pipelined architecture, requires additional storage perform reordering. www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Radix-2, Burst I/O, Radix-2 Lite, Burst Pipelined, Streaming architectures, Reverse order simple calculate, taking index data point, written binary, reversing order digits. Hence, 0000, 0001, 0010, 0011, 0100,.(0, 4,.) becomes 0000, 1000, 0100, 1100, 0010,.(0, 2,.). case Radix-4, Burst architecture, reversal applies digits and, therefore, called Digit Reversal. digit Radix-4 bits. Hence, 0000, 0001, 0010, 0011, 0100,.(0, 4,.) becomes 0000, 0100, 1000, 1100, 0001,.(0, 1,.), pairs digits reversed. Where transform size requires number index bits, digit least significant place moved most significant place, 00000, 00001, 00010, 00011, 00100,. 4,.) becomes 00000, 10000, 00100, 10100, 01000,.(0, 8,.) Note: core outputs data point index along with data, this section information only. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Pipelined, Streaming Pipelined, Streaming solution pipelines several Radix-2 butterfly processing engines offer continuous data processing. Each processing engine memory banks store input intermediate data (Figure core ability simultaneously perform transform calculations current frame data, load input data next frame data, unload results previous frame data. user continuously stream data and, after calculation latency, continuously unload results. preferred, this design also calculate frame itself frames with gaps between. scaled fixed-point mode, data scaled after every pair Radix-2 stages. block floatingpoint mode significantly more resources than scaled mode must maintain extra bits precision allow dynamic scaling without impacting performance. Therefore, input data well understood unlikely exhibit large amplitude fluctuation, using scaled arithmetic (with suitable scaling schedule avoid overflow known worst case) sufficient resources saved. input data presented natural order. unloaded output data either reversed order natural order. When natural order output data selected, additional memory resource utilized. This architecture covers point sizes from 65536. user flexibility select number stages block data phase factor storage. remaining stages distributed memory. X-Ref Target Figure Group Memory Memory Memory Group Memory Input Data Radix-2 Butterfly Stage Radix-2 Butterfly Stage Radix-2 Butterfly Stage Radix-2 Butterfly Stage Memory Memory Radix-2 Butterfly Radix-2 Butterfly Output Shuffling Output Data Figure Pipelined, Streaming www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Radix-4, Burst With Radix-4, Burst solution, core uses Radix-4 butterfly processing engine (Figure loads and/or unloads data separately from calculating transform. Data processing simultaneous. When started, data loaded. After full frame been loaded, core computes transform. When computation finished, data unloaded, cannot loaded unloaded during calculation process. data loading unloading processes overlapped data unloaded digit reversed order. This architecture lower resource usage than Pipelined, Streaming architecture longer transform time, supports point sizes from 65536. Data phase factors stored block distributed (the latter point sizes less than equal 1024). X-Ref Target Figure Twiddles Input Data Data Data Data Data RADIX-4 DRAGONFLY switch Output Data Figure Radix-4, Burst DS260 June 2009 Product Specification www.xilinx.com switch Fast Fourier Transform v7.0 Radix-2, Burst Radix-2, Burst architecture uses Radix-2 butterfly processing engine (Figure After frame data loaded, input data stream must halt until transform calculation completed. Then, data unloaded. with Radix-4, Burst architecture, data simultaneously loaded unloaded when output samples reversed order. This solution supports point sizes from 65536. Both data memories phase factor memories either block distributed (the latter point sizes less than equal 1024). X-Ref Target Figure Twiddles Input Data Data switch RADIX-2 BUTTERFLY switch Data Output Data Figure Radix-2, Burst www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Radix-2 Lite, Burst This architecture differs from Radix-2, Burst that butterfly processing engine uses shared adder/subtractor, hence reducing resources expense additional delay butterfly calculation. Again, with Radix-4 Radix-2, Burst architectures, data simultaneously loaded unloaded output samples reversed order. This solution supports point sizes from 65536. Figure X-Ref Target Figure Store data single Input Data Data Twiddles Sine cycle, cosine next RADIX-2 BUTTERFLY Data Multiply real cycle, imaginary next Output Data Generate output each cycle ds260_05_102306 Figure Radix-2 Lite, Burst Core Symbol Port Definitions Figure shows Core Schematic Symbol Table lists core pinout single channel configurations. X-Ref Target Figure XN_RE XN_IM START UNLOAD NFFT NFFT_WE FWD_INV FWD_INV_WE SCALE_SCH SCALE_SCH_WE CP_LEN CP_LEN_WE SCLR XK_RE XK_IM XN_INDEX XK_INDEX BUSY EDONE DONE BLK_EXP OVFLO DS260_06_091707 Figure Core Schematic Symbol (Single Channel) DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Table Core Pinout (Single Channel) Port Name XN_RE XN_IM Port Width Direction Input Input Description Input data bus: Real component (bxn two's complement single precision floating point format. Input data bus. Imaginary component (bxn two's complement single precision floating point format. start signal (Active High): START asserted begin data loading transform calculation (for Burst architectures). Streaming I/O, START begins data loading, which proceeds directly transform calculation then data unloading. Result unloading (Active High): Burst architectures, UNLOAD starts unloading results natural order. UNLOAD port necessary Pipelined, Streaming architecture bit/digit reversed unloading. Point size transform: NFFT size transform smaller point size. example, 1024-point compute point sizes 1024, 512, 256, value NFFT log2 (point size). This port only used with run-time configurable transform point size. Write enable NFFT (Active High): Asserting NFFT_WE causes core stop processes initialize state core point size NFFT port. This port only used with run-time configurable transform point size. Clock Enable overrides NFFT_WE both signals present. Control signal that indicates forward inverse performed. When FWD_INV=1, forward transform computed. FWD_INV=0, inverse transform computed. Write enable FWD_INV (Active High). Scaling schedule: Burst architectures, scaling schedule specified with bits each stage, with scaling first stage given LSBs. scaling specified which represents number bits shifted. example scaling schedule =1024, Radix-4, Burst (ordered from last first stage). N=128, Radix-2, Burst Radix-2 Lite, Burst I/O, possible scaling schedule (ordered from last first stage). Pipelined, Streaming architecture, scaling schedule specified with bits every pair Radix-2 stages, starting LSBs. example, scaling schedule N=256 could When power maximum growth last stage bit. instance, valid scaling schedules N=512, invalid. this transform length. MSBs SCALE_SCH only This port only available with scaled arithmetic (not unscaled, block floating-point single precision floating-point). START Input UNLOAD Input NFFT Input NFFT_WE Input FWD_INV Input FWD_INV_WE Input NFFT ceil PIpelined, Streaming Radix-4, Burst architectures NFFT Radix-2, Burst Radix-2 Lite, Burst architectures where NFFT log2 (maximum point size) number stages SCALE_SCH Input www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Table Core Pinout (Single Channel) (Cont'd) Port Name SCALE_SCH_WE CP_LEN Port Width log2 (maximum point size) Direction Input Input Description Write enable SCALE_SCH (Active High): This port available only with scaled arithmetic. Cyclic prefix length: number samples from transform that initially output cyclic prefix, before whole transform output. CP_LEN number from zero less than point size. This port only available with cyclic prefix insertion. Write enable CP_LEN (Active High): This port only available with cyclic prefix insertion. Master synchronous reset (Active High): Optional port. synchronous reset overrides clock enable when both present core. Clock enable (Active High): Optional port. Rising-edge clock Output data bus: Real component two's complement floating-point format. (For scaled arithmetic block floating-point arithmetic, bxk= bxn. unscaled arithmetic, bxn+ log2 (maximum point size) single precision floating-point bxk= 32). Output data bus: Imaginary component two's complement single precision floating-point format. (For scaled arithmetic block floating-point arithmetic, bxn. unscaled arithmetic, bxn+ log2 (maximum point size) single precision floating point bxk= Index input data. Index output data. Ready data (Active High): High during load operation. Core activity indicator (Active High): This signal goes High while core computing transform. Data valid (Active High): This signal High when valid data presented output. Early done strobe (Active High): EDONE goes High clock cycle immediately prior DONE going High. complete strobe (Active High): DONE transitions High clock cycle when transform calculation completed. Block exponent: amount scaling applied. Available only when block floating point used. Arithmetic overflow indicator (Active High): OVFLO High during result unloading value data frame overflowed. OVFLO signal reset beginning frame data. This port optional only available with scaled arithmetic single precision floating-point I/O. CP_LEN_WE SCLR Input Input XK_RE Input Input Output XK_IM Output XN_INDEX XK_INDEX BUSY EDONE log2 (maximum point size) log2 (maximum point size) Output Output Output Output Output Output DONE Output BLK_EXP OVFLO Output Output DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Table Core Pinout (Single Channel) (Cont'd) Port Name Port Width Direction Output Description Cyclic prefix valid (Active High): This signal High when valid data that part cyclic prefix presented output. This port only available with cyclic prefix insertion. Ready start (Active High): This signal goes High when core ready accept assertion START input begin data loading. This port only available with cyclic prefix insertion Pipelined, Streaming architecture. Output Multichannel Pinout channels supported, Burst architectures only. Table shows pinout above must adapted multichannel operation. Table Single Multichannel Pinout Conversion Single Channel SCLR NFFT NFFT_WE FWD_INV FWD_INV_WE START UNLOAD XN_RE XN_IM SCALE_SCH SCALE_SCH_WE CP_LEN CP_LEN_WE XN_INDEX BUSY EDONE DONE XK_INDEX XK_RE XK_IM BLK_EXP OVFLO Multichannel SCLR NFFT NFFT_WE FWD_INV0,.,FWD_INV11 FWD_INV0_WE,.,FWD_INV11_WE START UNLOAD XN0_RE,.,XN11_RE XN0_IM,.,XN11_IM SCALE_SCH0,.,SCALE_SCH11 SCALE_SCH0_WE,.,SCALE_SCH11_WE CP_LEN CP_LEN_WE XN_INDEX BUSY EDONE DONE XK_INDEX XK0_RE,.,XK11_RE XK0_IM,.,XK11_IM BLK_EXP0,.,BLK_EXP11 OVFLO0,.,OVFLO11 www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Table Single Multichannel Pinout Conversion (Cont'd) Single Channel Multichannel CORE Generator Graphical User Interface core graphical user interface (GUI) provides several screens with fields parameter values particular instantiation required. description each CORE Generator field follows: Page Component Name: name core component instantiated. name must begin with letter composed following characters: "_". Channels: Select number channels from Multichannel operation available three Burst architectures. Transform Length: Select desired point size. powers from 65536 available. Implementation Options: Select implementation option, described "Architecture Options," page Pipelined, Streaming I/O, Radix-2, Burst I/O, Radix-2 Lite, Burst architectures support point sizes 65536. Radix-4, Burst architecture supports point sizes 65536. Check Automatically Select choose smallest implementation that meets specified Target Data Throughput, provided specified Target Clock Frequency achieved when core implemented FPGA device. Target Clock Frequency Target Data Throughput only used automatically select implementation calculate latency. core guaranteed specified target clock frequency target data throughput. Transform Length Options: Select transform length run-time configurable not. core uses fewer logic resources faster maximum clock speed when transform length run-time configurable. Page Data Format: Select whether input output data samples Fixed Point format, IEEE-754 single precision (32-bit) Floating Point format. Floating Point format available when core multichannel configuration. Precision Options: Input data phase factors independently configured widths from bits, inclusive. When Data Format Floating Point, input data width fixed bits phase factor width bits depending noise performance required available resources. Scaling Options: Three options available, architectures: Unscaled integer growth carried output. This more FPGA resources. user-defined scaling schedule determines data scaled between stages. Scaled DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Block Floating Point. core determines much scaling necessary make best available dynamic range, reports scaling factor block exponent. Optional Pins: Clock Enable (CE), Synchronous Clear (SCLR), Overflow (OVFLO) optional pins. Synchronous Clear overrides Clock Enable both selected. option selected, some logic resources saved higher clock frequency attainable. Rounding Modes: output butterfly, LSBs datapath need trimmed. These bits truncated rounded using convergent rounding, which unbiased rounding scheme. When fractional part number equal exactly one-half, convergent rounding rounds number odd, rounds down number even. Convergent rounding used avoid bias that would otherwise introduced truncation after butterfly stages. Selecting this option will increase slice usage yields small increase transform time additional latency. Output Ordering: Output data selections either Bit/Digit Reversed Order Natural Order. Radix-2 based architectures (Pipelined, Streaming I/O, Radix-2, Burst Radix-2 Lite, Burst I/O) offer bit-reversed ordering, Radix-4 based architecture (Radix-4, Burst I/O) offers digit-reversed ordering. Pipelined, Streaming architecture, selecting natural order output ordering results increase memory used core. Burst architectures, selecting natural order output increases overall transform time because separate unloading phase required. Cyclic Prefix Insertion selected output ordering Natural Order. Cyclic Prefix Insertion available architectures, typically used OFDM wireless communications systems. Input Data Timing: previous versions Xilinx core, input data applied cycles after corresponding sample index, allow block memory containing data samples addressed. many cases, this necessary, applying data wrong cycle made appear core functioning incorrectly. This timing configured backwardscompatible with previous versions, have delay between sample index applied data (default). Page Memory Options: Data Phase Factors (Burst architectures): Burst architectures, either block distributed used data phase factor storage. Data phase factor storage distributed point sizes including, 1024 points. Data Phase Factors (Pipelined, Streaming I/O): Pipelined, Streaming solution, data partially stored block partially distributed RAM. Each pipeline stage, counting from input side, uses smaller data phase factor memories than preceding stages. user select number pipeline stages that block data phase factor storage. Later stages distributed RAM. default displayed offers good balance between both. output ordering Natural Order, memory used reorder buffer either block distributed RAM. reorder buffer distributed point sizes less than equal 1024. When block floating point selected Pipelined, Streaming architecture, buffer required natural order reversed order output data. this case, reorder buffer options remain available distributed selected point sizes below 2048. www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Hybrid Memories: Where data, phase factor, reorder buffer memories stored block RAM, size memory greater than block RAM, memory constructed from hybrid block RAMs distributed RAM, where majority data stored block RAMs bits that left over stored distributed RAM. This Hybrid Memory alternative constructing memory entirely from multiple block RAMs. provides reduction block count, cost increase number slices used. Hybrid Memories only available when block used more memories number slices required Hybrid Memory implementation below internal threshold LUTs memory. these conditions met, Hybrid Memories made available selected. Complex Multipliers: Three options available customization complex multiplier implementation: logic: complex multipliers will constructed using slice logic. This appropriate target applications which have performance requirements, target devices which have XtremeDSP slices/Mult18x18s. 3-multiplier structure (resource optimization): complex multipliers will three real multiply, five add/subtract structure, where multipliers XtremeDSP slices/Mult18x18s. This reduces XtremeDSP slice/Mult18x18 count, uses some slice logic. Spartan-3A DSP, Spartan-6 Virtex-6 devices, this structure make XtremeDSP slice's pre-adder reduce remove need extra slice logic, improve performance. 4-multiplier structure (performance optimization): complex multipliers will four real multiply, add/subtract structure, utilizing XtremeDSP slices/Mult18x18s. This structure yields highest clock performance expense more dedicated multipliers. devices with XtremeDSP slices, add/subtract operations implemented within XtremeDSP slices. devices with Mult18x18s, add/subtract operations slice logic. Optimize Options: Note: core override complex multiplier implementation internally ensure fewest number XtremeDSP slices/Mult18x18s used, without impacting performance. this reason, some core configurations show difference XtremeDSP slice/Mult18x18 usage when toggling between 3-multiplier 4-multiplier options. "Use logic" selected, however, slice logic will always utilized. Butterfly Arithmetic: options available customization butterfly implementation: logic: butterfly stages will constructed using slice logic. XtremeDSP Slices: devices with XtremeDSP slices, this option forces butterfly stages implemented using adder/subtracters XtremeDSP slices. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Information Tabs Resource Estimates: Implementation: This field displays currently selected architecture. This useful result automatic architecture selection. Transform Size: When transform length run-time configurable, core ability reprogram point size while core running; that core support selected point size smaller point size. This field displays supported point sizes based Transform Length, Transform Length Options, Implementation Options selected. Output Data Width: output data width equals input data width scaled arithmetic block floating-point arithmetic. With unscaled arithmetic, output data width equals (input data width log2(point size) Resource Estimates: Based options selected, this field displays XtremeDSP slice Mult18x18 count block numbers block numbers Spartan-6 devices). resource numbers just estimate. exact resource usage, slice/LUTFlipFlop pair information, report should consulted. This shows latency core clock cycles microseconds each point size supported. latency from asserting START input last sample output data coming core, assuming that UNLOAD input present) asserted soon DONE goes High. Note that this minimum number cycles between starting consecutive frames, frames overlap some cases. latency microseconds based target clock frequency. latency figures copied Clipboard pasted plain text into other applications. This provides link Xilinx® LogiCORE page where core's model downloaded. details model, "Bit-Accurate Model," page Latency: Model: Parameters Table defines valid entries parameters. Parameters case sensitive. Default values displayed bold. Xilinx strongly recommends that parameters manually edited file; instead, CORE Generator configure core perform range parameter value checking. Table Parameters Parameter component_name channels transform_length implementation_options Valid Values Name must begin with letter composed following characters: "_". (default value 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536 automatically_select pipelined_streaming_io radix_4_burst_io radix_2_burst_io radix_2_lite_burst_io (default 250) target_clock_frequency www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Table Parameters (Cont'd) Parameter target_data_throughput data_format input_width phase_factor_width scaling_options Valid Values (default false true fixed_point floating_point (default value (default value scaled unscaled block_floating_point truncation convergent_rounding false true false true false true bit_reversed_order natural_order false true block_ram distributed_ram block_ram distributed_ram block_ram distributed_ram rounding_modes sclr ovflo output_ordering cyclic_prefix_insertion memory_options_data memory_options_phase_factors memory_options_reorder (default value depends transform length) phase_factors memory_options_hybrid input_data_offset complex_mult_type false true no_offset three_cycle_offset use_luts use_mults_resources use_mults_performance use_luts use_xtremedsp_slices butterfly_type DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Simulation Models When core generated using CORE Generator software, UNISIM-based simulation model created. core does have VHDL Verilog functional behavioral model. this reason, core overrides CORE Generator Project Options always delivers Structural model type. Xilinx recommends that designer simulations using resolution Some Xilinx library components require resolution work properly either functional timing simulation. core's UNISIM-based structural model produce incorrect results simulated with resolution other than "Register Transfer Level (RTL) Simulation Using Xilinx Libraries" section Chapter Synthesis Simulation Design Guide more information. This document part ISE® Software Manuals available System Generator Graphical User Interface This section describes each System Generator details parameters that differ from CORE Generator GUI. "CORE Generator Graphical User Interface" more detailed information about other parameters. Basic Basic used specify transform configuration architecture similar page CORE Generator GUI. Implementation Options: Select implementation option, described "Architecture Options." Pipelined, Streaming I/O, Radix-2, Burst Radix-2 Lite, Burst architectures support point sizes 65536. Radix-4, Burst architecture supports point sizes 65536. option automatically select architecture currently available with System Generator and, therefore, Target Clock Frequency Target Data Throughput available options. System Generator only supports single-channel implementation and, hence, Channels available option. Advanced Advanced used specify phase factor precision, scaling, rounding, optional port options similar page CORE Generator GUI. Specifies core will have clock enable (the equivalent selecting option CORE Generator GUI). RST: Specifies core will have synchronous reset (the equivalent selecting SCLR option CORE Generator GUI). System Generator automatically sets Input Data Width parameter based signal properties XN_RE XN_IM ports. System Generator only supports fixed-point data types and, hence, Data Format available option GUI. www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Implementation Implementation used specify memory optimization options similar page CORE Generator GUI. Number stages using Block RAM: Specifies number stages Pipelined, Streaming architecture that uses Block data phase factor storage. dynamic list boxes offered with System Generator GUI, this option displays full range selection, only allows user select valid values visible CORE Generator GUI. FPGA Area Estimation: System Generator documentation detailed information about this option. Bit-Accurate Model core bit-accurate model designed system modeling selecting parameters before generating core. model bit-accurate cycle-accurate, produces exactly same output data core frame-by-frame basis. However, does model core's latency interface signals. model generally required before generating core, delivered output CORE Generator software. Instead available download Xilinx LogiCORE page model available dynamicallylinked library 32-bit 64-bit Windows platforms, 32-bit 64-bit Linux platforms. model also available MATLAB® function 32-bit Windows only. Download file unzip install model. README.txt file describes contents installed directory structure, further platform-specific installation instructions. Model Interface model used through xfft_v7_0_bitacc_cmodel.h: three functions, declared header file struct xilinx_ip_xfft_v7_0_state* xilinx_ip_xfft_v7_0_generics generics); struct xilinx_ip_xfft_v7_0_state* state, struct xilinx_ip_xfft_v7_0_inputs inputs, struct xilinx_ip_xfft_v7_0_outputs* outputs void xilinx_ip_xfft_v7_0_state* state); first function, creates state structure model, allocating memory store state required, returns pointer that state structure. state structure contains information required define being modelled. function called with structure containing core's generics: these parameters that define bitaccurate numerical performance core, represented integers, derived from parameters that result selections CORE Generator GUI. generics required model their mappings from parameters shown Table DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Table Model Generics Generic C_NFFT_MAX C_ARCH Description log2(maximum point size) Architecture Range 3-16 parameter mapping transform_length: take log2 implementation_options: radix_4_burst_io radix_2_burst_io pipelined_streaming_io radix_2_lite_burst_io false true input_width phase_factor_width data_format: fixed_point floating_point scaling_options: unscaled scaled block_floating_point scaling_options: unscaled scaled block_floating_point rounding_modes: truncation convergent_rounding C_HAS_NFFT Run-time configurable transform length Input data width (bits) Phase factor width (bits) Input/output data format C_INPUT_WIDTH C_TWIDDLE_WIDTH C_USE_FLT_PT 8-34 8-34 C_HAS_SCALING Scaling option: unscaled not. Ignored when C_USE_FLT_PT Scaling option: unscaled, scaled block floating point. Ignored when C_USE_FLT_PT Rounding mode. Ignored when C_USE_FLT_PT C_HAS_BFP C_HAS_ROUNDING After state structure been created, used many times required simulate core. simulation using second function, Call this function with pointer existing state structure, structures hold inputs outputs model. These input output structures fully defined described model's header file. Note that memory input output data arrays must allocated calling program before simulating model. Finally, state structure must destroyed free memory used store state, using third function, called with pointer existing state structure. generics core need changed, destroy existing state structure create state structure using generics. There change generics existing state structure. example file, run_bitacc_cmodel.c, included model file. This shows stages required model. differences between core model order operations within processing phase, when using Pipelined, Streaming architecture, fixed-point data being processed, scaling option Scaled overflow occurs, xk_re xk_im data outputs model match XK_RE XK_IM data outputs core. overflow output model OVFLO output core present) match cases. overflow output model always correctly when scaling option Scaled (when model generics C_HAS_SCALING C_HAS_BFP www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Therefore, Xilinx recommends that overflow output model always checked when scaling option Scaled architecture Pipelined, Streaming I/O, overflow occurred (overflow output xk_re xk_im outputs model ignored. This only case where model entirely bit-accurate core. Using Model Select Scaling Schedule When scaling option core Scaled, user given great flexibility scaling schedule that determines much scale data values each stage processing phase. "Forward/Inverse Scaling Schedule," page difficult choose best scaling schedule that avoids overflow sufficiently large proportion frames particular type input data. model tool that help with selection scaling schedule. process this follows: Create frames typical input data intended application. Create state structure using required generics. scaling option Scaled setting model generics C_HAS_SCALING C_HAS_BFP scaling schedule structure inputs some initial scaling schedule, such reset value each stage Radix-2, Burst I/O, Radix-2 Lite, Burst architectures, each stage Radix-4, Burst I/O, Pipelined, Streaming architectures. Simulate model with each frame typical input data turn. Count number frames which overflow occurred (overflow output percentage frames which overflow occurred lower than acceptable overflow rate, reduce scaling value more stages scaling schedule. percentage frames which overflow occurred higher than acceptable overflow rate, increase scaling value more stages scaling schedule. Repeat stages until percentage frames which overflow occurred matches acceptable overflow rate. This process produces scaling schedule that tailored typical input data intended application. Control Signals Timing Clock Enable Clock Enable present core, driving will pause core current state. logic within core will paused. Driving High will allow core continue processing. Synchronous Clear Synchronous Clear overrides Clock Enable both present core. Asserting Synchronous Clear (SCLR) results output pins, internal counters, state variables being reset their initial values. pending load processes, transform calculations, unload processes stop reinitialized. NFFT largest point size permitted (the Transform Length value GUI). scaling schedule 1/N. Radix-4, Burst Pipelined, Streaming architectures with non-power-of-four point size, last stage scaling rest have scaling Table DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Table Synchronous Clear Reset Values Signal NFFT FWD_INV SCALE_SCH Forward Initial Reset Value maximum point size Radix-4, Burst Pipelined, Streaming architectures when power Radix-4, Burst Pipelined, Streaming architectures when power Radix-2, Burst Radix-2 Lite, Burst architectures Note: run-time configurable transform length option selected, asserting NFFT_WE resets core same asserting SCLR pin, except that NFFT_WE does reset latched scaling schedule transform type (forward inverse). Note that NFFT_WE does override Clock Enable, unlike Synchronous Clear. Therefore, Synchronous Clear required addition run-time configurable transform length. Omitting Synchronous Clear result saving logic resources allow higher maximum clock frequency. Transform Size transform point size through NFFT port run-time configurable transform length option selected. Valid settings corresponding transform sizes provided Table NFFT value entered large, core sets itself largest available point size (selected GUI). value small, core sets itself smallest available point size: Radix-4, Burst architecture other architectures. NFFT values read rising clock edge when NFFT_WE High. transform size retimes current processes within core, every time transform size latched regardless whether point size differs from current point size, core internally reset. (FWD_INV SCALE_SCH reset.) Holding NFFT_WE High continues reset core every clock cycle. Table Valid NFFT Settings NFFT[4:0] 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 Transform size 1024 2048 4096 8192 16384 32768 65536 www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Forward/Inverse Scaling Schedule transform type (forward inverse) scaling schedule frame-by-frame without interrupting frame processing. Both transform type scaling schedule independently each channel multichannel core. single channel core uses FWD_INV transform type SCALE_SCH scaling schedule. multichannel core FWD_INV each channel, named FWD_INV0, FWD_INV1, SCALE_SCH each channel, named SCALE_SCH0, SCALE_SCH1, transform type using FWD_INV pin. Setting FWD_INV produces inverse FFT, setting FWD_INV creates forward transform. Burst Architectures scaling performed during successive stages SCALE_SCH bus. Radix-4, Burst Radix-2 architectures, value SCALE_SCH used pairs bits N0], each pair representing scaling value corresponding stage. Stages computed starting with stage LSBs. There log4(point size) stages Radix-4 log2(point size) stages Radix-2. each stage, data shifted bits, which corresponds SCALE_SCH values example, Radix-4, when 1024, translates right shift stage shift stage shift stage shift stage shift stage (there log4(1024) Radix-4 stages). This scaling schedule scales total bits which gives scaling factor 1/256. conservative schedule SCALE_SCH completely avoids overflows Radix-4, Burst architecture. Radix-2, Burst Radix-2 Lite, Burst architectures, conservative scaling schedule prevents overflow 1024 (there log2(1024) Radix-2 stages). Pipelined, Streaming Architecture Pipelined, Streaming architecture, consider every pair adjacent Radix-2 stages group. That group contains stage group contains stage forth. value SCALE_SCH also used pairs bits N0]. Each pair represents scaling value corresponding group stages. Groups computed starting with group LSBs. each group, data shifted bits which corresponds SCALE_SCH values example, when 1024, translates right shift group (stages shift group (stages shift group (stages shift group (stages shift group (stages conservative schedule SCALE_SCH completely avoids overflows Pipelined, Streaming architecture. When point size power last group only contains stage, maximum growth last group bit. Therefore, MSBs scaling schedule only conservative scaling schedule N=512 SCALE_SCH=[01 11]. user allowed great flexibility transform type (Forward/Inverse) scaling schedule. FWD_INV SCALE_SCH values latched into temporary registers whenever corresponding pins High. FWD_INV_WE SCALE_SCH_WE asserted time until cycles after START asserted, irrespective Input Data Timing parameter value. core then reads these temporary registers these values that used that frame data. There alter those values once transform calculation phase started. assertions later than cycles after START asserted affect frame that follows. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 multichannel core, there separate FWD_INV_WE SCALE_SCH_WE pins each channel, named FWD_INV0_WE, FWD_INV1_WE, SCALE_SCH0_WE, SCALE_SCH1_WE, Both scaling schedule transform type registered internally, there need hold these values pins. Also, scaling transform type constant through multiple frames (that values latched in), registered values apply successive frames. scaling schedule transform type reset when NFFT_WE asserted. initial value reset value FWD_INV forward scaling schedule 1/N. That translates Radix-4, Burst Pipelined, Streaming architectures, Radix-2 architectures. core uses (2*number stages) LSBs scaling schedule. when point size decreases, leftover MSBs ignored. However, bits latched into core SCALE_SCH_WE used later transforms point size increases. Cyclic Prefix Insertion Cyclic prefix insertion takes section output prefixes beginning transform. resultant output data consists cyclic prefix copy output data) followed complete output data, natural order. Cyclic prefix insertion only available when output ordering Natural Order. When cyclic prefix insertion used, length cyclic prefix frame-by-frame without interrupting frame processing. cyclic prefix length number samples from zero less than point size. cyclic prefix length CP_LEN bus. example, when 1024, cyclic prefix length from 1023 samples, CP_LEN value 0010010110 will produce cyclic prefix consisting last samples output data. user allowed great flexibility cyclic prefix length. CP_LEN value latched into temporary register whenever CP_LEN_WE High. CP_LEN_WE asserted time before frame data loaded core reads this temporary register cycles after START asserted, irrespective Input Data Timing parameter. This value that used current frame data. There alter this value once transform calculation phase started. CP_LEN_WE assertions later than cycles after START asserted affect frame that follows. cyclic prefix length registered internally, there need hold value CP_LEN bus. Also, cyclic prefix length constant through multiple frames (that values latched in), registered values apply successive frames. cyclic prefix length reset when NFFT_WE asserted. initial value reset value CP_LEN cyclic prefix). core uses log2(point size) MSBs CP_LEN cyclic prefix length. when point size decreases, leftover LSBs ignored. This effectively scales cyclic prefix length with point size, keeping them approximately constant proportion. However, bits CP_LEN latched into core CP_LEN_WE used later transforms point size increases. www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Overflow Fixed-Point Data Overflow (OVFLO) signal only available when Scaled arithmetic used. OVFLO driven High during unloading point data frame overflowed. multichannel core, there separate OVFLO output each channel, named OVFLO0, OVFLO1, Burst architectures, OVFLO signal goes High soon overflow occurs during computation remain High during entire time frame unloading. Pipelined, Streaming architecture, OVFLO signal goes High during unloading soon overflow detected that frame held high remainder frame. When overflow occurs core, data wrapped rather than saturated, resulting transformed data becoming unusable most applications. Floating-Point Data Overflow signal used indicate exponent overflow when processing floatingpoint data. When exponent overflow occurs, OVFLO signal goes High soon overflow detected that frame, remains High remainder frame. This behavior same both Burst Pipelined, Streaming architectures, which different from Overflow behavior fixed-point data described above. output sample which overflowed will Infinity, depending sign internal result. Overflow signal will asserted when value present output. values only occur output when input data frame contains Infinity samples. Block Exponent Block Exponent (BLK_EXP) signal (used only with block floating-point option) contains block exponent. multichannel core, there separate BLK_EXP output each channel, named BLK_EXP0, BLK_EXP1, signal valid during unloading data frame. value present port represents total number bits data scaled during transform. example, BLK_EXP value 00101 this means output data (XK_RE, XK_IM) scaled bits (shifted right bits), other words, divided fully utilize available dynamic range output data path without overflowing. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Timing Pipelined, Streaming Architecture Setting Starting Transform Asserting START starts data loading phase, which immediately flows into transform calculation phase then data unloading phase. Pulsing START once allows transform calculation single frame. Pulsing START every clock cycles allows continuous data processing. Alternatively, holding START High also allows continuous data processing (see Figure Figure cyclic prefix insertion used). START ignored except when core begin loading frame, i.e., when data being loaded, last value data frame being loaded. NFFT_WE, FWD_INV_WE, SCALE_SCH_WE were asserted before initial START, then defaults used. This architecture also support extended intervals between frames (Figure 10). Simply assert START time begin data loading. After data frame loaded, core proceeds calculate transform then output results. Figure intended show timing entire frames. does show small skews between signals which occur start frames. Applying Data Data applied contiguous burst. point which data input should start relative START pulse determined Input Data Timing parameter GUI. offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive same cycle XN_INDEX matches. first data sample should therefore applied soon goes High, such that first sample pair read into core first transition XN_INDEX. clock cycle offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive three clock cycles later than XN_INDEX matches (see Figure 11). this way, XN_INDEX used address external memory frame buffer storing input data. remains High with XN_INDEX during loading phase indicates that data input. Data Processing Data Output BUSY goes High while core calculating transform. DONE goes High when calculation complete. EDONE goes High cycle before that, i.e., during last cycle calculation phase. cycle which DONE goes High, core begins unloading. During unloading phase, while valid output results present XK_RE/XK_IM, (Data Valid) High. During unloading, XK_INDEX corresponds XK_RE/XK_IM being presented. cyclic prefix insertion used, cyclic prefix unloaded first. goes High indicate that cyclic prefix being unloaded, XK_INDEX counts from (point size) (cyclic prefix length) (point size) After cyclic prefix been unloaded, cyclic prefix length zero, cyclic prefix insertion used, whole frame output data unloaded. goes present) XK_INDEX counts from (point size) www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Cyclic Prefix Considerations cyclic prefix insertion used, more samples unloaded from core than loaded. Therefore, core cannot continuously stream frames, must insert (cyclic prefix length) clock cycles between each frame input data accommodate additional clock cycles required unload cyclic prefix. This indicated Ready Start (RFS) pin. goes High when core ready START asserted begin loading next frame data. START ignored except when High. remains (cyclic prefix length) clock cycles after gone Low, allow unloading cyclic prefix. DS260 June 2009 Product Specification www.xilinx.com X-Ref Target Figure Fast Fourier Transform v7.0 sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we start xn(0) xn(0) xn_re Figure Timing Continuous Streaming Data www.xilinx.com xk(0) xk(0) xn_im xn_index busy cycles edone done xk(N-1) xk(0) xk(N-1) xk(0) xk(N-1) xk(0) xk(N-1) xk(0) xip222 xk_re xk_im DS260 June 2009 Product Specification xk_index X-Ref Target Figure sclr nfft nfft_we fwd_inv DS260 June 2009 Product Specification scale_sch cp_len fwd_inv_we scale_sch_we cp_len_we start xn_re xn(0) xn(0) xn(N-4) xn(N-3) xn(N-2) xn(N-1) xn(N-4) xn(N-3) xn(N-2) xn(N-1) xn(0) xn(0) Figure Timing Continuous Streaming Data with Cyclic Prefix Insertion Length www.xilinx.com xn_im xn_index busy edone done done xk_re xk_im xk_index cp_len cycles xk(N-2) xk(N-1) xk(0) xk(N-2) xk(N-1) xk(0) xk(N-2) xk(N-1) xk(N-2) xk(N-2) xk(N-1) xk(N-2) Fast Fourier Transform v7.0 xip229 Fast Fourier Transform v7.0 X-Ref Target Figure start xn_re xn_im xn_index busy xn_re xn_im xn_index unload Frame unload Frame unload Frame unload Frame processing Frame processing Frame load data Frame load data Frame load data Frame load data Frame Note: transitions synchronous with rising edge clock. xip223 Figure Timing Non-Continuous Data Stream X-Ref Target Figure sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we cp_len cp_len_we start xn_re xn_im xn_index busy edone done xip224 size scaling length xn_re(0) xn_im(0) xn_re(1) xn_im(1) xn_re(2) xn_im(2) xn_re(3) xn_im(3) xn_re(4) xn_im(4) Figure Beginning Data Frame (Input Data Timing clock cycle offset") www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Timing Radix-4, Burst I/O, Radix-2, Burst I/O, Radix-2 Lite, Burst Architectures Setting Starting Transform START signal begins data loading phase, which leads directly calculation phase. Start ignored except when core begin loading frame, i.e., when core idle last cycle calculation (bit-reversed output) unloading (natural order output). Applying Data Data applied contiguous burst. point which data input should start relative START pulse determined Input Data Timing parameter GUI. offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive same cycle XN_INDEX matches. first data sample should therefore applied soon goes High, such that first sample pair read into core first transition XN_INDEX. clock cycle offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive three clock cycles later than XN_INDEX matches (see Figure 11). this way, XN_INDEX used address external memory frame buffer storing input data. remains High with XN_INDEX during loading phase indicates that data input. Data Processing BUSY goes High while core calculating transform. DONE goes High when calculation complete. EDONE goes High cycle before that, i.e., during last cycle calculation phase. Data Output After data loaded processed, options available unload data: Natural Order output order selected, UNLOAD should asserted (Figure Figure cyclic prefix insertion used) output data. During unloading phase, while valid output results present XK_RE/XK_IM, (Data Valid) High. During unloading, XK_INDEX corresponds XK_RE/XK_IM being presented. cyclic prefix insertion used, cyclic prefix unloaded first. goes High indicate that cyclic prefix being unloaded, XK_INDEX counts from (point size) (cyclic prefix length) (point size) After cyclic prefix been unloaded, cyclic prefix length zero, cyclic prefix insertion used, whole frame output data unloaded. goes present) XK_INDEX counts from (point size) UNLOAD asserted time from when EDONE goes High. UNLOAD ignored except when core begin unloading. addition using pulses, START UNLOAD tied High (Figure 14). this case, core continuously loads, processes, unloads data. Figure intended show timing entire frames. does show small skews between signals which occur start frames does show length each phase transform scale. processing time much longer than time required input output frame. Bit/Digit-Reversed output order selected, user assert START again (Figure 15). While next frame data loaded, results output same time. START asserted time from when EDONE goes High. START tied High, core continuously loads/unloads then processes, loads/unloads then processes, (Figure 16). remains High during data unloading both cases. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 There latency several clock cycles after triggering unload with UNLOAD START before output data XK_RE/XK_IM presented. This latency varies function several core parameters, output data qualified (Data Valid) XK_INDEX, should considered handshake. X-Ref Target Figure sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we start xn_re xn_im xn_index unload busy edone done xk_re xk_im xk_index xk_re(0) xk_re(1) xk_re(2) xk_im(0) xk_im(1) xk_im(2) xip226 Figure Unload Output Results Natural Order www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 X-Ref Target Figure sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we cp_len cp_len_we start xn_re xn_im xn_index busy edone done unload xk_re xk_im xk_index xk(N-2) xk(N-2) xk(N-1) xk(N-1) xk(0) xk(0) xk(1) xk(1) xk(2) xk(2) xip230 Figure Unload Output Results Natural Order with Cyclic Prefix Insertion Length DS260 June 2009 Product Specification www.xilinx.com X-Ref Target Figure Fast Fourier Transform v7.0 start load Frame load Frame load Frame load Frame xn_re xn_im xn_index unload processing Frame processing Frame busy Figure Timing Burst Solutions with Natural Order Output www.xilinx.com unload Frame unload Frame unload Frame unload Frame xk_re xk_im xn_index Note: transitions synchronous with rising edge clock. xip225 DS260 June 2009 Product Specification X-Ref Target Figure DS260 June 2009 Product Specification sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we start xn_re xn_im xn_index unload busy edone done xk_re xk_im xk_index xk_re xk_im xk_re xk_im xk_re xk_im digit-reversed order xip228 scaling xn_re(0) xn_re(1) xn_re(2) xn_re(3) xn_re(4) xn_re(5) xn_re(6) xn_im(0) xn_im(1) xn_im(2) xn_im(3) xn_im(4) xn_im(5) xn_im(6) Figure Unload Results Bit/Digit Reversed Order (Input Data Timing clock cycle offset") www.xilinx.com Fast Fourier Transform v7.0 X-Ref Target Figure Fast Fourier Transform v7.0 start xn(0) xn(0) Input data frame xn(N-4) x(N-3) xn(N-2) xn(N-1) xn(N-4) xn(N-3) xn(N-2) xn(N-1) xn(0) xn(0) Input data frame xn_re xn_im xn_index busy Figure Continuous Processing with Bit/Digit Reversed Order (Input Data Timing clock cycle offset") www.xilinx.com xk(0) xk(0) edone done xk(0) xk(0) Digit-reversed output previously entered frame Digit-reversed output data frame xip227 xk_re xk_im xk_index DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Known Device-Specific Constraints This section details issues which encountered when mapping core particular device. many cases, possible work around these issues adjusting core configuration without having alter target FPGA device. Spartan-3 FPGA Constraints Explanation these device architectures, multiplier site adjacent location 512x36 block component must remain free because interconnect resource sharing between multipliers block RAMs. This means that adjacent block used only bits wide when multiplier used. Error Message During place route, Place tool generates message similar following: ERROR:Place:341 design contains Block components that configured 512x36 Block RAMs Multiplier components. Multiplier site adjacent location 512x36 Block component must remain free because resource sharing. Therefore device must have least Multiplier sites this design fit. current device only Multiplier sites. Placer errors also present. Solution There number solutions this issue: Reduce input data width and/or phase factor width bits bits, respectively, allow adjacent block RAMs multipliers used uses Pipelined, Streaming architecture, reduce value Number Stages Using Block parameter reduce number block RAMs required. This would increase number slices used core. uses Burst architecture, distributed data phase factor memory, hybrid memory optimization available) reduce number block RAMs required. This would increase number slices used core. unscaled implemented, utilize scaled block floating-point instead reduce output width bits less allow adjacent block RAMs multipliers used. larger device with more block multiplier components. Spartan-3A FPGA Constraints Explanation Spartan-3A device split left-most right-most XtremeDSP slice columns accommodate clock tiles. complex multipliers core dedicated cascade routing between XtremeDSP slices enable high performance reduce power consumption. cascade routing cannot cross clock tile these particular columns. densely-packed devices where many XtremeDSP slices have been used, placer have option attempt place cascaded XtremeDSP slices these split columns, which possible. This occur multichannel Burst FFTs large Pipelined, Streaming FFTs. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Error Message During place route, Place tool generates message similar following: WARNING:Place:119 Unable find location. DSP48A component blk00000003/blk00002ec4 placed. WARNING:Place:119 Unable find location. DSP48A component blk00000003/blk00002ec9 placed. WARNING:Place:119 Unable find location. DSP48A component blk00000003/blk00002ec8 placed. Comps belong structure: Multiplier Cascade number instance names listed> ERROR:Place:120 There were enough sites place selected components. Solutions option optimize complex multipliers speed using XtremeDSP slices been checked, un-checking this option will fewer XtremeDSP slices, permit packing device. This impact maximum achievable clock frequency. option optimize butterflies using XtremeDSP slices been checked, un-checking this option will free XtremeDSP slice locations which allow placement succeed. This impact maximum achievable clock frequency. Reduce data phase factor widths until number XtremeDSP slices reduced. Because phase factor width increased internally, reducing bits less will allow smaller complex multiplier architecture utilized. This will impact maximum achievable clock frequency (and improve it), yields small reduction data precision. larger device. left-most right-most columns XC3SD1800A device shorter XtremeDSP slices) than equivalent columns XC3SD3400A device XtremeDSP slices). [Ref further details Spartan-3A XtremeDSP slices. Performance Resource Usage following tables list resource usage transform time selected parameters. This core does placement constraints, allowing Place Route full flexibility. slice count, block count, XtremeDSPslice count listed. maximum clock frequency listed with transform latency. latency from asserting START input last sample output data coming core, assuming that UNLOAD input asserted soon possible present. following device architectures represented: "Virtex-6 FPGA Family" "Virtex-5 FPGA Family" "Spartan-6 Family" "Spartan-3A Family" maximum clock frequency each test determined iteratively. determination maximum frequency, core generated with double registers each input output. registers directly connected core core clock, whereas outer registers separate clock. This ensures that paths core included timing constraint without artificially distorting design chip. slowest speed grade used each family. parameters used follows: high high www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 maximum achievable clock frequency resource counts also affected other tools options, additional logic FPGA device, using different version Xilinx tools, other factors. Improved performance resource usage achieved apply area group, using arguments such "-lc area". Consult 11.2 software documentation more details available options. When comparing performance resource usage with Fast Fourier Transform v5.0, note that option used arguments above, leading higher slice counts, improved performance. Virtex-6 FPGA Family Table shows performance resource usage numbers Virtex-6 FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown table. None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "PREVIEW 0.63 2009-04-27.' DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Table Virtex-6 FPGA Family Performance Resource Utilization Stages Using Block Clock Frequency Cyclic Prefix Insertion Optimize Speed Latency (cycles) 12453 12473 26804 26826 12453 12453 26804 26804 7364 7354 15575 15564 7364 7364 7364 15575 15575 15575 1652 1670 1670 Phase Factor Width Variable Point Size Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T 1841 2482 1951 2602 1018 1099 1604 2789 4542 1732 3003 4940 1031 1033 1084 1086 2603 4699 2674 4794 1204 1090 1265 1153 1978 3526 6622 2077 3701 6949 28.89 28.03 69.08 70.59 30.37 29.86 69.08 70.54 17.40 18.62 40.14 39.40 16.55 17.96 17.96 38.65 37.99 45.94 3.91 3.81 4.23 XC6VLX130T 1847 XC6VLX240T 4091 XC6VLX130T 1961 XC6VLX240T 4218 XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T 1029 1106 Baseband 3GPP XC6VLX130T 1617 XC6VLX130T 2807 XC6VLX240T 5899 XC6VLX130T 1743 XC6VLX130T 3016 XC6VLX240T 6312 XC6VLX75T XC6VLX75T XC6VLX75T OFDM www.xilinx.com DS260 June 2009 Product Specification Latency (10) Input Data Width Memory Type Scaling Type LUT/FF Pairs Application Xilinx Part Point Size Channels LUTs Fast Fourier Transform v7.0 Table Virtex-6 FPGA Family Performance Resource Utilization (Cont'd) Stages Using Block Clock Frequency Cyclic Prefix Insertion Optimize Speed Latency (cycles) 2179 2167 2179 2179 3207 3203 3216 2181 2171 2171 3199 3195 3223 3223 3225 12445 12441 24748 24746 5800 24758 24745 49341 49327 1411 5529 22703 2225 9427 41205 3169 14441 65649 3209 12445 Phase Factor Width Variable Point Size Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T 3196 3909 3930 3092 3319 3361 4177 3273 2286 3232 2394 2460 4703 4727 5100 4109 4279 4639 4985 6278 7918 6839 8699 3211 3359 3443 2005 2074 2141 1862 1928 2009 3524 4342 5201 3126 3865 3834 3003 3224 3265 4083 3213 2221 3189 2297 2399 4585 4623 4995 4006 4202 4492 4837 6185 7838 6751 8585 3164 3299 3387 1981 2046 2099 1843 1901 1967 3444 4236 5082 4848 5331 4885 4829 4991 5431 5949 4844 3307 4173 3434 3892 7335 7919 8122 6207 6953 6908 7936 1224 9394 10738 5.31 4.87 5.06 5.41 7.58 7.81 7.98 4.90 4.88 5.13 7.30 7.29 9.13 8.62 8.96 36.71 32.06 63.78 81.67 13.03 70.74 79.82 162.84 170.09 3.71 14.00 58.51 5.00 22.29 95.60 7.49 36.56 160.12 2.02 7.96 33.28 Scanners (11) 10160 11562 4282 4432 4533 2618 2687 2743 2511 2574 2621 5161 6516 7962 Test DS260 June 2009 Product Specification www.xilinx.com Latency (10) Input Data Width Memory Type Scaling Type LUT/FF Pairs Application Xilinx Part Point Size Channels LUTs Fast Fourier Transform v7.0 Table Virtex-6 FPGA Family Performance Resource Utilization (Cont'd) Stages Using Block Clock Frequency Cyclic Prefix Insertion Optimize Speed Latency (cycles) 3446 3451 3446 3446 3446 131256 131256 98497 Phase Factor Width Variable Point Size Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T 2031 1617 2096 2392 2460 2543 2617 8376 2023 1593 2087 2375 2451 2522 2591 8257 2450 2144 2474 2876 2900 2833 2845 8.60 7.76 8.15 8.40 8.40 364.6 371.83 386.26 Radar 10698 Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst I/O. Scaling types: scaled; unscaled; block floating point; single precision floating point Rounding modes: convergent rounding; truncation. Output ordering: Natural Order; Bit/Digit Reversed Order. Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures, output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic Virtex-6 FPGAs have block RAMs that packed pairs form block RAMs. reports number block RAMs block RAMs, which match number block RAMs given here. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound. Virtex-5 FPGA Family Table shows performance resource usage numbers Virtex-5 FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown Table None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "PRODUCTION 1.65 2009-04-27." www.xilinx.com DS260 June 2009 Product Specification Latency (10) Input Data Width Memory Type Scaling Type LUT/FF Pairs Application Xilinx Part Point Size Channels LUTs Fast Fourier Transform v7.0 Table Virtex-5 Family Performance Resource Utilization Latency (Clock Cycles) Stages Using Block Clock Frequency Configurable Point Size Cyclic Prefix Insertion Optimize speed Phase Factor Width Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation Input Data Width Memory Type XC5VSX95T 1179 XC5VSX95T 1181 XC5VSX95T 1287 XC5VSX95T 1288 XC5VSX95T 2772 XC5VLX330 4889 1906 3302 2034 2776 1114 1205 1799 3169 5909 1943 3419 6371 1031 1033 1084 1086 2603 4699 2674 4794 1271 1100 1333 1164 2102 3764 7088 2202 3940 7416 12453 12473 26804 26826 12453 12453 26804 26804 7364 7364 15575 15564 7364 7364 7364 15575 15575 15575 1652 1670 1670 30.90 31.58 70.54 62.24 33.30 42.94 74.46 92.43 20.12 17.66 41.64 37.96 21.22 21.72 26.68 43.26 46.91 68.61 4.35 4.47 4.47 Baseband 3GPP XC5VSX95T 2906 XC5VLX330 5026 XC5VSX95T 1447 XC5VSX95T 1253 XC5VSX95T 1539 XC5VSX95T 1336 XC5VSX95T 2318 XC5VSX95T 4061 XC5VLX330 7544 XC5VSX95T 2475 XC5VSX95T 4316 XC5VLX330 8034 OFDM XC5VSX95T 1142 XC5VSX95T XC5VSX95T 1001 DS260 June 2009 Product Specification www.xilinx.com Latency (10) Scaling Type LUT/FF Paris Application Xilinx Part Point size Channels LUTs Fast Fourier Transform v7.0 Table Virtex-5 Family Performance Resource Utilization (Cont'd) Latency (Clock Cycles) Stages Using Block Clock Frequency Configurable Point Size Cyclic Prefix Insertion Optimize speed Phase Factor Width Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation Input Data Width Memory Type XC5VSX95T 5034 XC5VSX95T 5495 XC5VSX95T 5653 XC5VSX95T 4904 XC5VSX95T 5170 XC5VSX95T 5595 XC5VSX95T 6188 XC5VSX95T 4997 XC5VSX95T 3468 XC5VSX95T 4625 XC5VSX95T 3579 XC5VSX95T 4048 XC5VSX95T 7574 XC5VSX95T 8141 XC5VSX95T 8551 XC5VSX95T 6378 XC5VSX95T 7156 XC5VSX95T 7168 XC5VSX95T 8220 XC5VSX95T 1316 XC5VSX95T 9786 3936 4600 4742 3708 4037 4115 4854 3853 2712 3791 2797 2972 5924 5971 6287 4890 5154 5440 5855 7105 4590 5077 4627 4571 4733 5151 5494 4447 3129 4222 3256 3692 6977 7539 7742 5845 6545 6492 7460 1224 8646 9885 2179 2167 2179 2179 3207 3203 3216 2181 2171 2171 3199 3195 3223 3223 3225 12445 12441 24748 24746 5800 24758 24745 49341 49327 1411 5529 22703 2225 9427 41205 3169 14441 65649 3209 12445 5.06 5.70 5.52 5.06 7.44 7.95 8.79 5.23 4.88 5.93 7.19 7.41 8.48 8.81 9.14 31.51 34.56 74.54 79.83 14.15 77.86 77.81 193.49 188.27 4.16 14.78 64.31 5.73 25.21 110.17 8.47 42.60 193.65 2.24 8.77 34.57 Scanners (11) XC5VSX95T 11178 8649 XC5VSX95T 10635 7771 9337 XC5VSX95T 11979 9434 10618 XC5VSX95T 4434 XC5VSX95T 4621 XC5VSX95T 4673 XC5VSX95T 2751 XC5VSX95T 2878 XC5VSX95T 2936 XC5VSX95T 2641 XC5VSX95T 2724 XC5VSX95T 2793 XC5VSX95T 5377 XC5VSX95T 6715 XC5VSX95T 8242 2771 2900 3034 1878 1942 2059 1698 1721 1886 4205 5105 6117 4243 4401 4494 2626 2697 2757 2511 2574 2621 4983 6209 7512 Test www.xilinx.com DS260 June 2009 Product Specification Latency (10) Scaling Type LUT/FF Paris Application Xilinx Part Point size Channels LUTs Fast Fourier Transform v7.0 Table Virtex-5 Family Performance Resource Utilization (Cont'd) Latency (Clock Cycles) Stages Using Block Clock Frequency Configurable Point Size Cyclic Prefix Insertion Optimize speed Phase Factor Width Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation Input Data Width Memory Type XC5VSX95T 2813 XC5VSX95T 2259 XC5VSX95T 2887 XC5VSX95T 3158 XC5VSX95T 3297 XC5VSX95T 3332 XC5VSX95T 3369 2220 1451 2325 2532 2637 2753 2832 2614 2134 2614 3022 3022 2966 2978 3446 3451 3446 3446 3446 131256 131256 98497 9.93 8.89 9.93 9.93 10.64 463.80 423.41 447.71 Radar XC5VSX95T 11687 9175 9727 Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst Scaling types: scaled; unscaled; block floating point; single precision floating point. Rounding modes: convergent rounding; truncation Output ordering: Natural Order; Bit/Digit Reversed Order Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic. Virtex-5 FPGAs have block RAMs that packed pairs form block RAMs. reports number block RAMs block RAMs, which match number block RAMs given here. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound. Spartan-6 Family Table shows performance resource usage numbers Spartan-6 FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown Table Some rows table grayed-out indicate that these cores would device FPGA resource requirements (typically insufficient pins route core signals outside device). None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "ADVANCED 0.94 2009-04-27." DS260 June 2009 Product Specification www.xilinx.com Latency (10) Scaling Type LUT/FF Paris Application Xilinx Part Point size Channels LUTs Fast Fourier Transform v7.0 Table Spartan-6 Family Performance Resource Utilization Latency (clock cycles) 12453 12473 26804 26826 12453 26804 7364 7364 15575 15564 7364 7364 15575 15575 1652 1670 1670 clock frequency Stages Using Block Configurable Point Size Cyclic Prefix Insertion Optimize Speed Phase Factor Width Output Ordering Rounding Mode XtremeDSP slices Implementation Block RAMs Input Data Width XC6SLX150T XC6SLX150T XC6SLX150T XC6SLX150T 1784 1032 1034 1084 1086 2610 51.89 52.85 119.66 117.66 63.54 XC6SLX150T 1835 XC6SLX150T 2458 XC6SLX150T 1023 XC6SLX150T Baseband 3GPP 1546 2674 127.64 1000 1077 1562 2699 1204 1090 1265 1153 1978 3526 30.18 31.20 68.31 71.07 32.30 34.74 XC6SLX150T 1109 XC6SLX150T XC6SLX150T 1589 XC6SLX150T 2745 XC6SLX150T 1716 XC6SLX150T 3466 XC6SLX150T XC6SLX150T XC6SLX150T 1681 2672 2078 3702 71.12 74.17 OFDM 7.54 7.63 7.32 www.xilinx.com DS260 June 2009 Product Specification Latency(s) (10) Memory Type Scaling Type LUT/FF Paris Application Xilinx Part Point Size Channels LUTs Fast Fourier Transform v7.0 Table Spartan-6 Family Performance Resource Utilization (Cont'd) Latency (clock cycles) 2203 2215 2203 2203 3231 3227 3240 2202 2171 2159 3199 3204 3251 3247 3253 12475 12471 24784 24785 5854 clock frequency Stages Using Block Configurable Point Size Cyclic Prefix Insertion Optimize Speed Phase Factor Width Output Ordering Rounding Mode XtremeDSP slices Implementation Block RAMs Input Data Width XC6SLX150T 3574 XC6SLX150T 5438 XC6SLX150T 4255 XC6SLX150T 3446 XC6SLX150T 3646 XC6SLX150T 3769 XC6SLX150T 4451 XC6SLX150T 3525 XC6SLX150T 2315 XC6SLX150T 2789 XC6SLX150T 2388 XC6SLX150T 2617 XC6SLX150T 5031 XC6SLX150T 5069 XC6SLX150T 5455 XC6SLX150T 4486 XC6SLX150T 4793 XC6SLX150T 5093 XC6SLX150T 5563 XC6SLX150T 3414 5355 4115 3279 3493 3619 4309 3398 2189 2693 2273 2485 4875 4860 5224 4328 4628 4865 5363 5425 7051 5462 5407 5568 6106 6527 5305 3296 3633 3423 4053 7904 8382 8585 6927 7827 7779 9029 1296 13.35 13.42 17.62 18.67 22.91 19.56 18.84 13.35 9.91 10.64 17.02 22.72 31.87 27.52 29.57 113.41 83.70 242.98 242.99 23.99 scanners (11) DS260 June 2009 Product Specification www.xilinx.com Latency(s) (10) Memory Type Scaling Type LUT/FF Paris Application Xilinx Part Point Size Channels LUTs Fast Fourier Transform v7.0 Table Spartan-6 Family Performance Resource Utilization (Cont'd) Latency (clock cycles) 24797 24819 49380 49401 1435 5559 22739 2273 9487 41277 3175 14447 65655 3236 12481 3446 3451 3446 3466 3466 98515 clock frequency Stages Using Block Configurable Point Size Cyclic Prefix Insertion Optimize Speed Phase Factor Width Output Ordering Rounding Mode XtremeDSP slices Implementation Block RAMs Input Data Width XC6SLX150T 6831 XC6SLX150T 10030 XC6SLX150T 7414 6621 9878 7205 10490 13272 11248 243.11 166.57 525.32 449.10 10.71 41.49 241.90 13.22 46.73 239.98 14.50 63.36 381.72 5.47 27.42 122.36 16.98 16.28 17.58 17.07 17.68 XC6SLX150T 10700 10579 14104 XC6SLX150T 3343 XC6SLX150T 3513 XC6SLX150T 3609 XC6SLX150T 2061 XC6SLX150T 2136 XC6SLX150T 2206 XC6SLX150T 1898 XC6SLX150T 1943 XC6SLX150T 2016 XC6SLX150T 3749 XC6SLX150T 4703 XC6SLX150T 5642 XC6SLX150T 1963 XC6SLX150T 1656 XC6SLX150T 2101 XC6SLX150T 2544 XC6SLX150T 2601 XC6SLX150T 2512 XC6SLX150T 2578 XC6SLX150T 8451 3278 3433 3532 1993 2063 2118 1834 1878 1937 3623 4506 5434 1940 1603 2059 2496 2557 2464 2524 8303 4710 4869 4970 2770 2838 2894 2610 2671 2718 5667 7227 8890 2450 2142 2474 3002 3026 2833 2845 10525 Test Radar 131256 1050.05 131256 979.52 965.83 Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst I/O. Scaling types: scaled; unscaled; block floating point; single precision floating point. Rounding modes: convergent rounding; truncation. Output ordering: Natural Order; Bit/Digit Reversed Order. Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures, output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic. Spartan-6 FPGAs have block RAMs that packed pairs form block RAMs. reports number block RAMs block RAMs, which match number block RAMs given here. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound. www.xilinx.com DS260 June 2009 Product Specification Latency(s) (10) Memory Type Scaling Type LUT/FF Paris Application Xilinx Part Point Size Channels LUTs Fast Fourier Transform v7.0 Spartan-3A Family Table shows performance resource usage numbers Spartan-3A FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown Table Some rows table grayed-out indicate that these cores would device FPGA resource requirements (typically insufficient pins route core signals outside device). None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "PRODUCTION 1.33 2009-04-27" DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Table Spartan-3A Family Performance Resource Utilization Latency (clock cycles) 12453 12473 26804 26826 12453 26804 7364 7354 15575 15564 7364 7364 15575 15575 1679 1670 1670 clock frequency Stages Using Block Configurable Point Size Cyclic Prefix Insertion Optimize Speed Phase Factor Width Output Ordering Rounding Mode XtremeDSP slices Implementation Input Data Width Block RAMs Memory Type XC3SD3400A XC3SD3400A XC3SD3400A XC3SD3400A 1004 1160 1130 2233 1062 1064 1117 1119 2640 61.34 61.44 155.84 149.03 66.24 XC3SD3400A 1710 Baseband 3GPP XC3SD3400A 1805 2395 2718 136.76 XC3SD3400A XC3SD3400A XC3SD3400A XC3SD3400A 1397 1198 1515 1280 2245 3874 1273 1160 1339 1227 2049 3597 36.28 36.23 76.72 73.42 39.17 40.91 XC3SD3400A 1409 XC3SD3400A 2468 XC3SD3400A 1501 XC3SD3400A 2618 2412 4144 2155 3779 82.85 86.53 OFDM XC3SD3400A 1197 XC3SD3400A XC3SD3400A 2063 1433 7.92 7.88 8.52 www.xilinx.com DS260 June 2009 Product Specification Latency(s) Scaling Type Application Xilinx Part Point Size Channels Slices LUTs Fast Fourier Transform v7.0 Table Spartan-3A Family Performance Resource Utilization (Cont'd) Latency (clock cycles) 2203 2215 2203 2203 3231 3227 3240 2202 2171 2159 3199 3204 3251 3247 3253 12475 12471 5854 clock frequency Stages Using Block Configurable Point Size Cyclic Prefix Insertion Optimize Speed Phase Factor Width Output Ordering Rounding Mode XtremeDSP slices Implementation Input Data Width Block RAMs Memory Type XC3SD3400A 3408 XC3SD3400A 4440 XC3SD3400A 4373 XC3SD3400A 3204 XC3SD3400A 3494 XC3SD3400A 3600 XC3SD3400A 4144 XC3SD3400A 3277 XC3SD3400A 2151 XC3SD3400A 2307 XC3SD3400A 2230 XC3SD3400A 2445 XC3SD3400A 4844 XC3SD3400A 4884 XC3SD3400A 5280 XC3SD3400A 4303 XC3SD3400A 4561 5076 7339 7157 4521 5175 4997 6147 4654 3263 3693 3346 3443 7321 6891 7606 6261 6239 5167 6843 5358 5149 5310 5826 6072 4908 3118 3466 3245 3853 7546 8002 8412 6565 7419 10.85 13.42 16.44 10.39 15.92 17.16 16.53 11.23 10.24 11.02 14.61 15.11 18.90 19.68 19.72 61.45 69.28 scanners (10) XC3SD3400A 1158 1323 31.14 DS260 June 2009 Product Specification www.xilinx.com Latency(s) Scaling Type Application Xilinx Part Point Size Channels Slices LUTs Fast Fourier Transform v7.0 Table Spartan-3A Family Performance Resource Utilization (Cont'd) Latency (clock cycles) 24819 1435 5559 22739 2273 9487 41277 3175 14447 65655 3236 3466 3451 3446 3466 3466 clock frequency Stages Using Block Configurable Point Size Cyclic Prefix Insertion Optimize Speed Phase Factor Width Output Ordering Rounding Mode XtremeDSP slices Implementation Input Data Width Block RAMs Memory Type XC3SD3400A 1691 XC3SD3400A 1529 XC3SD3400A 1764 XC3SD3400A 2098 XC3SD3400A 2170 XC3SD3400A 2085 XC3SD3400A 2156 2737 2069 2873 3402 3538 3476 3617 2441 2135 2465 3017 3040 2816 2830 18.44 18.36 20.03 21.01 21.01 841.38 841.38 XC3SD3400A 3170 XC3SD3400A 3298 XC3SD3400A 3367 XC3SD3400A 1893 XC3SD3400A 1968 XC3SD3400A 2021 XC3SD3400A 1682 XC3SD3400A 1746 XC3SD3400A 1785 XC3SD3400A 3566 XC3SD3400A 4439 4157 4322 4489 2526 2654 2795 2128 2200 2381 5200 6276 4679 4845 4938 2786 2883 2957 2606 2699 2752 5504 6932 7.97 33.69 126.33 11.60 50.46 219.56 16.89 78.85 364.75 5.25 18.81 XC3SD3400A 8047 12691 12600 159.10 Test Radar 131256 131256 Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst I/O. Scaling types: scaled; unscaled; block floating point; single precision floating point. Rounding modes: convergent rounding; truncation. Output ordering: Natural Order; Bit/Digit Reversed Order. Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures, output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound. www.xilinx.com DS260 June 2009 Product Specification Latency(s) Scaling Type Application Xilinx Part Point Size Channels Slices LUTs Fast Fourier Transform v7.0 Dynamic Range Characteristics dynamic range characteristics shown performing slot noise tests. First, frame complex Gaussian noise data samples created. taken acquire spectrum data. create slot, range frequencies spectra zero. create input slot noise data frame, inverse taken, then data quantized full input dynamic range. Because quantization, perfect done frame, noise floor bottom slot nonzero. Input Data figures, which basically represent dynamic range input format, display this. This slot noise input data frame core shallow slot becomes finite precision arithmetic. depth slot shows dynamic range FFT. Figure through Figure show effect input data width dynamic range. FFTs have same width both data phase factors. Block floating-point arithmetic used with rounding after butterfly. figures show input data slot output data slot widths X-Ref Target Figure -100 -110 -120 -130 -140 BinNumber 1000 Figure Input Data: Bits X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Core Results: Bits DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 X-Ref Target Figure -100 -110 -120 -130 -140 BinNumber 1000 Figure Input Data: Bits X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Core Results: Bits X-Ref Target Figure -100 -110 -120 -130 -140 BinNumber 1000 Figure Input Data: Bits www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Core Results: Bits X-Ref Target Figure -100 -110 -120 -130 -140 BinNumber 1000 Figure Input Data: Bits X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Core Results: Bits DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 X-Ref Target Figure -100 -110 -120 -130 -140 BinNumber 1000 Figure Input Data: Bits X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Core Results: Bits There several options available that also affect dynamic range. Consider arithmetic type used. www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 Figure Figure Figure display results using unscaled, scaled (scaling 1/1024), block floating point. three FFTs 1024 point, Radix-4, Burst transforms with 16-bit input, 16bit phase factors, convergent rounding. X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Full-Precision Unscaled Arithmetic X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Scaled (scaling 1/N) Arithmetic X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Block Floating Point Arithmetic DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 After butterfly computation, LSBs data path truncated rounded. effects these options shown below Figure Figure Both transforms 1024 points with 16-bit data phase factors using block floating-point arithmetic. X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Convergent Rounding X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure Truncation illustration purposes, effect point size dynamic range displayed Figure through Figure FFTs these figures 16-bit input phase factors along with convergent rounding block floating-point arithmetic www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 X-Ref Target Figure -100 -110 -120 -130 -140 Number Figure 64-point Transform X-Ref Target Figure -100 -110 -120 -130 -140 1000 1200 Number 1400 1600 1800 2000 Figure 2048-point Transform X-Ref Target Figure -100 -110 -120 -130 -140 1000 2000 3000 4000 5000 Number 6000 7000 8000 Figure 8192-point Transform DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 preceding dynamic range plots show results Radix-4, Burst architecture. Figure Figure show plots Radix-2, Burst architecture. Both 16-bit input phase factors along with convergent rounding block floating point. X-Ref Target Figure -100 -110 -120 -130 -140 Number Figure 64-point Radix-2 Transform X-Ref Target Figure -100 -110 -120 -130 -140 Number 1000 Figure 1024-point Radix-2 Transform www.xilinx.com DS260 June 2009 Product Specification Fast Fourier Transform v7.0 References Knight Kaiser, Simple Fixed-Point Error Bound Fast Fourier Transform, IEEE Trans. Acoustics, Speech Signal Proc., Vol. 615-620, December 1979. Rabiner Gold, Theory Application Digital Signal Processing, Prentice-Hall Inc., Englewood Cliffs, Jersey, 1975. Quang Hung Nguyen Istvan Kollar, Limited Dynamic Range Spectrum Analysis Round Errors FFT, available Szolik, Kovac, Smiesko, Influence Digital Signal Processing Precision Power Quality Parameters Measurement, available Xilinx, Inc., XtremeDSP DSP48A Spartan-3A FPGAs User Guide, UG431. Cooley Tukey, Algorithm Machine Computation Complex Fourier Series, Mathematics Computation, Vol. 297-301, April 1965. Proakis Manolakis, Digital Signal Processing Principles, Algorithms Applications Second Edition, Maxwell Macmillan International, York, 1992. Support Xilinx provides technical support www.xilinx.com/support this LogiCORE product when used described product documentation. Xilinx cannot guarantee timing, functionality, support product implemented devices that defined documentation, customized beyond that allowed product documentation, changes made section design labeled MODIFY. Refer Release Notes Guide (XTP025) further information this core. There will link then relevant core being designed with. each core, there master Answer Record that contains Release Notes Known Issues list core being used. following information listed each version core: Features Fixes Known Issues Ordering Information core downloaded from Xilinx Center with Xilinx® CORE Generator software v11.2 higher. Xilinx CORE Generator software bundled with ISE® FoundationSoftware packages additional charge. Information about additional Xilinx® LogiCORE modules available Xilinx Center. order Xilinx software, contact your local Xilinx sales representative. DS260 June 2009 Product Specification www.xilinx.com Fast Fourier Transform v7.0 Revision History Date 03/28/03 07/14/03 12/11/03 05/21/04 11/11/04 Version Xilinx release template. Revision Modified Figures through inclusive. Updated v2.1 release. Updated v3.0 release. Updated document support core v3.1 release updated performance resource utilization tables Virtex-II Virtex-II FPGAs. Also added performance resource utilization tables Virtex-4 FPGAs. Updated documentation v3.2 core release; updated performance resource utilization tables; updated v7.1i software. Corrected table XtremeDSP Slices, Updated v4.0 release. Updated v4.1 release. Updated v5.0 release. Updated v6.0 release. Updated v7.0 release. 8/31/05 1/11/06 11/30/06 02/15/07 10/10/07 09/19/08 06/24/09 Notice Disclaimer Xilinx providing this design, code, information (collectively, "Information") "AS-IS" with warranty kind, express implied. Xilinx makes representation that Information, particular implementation thereof, free from claims infringement. responsible obtaining rights require implementation based Information. XILINX EXPRESSLY DISCLAIMS WARRANTY WHATSOEVER WITH RESPECT ADEQUACY INFORMATION IMPLEMENTATION BASED THEREON, INCLUDING LIMITED WARRANTIES REPRESENTATIONS THAT THIS IMPLEMENTATION FREE FROM CLAIMS INFRINGEMENT IMPLIED WARRANTIES MERCHANTABILITY FITNESS PARTICULAR PURPOSE. Except stated herein, none Information copied, reproduced, distributed, republished, downloaded, displayed, posted, transmitted form means including, limited electronic, mechanical, photocopying, recording, otherwise, without prior written consent Xilinx. www.xilinx.com DS260 June 2009 Product Specification Other recent searchesSQM110N04-03 - SQM110N04-03 SQM110N04-03 Datasheet NCP3712ASNT1 - NCP3712ASNT1 NCP3712ASNT1 Datasheet ML6102 - ML6102 ML6102 Datasheet LS125 - LS125 LS125 Datasheet LD2985Axx - LD2985Axx LD2985Axx Datasheet LD2985Bxx - LD2985Bxx LD2985Bxx Datasheet GS8160E18BT-150 - GS8160E18BT-150 GS8160E18BT-150 Datasheet GS8160E18BT-200 - GS8160E18BT-200 GS8160E18BT-200 Datasheet GS816018BT-150 - GS816018BT-150 GS816018BT-150 Datasheet GS816018BT-200 - GS816018BT-200 GS816018BT-200 Datasheet GS74116ATP-12 - GS74116ATP-12 GS74116ATP-12 Datasheet GS74116AJ-12 - GS74116AJ-12 GS74116AJ-12 Datasheet GS8160E36BT-150 - GS8160E36BT-150 GS8160E36BT-150 Datasheet GS8160E36BT-200 - GS8160E36BT-200 GS8160E36BT-200 Datasheet GS816036BT-150 - GS816036BT-150 GS816036BT-150 Datasheet GS816036BT-200 - GS816036BT-200 GS816036BT-200 Datasheet GS71116ATP-10 - GS71116ATP-10 GS71116ATP-10 Datasheet GS71116ATP-12 - GS71116ATP-12 GS71116ATP-12 Datasheet GS71116ATP-8 - GS71116ATP-8 GS71116ATP-8 Datasheet GS71116AJ-10 - GS71116AJ-10 GS71116AJ-10 Datasheet GS71116AJ-12 - GS71116AJ-12 GS71116AJ-12 Datasheet GS71116AJ-8 - GS71116AJ-8 GS71116AJ-8 Datasheet GS82032AT-5 - GS82032AT-5 GS82032AT-5 Datasheet GS84032AT-150 - GS84032AT-150 GS84032AT-150 Datasheet FMR47 - FMR47 FMR47 Datasheet EMM5077VU - EMM5077VU EMM5077VU Datasheet
Privacy Policy | Disclaimer |