The Datasheet Archive - 100 Million Datasheets from 7500 Manufacturers.    


Datasheet Search Engine   
 
Part # or Description: • 5V RS232 Driver • 2SC5066* • "Real Time Clock" • "USB connector" • "blue led" 5mm • 10 watt zener diode • 2N3055* motorola
 
Search Tip: Try entering the part number only. Include a wildcard (eg. lm317* or 1n4148*)

 

 

DS260 June 2009 Product Specification Xilinx® LogiCOREIP Fast Fou


Datasheet Thumbnail

  

Download PDF



Top Searches for this datasheet



Fast Fourier Transform v7.0
DS260 June 2009 Product Specification
Xilinx® LogiCOREIP Fast Fourier Transform (FFT) implements Cooley-Tukey algorithm, computationally efficient method calculating Discrete Fourier Transform (DFT).
Overview
core computes N-point forward inverse (IDFT) where 3-16. fixed-point inputs, input data vector complex values represented dual bx-bit two's-complement numbers, that bits each real imaginary components data sample, where range bits inclusive. Similarly, phase factors bits wide. single-precision floating-point inputs, input data vector complex values represented dual 32-bit floating-point numbers with phase factors represented 25-bit fixed-point numbers. memory on-chip using either Block Distributed RAM. element output vector represented using bits each real imaginary components output data. Input data presented natural order output data either natural bit/digit reversed order. complex nature data input output intrinsic algorithm, implementation. Three arithmetic options available computing FFT: Full-precision unscaled arithmetic Scaled fixed-point, where user provides scaling schedule Block floating-point (run-time adjusted scaling)
Features
Drop-in module Virtex®-6, Virtex-5, Virtex-4, Spartan®-6, Spartan-3/XA, Spartan-3E/XA Spartan-3A/XA/AN/3A FPGAs Forward inverse complex FFT, run-time configurable Transform sizes Data sample precision Phase factor precision Arithmetic types:
Unscaled (full-precision) fixed-point Scaled fixed-point Block floating-point
Fixed-point floating-point interface Rounding truncation after butterfly Block Distributed data phase- factor storage Optional run-time configurable transform point size Run-time configurable scaling schedule scaled fixed-point cores Bit/digit reversed natural output order Optional cyclic prefix insertion digital communications systems Four architectures offer trade-off between core size transform time Bit-accurate model function system modeling available download with Xilinx CORE Generatorand Xilinx System Generator DSPv11.2 higher
point size choice forward inverse transform, scaling schedule cyclic prefix length run-time configurable. Transform type (forward inverse), scaling schedule cyclic prefix length changed frame frame basis. Changing point size resets core. Four architecture options available: Pipelined, Streaming I/O, Radix-4, Burst I/O, Radix-2, Burst I/O, Radix-2 Lite, Burst I/O. detailed information about each architecture, "Architecture Options."
2003-2009 Xilinx, Inc. rights reserved. XILINX, Xilinx logo, Brand Window, other designated brands included herein trademarks Xilinx, Inc. other trademarks property their respective owners.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Theory Operation
computationally efficient algorithm computing Discrete Fourier Transform (DFT) sample sizes that positive integer power sequence defined
x(n)e
Equation where transform size inverse (IDFT) given
Equation
Algorithm
core uses Radix-4 Radix-2 decompositions computing DFT. Burst architectures, decimation-in-time (DIT) method used, while decimation-in-frequency (DIF) method used Pipelined, Streaming architecture. When using Radix-4 decomposition, N-point consists log4 stages, with each stage containing Radix-4 butterflies. Point sizes that power need extra Radix-2 stage combining data. N-point using Radix2 decomposition log2 stages, with each stage containing Radix-2 butterflies. inverse (IFFT) computed conjugating phase factors corresponding forward FFT.
Finite Word Length Considerations
Burst architectures process array data successive passes over input data array. each pass, algorithm performs Radix-4 Radix-2 butterflies, where each butterfly picks four complex numbers, respectively, returns four complex numbers same memory. numbers returned memory core potentially larger than numbers picked from memory. strategy must employed accommodate this dynamic range expansion. full explanation scaling strategies their implications beyond scope this document; more information about this topic; [Ref [Ref Radix-4 FFT, values computed butterfly stage experience growth factor 5.242 This implies growth bits. Radix-2, growth factor 2.414 This implies growth bits. This growth handled three ways: Performing calculations with scaling carrying significant integer bits computation Scaling each stage using fixed-scaling schedule Scaling automatically using block floating point
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
significant integer bits retained when using full-precision unscaled arithmetic. width data path increases accommodate growth through butterfly. growth fractional bits created from multiplication truncated rounded) after multiplication. width output (input width log2(transform length) This accommodates worst case scenario growth. Consider unscaled Radix-2 FFT: datapath each stage must grow adder subtractor butterfly add/subtract full-scale values produce sample which grown width bit. This yields log2(transform length) part increase output width relative input width. complex multiplier preserves magnitude input applies rotation complex plane), theoretically produce bit-growth when magnitude input greater than (for example, magnitude 1.414). This means that complex multiplier growth must only considered once entire process, yielding additional increase output width relative input width. example, 1024-point transform with input bits consisting integer fractional bits, output bits with integer bits fractional bits. Note that core does have specific location binary point. output simply maintains same binary point location input. above example, input with integer bits fractional bits would have unscaled output bits with integer bits fractional bits. When using scaling, scaling schedule used divide factor each stage. scaling insufficient, butterfly output grow beyond dynamic range cause overflow. result scaling applied implementation, transform computed scaled transform. scale factor defined
Equation where scaling (specified bits) applied stage scaling results final output sequence being modified factor 1/s. forward FFT, output sequence (k), 0,.,N computed core defined
x(n)e
Equation inverse FFT, output sequence
Equation Radix-4 algorithm scales factor each stage, factor equal factor inverse equation (Equation Radix-2, scaling factor each stage provides factor 1/N. With block floating point, each stage applies sufficient scaling keep numbers range, scaling tracked block exponent.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
with unscaled arithmetic, scaled block floating-point arithmetic, core does have specific location binary point. location binary point output data inherited from input data then shifted scaling applied.
Floating Point Considerations
core optionally accepts data IEEE-754 single precision format with 32-bit words consisting 1-bit sign, 8-bit exponent, 23-bit fraction. construction word matches that Xilinx® Floating Point Operator core. Implementing full floating point FPGA expensive terms resources required. floating-point option Xilinx® core utilizes higher precision fixed-point internally achieve similar noise performance full floating-point FFT, with significantly fewer resources. Figure illustrates levels noise performance possible selecting either bits bits phase factor width. increasing phase factor width bits, more resources required, depending target FPGA device.
X-Ref Target Figure
Figure Comparison Levels Noise Performance Figure shows ratio difference between various models double precision MATLAB® data peak amplitude. models shown single-precision MATLAB function (calculated casting input data single-precision floating-point type), Xilinx core using 24-bit phase factor width, Xilinx core using 25-bit phase factor width. calculate error signal, randomized impulse magnitude time) used input signal, with error averaged over five simulation runs. optimization options (memory types XtremeDSPslice optimization) remain available when floating point input data selected, allowing user trade resources with transform time.
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Transform time Burst architectures increased approximately number points transform, input normalization requirements. Pipelined, Streaming architecture, initial latency fill pipeline increased, data still streams through core with gaps.
Denormalized Numbers
floating-point interface core does support denormalized numbers. match behavior Xilinx Floating Point Operator core, core treats denormalized operands zero, with sign taken from denormalized number.
NaNs Infinity
core detects Infinity value input, output samples associated with current input frame NaN. sign zero exponent fraction bits
Real-Valued Input Data
core accepts complex data samples, perform transform real-valued data setting imaginary input samples zero. finite wordlength effects described above, noise introduced during transform, resulting output data being perfectly symmetric. algorithms have different noise effects different calculation order. thorough treatment this topic, refer [Ref [Ref asymmetry between halves result more noticeable larger point sizes. addition, noise more prominent lower frequency bins. Therefore, Xilinx recommends that upper half (N/2+1 points) output data used when performing real-valued FFT.
Rounding Implementation
option available, architectures, apply convergent rounding data after butterfly stage. However, selecting this option does apply convergent rounding points datapath where wordlength reduction occurs. particular, outputs complex multipliers datapath truncated reduce datapath width (while still maintaining adequate precision) simple rounding constant added fractional bits. This constant implements non-symmetric, round-towards-minus-infinity rounding, introduce small bias results over large number samples.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Architecture Options
core provides four architecture options offer trade-off between core size transform time.
"Pipelined, Streaming I/O". Allows continuous data processing. "Radix-4, Burst I/O". Loads processes data separately, using iterative approach. smaller size than pipelined solution longer transform time. "Radix-2, Burst I/O". Uses same iterative approach Radix-4, butterfly smaller. This means smaller size than Radix-4 solution, transform time longer. "Radix-2 Lite, Burst I/O". Based Radix-2 architecture, this variant uses time-multiplexed approach butterfly even smaller core, cost longer transform time.
Figure illustrates trade-off throughput versus resource four architectures. rule thumb, each architecture offers factor difference resource from next architecture. example even power point size. This does require Radix-4 architecture have additional Radix-2 stage. four architectures configured fixed-point interface with three fixed-point arithmetic methods (unscaled, scaled block floating-point) instead floating-point interface.
X-Ref Target Figure
Figure Resource versus Throughput Architecture Options
Digit Reversal
Each architecture offers option natural reversed ordering output data, with data being input natural order. algorithm reorders samples during processing such that data input natural order output reversed order. core optionally output data natural order. However, this imposes cost each architecture. Burst architectures, this imposes time penalty, because unloading data cannot take place same time loading input data next frame, separate unload load phases required. pipelined architecture, requires additional storage perform reordering.
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Radix-2, Burst I/O, Radix-2 Lite, Burst Pipelined, Streaming architectures, Reverse order simple calculate, taking index data point, written binary, reversing order digits. Hence, 0000, 0001, 0010, 0011, 0100,.(0, 4,.) becomes 0000, 1000, 0100, 1100, 0010,.(0, 2,.). case Radix-4, Burst architecture, reversal applies digits and, therefore, called Digit Reversal. digit Radix-4 bits. Hence, 0000, 0001, 0010, 0011, 0100,.(0, 4,.) becomes 0000, 0100, 1000, 1100, 0001,.(0, 1,.), pairs digits reversed. Where transform size requires number index bits, digit least significant place moved most significant place, 00000, 00001, 00010, 00011, 00100,. 4,.) becomes 00000, 10000, 00100, 10100, 01000,.(0, 8,.) Note: core outputs data point index along with data, this section information only.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Pipelined, Streaming
Pipelined, Streaming solution pipelines several Radix-2 butterfly processing engines offer continuous data processing. Each processing engine memory banks store input intermediate data (Figure core ability simultaneously perform transform calculations current frame data, load input data next frame data, unload results previous frame data. user continuously stream data and, after calculation latency, continuously unload results. preferred, this design also calculate frame itself frames with gaps between. scaled fixed-point mode, data scaled after every pair Radix-2 stages. block floatingpoint mode significantly more resources than scaled mode must maintain extra bits precision allow dynamic scaling without impacting performance. Therefore, input data well understood unlikely exhibit large amplitude fluctuation, using scaled arithmetic (with suitable scaling schedule avoid overflow known worst case) sufficient resources saved. input data presented natural order. unloaded output data either reversed order natural order. When natural order output data selected, additional memory resource utilized. This architecture covers point sizes from 65536. user flexibility select number stages block data phase factor storage. remaining stages distributed memory.
X-Ref Target Figure
Group Memory Memory Memory
Group Memory
Input Data
Radix-2 Butterfly Stage
Radix-2 Butterfly Stage
Radix-2 Butterfly Stage
Radix-2 Butterfly Stage
Memory
Memory
Radix-2 Butterfly
Radix-2 Butterfly
Output Shuffling
Output Data
Figure Pipelined, Streaming
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Radix-4, Burst
With Radix-4, Burst solution, core uses Radix-4 butterfly processing engine (Figure loads and/or unloads data separately from calculating transform. Data processing simultaneous. When started, data loaded. After full frame been loaded, core computes transform. When computation finished, data unloaded, cannot loaded unloaded during calculation process. data loading unloading processes overlapped data unloaded digit reversed order. This architecture lower resource usage than Pipelined, Streaming architecture longer transform time, supports point sizes from 65536. Data phase factors stored block distributed (the latter point sizes less than equal 1024).
X-Ref Target Figure
Twiddles Input Data Data Data Data Data RADIX-4 DRAGONFLY
switch
Output Data
Figure Radix-4, Burst
DS260 June 2009 Product Specification
www.xilinx.com
switch
Fast Fourier Transform v7.0
Radix-2, Burst
Radix-2, Burst architecture uses Radix-2 butterfly processing engine (Figure After frame data loaded, input data stream must halt until transform calculation completed. Then, data unloaded. with Radix-4, Burst architecture, data simultaneously loaded unloaded when output samples reversed order. This solution supports point sizes from 65536. Both data memories phase factor memories either block distributed (the latter point sizes less than equal 1024).
X-Ref Target Figure
Twiddles
Input Data Data switch RADIX-2 BUTTERFLY switch
Data
Output Data
Figure Radix-2, Burst
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Radix-2 Lite, Burst
This architecture differs from Radix-2, Burst that butterfly processing engine uses shared adder/subtractor, hence reducing resources expense additional delay butterfly calculation. Again, with Radix-4 Radix-2, Burst architectures, data simultaneously loaded unloaded output samples reversed order. This solution supports point sizes from 65536. Figure
X-Ref Target Figure
Store data single Input Data Data
Twiddles
Sine cycle, cosine next
RADIX-2 BUTTERFLY
Data
Multiply real cycle, imaginary next
Output Data
Generate output each cycle
ds260_05_102306
Figure Radix-2 Lite, Burst
Core Symbol Port Definitions
Figure shows Core Schematic Symbol Table lists core pinout single channel configurations.
X-Ref Target Figure
XN_RE XN_IM START UNLOAD NFFT NFFT_WE FWD_INV FWD_INV_WE SCALE_SCH SCALE_SCH_WE CP_LEN CP_LEN_WE SCLR
XK_RE XK_IM XN_INDEX XK_INDEX BUSY EDONE DONE BLK_EXP OVFLO
DS260_06_091707
Figure Core Schematic Symbol (Single Channel)
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Table Core Pinout (Single Channel) Port Name
XN_RE XN_IM
Port Width
Direction
Input Input
Description
Input data bus: Real component (bxn two's complement single precision floating point format. Input data bus. Imaginary component (bxn two's complement single precision floating point format. start signal (Active High): START asserted begin data loading transform calculation (for Burst architectures). Streaming I/O, START begins data loading, which proceeds directly transform calculation then data unloading. Result unloading (Active High): Burst architectures, UNLOAD starts unloading results natural order. UNLOAD port necessary Pipelined, Streaming architecture bit/digit reversed unloading. Point size transform: NFFT size transform smaller point size. example, 1024-point compute point sizes 1024, 512, 256, value NFFT log2 (point size). This port only used with run-time configurable transform point size. Write enable NFFT (Active High): Asserting NFFT_WE causes core stop processes initialize state core point size NFFT port. This port only used with run-time configurable transform point size. Clock Enable overrides NFFT_WE both signals present. Control signal that indicates forward inverse performed. When FWD_INV=1, forward transform computed. FWD_INV=0, inverse transform computed. Write enable FWD_INV (Active High). Scaling schedule: Burst architectures, scaling schedule specified with bits each stage, with scaling first stage given LSBs. scaling specified which represents number bits shifted. example scaling schedule =1024, Radix-4, Burst (ordered from last first stage). N=128, Radix-2, Burst Radix-2 Lite, Burst I/O, possible scaling schedule (ordered from last first stage). Pipelined, Streaming architecture, scaling schedule specified with bits every pair Radix-2 stages, starting LSBs. example, scaling schedule N=256 could When power maximum growth last stage bit. instance, valid scaling schedules N=512, invalid. this transform length. MSBs SCALE_SCH only This port only available with scaled arithmetic (not unscaled, block floating-point single precision floating-point).
START
Input
UNLOAD
Input
NFFT
Input
NFFT_WE
Input
FWD_INV
Input
FWD_INV_WE
Input
NFFT ceil
PIpelined, Streaming Radix-4, Burst architectures NFFT Radix-2, Burst Radix-2 Lite, Burst architectures where NFFT log2 (maximum point size) number stages
SCALE_SCH
Input
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Table Core Pinout (Single Channel) (Cont'd) Port Name
SCALE_SCH_WE CP_LEN
Port Width
log2 (maximum point size)
Direction
Input Input
Description
Write enable SCALE_SCH (Active High): This port available only with scaled arithmetic. Cyclic prefix length: number samples from transform that initially output cyclic prefix, before whole transform output. CP_LEN number from zero less than point size. This port only available with cyclic prefix insertion. Write enable CP_LEN (Active High): This port only available with cyclic prefix insertion. Master synchronous reset (Active High): Optional port. synchronous reset overrides clock enable when both present core. Clock enable (Active High): Optional port. Rising-edge clock Output data bus: Real component two's complement floating-point format. (For scaled arithmetic block floating-point arithmetic, bxk= bxn. unscaled arithmetic, bxn+ log2 (maximum point size) single precision floating-point bxk= 32). Output data bus: Imaginary component two's complement single precision floating-point format. (For scaled arithmetic block floating-point arithmetic, bxn. unscaled arithmetic, bxn+ log2 (maximum point size) single precision floating point bxk= Index input data. Index output data. Ready data (Active High): High during load operation. Core activity indicator (Active High): This signal goes High while core computing transform. Data valid (Active High): This signal High when valid data presented output. Early done strobe (Active High): EDONE goes High clock cycle immediately prior DONE going High. complete strobe (Active High): DONE transitions High clock cycle when transform calculation completed. Block exponent: amount scaling applied. Available only when block floating point used. Arithmetic overflow indicator (Active High): OVFLO High during result unloading value data frame overflowed. OVFLO signal reset beginning frame data. This port optional only available with scaled arithmetic single precision floating-point I/O.
CP_LEN_WE SCLR
Input Input
XK_RE
Input Input Output
XK_IM
Output
XN_INDEX XK_INDEX BUSY EDONE
log2 (maximum point size) log2 (maximum point size)
Output Output Output Output Output Output
DONE
Output
BLK_EXP OVFLO
Output Output
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Table Core Pinout (Single Channel) (Cont'd) Port Name
Port Width
Direction
Output
Description
Cyclic prefix valid (Active High): This signal High when valid data that part cyclic prefix presented output. This port only available with cyclic prefix insertion. Ready start (Active High): This signal goes High when core ready accept assertion START input begin data loading. This port only available with cyclic prefix insertion Pipelined, Streaming architecture.
Output
Multichannel Pinout
channels supported, Burst architectures only. Table shows pinout above must adapted multichannel operation. Table Single Multichannel Pinout Conversion Single Channel
SCLR NFFT NFFT_WE FWD_INV FWD_INV_WE START UNLOAD XN_RE XN_IM SCALE_SCH SCALE_SCH_WE CP_LEN CP_LEN_WE XN_INDEX BUSY EDONE DONE XK_INDEX XK_RE XK_IM BLK_EXP OVFLO
Multichannel
SCLR NFFT NFFT_WE FWD_INV0,.,FWD_INV11 FWD_INV0_WE,.,FWD_INV11_WE START UNLOAD XN0_RE,.,XN11_RE XN0_IM,.,XN11_IM SCALE_SCH0,.,SCALE_SCH11 SCALE_SCH0_WE,.,SCALE_SCH11_WE CP_LEN CP_LEN_WE XN_INDEX BUSY EDONE DONE XK_INDEX XK0_RE,.,XK11_RE XK0_IM,.,XK11_IM BLK_EXP0,.,BLK_EXP11 OVFLO0,.,OVFLO11
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Table Single Multichannel Pinout Conversion (Cont'd) Single Channel
Multichannel
CORE Generator Graphical User Interface
core graphical user interface (GUI) provides several screens with fields parameter values particular instantiation required. description each CORE Generator field follows:
Page
Component Name: name core component instantiated. name must begin with letter composed following characters: "_". Channels: Select number channels from Multichannel operation available three Burst architectures. Transform Length: Select desired point size. powers from 65536 available. Implementation Options: Select implementation option, described "Architecture Options," page
Pipelined, Streaming I/O, Radix-2, Burst I/O, Radix-2 Lite, Burst architectures support point sizes 65536. Radix-4, Burst architecture supports point sizes 65536. Check Automatically Select choose smallest implementation that meets specified Target Data Throughput, provided specified Target Clock Frequency achieved when core implemented FPGA device. Target Clock Frequency Target Data Throughput only used automatically select implementation calculate latency. core guaranteed specified target clock frequency target data throughput.
Transform Length Options: Select transform length run-time configurable not. core uses fewer logic resources faster maximum clock speed when transform length run-time configurable.
Page
Data Format: Select whether input output data samples Fixed Point format, IEEE-754 single precision (32-bit) Floating Point format. Floating Point format available when core multichannel configuration. Precision Options: Input data phase factors independently configured widths from bits, inclusive. When Data Format Floating Point, input data width fixed bits phase factor width bits depending noise performance required available resources. Scaling Options: Three options available, architectures:
Unscaled integer growth carried output. This more FPGA resources. user-defined scaling schedule determines data scaled between stages. Scaled
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Block Floating Point. core determines much scaling necessary make best available dynamic range, reports scaling factor block exponent.
Optional Pins: Clock Enable (CE), Synchronous Clear (SCLR), Overflow (OVFLO) optional pins. Synchronous Clear overrides Clock Enable both selected. option selected, some logic resources saved higher clock frequency attainable. Rounding Modes: output butterfly, LSBs datapath need trimmed. These bits truncated rounded using convergent rounding, which unbiased rounding scheme. When fractional part number equal exactly one-half, convergent rounding rounds number odd, rounds down number even. Convergent rounding used avoid bias that would otherwise introduced truncation after butterfly stages. Selecting this option will increase slice usage yields small increase transform time additional latency. Output Ordering: Output data selections either Bit/Digit Reversed Order Natural Order. Radix-2 based architectures (Pipelined, Streaming I/O, Radix-2, Burst Radix-2 Lite, Burst I/O) offer bit-reversed ordering, Radix-4 based architecture (Radix-4, Burst I/O) offers digit-reversed ordering. Pipelined, Streaming architecture, selecting natural order output ordering results increase memory used core. Burst architectures, selecting natural order output increases overall transform time because separate unloading phase required.
Cyclic Prefix Insertion selected output ordering Natural Order. Cyclic Prefix Insertion available architectures, typically used OFDM wireless communications systems.
Input Data Timing: previous versions Xilinx core, input data applied cycles after corresponding sample index, allow block memory containing data samples addressed. many cases, this necessary, applying data wrong cycle made appear core functioning incorrectly. This timing configured backwardscompatible with previous versions, have delay between sample index applied data (default).
Page
Memory Options:
Data Phase Factors (Burst architectures): Burst architectures, either block distributed used data phase factor storage. Data phase factor storage distributed point sizes including, 1024 points. Data Phase Factors (Pipelined, Streaming I/O): Pipelined, Streaming solution, data partially stored block partially distributed RAM. Each pipeline stage, counting from input side, uses smaller data phase factor memories than preceding stages. user select number pipeline stages that block data phase factor storage. Later stages distributed RAM. default displayed offers good balance between both. output ordering Natural Order, memory used reorder buffer either block distributed RAM. reorder buffer distributed point sizes less than equal 1024. When block floating point selected Pipelined, Streaming architecture, buffer required natural order reversed order output data. this case, reorder buffer options remain available distributed selected point sizes below 2048.
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Hybrid Memories: Where data, phase factor, reorder buffer memories stored block RAM, size memory greater than block RAM, memory constructed from hybrid block RAMs distributed RAM, where majority data stored block RAMs bits that left over stored distributed RAM. This Hybrid Memory alternative constructing memory entirely from multiple block RAMs. provides reduction block count, cost increase number slices used. Hybrid Memories only available when block used more memories number slices required Hybrid Memory implementation below internal threshold LUTs memory. these conditions met, Hybrid Memories made available selected. Complex Multipliers: Three options available customization complex multiplier implementation: logic: complex multipliers will constructed using slice logic. This appropriate target applications which have performance requirements, target devices which have XtremeDSP slices/Mult18x18s. 3-multiplier structure (resource optimization): complex multipliers will three real multiply, five add/subtract structure, where multipliers XtremeDSP slices/Mult18x18s. This reduces XtremeDSP slice/Mult18x18 count, uses some slice logic. Spartan-3A DSP, Spartan-6 Virtex-6 devices, this structure make XtremeDSP slice's pre-adder reduce remove need extra slice logic, improve performance. 4-multiplier structure (performance optimization): complex multipliers will four real multiply, add/subtract structure, utilizing XtremeDSP slices/Mult18x18s. This structure yields highest clock performance expense more dedicated multipliers. devices with XtremeDSP slices, add/subtract operations implemented within XtremeDSP slices. devices with Mult18x18s, add/subtract operations slice logic.
Optimize Options:
Note: core override complex multiplier implementation internally ensure fewest
number XtremeDSP slices/Mult18x18s used, without impacting performance. this reason, some core configurations show difference XtremeDSP slice/Mult18x18 usage when toggling between 3-multiplier 4-multiplier options. "Use logic" selected, however, slice logic will always utilized.
Butterfly Arithmetic: options available customization butterfly implementation: logic: butterfly stages will constructed using slice logic. XtremeDSP Slices: devices with XtremeDSP slices, this option forces butterfly stages implemented using adder/subtracters XtremeDSP slices.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Information Tabs
Resource Estimates:
Implementation: This field displays currently selected architecture. This useful result automatic architecture selection. Transform Size: When transform length run-time configurable, core ability reprogram point size while core running; that core support selected point size smaller point size. This field displays supported point sizes based Transform Length, Transform Length Options, Implementation Options selected. Output Data Width: output data width equals input data width scaled arithmetic block floating-point arithmetic. With unscaled arithmetic, output data width equals (input data width log2(point size) Resource Estimates: Based options selected, this field displays XtremeDSP slice Mult18x18 count block numbers block numbers Spartan-6 devices). resource numbers just estimate. exact resource usage, slice/LUTFlipFlop pair information, report should consulted. This shows latency core clock cycles microseconds each point size supported. latency from asserting START input last sample output data coming core, assuming that UNLOAD input present) asserted soon DONE goes High. Note that this minimum number cycles between starting consecutive frames, frames overlap some cases. latency microseconds based target clock frequency. latency figures copied Clipboard pasted plain text into other applications. This provides link Xilinx® LogiCORE page where core's model downloaded. details model, "Bit-Accurate Model," page
Latency:
Model:
Parameters
Table defines valid entries parameters. Parameters case sensitive. Default values displayed bold. Xilinx strongly recommends that parameters manually edited file; instead, CORE Generator configure core perform range parameter value checking. Table Parameters Parameter
component_name channels transform_length implementation_options
Valid Values
Name must begin with letter composed following characters: "_". (default value 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536 automatically_select pipelined_streaming_io radix_4_burst_io radix_2_burst_io radix_2_lite_burst_io (default 250)
target_clock_frequency
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Table Parameters (Cont'd) Parameter
target_data_throughput data_format input_width phase_factor_width scaling_options
Valid Values
(default false true fixed_point floating_point (default value (default value scaled unscaled block_floating_point truncation convergent_rounding false true false true false true bit_reversed_order natural_order false true block_ram distributed_ram block_ram distributed_ram block_ram distributed_ram
rounding_modes sclr ovflo output_ordering cyclic_prefix_insertion memory_options_data memory_options_phase_factors memory_options_reorder
(default value depends transform length) phase_factors memory_options_hybrid input_data_offset complex_mult_type false true no_offset three_cycle_offset use_luts use_mults_resources use_mults_performance use_luts use_xtremedsp_slices
butterfly_type
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Simulation Models
When core generated using CORE Generator software, UNISIM-based simulation model created. core does have VHDL Verilog functional behavioral model. this reason, core overrides CORE Generator Project Options always delivers Structural model type. Xilinx recommends that designer simulations using resolution Some Xilinx library components require resolution work properly either functional timing simulation. core's UNISIM-based structural model produce incorrect results simulated with resolution other than "Register Transfer Level (RTL) Simulation Using Xilinx Libraries" section Chapter Synthesis Simulation Design Guide more information. This document part ISE® Software Manuals available
System Generator Graphical User Interface
This section describes each System Generator details parameters that differ from CORE Generator GUI. "CORE Generator Graphical User Interface" more detailed information about other parameters.
Basic
Basic used specify transform configuration architecture similar page CORE Generator GUI. Implementation Options: Select implementation option, described "Architecture Options."
Pipelined, Streaming I/O, Radix-2, Burst Radix-2 Lite, Burst architectures support point sizes 65536. Radix-4, Burst architecture supports point sizes 65536. option automatically select architecture currently available with System Generator and, therefore, Target Clock Frequency Target Data Throughput available options.
System Generator only supports single-channel implementation and, hence, Channels available option.
Advanced
Advanced used specify phase factor precision, scaling, rounding, optional port options similar page CORE Generator GUI.
Specifies core will have clock enable (the equivalent selecting option CORE Generator GUI). RST: Specifies core will have synchronous reset (the equivalent selecting SCLR option CORE Generator GUI).
System Generator automatically sets Input Data Width parameter based signal properties XN_RE XN_IM ports. System Generator only supports fixed-point data types and, hence, Data Format available option GUI.
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Implementation
Implementation used specify memory optimization options similar page CORE Generator GUI.
Number stages using Block RAM: Specifies number stages Pipelined, Streaming architecture that uses Block data phase factor storage. dynamic list boxes offered with System Generator GUI, this option displays full range selection, only allows user select valid values visible CORE Generator GUI. FPGA Area Estimation: System Generator documentation detailed information about this option.
Bit-Accurate Model
core bit-accurate model designed system modeling selecting parameters before generating core. model bit-accurate cycle-accurate, produces exactly same output data core frame-by-frame basis. However, does model core's latency interface signals. model generally required before generating core, delivered output CORE Generator software. Instead available download Xilinx LogiCORE page model available dynamicallylinked library 32-bit 64-bit Windows platforms, 32-bit 64-bit Linux platforms. model also available MATLAB® function 32-bit Windows only. Download file unzip install model. README.txt file describes contents installed directory structure, further platform-specific installation instructions.
Model Interface
model used through xfft_v7_0_bitacc_cmodel.h: three functions, declared header file
struct xilinx_ip_xfft_v7_0_state* xilinx_ip_xfft_v7_0_generics generics); struct xilinx_ip_xfft_v7_0_state* state, struct xilinx_ip_xfft_v7_0_inputs inputs, struct xilinx_ip_xfft_v7_0_outputs* outputs void xilinx_ip_xfft_v7_0_state* state);
first function, creates state structure model, allocating memory store state required, returns pointer that state structure. state structure contains information required define being modelled. function called with structure containing core's generics: these parameters that define bitaccurate numerical performance core, represented integers, derived from parameters that result selections CORE Generator GUI. generics required model their mappings from parameters shown Table
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Table Model Generics Generic
C_NFFT_MAX C_ARCH
Description
log2(maximum point size) Architecture
Range
3-16
parameter mapping
transform_length: take log2 implementation_options: radix_4_burst_io radix_2_burst_io pipelined_streaming_io radix_2_lite_burst_io false true input_width phase_factor_width data_format: fixed_point floating_point scaling_options: unscaled scaled block_floating_point scaling_options: unscaled scaled block_floating_point rounding_modes: truncation convergent_rounding
C_HAS_NFFT
Run-time configurable transform length Input data width (bits) Phase factor width (bits) Input/output data format
C_INPUT_WIDTH C_TWIDDLE_WIDTH C_USE_FLT_PT
8-34 8-34
C_HAS_SCALING
Scaling option: unscaled not. Ignored when C_USE_FLT_PT Scaling option: unscaled, scaled block floating point. Ignored when C_USE_FLT_PT Rounding mode. Ignored when C_USE_FLT_PT
C_HAS_BFP
C_HAS_ROUNDING
After state structure been created, used many times required simulate core. simulation using second function, Call this function with pointer existing state structure, structures hold inputs outputs model. These input output structures fully defined described model's header file. Note that memory input output data arrays must allocated calling program before simulating model. Finally, state structure must destroyed free memory used store state, using third function, called with pointer existing state structure. generics core need changed, destroy existing state structure create state structure using generics. There change generics existing state structure. example file, run_bitacc_cmodel.c, included model file. This shows stages required model. differences between core model order operations within processing phase, when using Pipelined, Streaming architecture, fixed-point data being processed, scaling option Scaled overflow occurs, xk_re xk_im data outputs model match XK_RE XK_IM data outputs core. overflow output model OVFLO output core present) match cases. overflow output model always correctly when scaling option Scaled (when model generics C_HAS_SCALING C_HAS_BFP
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Therefore, Xilinx recommends that overflow output model always checked when scaling option Scaled architecture Pipelined, Streaming I/O, overflow occurred (overflow output xk_re xk_im outputs model ignored. This only case where model entirely bit-accurate core.
Using Model Select Scaling Schedule
When scaling option core Scaled, user given great flexibility scaling schedule that determines much scale data values each stage processing phase. "Forward/Inverse Scaling Schedule," page difficult choose best scaling schedule that avoids overflow sufficiently large proportion frames particular type input data. model tool that help with selection scaling schedule. process this follows: Create frames typical input data intended application. Create state structure using required generics. scaling option Scaled setting model generics C_HAS_SCALING C_HAS_BFP scaling schedule structure inputs some initial scaling schedule, such reset value each stage Radix-2, Burst I/O, Radix-2 Lite, Burst architectures, each stage Radix-4, Burst I/O, Pipelined, Streaming architectures. Simulate model with each frame typical input data turn. Count number frames which overflow occurred (overflow output percentage frames which overflow occurred lower than acceptable overflow rate, reduce scaling value more stages scaling schedule. percentage frames which overflow occurred higher than acceptable overflow rate, increase scaling value more stages scaling schedule. Repeat stages until percentage frames which overflow occurred matches acceptable overflow rate.
This process produces scaling schedule that tailored typical input data intended application.
Control Signals Timing
Clock Enable
Clock Enable present core, driving will pause core current state. logic within core will paused. Driving High will allow core continue processing.
Synchronous Clear
Synchronous Clear overrides Clock Enable both present core. Asserting Synchronous Clear (SCLR) results output pins, internal counters, state variables being reset their initial values. pending load processes, transform calculations, unload processes stop reinitialized. NFFT largest point size permitted (the Transform Length value GUI). scaling schedule 1/N. Radix-4, Burst Pipelined, Streaming architectures with non-power-of-four point size, last stage scaling rest have scaling Table
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Table Synchronous Clear Reset Values Signal
NFFT FWD_INV SCALE_SCH Forward
Initial Reset Value
maximum point size
Radix-4, Burst Pipelined, Streaming architectures when power Radix-4, Burst Pipelined, Streaming architectures when power Radix-2, Burst Radix-2 Lite, Burst architectures
Note: run-time configurable transform length option selected, asserting NFFT_WE resets core
same asserting SCLR pin, except that NFFT_WE does reset latched scaling schedule transform type (forward inverse). Note that NFFT_WE does override Clock Enable, unlike Synchronous Clear. Therefore, Synchronous Clear required addition run-time configurable transform length. Omitting Synchronous Clear result saving logic resources allow higher maximum clock frequency.
Transform Size
transform point size through NFFT port run-time configurable transform length option selected. Valid settings corresponding transform sizes provided Table NFFT value entered large, core sets itself largest available point size (selected GUI). value small, core sets itself smallest available point size: Radix-4, Burst architecture other architectures. NFFT values read rising clock edge when NFFT_WE High. transform size retimes current processes within core, every time transform size latched regardless whether point size differs from current point size, core internally reset. (FWD_INV SCALE_SCH reset.) Holding NFFT_WE High continues reset core every clock cycle. Table Valid NFFT Settings NFFT[4:0]
00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000
Transform size
1024 2048 4096 8192 16384 32768 65536
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Forward/Inverse Scaling Schedule
transform type (forward inverse) scaling schedule frame-by-frame without interrupting frame processing. Both transform type scaling schedule independently each channel multichannel core. single channel core uses FWD_INV transform type SCALE_SCH scaling schedule. multichannel core FWD_INV each channel, named FWD_INV0, FWD_INV1, SCALE_SCH each channel, named SCALE_SCH0, SCALE_SCH1, transform type using FWD_INV pin. Setting FWD_INV produces inverse FFT, setting FWD_INV creates forward transform.
Burst Architectures
scaling performed during successive stages SCALE_SCH bus. Radix-4, Burst Radix-2 architectures, value SCALE_SCH used pairs bits N0], each pair representing scaling value corresponding stage. Stages computed starting with stage LSBs. There log4(point size) stages Radix-4 log2(point size) stages Radix-2. each stage, data shifted bits, which corresponds SCALE_SCH values example, Radix-4, when 1024, translates right shift stage shift stage shift stage shift stage shift stage (there log4(1024) Radix-4 stages). This scaling schedule scales total bits which gives scaling factor 1/256. conservative schedule SCALE_SCH completely avoids overflows Radix-4, Burst architecture. Radix-2, Burst Radix-2 Lite, Burst architectures, conservative scaling schedule prevents overflow 1024 (there log2(1024) Radix-2 stages).
Pipelined, Streaming Architecture
Pipelined, Streaming architecture, consider every pair adjacent Radix-2 stages group. That group contains stage group contains stage forth. value SCALE_SCH also used pairs bits N0]. Each pair represents scaling value corresponding group stages. Groups computed starting with group LSBs. each group, data shifted bits which corresponds SCALE_SCH values example, when 1024, translates right shift group (stages shift group (stages shift group (stages shift group (stages shift group (stages conservative schedule SCALE_SCH completely avoids overflows Pipelined, Streaming architecture. When point size power last group only contains stage, maximum growth last group bit. Therefore, MSBs scaling schedule only conservative scaling schedule N=512 SCALE_SCH=[01 11]. user allowed great flexibility transform type (Forward/Inverse) scaling schedule. FWD_INV SCALE_SCH values latched into temporary registers whenever corresponding pins High. FWD_INV_WE SCALE_SCH_WE asserted time until cycles after START asserted, irrespective Input Data Timing parameter value. core then reads these temporary registers these values that used that frame data. There alter those values once transform calculation phase started. assertions later than cycles after START asserted affect frame that follows.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
multichannel core, there separate FWD_INV_WE SCALE_SCH_WE pins each channel, named FWD_INV0_WE, FWD_INV1_WE, SCALE_SCH0_WE, SCALE_SCH1_WE, Both scaling schedule transform type registered internally, there need hold these values pins. Also, scaling transform type constant through multiple frames (that values latched in), registered values apply successive frames. scaling schedule transform type reset when NFFT_WE asserted. initial value reset value FWD_INV forward scaling schedule 1/N. That translates Radix-4, Burst Pipelined, Streaming architectures, Radix-2 architectures. core uses (2*number stages) LSBs scaling schedule. when point size decreases, leftover MSBs ignored. However, bits latched into core SCALE_SCH_WE used later transforms point size increases.
Cyclic Prefix Insertion
Cyclic prefix insertion takes section output prefixes beginning transform. resultant output data consists cyclic prefix copy output data) followed complete output data, natural order. Cyclic prefix insertion only available when output ordering Natural Order. When cyclic prefix insertion used, length cyclic prefix frame-by-frame without interrupting frame processing. cyclic prefix length number samples from zero less than point size. cyclic prefix length CP_LEN bus. example, when 1024, cyclic prefix length from 1023 samples, CP_LEN value 0010010110 will produce cyclic prefix consisting last samples output data. user allowed great flexibility cyclic prefix length. CP_LEN value latched into temporary register whenever CP_LEN_WE High. CP_LEN_WE asserted time before frame data loaded core reads this temporary register cycles after START asserted, irrespective Input Data Timing parameter. This value that used current frame data. There alter this value once transform calculation phase started. CP_LEN_WE assertions later than cycles after START asserted affect frame that follows. cyclic prefix length registered internally, there need hold value CP_LEN bus. Also, cyclic prefix length constant through multiple frames (that values latched in), registered values apply successive frames. cyclic prefix length reset when NFFT_WE asserted. initial value reset value CP_LEN cyclic prefix). core uses log2(point size) MSBs CP_LEN cyclic prefix length. when point size decreases, leftover LSBs ignored. This effectively scales cyclic prefix length with point size, keeping them approximately constant proportion. However, bits CP_LEN latched into core CP_LEN_WE used later transforms point size increases.
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Overflow
Fixed-Point Data
Overflow (OVFLO) signal only available when Scaled arithmetic used. OVFLO driven High during unloading point data frame overflowed. multichannel core, there separate OVFLO output each channel, named OVFLO0, OVFLO1, Burst architectures, OVFLO signal goes High soon overflow occurs during computation remain High during entire time frame unloading. Pipelined, Streaming architecture, OVFLO signal goes High during unloading soon overflow detected that frame held high remainder frame. When overflow occurs core, data wrapped rather than saturated, resulting transformed data becoming unusable most applications.
Floating-Point Data
Overflow signal used indicate exponent overflow when processing floatingpoint data. When exponent overflow occurs, OVFLO signal goes High soon overflow detected that frame, remains High remainder frame. This behavior same both Burst Pipelined, Streaming architectures, which different from Overflow behavior fixed-point data described above. output sample which overflowed will Infinity, depending sign internal result. Overflow signal will asserted when value present output. values only occur output when input data frame contains Infinity samples.
Block Exponent
Block Exponent (BLK_EXP) signal (used only with block floating-point option) contains block exponent. multichannel core, there separate BLK_EXP output each channel, named BLK_EXP0, BLK_EXP1, signal valid during unloading data frame. value present port represents total number bits data scaled during transform. example, BLK_EXP value 00101 this means output data (XK_RE, XK_IM) scaled bits (shifted right bits), other words, divided fully utilize available dynamic range output data path without overflowing.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Timing Pipelined, Streaming Architecture
Setting Starting Transform
Asserting START starts data loading phase, which immediately flows into transform calculation phase then data unloading phase. Pulsing START once allows transform calculation single frame. Pulsing START every clock cycles allows continuous data processing. Alternatively, holding START High also allows continuous data processing (see Figure Figure cyclic prefix insertion used). START ignored except when core begin loading frame, i.e., when data being loaded, last value data frame being loaded. NFFT_WE, FWD_INV_WE, SCALE_SCH_WE were asserted before initial START, then defaults used. This architecture also support extended intervals between frames (Figure 10). Simply assert START time begin data loading. After data frame loaded, core proceeds calculate transform then output results. Figure intended show timing entire frames. does show small skews between signals which occur start frames.
Applying Data
Data applied contiguous burst. point which data input should start relative START pulse determined Input Data Timing parameter GUI. offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive same cycle XN_INDEX matches. first data sample should therefore applied soon goes High, such that first sample pair read into core first transition XN_INDEX. clock cycle offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive three clock cycles later than XN_INDEX matches (see Figure 11). this way, XN_INDEX used address external memory frame buffer storing input data. remains High with XN_INDEX during loading phase indicates that data input.
Data Processing Data Output
BUSY goes High while core calculating transform. DONE goes High when calculation complete. EDONE goes High cycle before that, i.e., during last cycle calculation phase. cycle which DONE goes High, core begins unloading. During unloading phase, while valid output results present XK_RE/XK_IM, (Data Valid) High. During unloading, XK_INDEX corresponds XK_RE/XK_IM being presented. cyclic prefix insertion used, cyclic prefix unloaded first. goes High indicate that cyclic prefix being unloaded, XK_INDEX counts from (point size) (cyclic prefix length) (point size) After cyclic prefix been unloaded, cyclic prefix length zero, cyclic prefix insertion used, whole frame output data unloaded. goes present) XK_INDEX counts from (point size)
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Cyclic Prefix Considerations
cyclic prefix insertion used, more samples unloaded from core than loaded. Therefore, core cannot continuously stream frames, must insert (cyclic prefix length) clock cycles between each frame input data accommodate additional clock cycles required unload cyclic prefix. This indicated Ready Start (RFS) pin. goes High when core ready START asserted begin loading next frame data. START ignored except when High. remains (cyclic prefix length) clock cycles after gone Low, allow unloading cyclic prefix.
DS260 June 2009 Product Specification
www.xilinx.com
X-Ref Target Figure
Fast Fourier Transform v7.0
sclr
nfft
nfft_we
fwd_inv
fwd_inv_we
scale_sch
scale_sch_we
start
xn(0) xn(0)
xn_re
Figure Timing Continuous Streaming Data
www.xilinx.com
xk(0) xk(0)
xn_im
xn_index
busy
cycles
edone
done
xk(N-1) xk(0) xk(N-1) xk(0) xk(N-1) xk(0) xk(N-1) xk(0)
xip222
xk_re
xk_im
DS260 June 2009 Product Specification
xk_index
X-Ref Target Figure
sclr nfft nfft_we fwd_inv
DS260 June 2009 Product Specification
scale_sch cp_len
fwd_inv_we
scale_sch_we
cp_len_we start xn_re
xn(0) xn(0) xn(N-4) xn(N-3) xn(N-2) xn(N-1) xn(N-4) xn(N-3) xn(N-2) xn(N-1) xn(0) xn(0)
Figure Timing Continuous Streaming Data with Cyclic Prefix Insertion Length
www.xilinx.com
xn_im xn_index busy edone done done xk_re xk_im xk_index
cp_len cycles
xk(N-2) xk(N-1) xk(0) xk(N-2) xk(N-1) xk(0)
xk(N-2) xk(N-1) xk(N-2) xk(N-2) xk(N-1) xk(N-2)
Fast Fourier Transform v7.0
xip229
Fast Fourier Transform v7.0
X-Ref Target Figure
start xn_re xn_im xn_index busy xn_re xn_im xn_index unload Frame unload Frame unload Frame unload Frame processing Frame processing Frame load data Frame load data Frame load data Frame load data Frame
Note: transitions synchronous with rising edge clock.
xip223
Figure Timing Non-Continuous Data Stream
X-Ref Target Figure
sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we cp_len cp_len_we start xn_re xn_im xn_index busy edone done
xip224
size
scaling
length
xn_re(0) xn_im(0)
xn_re(1) xn_im(1)
xn_re(2) xn_im(2)
xn_re(3) xn_im(3)
xn_re(4) xn_im(4)
Figure Beginning Data Frame (Input Data Timing clock cycle offset")
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Timing Radix-4, Burst I/O, Radix-2, Burst I/O, Radix-2 Lite, Burst Architectures
Setting Starting Transform
START signal begins data loading phase, which leads directly calculation phase. Start ignored except when core begin loading frame, i.e., when core idle last cycle calculation (bit-reversed output) unloading (natural order output).
Applying Data
Data applied contiguous burst. point which data input should start relative START pulse determined Input Data Timing parameter GUI. offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive same cycle XN_INDEX matches. first data sample should therefore applied soon goes High, such that first sample pair read into core first transition XN_INDEX. clock cycle offset" selected Input Data Timing parameter, input data (XN_RE, XN_IM) corresponding given XN_INDEX should arrive three clock cycles later than XN_INDEX matches (see Figure 11). this way, XN_INDEX used address external memory frame buffer storing input data. remains High with XN_INDEX during loading phase indicates that data input.
Data Processing
BUSY goes High while core calculating transform. DONE goes High when calculation complete. EDONE goes High cycle before that, i.e., during last cycle calculation phase.
Data Output
After data loaded processed, options available unload data:
Natural Order output order selected, UNLOAD should asserted (Figure Figure cyclic prefix insertion used) output data. During unloading phase, while valid output results present XK_RE/XK_IM, (Data Valid) High. During unloading, XK_INDEX corresponds XK_RE/XK_IM being presented. cyclic prefix insertion used, cyclic prefix unloaded first. goes High indicate that cyclic prefix being unloaded, XK_INDEX counts from (point size) (cyclic prefix length) (point size) After cyclic prefix been unloaded, cyclic prefix length zero, cyclic prefix insertion used, whole frame output data unloaded. goes present) XK_INDEX counts from (point size) UNLOAD asserted time from when EDONE goes High. UNLOAD ignored except when core begin unloading. addition using pulses, START UNLOAD tied High (Figure 14). this case, core continuously loads, processes, unloads data. Figure intended show timing entire frames. does show small skews between signals which occur start frames does show length each phase transform scale. processing time much longer than time required input output frame. Bit/Digit-Reversed output order selected, user assert START again (Figure 15). While next frame data loaded, results output same time. START asserted time from when EDONE goes High. START tied High, core continuously loads/unloads then processes, loads/unloads then processes, (Figure 16).
remains High during data unloading both cases.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
There latency several clock cycles after triggering unload with UNLOAD START before output data XK_RE/XK_IM presented. This latency varies function several core parameters, output data qualified (Data Valid) XK_INDEX, should considered handshake.
X-Ref Target Figure
sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we start xn_re xn_im xn_index unload busy edone done xk_re xk_im xk_index xk_re(0) xk_re(1) xk_re(2) xk_im(0) xk_im(1) xk_im(2)
xip226
Figure Unload Output Results Natural Order
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
X-Ref Target Figure
sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we cp_len cp_len_we start xn_re xn_im xn_index busy edone done unload xk_re xk_im xk_index xk(N-2) xk(N-2) xk(N-1) xk(N-1) xk(0) xk(0) xk(1) xk(1) xk(2) xk(2)
xip230
Figure Unload Output Results Natural Order with Cyclic Prefix Insertion Length
DS260 June 2009 Product Specification
www.xilinx.com
X-Ref Target Figure
Fast Fourier Transform v7.0
start
load Frame load Frame load Frame load Frame
xn_re
xn_im
xn_index
unload
processing Frame processing Frame
busy
Figure Timing Burst Solutions with Natural Order Output
www.xilinx.com
unload Frame unload Frame
unload Frame unload Frame
xk_re
xk_im
xn_index
Note: transitions synchronous with rising edge clock.
xip225
DS260 June 2009 Product Specification
X-Ref Target Figure
DS260 June 2009 Product Specification
sclr nfft nfft_we fwd_inv fwd_inv_we scale_sch scale_sch_we start xn_re xn_im xn_index unload busy edone done xk_re xk_im xk_index xk_re xk_im xk_re xk_im xk_re xk_im digit-reversed order
xip228
scaling
xn_re(0) xn_re(1) xn_re(2) xn_re(3) xn_re(4) xn_re(5) xn_re(6) xn_im(0) xn_im(1) xn_im(2) xn_im(3) xn_im(4) xn_im(5) xn_im(6)
Figure Unload Results Bit/Digit Reversed Order (Input Data Timing clock cycle offset")
www.xilinx.com
Fast Fourier Transform v7.0
X-Ref Target Figure
Fast Fourier Transform v7.0
start
xn(0) xn(0) Input data frame xn(N-4) x(N-3) xn(N-2) xn(N-1) xn(N-4) xn(N-3) xn(N-2) xn(N-1) xn(0) xn(0) Input data frame
xn_re
xn_im
xn_index
busy
Figure Continuous Processing with Bit/Digit Reversed Order (Input Data Timing clock cycle offset")
www.xilinx.com
xk(0) xk(0)
edone
done
xk(0) xk(0) Digit-reversed output previously entered frame Digit-reversed output data frame
xip227
xk_re
xk_im
xk_index
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Known Device-Specific Constraints
This section details issues which encountered when mapping core particular device. many cases, possible work around these issues adjusting core configuration without having alter target FPGA device.
Spartan-3 FPGA Constraints
Explanation
these device architectures, multiplier site adjacent location 512x36 block component must remain free because interconnect resource sharing between multipliers block RAMs. This means that adjacent block used only bits wide when multiplier used.
Error Message
During place route, Place tool generates message similar following:
ERROR:Place:341 design contains Block components that configured 512x36 Block RAMs Multiplier components. Multiplier site adjacent location 512x36 Block component must remain free because resource sharing. Therefore device must have least Multiplier sites this design fit. current device only Multiplier sites.
Placer errors also present.
Solution
There number solutions this issue:
Reduce input data width and/or phase factor width bits bits, respectively, allow adjacent block RAMs multipliers used uses Pipelined, Streaming architecture, reduce value Number Stages Using Block parameter reduce number block RAMs required. This would increase number slices used core. uses Burst architecture, distributed data phase factor memory, hybrid memory optimization available) reduce number block RAMs required. This would increase number slices used core. unscaled implemented, utilize scaled block floating-point instead reduce output width bits less allow adjacent block RAMs multipliers used. larger device with more block multiplier components.
Spartan-3A FPGA Constraints
Explanation
Spartan-3A device split left-most right-most XtremeDSP slice columns accommodate clock tiles. complex multipliers core dedicated cascade routing between XtremeDSP slices enable high performance reduce power consumption. cascade routing cannot cross clock tile these particular columns. densely-packed devices where many XtremeDSP slices have been used, placer have option attempt place cascaded XtremeDSP slices these split columns, which possible. This occur multichannel Burst FFTs large Pipelined, Streaming FFTs.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Error Message
During place route, Place tool generates message similar following:
WARNING:Place:119 Unable find location. DSP48A component blk00000003/blk00002ec4 placed. WARNING:Place:119 Unable find location. DSP48A component blk00000003/blk00002ec9 placed. WARNING:Place:119 Unable find location. DSP48A component blk00000003/blk00002ec8 placed. Comps belong structure: Multiplier Cascade number instance names listed> ERROR:Place:120 There were enough sites place selected components.
Solutions
option optimize complex multipliers speed using XtremeDSP slices been checked, un-checking this option will fewer XtremeDSP slices, permit packing device. This impact maximum achievable clock frequency. option optimize butterflies using XtremeDSP slices been checked, un-checking this option will free XtremeDSP slice locations which allow placement succeed. This impact maximum achievable clock frequency. Reduce data phase factor widths until number XtremeDSP slices reduced. Because phase factor width increased internally, reducing bits less will allow smaller complex multiplier architecture utilized. This will impact maximum achievable clock frequency (and improve it), yields small reduction data precision. larger device. left-most right-most columns XC3SD1800A device shorter XtremeDSP slices) than equivalent columns XC3SD3400A device XtremeDSP slices). [Ref further details Spartan-3A XtremeDSP slices.
Performance Resource Usage
following tables list resource usage transform time selected parameters. This core does placement constraints, allowing Place Route full flexibility. slice count, block count, XtremeDSPslice count listed. maximum clock frequency listed with transform latency. latency from asserting START input last sample output data coming core, assuming that UNLOAD input asserted soon possible present. following device architectures represented:
"Virtex-6 FPGA Family" "Virtex-5 FPGA Family" "Spartan-6 Family" "Spartan-3A Family"
maximum clock frequency each test determined iteratively. determination maximum frequency, core generated with double registers each input output. registers directly connected core core clock, whereas outer registers separate clock. This ensures that paths core included timing constraint without artificially distorting design chip. slowest speed grade used each family. parameters used follows: high high
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
maximum achievable clock frequency resource counts also affected other tools options, additional logic FPGA device, using different version Xilinx tools, other factors. Improved performance resource usage achieved apply area group, using arguments such "-lc area". Consult 11.2 software documentation more details available options. When comparing performance resource usage with Fast Fourier Transform v5.0, note that option used arguments above, leading higher slice counts, improved performance.
Virtex-6 FPGA Family
Table shows performance resource usage numbers Virtex-6 FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown table. None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "PREVIEW 0.63 2009-04-27.'
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Table Virtex-6 FPGA Family Performance Resource Utilization
Stages Using Block Clock Frequency Cyclic Prefix Insertion Optimize Speed
Latency (cycles)
12453 12473 26804 26826 12453 12453 26804 26804 7364 7354 15575 15564 7364 7364 7364 15575 15575 15575 1652 1670 1670
Phase Factor Width
Variable Point Size
Output Ordering
Block RAMs
XtremeDSP Slices
Rounding Mode
Implementation
XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T
1841 2482 1951 2602 1018 1099 1604 2789 4542 1732 3003 4940
1031 1033 1084 1086 2603 4699 2674 4794 1204 1090 1265 1153 1978 3526 6622 2077 3701 6949
28.89 28.03 69.08 70.59 30.37 29.86 69.08 70.54 17.40 18.62 40.14 39.40 16.55 17.96 17.96 38.65 37.99 45.94 3.91 3.81 4.23
XC6VLX130T 1847 XC6VLX240T 4091 XC6VLX130T 1961 XC6VLX240T 4218 XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T 1029 1106
Baseband 3GPP
XC6VLX130T 1617 XC6VLX130T 2807 XC6VLX240T 5899 XC6VLX130T 1743 XC6VLX130T 3016 XC6VLX240T 6312 XC6VLX75T XC6VLX75T XC6VLX75T
OFDM
www.xilinx.com
DS260 June 2009 Product Specification
Latency (10)
Input Data Width
Memory Type
Scaling Type
LUT/FF Pairs
Application
Xilinx Part
Point Size
Channels
LUTs
Fast Fourier Transform v7.0
Table Virtex-6 FPGA Family Performance Resource Utilization (Cont'd)
Stages Using Block Clock Frequency Cyclic Prefix Insertion Optimize Speed
Latency (cycles)
2179 2167 2179 2179 3207 3203 3216 2181 2171 2171 3199 3195 3223 3223 3225 12445 12441 24748 24746 5800 24758 24745 49341 49327 1411 5529 22703 2225 9427 41205 3169 14441 65649 3209 12445
Phase Factor Width
Variable Point Size
Output Ordering
Block RAMs
XtremeDSP Slices
Rounding Mode
Implementation
XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T
3196 3909 3930 3092 3319 3361 4177 3273 2286 3232 2394 2460 4703 4727 5100 4109 4279 4639 4985 6278 7918 6839 8699 3211 3359 3443 2005 2074 2141 1862 1928 2009 3524 4342 5201
3126 3865 3834 3003 3224 3265 4083 3213 2221 3189 2297 2399 4585 4623 4995 4006 4202 4492 4837 6185 7838 6751 8585 3164 3299 3387 1981 2046 2099 1843 1901 1967 3444 4236 5082
4848 5331 4885 4829 4991 5431 5949 4844 3307 4173 3434 3892 7335 7919 8122 6207 6953 6908 7936 1224 9394 10738
5.31 4.87 5.06 5.41 7.58 7.81 7.98 4.90 4.88 5.13 7.30 7.29 9.13 8.62 8.96 36.71 32.06 63.78 81.67 13.03 70.74 79.82 162.84 170.09 3.71 14.00 58.51 5.00 22.29 95.60 7.49 36.56 160.12 2.02 7.96 33.28
Scanners
(11)
10160 11562 4282 4432 4533 2618 2687 2743 2511 2574 2621 5161 6516 7962
Test
DS260 June 2009 Product Specification
www.xilinx.com
Latency (10)
Input Data Width
Memory Type
Scaling Type
LUT/FF Pairs
Application
Xilinx Part
Point Size
Channels
LUTs
Fast Fourier Transform v7.0
Table Virtex-6 FPGA Family Performance Resource Utilization (Cont'd)
Stages Using Block Clock Frequency Cyclic Prefix Insertion Optimize Speed
Latency (cycles)
3446 3451 3446 3446 3446 131256 131256 98497
Phase Factor Width
Variable Point Size
Output Ordering
Block RAMs
XtremeDSP Slices
Rounding Mode
Implementation
XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T XC6VLX75T
2031 1617 2096 2392 2460 2543 2617 8376
2023 1593 2087 2375 2451 2522 2591 8257
2450 2144 2474 2876 2900 2833 2845
8.60 7.76 8.15 8.40 8.40 364.6 371.83 386.26
Radar
10698
Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst I/O. Scaling types: scaled; unscaled; block floating point; single precision floating point Rounding modes: convergent rounding; truncation. Output ordering: Natural Order; Bit/Digit Reversed Order. Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures, output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic Virtex-6 FPGAs have block RAMs that packed pairs form block RAMs. reports number block RAMs block RAMs, which match number block RAMs given here. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound.
Virtex-5 FPGA Family
Table shows performance resource usage numbers Virtex-5 FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown Table None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "PRODUCTION 1.65 2009-04-27."
www.xilinx.com
DS260 June 2009 Product Specification
Latency (10)
Input Data Width
Memory Type
Scaling Type
LUT/FF Pairs
Application
Xilinx Part
Point Size
Channels
LUTs
Fast Fourier Transform v7.0
Table Virtex-5 Family Performance Resource Utilization
Latency (Clock Cycles) Stages Using Block Clock Frequency Configurable Point Size Cyclic Prefix Insertion Optimize speed Phase Factor Width Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation
Input Data Width
Memory Type
XC5VSX95T 1179 XC5VSX95T 1181 XC5VSX95T 1287 XC5VSX95T 1288 XC5VSX95T 2772 XC5VLX330 4889
1906 3302 2034 2776 1114 1205 1799 3169 5909 1943 3419 6371
1031 1033 1084 1086 2603 4699 2674 4794 1271 1100 1333 1164 2102 3764 7088 2202 3940 7416
12453 12473 26804 26826 12453 12453 26804 26804 7364 7364 15575 15564 7364 7364 7364 15575 15575 15575 1652 1670 1670
30.90 31.58 70.54 62.24 33.30 42.94 74.46 92.43 20.12 17.66 41.64 37.96 21.22 21.72 26.68 43.26 46.91 68.61 4.35 4.47 4.47
Baseband 3GPP
XC5VSX95T 2906 XC5VLX330 5026
XC5VSX95T 1447 XC5VSX95T 1253 XC5VSX95T 1539 XC5VSX95T 1336 XC5VSX95T 2318 XC5VSX95T 4061 XC5VLX330 7544
XC5VSX95T 2475 XC5VSX95T 4316 XC5VLX330 8034
OFDM
XC5VSX95T 1142 XC5VSX95T
XC5VSX95T 1001
DS260 June 2009 Product Specification
www.xilinx.com
Latency (10)
Scaling Type
LUT/FF Paris
Application
Xilinx Part
Point size
Channels
LUTs
Fast Fourier Transform v7.0
Table Virtex-5 Family Performance Resource Utilization (Cont'd)
Latency (Clock Cycles) Stages Using Block Clock Frequency Configurable Point Size Cyclic Prefix Insertion Optimize speed Phase Factor Width Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation
Input Data Width
Memory Type
XC5VSX95T 5034 XC5VSX95T 5495 XC5VSX95T 5653 XC5VSX95T 4904 XC5VSX95T 5170 XC5VSX95T 5595 XC5VSX95T 6188 XC5VSX95T 4997 XC5VSX95T 3468 XC5VSX95T 4625 XC5VSX95T 3579 XC5VSX95T 4048 XC5VSX95T 7574 XC5VSX95T 8141 XC5VSX95T 8551 XC5VSX95T 6378 XC5VSX95T 7156 XC5VSX95T 7168 XC5VSX95T 8220 XC5VSX95T 1316 XC5VSX95T 9786
3936 4600 4742 3708 4037 4115 4854 3853 2712 3791 2797 2972 5924 5971 6287 4890 5154 5440 5855 7105
4590 5077 4627 4571 4733 5151 5494 4447 3129 4222 3256 3692 6977 7539 7742 5845 6545 6492 7460 1224 8646 9885
2179 2167 2179 2179 3207 3203 3216 2181 2171 2171 3199 3195 3223 3223 3225 12445 12441 24748 24746 5800 24758 24745 49341 49327 1411 5529 22703 2225 9427 41205 3169 14441 65649 3209 12445
5.06 5.70 5.52 5.06 7.44 7.95 8.79 5.23 4.88 5.93 7.19 7.41 8.48 8.81 9.14 31.51 34.56 74.54 79.83 14.15 77.86 77.81 193.49 188.27 4.16 14.78 64.31 5.73 25.21 110.17 8.47 42.60 193.65 2.24 8.77 34.57
Scanners
(11)
XC5VSX95T 11178 8649 XC5VSX95T 10635 7771
9337
XC5VSX95T 11979 9434 10618 XC5VSX95T 4434 XC5VSX95T 4621 XC5VSX95T 4673 XC5VSX95T 2751 XC5VSX95T 2878 XC5VSX95T 2936 XC5VSX95T 2641 XC5VSX95T 2724 XC5VSX95T 2793 XC5VSX95T 5377 XC5VSX95T 6715 XC5VSX95T 8242 2771 2900 3034 1878 1942 2059 1698 1721 1886 4205 5105 6117 4243 4401 4494 2626 2697 2757 2511 2574 2621 4983 6209 7512
Test
www.xilinx.com
DS260 June 2009 Product Specification
Latency (10)
Scaling Type
LUT/FF Paris
Application
Xilinx Part
Point size
Channels
LUTs
Fast Fourier Transform v7.0
Table Virtex-5 Family Performance Resource Utilization (Cont'd)
Latency (Clock Cycles) Stages Using Block Clock Frequency Configurable Point Size Cyclic Prefix Insertion Optimize speed Phase Factor Width Output Ordering Block RAMs XtremeDSP Slices Rounding Mode Implementation
Input Data Width
Memory Type
XC5VSX95T 2813 XC5VSX95T 2259 XC5VSX95T 2887 XC5VSX95T 3158 XC5VSX95T 3297 XC5VSX95T 3332 XC5VSX95T 3369
2220 1451 2325 2532 2637 2753 2832
2614 2134 2614 3022 3022 2966 2978
3446 3451 3446 3446 3446 131256 131256 98497
9.93 8.89 9.93 9.93 10.64 463.80 423.41 447.71
Radar
XC5VSX95T 11687 9175
9727
Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst Scaling types: scaled; unscaled; block floating point; single precision floating point. Rounding modes: convergent rounding; truncation Output ordering: Natural Order; Bit/Digit Reversed Order Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic. Virtex-5 FPGAs have block RAMs that packed pairs form block RAMs. reports number block RAMs block RAMs, which match number block RAMs given here. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound.
Spartan-6 Family
Table shows performance resource usage numbers Spartan-6 FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown Table Some rows table grayed-out indicate that these cores would device FPGA resource requirements (typically insufficient pins route core signals outside device). None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "ADVANCED 0.94 2009-04-27."
DS260 June 2009 Product Specification
www.xilinx.com
Latency (10)
Scaling Type
LUT/FF Paris
Application
Xilinx Part
Point size
Channels
LUTs
Fast Fourier Transform v7.0
Table Spartan-6 Family Performance Resource Utilization Latency (clock cycles)
12453 12473 26804 26826 12453 26804 7364 7364 15575 15564 7364 7364 15575 15575 1652 1670 1670
clock frequency
Stages Using Block
Configurable Point Size
Cyclic Prefix Insertion
Optimize Speed
Phase Factor Width
Output Ordering
Rounding Mode
XtremeDSP slices
Implementation
Block RAMs
Input Data Width
XC6SLX150T XC6SLX150T XC6SLX150T XC6SLX150T
1784
1032 1034 1084 1086 2610
51.89 52.85 119.66 117.66 63.54
XC6SLX150T 1835 XC6SLX150T 2458 XC6SLX150T 1023 XC6SLX150T
Baseband 3GPP
1546
2674
127.64
1000 1077 1562 2699
1204 1090 1265 1153 1978 3526
30.18 31.20 68.31 71.07 32.30 34.74
XC6SLX150T 1109 XC6SLX150T
XC6SLX150T 1589 XC6SLX150T 2745 XC6SLX150T 1716 XC6SLX150T 3466 XC6SLX150T XC6SLX150T XC6SLX150T
1681 2672
2078 3702
71.12 74.17
OFDM
7.54 7.63 7.32
www.xilinx.com
DS260 June 2009 Product Specification
Latency(s) (10)
Memory Type
Scaling Type
LUT/FF Paris
Application
Xilinx Part
Point Size
Channels
LUTs
Fast Fourier Transform v7.0
Table Spartan-6 Family Performance Resource Utilization (Cont'd) Latency (clock cycles)
2203 2215 2203 2203 3231 3227 3240 2202 2171 2159 3199 3204 3251 3247 3253 12475 12471 24784 24785 5854
clock frequency
Stages Using Block
Configurable Point Size
Cyclic Prefix Insertion
Optimize Speed
Phase Factor Width
Output Ordering
Rounding Mode
XtremeDSP slices
Implementation
Block RAMs
Input Data Width
XC6SLX150T 3574 XC6SLX150T 5438 XC6SLX150T 4255 XC6SLX150T 3446 XC6SLX150T 3646 XC6SLX150T 3769 XC6SLX150T 4451 XC6SLX150T 3525 XC6SLX150T 2315 XC6SLX150T 2789 XC6SLX150T 2388 XC6SLX150T 2617 XC6SLX150T 5031 XC6SLX150T 5069 XC6SLX150T 5455 XC6SLX150T 4486 XC6SLX150T 4793 XC6SLX150T 5093 XC6SLX150T 5563 XC6SLX150T
3414 5355 4115 3279 3493 3619 4309 3398 2189 2693 2273 2485 4875 4860 5224 4328 4628 4865 5363
5425 7051 5462 5407 5568 6106 6527 5305 3296 3633 3423 4053 7904 8382 8585 6927 7827 7779 9029 1296
13.35 13.42 17.62 18.67 22.91 19.56 18.84 13.35 9.91 10.64 17.02 22.72 31.87 27.52 29.57 113.41 83.70 242.98 242.99 23.99
scanners
(11)
DS260 June 2009 Product Specification
www.xilinx.com
Latency(s) (10)
Memory Type
Scaling Type
LUT/FF Paris
Application
Xilinx Part
Point Size
Channels
LUTs
Fast Fourier Transform v7.0
Table Spartan-6 Family Performance Resource Utilization (Cont'd) Latency (clock cycles)
24797 24819 49380 49401 1435 5559 22739 2273 9487 41277 3175 14447 65655 3236 12481 3446 3451 3446 3466 3466 98515
clock frequency
Stages Using Block
Configurable Point Size
Cyclic Prefix Insertion
Optimize Speed
Phase Factor Width
Output Ordering
Rounding Mode
XtremeDSP slices
Implementation
Block RAMs
Input Data Width
XC6SLX150T 6831 XC6SLX150T 10030 XC6SLX150T 7414
6621 9878 7205
10490 13272 11248
243.11 166.57 525.32 449.10 10.71 41.49 241.90 13.22 46.73 239.98 14.50 63.36 381.72 5.47 27.42 122.36 16.98 16.28 17.58 17.07 17.68
XC6SLX150T 10700 10579 14104 XC6SLX150T 3343 XC6SLX150T 3513 XC6SLX150T 3609 XC6SLX150T 2061 XC6SLX150T 2136 XC6SLX150T 2206 XC6SLX150T 1898 XC6SLX150T 1943 XC6SLX150T 2016 XC6SLX150T 3749 XC6SLX150T 4703 XC6SLX150T 5642 XC6SLX150T 1963 XC6SLX150T 1656 XC6SLX150T 2101 XC6SLX150T 2544 XC6SLX150T 2601 XC6SLX150T 2512 XC6SLX150T 2578 XC6SLX150T 8451 3278 3433 3532 1993 2063 2118 1834 1878 1937 3623 4506 5434 1940 1603 2059 2496 2557 2464 2524 8303 4710 4869 4970 2770 2838 2894 2610 2671 2718 5667 7227 8890 2450 2142 2474 3002 3026 2833 2845 10525
Test
Radar
131256 1050.05 131256 979.52 965.83
Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst I/O. Scaling types: scaled; unscaled; block floating point; single precision floating point. Rounding modes: convergent rounding; truncation. Output ordering: Natural Order; Bit/Digit Reversed Order. Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures, output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic. Spartan-6 FPGAs have block RAMs that packed pairs form block RAMs. reports number block RAMs block RAMs, which match number block RAMs given here. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound.
www.xilinx.com
DS260 June 2009 Product Specification
Latency(s) (10)
Memory Type
Scaling Type
LUT/FF Paris
Application
Xilinx Part
Point Size
Channels
LUTs
Fast Fourier Transform v7.0
Spartan-3A Family
Table shows performance resource usage numbers Spartan-3A FPGAs. range cores shown several typical applications: Baseband 3GPP LTE, Baseband OFDM, scanners, Ultrasound, Test measurement, Radar. parameters each core shown Table Some rows table grayed-out indicate that these cores would device FPGA resource requirements (typically insufficient pins route core signals outside device). None optional pins (CE, SCLR, OVFLO) used. Hybrid used. performance resource usage numbers were produced using 11.2 software, with speed file version "PRODUCTION 1.33 2009-04-27"
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Table Spartan-3A Family Performance Resource Utilization Latency (clock cycles)
12453 12473 26804 26826 12453 26804 7364 7354 15575 15564 7364 7364 15575 15575 1679 1670 1670
clock frequency
Stages Using Block
Configurable Point Size
Cyclic Prefix Insertion
Optimize Speed
Phase Factor Width
Output Ordering
Rounding Mode
XtremeDSP slices
Implementation
Input Data Width
Block RAMs
Memory Type
XC3SD3400A XC3SD3400A XC3SD3400A XC3SD3400A
1004 1160 1130 2233
1062 1064 1117 1119 2640
61.34 61.44 155.84 149.03 66.24
XC3SD3400A 1710
Baseband 3GPP
XC3SD3400A 1805
2395
2718
136.76
XC3SD3400A XC3SD3400A XC3SD3400A XC3SD3400A
1397 1198 1515 1280 2245 3874
1273 1160 1339 1227 2049 3597
36.28 36.23 76.72 73.42 39.17 40.91
XC3SD3400A 1409 XC3SD3400A 2468
XC3SD3400A 1501 XC3SD3400A 2618
2412 4144
2155 3779
82.85 86.53
OFDM
XC3SD3400A 1197 XC3SD3400A XC3SD3400A
2063
1433
7.92 7.88 8.52
www.xilinx.com
DS260 June 2009 Product Specification
Latency(s)
Scaling Type
Application
Xilinx Part
Point Size
Channels
Slices
LUTs
Fast Fourier Transform v7.0
Table Spartan-3A Family Performance Resource Utilization (Cont'd) Latency (clock cycles)
2203 2215 2203 2203 3231 3227 3240 2202 2171 2159 3199 3204 3251 3247 3253 12475 12471 5854
clock frequency
Stages Using Block
Configurable Point Size
Cyclic Prefix Insertion
Optimize Speed
Phase Factor Width
Output Ordering
Rounding Mode
XtremeDSP slices
Implementation
Input Data Width
Block RAMs
Memory Type
XC3SD3400A 3408 XC3SD3400A 4440 XC3SD3400A 4373 XC3SD3400A 3204 XC3SD3400A 3494 XC3SD3400A 3600 XC3SD3400A 4144 XC3SD3400A 3277 XC3SD3400A 2151 XC3SD3400A 2307 XC3SD3400A 2230 XC3SD3400A 2445 XC3SD3400A 4844 XC3SD3400A 4884 XC3SD3400A 5280 XC3SD3400A 4303 XC3SD3400A 4561
5076 7339 7157 4521 5175 4997 6147 4654 3263 3693 3346 3443 7321 6891 7606 6261 6239
5167 6843 5358 5149 5310 5826 6072 4908 3118 3466 3245 3853 7546 8002 8412 6565 7419
10.85 13.42 16.44 10.39 15.92 17.16 16.53 11.23 10.24 11.02 14.61 15.11 18.90 19.68 19.72 61.45 69.28
scanners
(10)
XC3SD3400A
1158
1323
31.14
DS260 June 2009 Product Specification
www.xilinx.com
Latency(s)
Scaling Type
Application
Xilinx Part
Point Size
Channels
Slices
LUTs
Fast Fourier Transform v7.0
Table Spartan-3A Family Performance Resource Utilization (Cont'd) Latency (clock cycles)
24819 1435 5559 22739 2273 9487 41277 3175 14447 65655 3236 3466 3451 3446 3466 3466
clock frequency
Stages Using Block
Configurable Point Size
Cyclic Prefix Insertion
Optimize Speed
Phase Factor Width
Output Ordering
Rounding Mode
XtremeDSP slices
Implementation
Input Data Width
Block RAMs
Memory Type
XC3SD3400A 1691 XC3SD3400A 1529 XC3SD3400A 1764 XC3SD3400A 2098 XC3SD3400A 2170 XC3SD3400A 2085 XC3SD3400A 2156 2737 2069 2873 3402 3538 3476 3617 2441 2135 2465 3017 3040 2816 2830 18.44 18.36 20.03 21.01 21.01 841.38 841.38 XC3SD3400A 3170 XC3SD3400A 3298 XC3SD3400A 3367 XC3SD3400A 1893 XC3SD3400A 1968 XC3SD3400A 2021 XC3SD3400A 1682 XC3SD3400A 1746 XC3SD3400A 1785 XC3SD3400A 3566 XC3SD3400A 4439 4157 4322 4489 2526 2654 2795 2128 2200 2381 5200 6276 4679 4845 4938 2786 2883 2957 2606 2699 2752 5504 6932 7.97 33.69 126.33 11.60 50.46 219.56 16.89 78.85 364.75 5.25 18.81 XC3SD3400A 8047 12691 12600 159.10
Test
Radar
131256 131256
Implementations: Pipelined, Streaming I/O; Radix-4, Burst I/O; Radix-2, Burst I/O; Radix-2 Lite, Burst I/O. Scaling types: scaled; unscaled; block floating point; single precision floating point. Rounding modes: convergent rounding; truncation. Output ordering: Natural Order; Bit/Digit Reversed Order. Memory types: block RAM, distributed RAM. Applies data phase factor storage Burst architectures, output reorder buffer Pipelined, Streaming architecture. Optimize Speed using XtremeDSP slices both Complex Multipliers (4-multiplier structure) Butterfly Arithmetic. Area maximum clock frequencies provided guide. They vary with amount other logic FPGA device, tools options, other releases Xilinx implementation tools. Clock frequency does take jitter into account should de-rated amount appropriate clock source jitter specification. Latency clock cycles largest transform size. Latency microseconds largest transform size, when running maximum achievable clock frequency. Ultrasound.
www.xilinx.com
DS260 June 2009 Product Specification
Latency(s)
Scaling Type
Application
Xilinx Part
Point Size
Channels
Slices
LUTs
Fast Fourier Transform v7.0
Dynamic Range Characteristics
dynamic range characteristics shown performing slot noise tests. First, frame complex Gaussian noise data samples created. taken acquire spectrum data. create slot, range frequencies spectra zero. create input slot noise data frame, inverse taken, then data quantized full input dynamic range. Because quantization, perfect done frame, noise floor bottom slot nonzero. Input Data figures, which basically represent dynamic range input format, display this. This slot noise input data frame core shallow slot becomes finite precision arithmetic. depth slot shows dynamic range FFT. Figure through Figure show effect input data width dynamic range. FFTs have same width both data phase factors. Block floating-point arithmetic used with rounding after butterfly. figures show input data slot output data slot widths
X-Ref Target Figure
-100 -110 -120 -130 -140 BinNumber 1000
Figure Input Data: Bits
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Core Results: Bits
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
X-Ref Target Figure
-100 -110 -120 -130 -140 BinNumber 1000
Figure Input Data: Bits
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Core Results: Bits
X-Ref Target Figure
-100 -110 -120 -130 -140 BinNumber 1000
Figure Input Data: Bits
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Core Results: Bits
X-Ref Target Figure
-100 -110 -120 -130 -140 BinNumber 1000
Figure Input Data: Bits
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Core Results: Bits
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
X-Ref Target Figure
-100 -110 -120 -130 -140 BinNumber 1000
Figure Input Data: Bits
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Core Results: Bits There several options available that also affect dynamic range. Consider arithmetic type used.
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
Figure Figure Figure display results using unscaled, scaled (scaling 1/1024), block floating point. three FFTs 1024 point, Radix-4, Burst transforms with 16-bit input, 16bit phase factors, convergent rounding.
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Full-Precision Unscaled Arithmetic
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Scaled (scaling 1/N) Arithmetic
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Block Floating Point Arithmetic
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
After butterfly computation, LSBs data path truncated rounded. effects these options shown below Figure Figure Both transforms 1024 points with 16-bit data phase factors using block floating-point arithmetic.
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Convergent Rounding
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure Truncation illustration purposes, effect point size dynamic range displayed Figure through Figure FFTs these figures 16-bit input phase factors along with convergent rounding block floating-point arithmetic
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
X-Ref Target Figure
-100 -110 -120 -130 -140 Number
Figure 64-point Transform
X-Ref Target Figure
-100 -110 -120 -130 -140 1000 1200 Number 1400 1600 1800 2000
Figure 2048-point Transform
X-Ref Target Figure
-100 -110 -120 -130 -140 1000 2000 3000 4000 5000 Number 6000 7000 8000
Figure 8192-point Transform
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
preceding dynamic range plots show results Radix-4, Burst architecture. Figure Figure show plots Radix-2, Burst architecture. Both 16-bit input phase factors along with convergent rounding block floating point.
X-Ref Target Figure
-100 -110 -120 -130 -140 Number
Figure 64-point Radix-2 Transform
X-Ref Target Figure
-100 -110 -120 -130 -140 Number 1000
Figure 1024-point Radix-2 Transform
www.xilinx.com
DS260 June 2009 Product Specification
Fast Fourier Transform v7.0
References
Knight Kaiser, Simple Fixed-Point Error Bound Fast Fourier Transform, IEEE Trans. Acoustics, Speech Signal Proc., Vol. 615-620, December 1979. Rabiner Gold, Theory Application Digital Signal Processing, Prentice-Hall Inc., Englewood Cliffs, Jersey, 1975. Quang Hung Nguyen Istvan Kollar, Limited Dynamic Range Spectrum Analysis Round Errors FFT, available Szolik, Kovac, Smiesko, Influence Digital Signal Processing Precision Power Quality Parameters Measurement, available Xilinx, Inc., XtremeDSP DSP48A Spartan-3A FPGAs User Guide, UG431. Cooley Tukey, Algorithm Machine Computation Complex Fourier Series, Mathematics Computation, Vol. 297-301, April 1965. Proakis Manolakis, Digital Signal Processing Principles, Algorithms Applications Second Edition, Maxwell Macmillan International, York, 1992.
Support
Xilinx provides technical support www.xilinx.com/support this LogiCORE product when used described product documentation. Xilinx cannot guarantee timing, functionality, support product implemented devices that defined documentation, customized beyond that allowed product documentation, changes made section design labeled MODIFY. Refer Release Notes Guide (XTP025) further information this core. There will link then relevant core being designed with. each core, there master Answer Record that contains Release Notes Known Issues list core being used. following information listed each version core:
Features Fixes Known Issues
Ordering Information
core downloaded from Xilinx Center with Xilinx® CORE Generator software v11.2 higher. Xilinx CORE Generator software bundled with ISE® FoundationSoftware packages additional charge. Information about additional Xilinx® LogiCORE modules available Xilinx Center. order Xilinx software, contact your local Xilinx sales representative.
DS260 June 2009 Product Specification
www.xilinx.com
Fast Fourier Transform v7.0
Revision History
Date
03/28/03 07/14/03 12/11/03 05/21/04 11/11/04
Version
Xilinx release template.
Revision
Modified Figures through inclusive. Updated v2.1 release. Updated v3.0 release. Updated document support core v3.1 release updated performance resource utilization tables Virtex-II Virtex-II FPGAs. Also added performance resource utilization tables Virtex-4 FPGAs. Updated documentation v3.2 core release; updated performance resource utilization tables; updated v7.1i software. Corrected table XtremeDSP Slices, Updated v4.0 release. Updated v4.1 release. Updated v5.0 release. Updated v6.0 release. Updated v7.0 release.
8/31/05 1/11/06 11/30/06 02/15/07 10/10/07 09/19/08 06/24/09
Notice Disclaimer
Xilinx providing this design, code, information (collectively, "Information") "AS-IS" with warranty kind, express implied. Xilinx makes representation that Information, particular implementation thereof, free from claims infringement. responsible obtaining rights require implementation based Information. XILINX EXPRESSLY DISCLAIMS WARRANTY WHATSOEVER WITH RESPECT ADEQUACY INFORMATION IMPLEMENTATION BASED THEREON, INCLUDING LIMITED WARRANTIES REPRESENTATIONS THAT THIS IMPLEMENTATION FREE FROM CLAIMS INFRINGEMENT IMPLIED WARRANTIES MERCHANTABILITY FITNESS PARTICULAR PURPOSE. Except stated herein, none Information copied, reproduced, distributed, republished, downloaded, displayed, posted, transmitted form means including, limited electronic, mechanical, photocopying, recording, otherwise, without prior written consent Xilinx.
www.xilinx.com
DS260 June 2009 Product Specification

Other recent searches


SQM110N04-03 - SQM110N04-03   SQM110N04-03 Datasheet
NCP3712ASNT1 - NCP3712ASNT1   NCP3712ASNT1 Datasheet
ML6102 - ML6102   ML6102 Datasheet
LS125 - LS125   LS125 Datasheet
LD2985Axx - LD2985Axx   LD2985Axx Datasheet
LD2985Bxx - LD2985Bxx   LD2985Bxx Datasheet
GS8160E18BT-150 - GS8160E18BT-150   GS8160E18BT-150 Datasheet
GS8160E18BT-200 - GS8160E18BT-200   GS8160E18BT-200 Datasheet
GS816018BT-150 - GS816018BT-150   GS816018BT-150 Datasheet
GS816018BT-200 - GS816018BT-200   GS816018BT-200 Datasheet
GS74116ATP-12 - GS74116ATP-12   GS74116ATP-12 Datasheet
GS74116AJ-12 - GS74116AJ-12   GS74116AJ-12 Datasheet
GS8160E36BT-150 - GS8160E36BT-150   GS8160E36BT-150 Datasheet
GS8160E36BT-200 - GS8160E36BT-200   GS8160E36BT-200 Datasheet
GS816036BT-150 - GS816036BT-150   GS816036BT-150 Datasheet
GS816036BT-200 - GS816036BT-200   GS816036BT-200 Datasheet
GS71116ATP-10 - GS71116ATP-10   GS71116ATP-10 Datasheet
GS71116ATP-12 - GS71116ATP-12   GS71116ATP-12 Datasheet
GS71116ATP-8 - GS71116ATP-8   GS71116ATP-8 Datasheet
GS71116AJ-10 - GS71116AJ-10   GS71116AJ-10 Datasheet
GS71116AJ-12 - GS71116AJ-12   GS71116AJ-12 Datasheet
GS71116AJ-8 - GS71116AJ-8   GS71116AJ-8 Datasheet
GS82032AT-5 - GS82032AT-5   GS82032AT-5 Datasheet
GS84032AT-150 - GS84032AT-150   GS84032AT-150 Datasheet
FMR47 - FMR47   FMR47 Datasheet
EMM5077VU - EMM5077VU   EMM5077VU Datasheet

 

Privacy Policy | Disclaimer
© 2012 Datasheet Archive