NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
XAPP738 XC4VFX12 KB/16 UG018 ML403 PPC405 RAMB16 - Datasheet Archive
Code Acceleration with an APU Coprocessor: a Case Study of an LPM Algorithm R XAPP738 (v1.0) February 22, 2008 Summary Contact:
Application Note: Virtex-4 FPGA Family Code Acceleration with an APU Coprocessor: a Case Study of an LPM Algorithm R XAPP738 XAPP738 (v1.0) February 22, 2008 Summary Contact: Glenn Steiner In network address routing, an IP packet is routed to a specific destination based on its IP address. Frequently, a variant of the longest prefix match (LPM) algorithm is implemented. This algorithm looks up IP addresses in a routing table and finds a forwarding path for the incoming IP address based on the longest possible match from the routing table. This application note describes both hardware and software implementations of a simple LPM algorithm, consisting of a linear search of the routing table for the best match. The linear search sequentially compares each entry in the routing table to the IP address being looked up and remembers the longest matching prefix encountered. While more sophisticated algorithms can be implemented, the linear search provides a valid comparison between the various methods of storing and accessing the routing table. The storage and retrieval of the routing table data is implemented in software and hardware in four different ways: · Processor Local Bus (PLB) attached Double Data Rate (DDR) memory · PLB-attached block RAM · On-chip memory (OCM) attached block RAM · Auxiliary Processor Unit (APU) attached block RAM This application note compares the speeds of these software implementations. In addition, to accelerate the lookup, a coprocessor is implemented in the VirtexTM-4 FPGA to carry out each lookup as an APU user-defined instruction. This hardware implementation achieves a substantial improvement over the software implementations. Introduction An LPM algorithm is an essential part of a router. This application note describes hardware and software LPM implementations that use the APU interface. It also describes how the APU interface can be used as an alternative to DDR and OCM memories. This application note compares the performance of LPM lookups performed on both DDR- and APU-attached memories as well as on a custom APU-attached coprocessor used to accelerate lookups. This application note shows the benefits of using the APU interface to communicate with memory, especially when used in conjunction with application-specific hardware. The LPM algorithm provides a typical application of what users need in their own designs. This application note defines the LPM algorithm and the APU to Block RAM Fabric Coprocessor Module (FCM). It describes the implemented reference design and provides its performance results. Key Concepts Longest Prefix Match Algorithm The longest prefix match algorithm comprises a number of algorithms used to find the closest match of an IP address in a routing table. The input IP address that is being compared to the entries in the table is known as the search key. The 32-bit IP address specifies the destination of an IP packet in internet protocol routing. There can be more than one way for an IP packet to reach its final destination in a system. Thus, routers attempt to select the best intermediate destination (next hop) from any number of possible next hops. The LPM algorithm determines this next hop by performing a lookup in a routing table. © 2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. PowerPC is a trademark of IBM Corp. and is used under license. All other trademarks are the property of their respective owners. All specifications are subject to change without notice. XAPP738 XAPP738 (v1.0) February 22, 2008 www.xilinx.com 1 R Key Concepts In the linear search version of the LPM algorithm, every entry of the table is examined in sequential order. The only input is the search key, which is compared against each entry of the table. An entry becomes the new longest match if it matches the search key and has a prefix that is at least as long as any previous matches. When every entry in the table has been searched, the index pointing to the IP address with the longest matching prefix is known and can be used to fetch the entry, when needed. This index to the longest matching prefix entry is the output of the search. Routing Table Figure 1 shows the format of an entry in the routing table. The text in parentheses indicates the code name of each 32-bit field. X-Ref Target - Figure 1 32 32 IP Address (ip) Mask (mask) Valid 1 31 Destination Port up to 31 bits (valid_port) 32 Next Hop IP Address (next_hop) Each entry takes up 4 words in memory for a total of 128 bits. X738_01_101707 Figure 1: Routing Table Entry Format Each entry in the routing table has the same format. The first word of the entry is the IP address (ip). The search key (ip_to_find), which consists of a 32-bit IP address, is compared against this first word of the entry. The mask field (mask) is a 32-bit binary string starting with 1s at the most-significant bits and terminating in 0s, if any. Bits that are set to 1 are used in the comparison. For example, if a mask is composed of 28 1s followed by 4 0s, the 28 most-significant bits of the IP address and the search key are used in the comparison. The mask can only contain sequential 1s followed by 0s, if any. The most-significant bits are 1s until the first 0, at which point the remaining bits must also be 0s. In this application note, these values have been chosen as the mask: · 0xFFFFFFFF (all bits of the IP address word and the search key must match) · 0xFFFFFFF0 (the most-significant 28 bits of the IP address word and the search key must match) · 0xFFFFFF00 (the most-significant 24 bits of the IP address word and the search key must match) · 0xFFFFF000 (the most-significant 20 bits of the IP address word and the search key must match) · 0xFFFF0000 (the most-significant 16 bits of the IP address word and the search key must match) The mask ultimately determines the longest prefix. Even if an entry has an exact match to the search key, the length of the entry's mask determines how many bits are compared. The longest possible match is a 32-bit match with a mask of 0xFFFFFFFF. The third word in the routing table entry contains the Valid bit and the destination port number (valid_port). The Valid bit must be set for the entry to be considered valid and included in a lookup. The remaining 31 bits of this field specify a destination port for the matching IP address of the entry. In typical routers, much fewer than 31 bits are needed for this field because the number of physical ports on a router is relatively small. This reference design uses 31 bits to enable word alignment. The fourth word in the routing table entry is the next hop IP address (next_hop). It is not used in the comparison but is used in forwarding the packet. This field is not read during the comparison operation because it is not needed until after the comparison is completed. XAPP738 XAPP738 (v1.0) February 22, 2008 www.xilinx.com 2 R Key Concepts The routing table in this design uses 64 KB of space, which is nearly the entire 72 KB total space available on the XC4VFX12 XC4VFX12 device. The table was designed to be as large as possible to illustrate the different methods to store and access data. Each entry is 4 words (or 16 bytes), which results in 64 KB/16 KB/16 bytes, or 4K entries. Entry 0 is used as a no match found location, so the total number of entries in the table is 4095. The 16 most-significant bits of every entry's IP address are set to 1, followed by a 16-bit pseudorandom number. A typical entry has an IP address such as 0xFFFF7AC1. The pseudorandom number has a value between 0x1 and 0xFFFF. It is generated using a software implementation of a Linear Feedback Shift Register (LFSR). Therefore, each of the 4095 entries has a unique IP address. This code segment shows the generation of the pseudorandom number: Xuint32 LPM_random(Xuint32 number){ Xuint32 tap_1=0; Xuint32 tap_2=0; Xuint32 tap_3=0; Xuint32 tap_4=0; if(number & 0x8000) tap_1 = 1; if(number & 0x4000) tap_2 = 1; if(number & 0x1000) tap_3 = 1; if(number & 0x0008) tap_4 = 1; //the next random number in the sequence is generated by shifting //the number left 1 bit XORing the taps to achieve the LSB. return (tap_1 ^ tap_2) ^ tap_3) ^ tap_4) | (number