NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
MVTX2803 256K/512K MVTX2803AG 256/512K MVTX2800 MVTX2804 MVTX2802 MVTX2801 - Datasheet Archive
Unmanaged 8-Port 1000 Mbps Ethernet Switch Data Sheet Features · · · · · · February
MVTX2803 MVTX2803 Unmanaged 8-Port 1000 Mbps Ethernet Switch Data Sheet Features · · · · · · February 2003 Eight Gigabit Ports with GMII and PCS interface · Gigabit Port can also support 100/10 Mbps MII interface High Performance Layer 2 Packet Forwarding (23.808M packets per second) and Filtering at Full-Wire Speed Maximum throughput is 8 Gbps non-blocking Centralized shared-memory architecture Consists of two Memory Domains at 133 MHz · Frame Buffer Domain: Two banks of ZBTSRAM with 2M/4MB total · Switch Database Domain with 256K/512K 256K/512K SRAM Up to 64K MAC addresses to provide large node aggregation in wiring closet switches Ordering Information MVTX2803AG MVTX2803AG 596 Pin HSBGA -40°C to 85°C · · QoS Support · Traffic Classification · · · Classify traffic into 8 transmission priorities per port Supports Delay bounded, Strict Priority and WFQ Provides 2 level dropping precedence with WRED mechanism · · Supports IEEE 802.1p/Q Quality of Service with 8 Priority Buffer Management: reserve buffers on per class and per port basis Port-based Priority: VLAN Priority with Tagged frame can be overwritten by the priority of PVID SRAM 256/512K 256/512K SW Database MAC Table Frame Data Buffer B ZBT-SRAM (1M/2Mb) Frame Data Buffer A ZBT-SRAM (1M/2Mb) MVTX2803 MVTX2803 · User controlled thresholds for WRED Classification based on layer 2, 3 markings · VLAN Priority field in VLAN tagged frame · DS/TOS field in IP packet The precedence of above two classifications can be programmable 64-Bit 64-Bit 32-Bit SDB Interface FDB Interface LED Frame Engine Search Engine NM Database Schedule Management Module GMII /PCS Port 0 GMII /PCS Port 1 GMII /PCS Port 2 GMII /PCS Port 3 GMII /PCS Port 4 GMII /PCS Port 5 GMII /PCS Port 6 GMII /PCS Port 7 Serial / I2C Figure 1 - MVTX2803AG MVTX2803AG Block Diagram 1 MVTX2803 MVTX2803 · · · · · · · · · · Data Sheet QoS features can be configured on a per port basis Full Duplex Ethernet IEEE 802.3x Flow Control Provides Ethernet Multicast and Broadcast Control 4 Port Trunking groups, max of 3 ports per group (Trunking can be based on source MAC and/or destination MAC and source port) LED signals provided by a serial or parallel interface Synchronous Serial Interface and I2C interface in unmanaged mode. Hardware auto-negotiation through serial management interface (MDIO) for Gigabit Ethernet ports, supports 10/100/1000 Mbps BIST for internal and external SRAM-ZBT I2C EEPROM or synchronous serial port for configuration Packaged in 596-pin BGA Description The MVTX2800 MVTX2800 family is a group of 1000 Mbps non-blocking Ethernet switch chips with on-chip address memory. A single chip provides a maximum of eight 1000 Mbps ports and a dedicated CPU interface with a 16/8 bit bus for managed and unmanaged switch applications. The MVTX2800 MVTX2800 family consists of the following four products: · · · · MVTX2804 MVTX2804 MVTX2803 MVTX2803 MVTX2802 MVTX2802 MVTX2801 MVTX2801 8 8 4 4 Gigabit Gigabit Gigabit Gigabit ports ports ports ports Managed Unmanaged Managed Unmanaged The MVTX2803 MVTX2803 supports up to 64K MAC addresses to aggregate traffic from multiple wiring closet stacks. The centralized shared-memory architecture allows a very high performance packet-forwarding rate of 11.904M packets per second at full wire speed. The chip is optimized to provide a low-cost, high performance workgroup, and wiring closet, layer 2 switching solution with 8 Gigabit Ethernet ports. Two Frame Buffer Memory domains utilize cost effective, highperformance ZBT-SRAM with aggregated bandwidth of 16Gbps to support full wire speed on all external ports simultaneously. With Strict priority, Delay Bounded, and WRR transmission scheduling, plus WRED memory congestion scheme, the chip provides powerful QoS functions for convergent network multimedia and mission-critical applications. The chip provides 8 transmission priorities and 2 level drop precedence. Traffic is assigned its transmission priority and dropping precedence based on the frame VLAN Tag priority. The MVTX2803AG MVTX2803AG supports port trunking/load sharing on the 1000 Mbps ports with fail-over capability. The port trunking/load sharing can be used to group ports between interlinked switches to increase the effective network bandwidth. In full-duplex mode, IEEE 802.3x flow control is provided. The Physical Coding Sublayer (PCS) is integrated onchip to provide a direct 10-bit GMII interface, or the PCS can be bypassed to provide an interface to existing fiber-based Gigabit Ethernet transceivers. The MVTX2803AG MVTX2803AG is fabricated using 0.25µm technology. Inputs, however, are 3.3V tolerant and the outputs are capable of directly interfacing to LVTTL levels. The MVTX2803AG MVTX2803AG is packaged in a 596-pin Ball Grid Array package. 2 Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet Table of Contents 1.0 Block Functionality .15 1.1 Frame Data Buffer (FDB) Interfaces. 15 1.2 Switch Database (SDB) Interface. 15 1.3 GMII/PCS MAC Module (GMAC) . 15 1.4 Frame Engine . 15 1.5 Search Engine . 15 1.6 LED Interface. 15 1.7 Internal Memory . 15 2.0 System Configuration .15 2.1 I2C Interface . 16 2.1.1 Start Condition . 16 2.1.2 Address . 16 2.1.3 Data Direction . 16 2.1.4 Acknowledgment. 16 2.1.5 Data. 16 2.1.6 Stop Condition. 16 2.2 Synchronous Serial Interface . 16 2.2.1 Write Command . 17 2.2.2 Read Command . 17 3.0 Data Forwarding Protocol .17 3.1 Unicast Data Frame Forwarding. 17 3.2 Multicast Data Frame Forwarding . 18 4.0 Memory Interface.18 4.1 Overview . 18 4.2 Detailed Memory Information . 19 5.0 Search Engine .19 5.1 Search Engine Overview . 19 5.2 Basic Flow . 20 5.3 Search, Learning, and Aging . 20 5.3.1 MAC Search. 20 5.3.2 Learning . 20 5.3.3 Aging . 20 5.3.4 Data Structure . 20 6.0 Frame Engine.21 6.1 Data Forwarding Summary . 21 6.2 Frame Engine Details . 21 6.2.1 FCB Manager. 21 6.2.2 Rx Interface. 21 6.2.3 RxDMA. 21 6.2.4 TxQ Manager . 21 6.3 Port Control . 22 6.4 TxDMA. 22 7.0 Quality of Service and Flow Control.22 7.1 Model . 22 7.2 Four QoS Configurations. 23 7.3 Delay Bound . 24 7.4 Strict Priority and Best Effort . 24 7.5 Weighted Fair Queuing. 24 7.6 Shaper . 24 7.7 WRED Drop Threshold Management Support . 25 Zarlink Semiconductor Inc. iii MVTX2803 MVTX2803 Data Sheet Table of Contents 7.8 Buffer Management .25 7.8.1 Dropping When Buffers Are Scarce .26 7.9 MVTX2803AG MVTX2803AG Flow Control Basics .26 7.9.1 Unicast Flow Control .27 7.9.2 Multicast Flow Control .27 7.10 Mapping to IETF Diffserv Classes .27 8.0 Port Trunking . 28 8.1 Features and Restrictions .28 8.2 Unicast Packet Forwarding .28 8.3 Multicast Packet Forwarding.29 8.4 Preventing Multicast Packets from Looping Back to the Source Trunk .29 9.0 LED Interface. 29 9.1 Introduction .29 9.2 Serial Mode.29 9.3 Parallel Mode .30 9.4 LED Control Registers .30 10.0 Register Definition. 31 10.1 MVTX2803AG MVTX2803AG Register Description .31 10.2 Group 0 Address - MAC Ports Group .36 10.2.1 ECR1Pn: Port N Control Register .36 10.2.2 ECR2Pn: Port N Control Register .39 10.2.3 GGControl 0 Extra GIGA Port Control.40 10.2.4 GGControl 1 Extra GIGA Port Control.40 10.2.5 GGControl 2 Extra GIGA Port Control.41 10.2.6 GGControl 3 Extra GIGA Port Control.42 10.3 Group 1 Address - VLAN Group .42 10.3.1 AVTCL VLAN Type Code Register Low.42 10.3.2 AVTCH VLAN Type Code Register High .42 10.3.3 PVMAP00 PVMAP00_0 Port 00 Configuration Register 0 .43 10.3.4 PVMAP00 PVMAP00_3 Port 00 Configuration Register 3 .43 10.3.5 PVMODE.44 10.4 Group 2 Address - Port Trunking Group .45 10.4.1 TRUNK0_MODE Trunk group 0 and 1 mode.45 10.4.2 TRUNK1_MODE Trunk group 1 mode (Unmanaged Mode) .45 10.4.3 TX_AGE Tx Queue Aging timer .46 10.5 Group 4 Address - Search Engine Group .46 10.5.1 AGETIME_LOW MAC address aging time Low .46 10.5.2 AGETIME_HIGH MAC address aging time High .46 10.5.3 SE_OPMODE Search Engine Operation Mode .47 10.6 Group 5 Address - Buffer Control/QOS Group .47 10.6.1 FCBAT FCB Aging Timer .47 10.6.2 QOSC QOS Control .47 10.6.3 FCR Flooding Control Register .48 10.6.4 AVPML VLAN Priority Map.49 10.6.5 AVPMM VLAN Priority Map.49 10.6.6 AVPMH VLAN Priority Map .49 10.6.7 OSPML TOS Priority Map .50 10.6.8 TOSPMM TOS Priority Map .50 10.6.9 TOSPMH TOS Priority Map .50 10.6.10 AVDM VLAN Discard Map .51 10.6.11 TOSDML TOS Discard Map.52 iv Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet Table of Contents 10.6.12 BMRC - Broadcast/Multicast Rate Control. 52 10.6.13 UCC Unicast Congestion Control. 53 10.6.14 MCC Multicast Congestion Control . 53 10.6.15 PRG Port Reservation for Giga ports. 53 10.6.16 SFCB Share FCB Size. 54 10.6.17 C2RS Class 2 Reserved Size . 54 10.6.18 C3RS Class 3 Reserved Size . 55 10.6.19 C4RS Class 4 Reserved Size . 55 10.6.20 C5RS Class 5 Reserved Size . 55 10.6.21 C6RS Class 6 Reserved Size . 55 10.6.22 C7RS Class 7 Reserved Size . 56 10.6.23 QOSC00 QOSC00 BYTE_C2_G0. 56 10.6.24 QOSC01 QOSC01 BYTE_C3_G0. 56 10.6.25 QOSC02 QOSC02 BYTE_C4_G0. 56 10.6.26 QOSC03 QOSC03 BYTE_C5_G0. 56 10.6.27 QOSC04 QOSC04 BYTE_C6_G0. 57 10.6.28 QOSC05 QOSC05 BYTE_C7_G0. 57 10.6.29 QOSC06 QOSC06 BYTE_C2_G1. 57 10.6.30 QOSC07 QOSC07 BYTE_C3_G1. 57 10.6.31 QOSC08 QOSC08 BYTE_C4_G1. 58 10.6.32 QOSC09 QOSC09 BYTE_C5_G1. 58 10.6.33 QOSC0A BYTE_C6_G1 . 58 10.6.34 QOSC0B BYTE_C7_G1 . 58 10.6.35 QOSC0C BYTE_C2_G2 . 58 10.6.36 QOSC0D BYTE_C3_G2 . 59 10.6.37 QOSC0E BYTE_C4_G2 . 59 10.6.38 OSC0F BYTE_C5_G2 . 59 10.6.39 QOSC10 QOSC10 BYTE_C6_G2. 59 10.6.40 QOSC11 QOSC11 BYTE_C7_G2. 59 10.6.41 QOSC12 QOSC12 BYTE_C2_G3. 60 10.6.42 QOSC13 QOSC13 BYTE_C3_G3. 60 10.6.43 QOSC14 QOSC14 BYTE_C4_G3. 60 10.6.44 QOSC15 QOSC15 BYTE_C5_G3. 60 10.6.45 QOSC16 QOSC16 BYTE_C6_G3. 60 10.6.46 QOSC17 QOSC17 BYTE_C7_G3. 61 10.6.47 QOSC18 QOSC18 BYTE_C2_G4. 61 10.6.48 QOSC019 QOSC019 BYTE_C3_G4. 61 10.6.49 QOSC1A BYTE_C4_G4 . 61 10.6.50 QOSC1B BYTE_C5_G4 . 61 10.6.51 QOSC1C BYTE_C6_G4 . 62 10.6.52 QOSC1D BYTE_C7_G4 . 62 10.6.53 QOSC1E BYTE_C2_G5 . 62 10.6.54 QOSC1F BYTE_C3_G5. 62 10.6.55 QOSC20 QOSC20 BYTE_C4_G5. 62 10.6.56 QOSC21 QOSC21 BYTE_C5_G5. 63 10.6.57 QOSC22 QOSC22 BYTE_C6_G5. 63 10.6.58 QOSC23 QOSC23 BYTE_C7_G5. 63 10.6.59 QOSC24 QOSC24 BYTE_C2_G6. 63 10.6.60 QOSC25 QOSC25 BYTE_C3_G6. 64 10.6.61 QOSC26 QOSC26 BYTE_C4_G6. 64 10.6.62 QOSC27 QOSC27 BYTE_C5_G6. 64 10.6.63 QOSC28 QOSC28 BYTE_C6_G6. 64 Zarlink Semiconductor Inc. v MVTX2803 MVTX2803 Data Sheet Table of Contents 10.6.64 QOSC29 QOSC29 BYTE_C7_G6 .64 10.6.65 QOSC2A BYTE_C2_G7.65 10.6.66 QOSC2B BYTE_C3_G7.65 10.6.67 QOSC2C BYTE_C4_G7 .65 10.6.68 QOSC2D BYTE_C5_G7 .65 10.6.69 QOSC2E BYTE_C6_G7.65 10.6.70 QOSC2F BYTE_C7_G7.66 10.6.71 QOSC33 QOSC33 CREDIT_C0_G0 .66 10.6.72 QOSC34 QOSC34 CREDIT_C1_G0 .67 10.6.73 QOSC35 QOSC35 CREDIT_C2_G0 .67 10.6.74 QOSC36 QOSC36 CREDIT_C3_G0 .67 10.6.75 QOSC37 QOSC37 CREDIT_C4_G0 .68 10.6.76 QOSC38 QOSC38 CREDIT_C5_G0 .68 10.6.77 QOSC39 QOSC39 CREDIT_C6_G0 .68 10.6.78 QOSC3A CREDIT_C7_G0.68 10.6.79 QOSC3B CREDIT_C0_G1.68 10.6.80 QOSC3C CREDIT_C1_G1 .69 10.6.81 QOSC3D CREDIT_C2_G1 .70 10.6.82 QOSC3E CREDIT_C3_G1.70 10.6.83 QOSC3F CREDIT_C4_G1.70 10.6.84 QOSC40 QOSC40 CREDIT_C5_G1 .70 10.6.85 QOSC41 QOSC41 CREDIT_C6_G1 .70 10.6.86 QOSC42 QOSC42 CREDIT_C7_G1 .70 10.6.87 QOSC43 QOSC43 CREDIT_C0_G2 .71 10.6.88 QOSC44 QOSC44 CREDIT_C1_G2 .71 10.6.89 QOSC45 QOSC45 CREDIT_C2_G2 .72 10.6.90 QOSC46 QOSC46 CREDIT_C3_G2 .72 10.6.91 QOSC47 QOSC47 CREDIT_C4_G2 .72 10.6.92 QOSC48 QOSC48 CREDIT_C5_G2 .72 10.6.93 QOSC49 QOSC49 CREDIT_C6_G2 .73 10.6.94 QOSC4A CREDIT_C7_G2.73 10.6.95 QOSC4B CREDIT_C0_G3.73 10.6.96 QOSC4 CREDIT_C1_G3 .74 10.6.97 QOSC4D CREDIT_C2_G3 .74 10.6.98 QOSC4E CREDIT_C3_G3.74 10.6.99 QOSC4F CREDIT_C4_G3.75 10.6.100 QOSC50 QOSC50 CREDIT_C5_G3 .75 10.6.101 QOSC51 QOSC51 CREDIT_C6_G3 .75 10.6.102 QOSC52 QOSC52 CREDIT_C7_G3 .75 10.6.103 QOSC53 QOSC53 CREDIT_C0_G4 .75 10.6.104 QOSC54 QOSC54 CREDIT_C1_G4 .76 10.6.105 QOSC55 QOSC55 CREDIT_C2_G4 .77 10.6.106 QOSC56 QOSC56 CREDIT_C3_G4 .77 10.6.107 QOSC57 QOSC57 CREDIT_C4_G4 .77 10.6.108 QOSC58 QOSC58 CREDIT_C5_G4 .77 10.6.109 QOSC59 QOSC59 CREDIT_C6_G4 .77 10.6.110 QOSC5A CREDIT_C7_G4.77 10.7.114 QOSC5B CREDIT_C0_G5.79 10.7.115 QOSC5C CREDIT_C1_G5 .79 10.7.116 QOSC5D CREDIT_C2_G5 .80 10.7.117 QOSC5E CREDIT_C3_G5.80 10.7.118 QOSC5F CREDIT_C4_G5.80 vi Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet Table of Contents 10.7.119 QOSC60 QOSC60 CREDIT_C5_G5. 80 10.7.120 QOSC61 QOSC61 CREDIT_C6_G5. 81 10.7.121 QOSC62 QOSC62 CREDIT_C7_G5. 81 10.7.122 QOSC63 QOSC63 CREDIT_C0_G6. 81 10.7.123 QOSC64 QOSC64 CREDIT_C1_G6. 82 10.7.124 QOSC65 QOSC65 CREDIT_C2_G6. 82 10.7.125 QOSC66 QOSC66 CREDIT_C3_G6. 82 10.7.126 QOSC67 QOSC67 CREDIT_C4_G6. 83 10.7.127 QOSC68 QOSC68 CREDIT_C5_G6. 83 10.7.128 QOSC69 QOSC69 CREDIT_C6_G6. 83 10.7.129 QOSC6A CREDIT_C7_G6 . 83 10.7.130 QOSC6B CREDIT_C0_G7 . 84 10.7.131 QOSC6C CREDIT_C1_G7 . 84 10.7.132 QOSC6D CREDIT_C2_G7 . 85 10.7.133 QOSC6E CREDIT_C3_G7 . 85 10.7.134 QOSC6F CREDIT_C4_G7. 85 10.7.135 QOSC70 QOSC70 CREDIT_C5_G7. 85 10.7.136 QOSC71 QOSC71 CREDIT_C6_G7. 86 10.7.137 QOSC72 QOSC72 CREDIT_C7_G7. 86 10.7.138 QOSC73 QOSC73 TOKEN_RATE_G0. 86 10.7.139 QOSC74 QOSC74 TOKEN_LIMIT_G0 . 86 10.7.140 QOSC75 QOSC75 TOKEN_RATE_G1. 86 10.7.141 QOSC76 QOSC76 TOKEN_LIMIT_G1 . 87 10.7.142 QOSC77 QOSC77 TOKEN_RATE_G2. 87 10.7.143 QOSC78 QOSC78 TOKEN_LIMIT_G2 . 87 10.7.144 QOSC79 QOSC79 TOKEN_RATE_G3. 87 10.7.145 QOSC7A TOKEN_LIMIT_G3. 88 10.7.146 QOSC7B TOKEN_RATE_G4 . 88 10.7.147 QOSC7C TOKEN_LIMIT_G4. 88 10.7.148 QOSC7D TOKEN_RATE_G5 . 88 10.7.149 QOSC7E TOKEN_LIMIT_G5. 89 10.7.150 QOSC7F TOKEN_RATE_G6. 89 10.7.151 QOSC80 QOSC80 TOKEN_LIMIT_G6 . 89 10.7.152 QOSC81 QOSC81 TOKEN_RATE_G7. 89 10.7.153 QOSC82 QOSC82 TOKEN_LIMIT_G7 . 90 10.7.154 RDRC0 WRED Rate Control 0. 90 10.7.155 RDRC1 WRED Rate Control 1. 90 10.8 Group 6 Address - MISC Group . 91 10.8.1 MII_OP0 MII Register Option 0 . 91 10.8.2 MII_OP1 MII Register Option 1 . 91 10.8.3 FEN Feature Register . 91 10.8.4 MIIC0 MII Command Register 0. 92 10.8.5 MIIC1 MII Command Register 1. 92 10.8.6 MIIC2 MII Command Register 2. 92 10.8.7 MIIC3 MII Command Register 3. 93 10.8.8 MIID0 MII Data Register 0. 93 10.8.9 MIID1 MII Data Register 0. 93 10.8.10 LED Mode LED Control. 93 10.8.11 CHECKSUM - EEPROM Checksum. 95 10.8.12 LED User. 96 10.8.13 LEDUSER0 . 96 10.8.14 LEDUSER1 . 96 Zarlink Semiconductor Inc. vii MVTX2803 MVTX2803 Data Sheet Table of Contents 10.8.15 LEDUSER2/LEDSIG2 .96 10.8.16 EDUSER3/LEDSIG3 .97 10.8.17 LEDUSER4/LEDSIG4 .98 10.8.18 LEDUSER5/LEDSIG5 .98 10.8.19 LEDUSER6/LEDSIG6 .99 10.8.20 LEDUSER7/LEDSIG1_0 .100 10.8.21 MIINP0 MII Next Page Data Register 0.101 10.8.22 MIINP1 MII Next Page Data Register 1.101 10.9 Group F Address - CPU Access Group .101 10.9.1 GCR-Global Control Register .101 10.9.2 DCR-Device Status and Signature Register .102 10.9.3 DCR01-Giga port status .102 10.9.4 DCR23-Giga port status .103 10.9.5 DCR45-Giga port status .103 10.9.6 DCR67-Giga port status .104 10.9.7 DPST Device Port Status Register.105 10.9.8 DTST Data Read Back Register .105 11.0 BGA and Ball Signal Description . 106 11.1 BGA Views Views (Top-View).106 11.2 Power and Ground Distribution.107 11.3 Ball- Signal Descriptions .108 11.4 Ball Signal Name .119 11.5 AC/DC Timing .125 11.5.1 Absolute Maximum Ratings.125 11.5.2 DC Electrical Characteristics .125 11.5.3 Recommended Operation Conditions .126 11.6 Local Frame Buffer ZBT SRAM Memory Interface .126 11.6.1 Local ZBT SRAM Memory Interface A: .126 11.6.2 Local ZBT SRAM Memory Interface B: .127 11.7 Local Switch Database SBRAM Memory Interface.128 11.7.1 Local SBRAM Memory Interface: .128 11.8 AC Characteristics .129 11.8.1 Media Independent Interface.129 11.8.2 Gigabit Media Independent Interface .130 11.8.3 PCS Interface .131 11.8.4 LED Interface .132 11.8.5 MDIO Input Setup and Hold Timing .133 11.8.6 I2C Input Setup Timing.133 11.8.7 Serial Interface Setup Timing .134 viii Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet List of Figures Figure 1 - MVTX2803AG MVTX2803AG Block Diagram . 1 Figure 2 - Data Transfer Format for I2C Interface. 16 Figure 3 - MVTX2803AG MVTX2803AG SRAM Interface Block Diagram (DMAs for Gigabit Ports). 19 Figure 4 - Buffer Partition Scheme Used in the MVTX2803AG MVTX2803AG . 26 Figure 5 - Timing diagram for serial mode in LED interface. 30 Figure 6 - Local Memory Interface Input setup and hold timing . 126 Figure 7 - Local Memory Interface - Output valid delay timing. 126 Figure 8 - Local Memory Interface Input setup and hold timing . 127 Figure 9 - Local Memory Interface - Output valid delay timing. 127 Figure 10 - Local Memory Interface Input setup and hold timing . 128 Figure 11 - Local Memory Interface - Output valid delay timing. 128 Figure 12 - AC Characteristics Media Independent Interface . 129 Figure 13 - AC Characteristics Media Independent Interface . 129 Figure 14 - AC Characteristics- GMII . 130 Figure 15 - AC Characteristics Gigabit Media Independent Interface. 130 Figure 16 - AC Characteristics PCS Interface. 131 Figure 17 - AC Characteristics PCS Interface. 131 Figure 18 - AC Characteristics LED Interface . 132 Figure 19 - MDIO Input Setup and Hold Timing. 133 Figure 20 - MDIO Output Delay Timing. 133 Figure 21 - I2C Input Setup Timing . 133 Figure 22 - I2C Output Delay Timing. 133 Figure 23 - Serial Interface Setup Timing . 134 Figure 24 - Serial Interface Output Delay Timing. 134 Zarlink Semiconductor Inc. ix MVTX2803 MVTX2803 x Data Sheet Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet List of Tables Table 1 - Two-dimensional World Traffic .22 Table 2 - Four QoS configurations per port. .23 Table 3 - WRED Dropping Scheme .25 Table 4 - Mapping between MVTX2803AG MVTX2803AG and IETF Diffserv Classes for Gigabit Ports.27 Table 5 - MVTX2803AG MVTX2803AG Features Enabling IETF Diffserv Standards .28 Table 6 - AC Characteristics Local frame buffer ZBT-SRAM Memory Interface A .127 Table 7 - Local frame buffer ZBT-SRAM Memory Interface B .128 Table 8 - AC Characteristics Local Switch Database SBRAM Memory Interface.129 Table 9 - AC Characteristics Media Independent Interface .130 Table 10 - AC Characteristics Gigabit Media Independent Interface .131 Table 11 - AC Characteristics PCS Interface.132 Table 12 - AC Characteristics LED Interface .132 Table 13 - MDIO Timing.133 Table 14 - I2C Timing .134 Table 15 - Serial Interface Timing .134 Zarlink Semiconductor Inc. xi MVTX2803 MVTX2803 xii Data Sheet Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet 1.0 Block Functionality 1.1 Frame Data Buffer (FDB) Interfaces The FDB interface supports pipelined ZBT-SRAM memory at 133 MHz. To ensure a non-blocking switch, two memory domains are required. Each domain has a 64-bit wide memory bus. At 133 MHz, the aggregate memory bandwidth is 17 Gbps, which is enough to support 8 Gigabit ports at full wire speed switching. A patent pending scheme is used to access the FDB memory. Each slot has one tick to read or write 8 bytes. 1.2 Switch Database (SDB) Interface A pipelined synchronous burst SRAM (SBRAM) memory is used to store the switch database information including MAC Table. Search Engine accesses the switch database via SDB interface. The SDB bus has 32-bit wide bus at 133MHz. 1.3 GMII/PCS MAC Module (GMAC) The GMII/PCS Media Access Control (MAC) module provides the necessary buffers and control interface between the Frame Engine (FE) and the external physical device (PHY). The MVTX2803AG MVTX2803AG has two interfaces, GMII or PCS. The MAC of the MVTX2803AG MVTX2803AG meets the IEEE 802.3z specification and supports the MII interface. It is able to operate 10M/100M/1G 10M/100M/1G in Full Duplex mode with a back pressure/flow control mechanism. It has the options to insert Source Address/CRC/VLAN ID to each frame. The GMII/PCS Module also supports hot plug detection. 1.4 Frame Engine The main function of the frame engine is to forward a frame to its proper destination port or ports. When a frame arrives, the frame engine parses the frame header (64 bytes) and formulates a switching request which is sent to the search engine, to resolve the destination port. The arriving frame is moved to the FDB. After receiving a switch response from the search engine, the frame engine performs transmission scheduling based on the frame's priority. The frame engine forwards the frame to the MAC module when the frame is ready to be sent. 1.5 Search Engine The Search Engine resolves the frame's destination port or ports according to the destination MAC address (L2) by searching the database. It also performs MAC learning, priority assignment, and trunking functions. 1.6 LED Interface The LED interface can be operated in a serial mode or a parallel mode. In the serial mode, the LED interface uses 3 pins for carrying 8 port status signals. In the parallel mode, the interface can drive LEDs by 8 status pins. The LED port is shared with bootstrap pins. In order to avoid error when reading the bootstraps, a buffer must be used to isolate the LED circuitry from the bootstrap pins during bootstrap cycle (the bootstrap pins are sampled at the rising edge of the Reset). 1.7 Internal Memory Several internal tables are required and are described as follows: · · Frame Control Block (FCB) - Each FCB entry contains the control information of the associated frame stored in the FDB, e.g. frame size, read/write pointer, transmission priority, etc. MCT Link Table - The MCT Link Table stores the linked list of MCT entries that have collisions in the external MAC Table. 2.0 System Configuration The MVTX2803AG MVTX2803AG can be configured by EEPROM (24C02 24C02 or compatible) via an I2C interface at boot time, or via a synchronous serial interface during operation. Zarlink Semiconductor Inc. 15 MVTX2803 MVTX2803 2.1 Data Sheet I2C Interface The I2C interface uses two bus lines, a serial data line (SDA) and a serial clock line (SCL). The SCL carries the control signals that facilitate the transfer of information from the EEPROM to the switch. Data transfer is a bidirectional 8-bit serial at a rate of 50 Kbps. Data transfer is performed between master and slave IC using a request / acknowledgment style of protocol. The master IC generates the timing signals and terminates data transfer. The figure below shows the data transfer format. START SLAVE ADDRESS R/W ACK DATA 1 (8 bits) ACK DATA 2 (8 bits) ACK DATA M (8 bits) ACK STOP Figure 2 - Data Transfer Format for I 2C Interface 2.1.1 Start Condition Generated by the master, the MVTX2803AG MVTX2803AG. The bus is considered to be busy after the Start condition is generated. The Start condition occurs if, while the SCL line is High, there is a High-to-Low transition of the SDA. Other than in the Start condition (and Stop condition), the data on the SDA line must be stable during the High period of SCL. The High or Low state of SDA can only change when SCL is Low. In addition, when the I2C bus is free, both lines are High. 2.1.2 Address The first byte after the Start condition determines which slave the master will select. The slave in our case is the EEPROM. The first seven bits of the first data byte make up the slave address. 2.1.3 Data Direction The eighth bit in the first byte after the Start condition determines the direction (R/W) of the message. A master transmitter sets this bit to W; a master receiver sets this bit to R. 2.1.4 Acknowledgment Like all clock pulses, the master generates the acknowledgment-related clock pulse. However, the transmitter releases the SDA (High) during the acknowledgment clock pulse. Furthermore, the receiver must pull down the SDA during the acknowledge pulse so that it remains stable Low during the High period of this clock pulse. An acknowledgment pulse follows every byte transfer. If a slave receiver does not acknowledge after any byte, then the master generates a Stop condition and aborts the transfer. If a master receiver does not acknowledge after any byte, then the slave transmitter must release the SDA line to let the master generate the Stop condition. 2.1.5 Data After the first byte containing the address, all bytes that follow are data bytes. Each byte must be followed by an acknowledge bit. Data is transferred MSB-first. 2.1.6 Stop Condition Generated by the master, the MVTX2803AG MVTX2803AG. The bus is considered to be free after the Stop condition is generated. The Stop condition occurs if while the SCL line is High, there is a Low-to-High transition of the SDA. The I2C interface serves the function of configuring the MVTX2803AG MVTX2803AG at boot time. The master is the MVTX2803AG MVTX2803AG, and the slave is the EEPROM memory. 2.2 Synchronous Serial Interface The synchronous serial interface serves the function of configuring the MVTX2803AG MVTX2803AG not at boot time but via a PC. The PC serves as master and the MVTX2803AG MVTX2803AG serves as slave. The protocol for the synchronous serial 16 Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet interface is nearly identical to the I2C protocol. The main difference is that there is no acknowledgment bit after each byte of data transferred. The unmanaged MVTX2803AG MVTX2803AG uses a synchronous serial interface to program the internal registers. To reduce the number of signals required, the register address, command and data are shifted in serially through the PS_DO pin. PS_STROBE- pin is used as the shift clock. PS_DI- pin is used as data return path. Each command consists of four parts. · START pulse · Register Address · Read or Write command · Data to be written or read back Any command can be aborted in the middle by sending an ABORT pulse to the MVTX2803AG MVTX2803AG. A START command is detected when PS_DO is sampled high at PS_STROBE - leading edge, and PS_DO is sampled low when STROBE- falls. An ABORT command is detected when PS_DO is sampled low at PS_STROBE - leading edge, and PS_DO is sampled high when PS_STROBE - falls. 2.2.1 Write Command PS-STROBE2 extra clocks after last transfer PS_D0 A0 A1 START 2.2.2 A2 . A9 A10 A11 W ADDRESS D0 D1 D2 D3 D4 D5 D6 D7 DATA COMMAND Read Command PS-STROBE- PS_D0 A0 START A1 A2 . A9 A10 A11 R ADDRESS PS_DI DATA COMMAND D0 D1 D2 D3 D4 D5 D6 D7 All registers in the MVTX2803AG MVTX2803AG can be modified through this synchronous serial interface. 3.0 Data Forwarding Protocol 3.1 Unicast Data Frame Forwarding When a frame arrives, it is assigned a handle in memory by the Frame Control Buffer Manager (FCB Manager). A FCB handle will always be available, because of advance buffer reservations. The memory (ZBT-SRAM) interface is two 64-bit buses, connected to two ZBT-SRAM domains, A and B. The Receive (RxDMA) is responsible for multiplexing the data and the address. On a port's "turn," the RxDMA will Zarlink Semiconductor Inc. 17 MVTX2803 MVTX2803 Data Sheet move 8 bytes (or up to the end-of-frame) from the port's associated Receive FIFO (RxFIFO) into memory (Frame Data Buffer, or FDB). Once an entire frame has been moved to the FDB, and a good end-of-frame (EOF) has been received, the Rx interface makes a switch request. The RxDMA arbitrates among multiple switch requests. The switch request consists of the first 64 bytes of a frame, containing the source and destination MAC addresses of the frame. The search engine places a switch response in the switch response queue of the frame engine when done. Among other information, the search engine will have resolved the destination port of the frame and will have determined that the frame is unicast. After processing the switch response, the Transmission Queue Manager (TxQ manager) of the frame engine is responsible for notifying the destination port that it has a frame to forward. But first, the TxQ manager has to decide whether or not to drop the frame, based on global FDB reservations and usage, as well as TxQ occupancy at the destination. If the frame is not dropped, then the TxQ manager links the frame's FCB to the correct per-port-per-class TxQ. Unicast TxQ's are linked lists of transmission jobs, represented by their associated frames' FCB's. There is one linked list for each transmission class for each port. There are 8 classes for each of the 8 Gigabit ports a total of 32 unicast queues. The TxQ manager is responsible for scheduling transmission among the queues representing different classes for a port. When the port control module determines that there is room in the MAC Transmission FIFO (TxFIFO) for another frame, it requests the handle of a new frame from the TxQ manager. The TxQ manager chooses among the head-of-line (HOL) frames from the per-class queues for that port, using a Zarlink Semiconductor scheduling algorithm. At the transmit end, each of the 8 ports has time slots devoted solely to reading data from memory at the address calculated by port control. The Transmission DMA (TxDMA) is responsible for multiplexing the data and the address. On a port's turn, the TxDMA will move 8 bytes (or up to the EOF) from memory into the port's associated TxFIFO. After reading the EOF, the port control requests a FCB release for that frame. The TxDMA arbitrates among multiple buffer release requests. The frame is transmitted from the TxFIFO to the line. 3.2 Multicast Data Frame Forwarding After receiving the switch response, the TxQ manager has to make the dropping decision. A global decision to drop can be made, based on global FDB utilization and reservations. If so, then the FCB is released and the frame is dropped. In addition, a selective decision to drop can be made, based on the TxQ occupancy at some subset of the multicast packet's destinations. If so, then the frame is dropped at some destinations but not others, and the FCB is not released. If the frame is not dropped at a particular destination port, then the TxQ manager formats an entry in the multicast queue for that port and class. Multicast queues are physical queues (unlike the linked lists for unicast frames). There are 4 multicast queues for each of the 8 Gigabit ports. There is one multicast queue for every two unicast classes. During scheduling, the TxQ manager treats the unicast queue and the multicast queue of the same class as one logical queue. The port control requests a FCB release only after the EOF for the multicast frame has been read by all ports to which the frame is destined. 4.0 Memory Interface 4.1 Overview Figure 3 illustrates the first part of the ZBT-SRAM interface for the MVTX2803AG MVTX2803AG. As shown, two ZBT-SRAM banks, A and B, are used, with a 64-bit bus connected to each. Each DMA can read and write from both bank A and bank B. During each tick, two memory operations will take place in parallel one for bank A, and one for 18 Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet bank B. Because the clock frequency is 133 MHz, the total memory bandwidth is 128 bits × 133 MHz = 17 Gbps, for frame data buffer (FDB) access. In addition, the figure shows that the 8 Gigabit ports are actually grouped into sets of 4. If TxDMA 0 is using bank B during a given memory slot, then TxDMA's 1-3 will never be using bank A during this same slot. As a result, TxDMA's 0-3 can share the same bank selector. ZBT-SRAM Bank A TxDMA 0-1 TxDMA 2-3 TxDMA 4-5 ZBT-SRAM Bank B TxDMA 6-7 RxDMA 0-1 RxDMA 2-3 RxDMA 4-5 RxDMA 6-7 Figure 3 - MVTX2803AG MVTX2803AG SRAM Interface Block Diagram (DMAs for Gigabit Ports) 4.2 Detailed Memory Information Because the bus for each bank is 64 bits wide, frames are broken into 8-byte granules, written to and read from memory. The first 8-byte granule gets written to Bank A, the second 8-byte granule gets written to Bank B, and so on in alternating fashion. When reading frames from memory, the same procedure is followed, first from A, then from B, and so on. The reading and writing from alternating memory banks can be performed with minimal waste of memory bandwidth. For any speed port, in the worst case, a 1-byte-long EOF granule gets written to Bank A. This means that a 7-byte segment of Bank A bandwidth is idle, and furthermore, the next 8-byte segment of Bank B bandwidth is idle, because the first 8 bytes of the next frame will be written to Bank A, not B. This scenario results in a maximum 15 bytes of waste per frame, which is always acceptable because the interframe gap is 20 bytes. Search engine data is written to both banks in parallel. In this way, a search engine read operation could be performed by either bank at any time without a problem. 5.0 Search Engine 5.1 Search Engine Overview The MVTX2803AG MVTX2803AG search engine is optimized for high throughput searching, with enhanced features to support: · · · Up to 64K MAC addresses 4 groups of port trunking Traffic classification into 8 transmission priorities, and 2 drop precedence levels Zarlink Semiconductor Inc. 19 MVTX2803 MVTX2803 5.2 Data Sheet Basic Flow Shortly after a frame enters the MVTX2803AG MVTX2803AG and is written to the Frame Data Buffer (FDB), the frame engine generates a Switch Request, which is sent to the search engine. The switch request consists of the first 64 bytes of the frame, which contain all the necessary information for the search engine to perform its task. When the search engine is done, it writes to the Switch Response Queue, and the frame engine uses the information provided in that queue for scheduling and forwarding. In performing its task, the search engine extracts and compresses the useful information from the 64-byte switch request. Among the information extracted are the source and destination MAC addresses, the transmission and discard priorities and whether the frame is unicast or multicast. Requests are sent to the external SRAM Switch Database to locate the associated entries in the external MCT table. When all the information has been collected from external SRAM, the search engine has to compare the MAC address on the current entry with the MAC address for which it is searching. If it is not a match, the process is repeated on the internal MCT Table. All MCT entries, other than the first of each linked list, are maintained internal to the chip. If the desired MAC address is still not found, then the result is either learning (source MAC address unknown) or flooding (destination MAC address unknown). If the destination MAC address belongs to a port trunk, then the trunk number is retrieved instead of the port number. But on which port of the trunk will the frame be transmitted? This is easily computed using a hash of the source and destination MAC addresses. When all the information is compiled, the switch response is generated, as stated earlier. 5.3 Search, Learning, and Aging 5.3.1 MAC Search The search block performs source MAC address and destination MAC address searching. As indicated earlier, if a match is not found, then the next entry in the linked list must be examined, and so on until a match is found or the end of the list is reached. In port based VLAN mode, a bitmap is used to determine whether the frame should be forwarded to the outgoing port. The bitmap is not dynamic. Ports cannot enter and exit groups dynamically. The MAC search block is also responsible for updating the source MAC address timestamp, used for aging. 5.3.2 Learning The learning module learns new MAC addresses and performs port change operations on the MCT database. The goal of learning is to update this database as the networking environment changes over time. Learning and port change will be performed based on memory slot availability only. 5.3.3 Aging Aging time is controlled by register 400h and 401h. The aging module scans and ages MCT entries based on a programmable "age out" time interval. As indicated earlier, the search module updates the source MAC address and VLAN port association timestamps for each frame it processes. When an entry is ready to be aged, the entry is removed from the table. 5.3.4 Data Structure The MCT data structure is used when searching for MAC addresses. The structure is maintained by hardware in the search engine. The database is essentially a hash table, with collisions resolved by chaining. The database is partial external, and partial internal, as described earlier: the first MCT entry of each linked list is always located in the external SRAM, and the subsequent MCT's are located internally. 20 Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet 6.0 Frame Engine 6.1 Data Forwarding Summary Data enters the device at the RxMAC, the RxDMA will move the data from the MAC RxFIFO to the FDB. Data is moved in 8-byte granules in conjunction with the scheme for the SRAM interface. · · · · · A switch request is sent to the Search Engine. The Search Engine processes the switch request. A switch response is sent back to the Frame Engine and indicates whether the frame is unicast or multicast, and its destination port or ports. A Transmission Scheduling Request is sent in the form of a signal notifying the TxQ manager. Upon receiving a Transmission Scheduling Request, the device will format an entry in the appropriate Transmission Scheduling Queue (TxSch Q) or Queues. There are 8 TxSch Queues for each Gigabit port, one for each priority. Creation of a queue entry either involves linking a new job to the appropriate linked list if unicast, or adding an entry to a physical queue if multicast. When the port is ready to accept the next frame, the TxQ manager will get the head-of-line (HOL) entry of one of the TxSch Qs, according to the transmission scheduling algorithm (so as to ensure per-class quality of service). The unicast linked list and the multicast queue for the same port-class pair are treated as one logical queue. The TxDMA will pull frame data from the memory and forward it granule-by-granule to the MAC TxFIFO of the destination port. 6.2 Frame Engine Details This section briefly describes the functions of each of the modules of the MVTX2803AG MVTX2803AG frame engine. 6.2.1 FCB Manager The FCB manager allocates FCB handles to incoming frames, and releases FCB handles upon frame departure. The FCB manager is also responsible for enforcing buffer reservations and limits. The default values can be determined by referring to Chapter 8. In addition, the FCB manager is responsible for buffer aging, and for linking unicast forwarding jobs to their correct TxSch Q. The buffer aging can be enabled or disabled by the bootstrap pin and the aging time is defined in register FCBAT. 6.2.2 Rx Interface The Rx interface is mainly responsible for communicating with the RxMAC. It keeps track of the start and end of frame and frame status (good or bad). Upon receiving an end of frame that is good, the Rx interface makes a switch request. 6.2.3 RxDMA The RxDMA arbitrates among switch requests from each Rx interface. It also buffers the first 64 bytes of each frame for use by the search engine when the switch request has been made. 6.2.4 TxQ Manager First, the TxQ manager checks the per-class queue status and global Reserved resource situation, and using this information, makes the frame dropping decision after receiving a switch response. If the decision is not to drop, the TxQ manager requests that the FCB manager link the unicast frame's FCB to the correct per-port-perclass TxQ. If multicast, the TxQ manager writes to the multicast queue for that port and class. The TxQ manager can also trigger source port flow control for the incoming frame's source if that port is flow control enabled. Second, the TxQ manager handles transmission scheduling; it schedules transmission among the queues representing different classes for a port. Once a frame has been scheduled, the TxQ manager reads the FCB information and writes to the correct port control module. Zarlink Semiconductor Inc. 21 MVTX2803 MVTX2803 6.3 Data Sheet Port Control The port control module calculates the SRAM read address for the frame currently being transmitted. It also writes start of frame information and an end of frame flag to the MAC TxFIFO. When transmission is done, the port control module requests that the buffer be released. 6.4 TxDMA The TxDMA multiplexes data and address from port control, and arbitrates among buffer release requests from the port control modules. 7.0 Quality of Service and Flow Control 7.1 Model Quality of service (QoS) is an all-encompassing term for which different people have different interpretations. In this chapter, quality of service assurances means the allocation of chip resources so as to meet the latency and bandwidth requirements associated with each traffic class. There is nothing presupposed about the offered traffic pattern. If the traffic load is light, then ensuring quality of service is straightforward. But if the traffic load is heavy, the MVTX2803AG MVTX2803AG must intelligently allocate resources so as to assure quality of service for high priority data. The network manager must assign importance for the application types, such as voice, file transfer, or web browsing. The manager can then subdivide the applications into classes and set up a service contract with each. The contract may consist of bandwidth or latency assurances per class. Sometimes it may even reflect an estimate of the traffic mix offered to the switch, though this is not required. The table below shows examples of QoS applications with eight transmission priorities, including best effort traffic for which no bandwidth or latency assurances are provided. Class Example Assured Bandwidth (user defined) Low Drop Subclass (If class is oversubscribed, these packets are the last to be dropped) High Drop Subclass (If class is oversubscribed, these packets are the first to be dropped) Highest transmission priorities, P7 Latency < 200 µs 300 Mbps Sample application: control information Highest transmission priorities, P6 Latency < 200 µs 200 Mbps Sample applications: phone calls; circuit emulation Sample application: training video; other multimedia Middle transmission priorities, P5 Latency < 400 µs 125 Mbps Sample application: interactive activities Sample application: noncritical interactive activities Middle transmission priorities, P4 Latency < 800 µs 250 Mbps Sample application: web business Low transmission priorities, P3 Latency < 1600 µs 80 Mbps Sample application: file backups Low transmission priorities, P2 Latency < 3200 µs 45 Mbps Sample application: email Best effort, P1-P0 TOTAL Sample application: casual web browsing 1 Gbps Table 1 - Two-dimensional World Traffic 22 Sample application: web research Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet It is possible that a class of traffic may attempt to monopolize system resources by sending data at a rate in excess of the contractually assured bandwidth for that class. A well-behaved class offers traffic at a rate no greater than the agreed-upon rate. By contrast, a misbehaving class offers traffic that exceeds the agreed rate. A misbehaving class is formed from an aggregation of misbehaving microflows. To achieve high link utilization, a misbehaving class is allowed to use any idle bandwidth. However, the quality of service (QoS) received by well-behaved classes must never suffer. As Table 1 illustrates, each traffic class may have its own distinct properties and applications. As shown, classes may receive bandwidth assurances or latency bounds. In the example, P7, the highest transmission class, requires that all frames be transmitted within 0.2 ms, and receives 30% of the 1 Gbps of bandwidth at that port. Best-effort (P1-P0) traffic forms a lower tier of service that only receives bandwidth when none of the other classes have any traffic to offer. In addition, each transmission class has two subclasses, high-drop and low-drop. Well-behaved users should not lose packets. But poorly behaved users users who send data at too high a rate will encounter frame loss, and the first to be discarded will be high-drop. Of course, if this is insufficient to resolve the congestion, eventually some low-drop frames are dropped as well. Table 1 shows that different types of applications may be placed in different boxes in the traffic table. For example, web search may fit into the category of high-loss, high-latency-tolerant traffic, whereas VoIP fits into the category of low-loss, low-latency traffic. 7.2 Four QoS Configurations There are four basic pieces to QoS scheduling in the MVTX2803AG MVTX2803AG: strict priority (SP), delay bound, weighted fair queuing (WFQ), and best effort (BE). Using these four pieces, there are four different modes of operation, as shown in Table 2. P7 P6 P5 Op1 (default) SP SP P1 P0 WFQ Op4 P2 Delay Bound Op3 P3 Delay Bound Op2 P4 BE WFQ BE Table 2 - Four QoS configurations per port. The default configuration is six delay-bounded queues and two best-effort queues. The delay bounds per class are 0.16 ms for P7 and P6, 0.32 ms for P5, 0.64 ms for P4, 1.28 ms for P3, and 2.56 ms for P2. Best effort traffic is only served when there is no delay-bounded traffic to be served. P1 has strict priority over P0. There is a second configuration in which there are two strict priority queues, four delay bounded queues, and two best effort queues. The delay bounds per class are 0.32 ms for P5, 0.64 ms for P4, 1.28 ms for P3, and 2.56 ms for P2. If the user is to choose this configuration, it is important that P7-P6 (SP) traffic be either policed or implicitly bounded (e.g. if the incoming SP traffic is very light and predictably patterned). Strict priority traffic, if not admission-controlled at a prior stage to the MVTX2803AG MVTX2803AG, can have an adverse effect on all other classes' performance. P7 and P6 are both SP classes, and P7 has strict priority over P6. The third configuration contains two strict priority queues and six queues receiving a bandwidth partition via WFQ. As in the second configuration, strict priority traffic needs to be carefully controlled. In the fourth configuration, all queues are served using a WFQ service discipline Zarlink Semiconductor Inc. 23 MVTX2803 MVTX2803 7.3 Data Sheet Delay Bound In the absence of a sophisticated QoS server and signaling protocol, the MVTX2803AG MVTX2803AG may not be assured of the mix of incoming traffic ahead of time. To cope with this uncertainty, the delay assurance algorithm dynamically adjusts its scheduling and dropping criteria, guided by the queue occupancies and the due dates of their head-of-line (HOL) frames. As a result, latency bounds are assured for all admitted frames with high confidence, even in the presence of system-wide congestion. The algorithm identifies misbehaving classes and intelligently discards frames at no detriment to well-behaved classes. The algorithm also differentiates between high-drop and low-drop traffic with a weighted random early drop (WRED) approach. Random early dropping prevents congestion by randomly dropping a percentage of high-drop frames even before the chip's buffers are completely full, while still largely sparing low-drop frames. This allows high-drop frames to be discarded early, as a sacrifice for future low-drop frames. Finally, the delay bound algorithm also achieves bandwidth partitioning among classes. 7.4 Strict Priority and Best Effort When strict priority is part of the scheduling algorithm, if a queue has even one frame to transmit, it goes first. Two of the four QoS configurations include strict priority queues. The goal is for strict priority classes to be used for IETF expedited forwarding (EF), where performance guarantees are required. As indicated, it is important that strict priority traffic be either policed or implicitly bounded, so as to keep from harming other traffic classes. When best effort is part of the scheduling algorithm, a queue only receives bandwidth when none of the other classes have any traffic to offer. Two of the four QoS configurations include best effort queues. The goal is for best effort classes to be used for non-essential traffic, because there are no assurances about best effort performance. However, in a typical network setting, much best effort traffic will be transmitted, and with an adequate degree of expediency. Because there is not any delay assurances for best effort traffic, enforcement of latency by dropping best effort traffic is not provided. Furthermore, because it is assumed that strict priority traffic is carefully controlled before entering the MVTX2803AG MVTX2803AG, a fair bandwidth partition by dropping strict priority traffic is not enforced. To summarize, dropping to enforce quality of service (i.e. bandwidth or delay) does not apply to strict priority or best effort queues. It only drops frames from best effort and strict priority queues when global buffer resources become scarce. 7.5 Weighted Fair Queuing In some environments for example, in an environment in which delay assurances are not required, but precise bandwidth partitioning on small time scales is essential - WFQ may be preferable to a delay-bounded scheduling discipline. The MVTX2803AG MVTX2803AG provides the user with a WFQ option with the understanding that delay assurances cannot be provided if the incoming traffic pattern is uncontrolled. The user sets eight WFQ "weights" such that all weights are whole numbers and sum to 64. This provides per-class bandwidth partitioning with error within 2%. In WFQ mode, though frame latency is not assured, the MVTX2803AG MVTX2803AG still retains a set of dropping rules that helps to prevent congestion and trigger higher level protocol end-to-end flow control. As before, when strict priority is combined with WFQ, there are no special dropping rules for the strict priority queues, because the input traffic pattern is assumed to be carefully controlled at a prior stage. However, there is indeed drop frames from SP queues for global buffer management purposes. In addition, queues P1 and P0 are treated as best effort from a dropping perspective, though they still are assured a percentage of bandwidth from a WFQ scheduling perspective. What this means is that these particular queues are only affected by dropping when the global buffer count becomes low. 7.6 Shaper Although traffic shaping is not a primary function of the MVTX2803AG MVTX2803AG, the chip does implement a shaper for expedited forwarding (EF). The goal in shaping is to control the peak and average rate of traffic exiting the MVTX2803AG MVTX2803AG. Shaping is limited to class P6 (the second highest priority). This means that class P6 will be the class used for EF traffic. (By contrast, assume class P7 will be used for control packets only.) If shaping is 24 Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet enabled for P6, then P6 traffic must be scheduled using strict priority. With reference to Table 4, only the middle two QoS configurations may be used. Peak rate is set using a programmable whole number, no greater than 64 (register QOS-CREDIT_C6_Gn). For example, if the setting is 32, then the peak rate for shaped traffic is 32/64 × 1000 Mbps = 500 Mbps. Average rate is also a programmable whole number, no greater than 64, and no greater than the peak rate. For example, if the setting is 16, then the average rate for shaped traffic is 16/64 × 1000 Mbps = 250 Mbps. As a consequence of the above settings in the example, shaped traffic will exit the MVTX2803AG MVTX2803AG at a rate always less than 500 Mbps, and averaging no greater than 250 Mbps. Also, when shaping is enabled, it is possible for a P6 queue to explode in length if fed by a greedy source. The reason is that a shaper is by definition not work-conserving; that is, it may hold back from sending a packet even if the line is idle. Though there is global resource management, nothing is done to prevent this situation locally. This assumes SP traffic is policed at a prior stage to the MVTX2803AG MVTX2803AG. 7.7 WRED Drop Threshold Management Support To avoid congestion, the Weighted Random Early Detection (WRED) logic drops packets according to specified parameters. The following table summarizes the behavior of the WRED logic. P7 P6 P5 P4 P3 P2 High Drop Low Drop |P7| A KB |P6| B KB |P5| C KB |P4| D KB |P3| E KB |P2| F KB X% 0% Level 2 N 280 Y% Z% Level 3 N 320 100% 100% Level 1 N 240 Table 3 - WRED Dropping Scheme In the table, |Px| is the byte count in queue Px. The WRED logic has three drop levels, depending on the value of N, which is based on the number of bytes in the priority queues. If delay bound scheduling is used, N equals 16|P7| + 16|P6| + 8|P5| + 4|P4| + 2|P3| + |P2|. If WFQ scheduling is used, N equals |P7| + |P6| + |P5| + |P4| + |P3| + |P2|. Each drop level has defined high-drop and low-drop percentages, which indicate the percentage of high-drop and low-drop packets that will be dropped at that level. The X, Y, and Z percent parameters can be programmed using the registers RDRC0 and RDRC1. Parameters A-F are the byte count thresholds for each priority queue, and are also programmable. When using delay bound scheduling, the values selected for A-F also control the approximate bandwidth partition among the traffic classes; see application note. 7.8 Buffer Management Because the number of frame data buffer (FDB) slots is a scarce resource, and because it is desirable to ensure that one misbehaving source port or class cannot harm the performance of a well-behaved source port or class, the concept of buffer management was produced into the MVTX2803AG MVTX2803AG. The buffer management scheme is designed to divide the total buffer space into numerous reserved regions and one shared pool, (see Figure 4). As shown in the figure, the FDB pool is divided into several parts. A reserved region for temporary frames stores frames prior to receiving a switch response. Such a temporary region is necessary, because when the frame first enters the MVTX2803AG MVTX2803AG, its destination port and class are as yet unknown, and so the decision to drop or not needs to be temporarily postponed. This ensures that every frame can be received first before subjecting it to the frame drop discipline after classifying. Six reserved sections, one for each of the highest six priority classes, ensure a programmable number of FDB slots per class. The lowest two classes do not receive any buffer reservation. Another segment of the FDB reserves space for each of the 8 ports. These source port buffer reservations are programmable. These 8 reserved regions make sure that no well-behaved source port can be blocked by another misbehaving source port. Zarlink Semiconductor Inc. 25 MVTX2803 MVTX2803 Data Sheet In addition, there is a shared pool, which can store any type of frame. The registers related to the Buffer Management logic are: · · · · · · · · PRG- Port Reservation for Gigabit Ports SFCB- Share FCB Size C2RS- Class 2 Reserved Size C3RS- Class 3 Reserved Size C4RS- Class 4 Reserved Size C5RS- Class 5 Reserved Size C6RS- Class 6 Reserved Size C7RS- Class 7 Reserved Size Temporary Reservation RTMP Shared Pool S Per-Class Reservations RP7, RP6,.RP2 Per-Source Reservations 8-R1G Figure 4 - Buffer Partition Scheme Used in the MVTX2803AG MVTX2803AG 7.8.1 Dropping When Buffers Are Scarce Summarizing the two examples of local dropping discussed earlier in this chapter: · If a queue is a delay-bounded queue, we have a multilevel WRED drop scheme, designed to control delay and partition bandwidth in case of congestion. · If a queue is a WFQ-scheduled queue, we have a multilevel WRED drop scheme, designed to prevent congestion. In addition to these reasons for dropping, the MVTX2803AG MVTX2803AG also drops frames when global buffer space becomes scarce. The function of buffer management is to ensure that such droppings cause as little blocking as possible. 7.9 MVTX2803AG MVTX2803AG Flow Control Basics Because frame loss is unacceptable for some applications, the MVTX2803AG MVTX2803AG provides a flow control option. When flow control is enabled, scarcity of buffer space in the switch may trigger a flow control signal; this signal tells a source port, sending a packet to this switch, to temporarily hold off. While flow control offers the clear benefit of no packet loss, it also introduces a problem for quality of service. When a source port receives an Ethernet flow control signal, all microflows originating at that port, well-behaved 26 Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet or not, are halted. A single packet destined for a congested output can block other packets destined for uncongested outputs. The resulting head-of-line blocking phenomenon means that quality of service cannot be assured with high confidence when flow control is enabled. In the MVTX2803AG MVTX2803AG, each source port can independently have flow control enabled or disabled. For flow control enabled ports, by default all frames are treated as lowest priority during transmission scheduling. This is done so that those frames are not exposed to the WRED Dropping scheme. Frames from flow control enabled ports feed to only one queue at the destination, the queue of lowest priority. What this means is that if flow control is enabled for a given source port, then it can guarantee that no packets originating from that port will be lost, but at the possible expense of minimum bandwidth or maximum delay assurances. In addition, these "downgraded" frames may only use the shared pool or the per-source reserved pool in the FDB; frames from flow control enabled sources may not use reserved FDB slots for the highest six classes (P2-P7). The MVTX2803AG MVTX2803AG does provide a system-wide option of permitting normal QoS scheduling (and buffer use) for frames originating from flow control enabled ports. When this programmable option is active, it is possible that some packets may be dropped, even though flow control is on. The reason is that intelligent packet dropping is a major component of the MVTX2803AG MVTX2803AG's approach to ensuring bounded delay and minimum bandwidth for high priority flows. 7.9.1 Unicast Flow Control For unicast frames, flow control is triggered by source port resource availability. Recall that the MVTX2803AG MVTX2803AG's buffer management scheme allocates a reserved number of FDB slots for each source port. If a programmed number of a source port's reserved FDB slots have been used, then flow control Xoff is triggered. Xon is triggered when a port is currently being flow controlled, and all of that port's reserved FDB slots have been released. Note that the MVTX2803AG MVTX2803AG's per-source-port FDB reservations assure that a source port that sends a single frame to a congested destination will not be flow controlled. 7.9.2 Multicast Flow Control When port based Vlan is not used, a global buffer counter (64 packets) triggers flow control for multicast frames. When the system exceeds a programmable threshold of multicast packets, Xoff is triggered. Xon is triggered when the system returns below this threshold. MCC register programs the threshold. When port based Vlan is used, each Vlan has a global buffer counter. In addition, each source port has an 8-bit port map recording which port or ports of the multicast frame's fanout were congested at the time Xoff was triggered. All ports are continuously monitored for congestion, and a port is identified as uncongested when its queue occupancy falls below a fixed threshold. When all those ports that were originally marked as congested in the port map have become uncongested, then Xon is triggered, and the 8-bit vector is reset to zero. The MVTX2803AG MVTX2803AG also provides the option of disabling VLAN multicast flow control. Note: If port flow control is on, QoS performance will be affected. To determine the most efficient way to program, please refer to the QoS Application Note. 7.10 Mapping to IETF Diffserv Classes The mapping between priority classes discussed in this chapter and elsewhere is shown below. MVTX2803AG MVTX2803AG P7 P6 P5 P4 P3 P2 P1 P0 IETF NM EF AF0 AF1 AF2 AF3 BE0 BE1 Table 4 - Mapping between MVTX2803AG MVTX2803AG and IETF Diffserv Classes for Gigabit Ports Zarlink Semiconductor Inc. 27 MVTX2803 MVTX2803 Data Sheet As the table illustrates, P7 is used solely for network management (NM) frames. P6 is used for expedited forwarding service (EF). Classes P2 through P5 correspond to an assured forwarding (AF) group of size 4. Finally, P0 and P1 are two best effort (BE) classes. Features of the MVTX2803AG MVTX2803AG that correspond to the requirements of their associated IETF classes are summarized in the following below. Network Management (NM) and Expedited Forwarding (EF) · · · · Global buffer reservation for NM and EF Shaper for EF traffic Option of strict priority scheduling No dropping if admission controlled Assured Forwarding (AF) · · · Four AF classes Programmable bandwidth partition, with option of WFQ service Option of delay-bounded service keeps delay under fixed levels even if not admission-controlled Random early discard, with programmable levels Global buffer reservation for each AF class · · Best Effort (BE) · · · · Two BE classes Service only when other queues are idle means that QoS not adversely affected Random early discard, with programmable levels Traffic from flow control enabled ports automatically classified as BE Table 5 - MVTX2803AG MVTX2803AG Features Enabling IETF Diffserv Standards 8.0 Port Trunking 8.1 Features and Restrictions A port group (i.e. trunk) can include up to 8 physical ports, but all of the ports in a group must be in the same MVTX2803AG MVTX2803AG. The MVTX2803AG MVTX2803AG provides several pre-assigned trunk group options, containing as many as 4 ports per group, or alternatively, as many as 4 total groups. Load distribution among the ports in a trunk for unicast is performed using hashing based on source MAC address and destination MAC address. The other options include source MAC address only, destination MAC address only. Load distribution for multicast is performed similarly. If a VLAN includes any of the ports in a trunk group, all the ports in that trunk group should be in the same VLAN member map. The MVTX2803AG MVTX2803AG also provides a safe fail-over mode for port trunking automatically. If one of the ports in the trunking group goes down, the MVTX2803AG MVTX2803AG will automatically redistribute the traffic over to the remaining ports in the trunk in unmanaged mode. In managed mode, the software can perform similar tasks. 8.2 Unicast Packet Forwarding The search engine finds the destination MCT entry, and if the status field says that the destination address found belongs to a trunk, then the group number is retrieved instead of the port number. In addition, if the source address belongs to a trunk, then the source port's trunk membership register is checked to determine if the address has moved. A hash key is used to determine the appropriate forwarding port, based on some combination of the source and destination MAC addresses for the current packet. The search engine retrieves the VLAN member ports from the VLAN index table, which consists of 4K entries. 28 Zarlink Semiconductor Inc. MVTX2803 MVTX2803 Data Sheet The search engine retrieves the VLAN member ports from the ingress port's VLAN map. Based on the destination MAC address, the search engine determines the egress port from the MCT database. If the egress port is a member of a trunk group, the packet can be distributed to the other members of that trunk group. The VLAN map is used to check whether the egress port is a member of the VLAN, based on the ingress port. If it is a member, the packet is forwarded otherwis