NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
MVTX2801AG DS5748 256/512K VTX2801 VTX2800 256K/512K MVTX2800AG VTX2804 VTX2803 - Datasheet Archive
4-Port 1000 Mbps Ethernet Distributed Switch Data Sheet Features · DS5748 4 Gigabit Ports with GMII and PCS interface
MVTX2801AG MVTX2801AG 4-Port 1000 Mbps Ethernet Distributed Switch Data Sheet Features · DS5748 DS5748 4 Gigabit Ports with GMII and PCS interface · · · MVTX2801AG MVTX2801AG 596-pin BGA High Performance Layer 2 Packet Forwarding (11.904M packets per second) and Filtering at Full-Wire Speed Maximum throughput is 4 Gbps non-blocking Centralized shared-memory architecture Consists of two Memory Domains at 133 MHz -40oC to + 85 o C WRED mechanism · User controlled thresholds for WRED · Up to 64K MAC addresses to provide large node aggregation in wiring closet switches · · Traffic Classification · · · · · Classify traffic into 8 transmission priorities per port Supports Delay bounded, Strict Priority, and WFQ Provides 2 level dropping precedence with · · QoS Support Supports IEEE 802.1p/Q Quality of Service with 8 Priority Buffer Management: reserve buffers on per class and per port basis Port-based Priority: VLAN Priority with Tagged frame can be overwritten by the priority of PVID QoS features can be configured on a per port basis Control Full Duplex Ethernet IEEE 802.3x Flow Control SRAM 256/512K 256/512K SW Databasee MAC Table Frame Data Buffer A ZBT-SRAM (1M/2MB) VTX2801 VTX2801 VTX2800 VTX2800 Classification based on layer 2, 3 markings · VLAN Priority field in VLAN tagged frame. · DS/TOS field in IP packet · The precedence of above two classifications can be programmable · Frame Buffer Domain: one bank of ZBT-SRAM with 1M/2MB total · Switch Database Domain with 256K/512K 256K/512K SRAM. · May 2002 Ordering Information · Gigabit Port can also support 100/10 Mbps MII interface · Issue 1.1 64bit 32bit SDB Interface FDB Interface LED Search Engine NM Database Frame Engine Scheduler Management Module GMII /PCS GMII /PCS GMII /PCS GMII /PCS Port 0 Port 1 Port 2 Port 3 Serial/ Serial / I2I2C C Figure 1 - Chip Block Diagram SEMICMF.019 i MVTX2801AG MVTX2801AG · · · · · · · · Data Sheet Provides Ethernet Multicast and Broadcast Control 2 Port Trunking groups, max of 3 ports per group (Trunking can be based on source MAC and/or destination MAC and source port) LED signals provided by a serial or parallel interface Synchronous Serial Interface and I2C interface in unmanaged mode. Hardware auto-negotiation through serial management interface (MDIO) for Gigabit Ethernet ports, supports 10/100/1000 Mbps BIST for internal and external SRAM-ZBT I2C EEPROM or synchronous serial port for configuration Packaged in 596-pin BGA Description The MVTX2800AG MVTX2800AG family is a group of 8-port 1000 Mbps non-blocking Ethernet switch chips with on-chip address memory. A single chip provides a maximum of eight 1000 Mbps ports and a dedicated CPU interface with a 16/8-bit bus for managed and unmanaged switch applications. The VTX2800 VTX2800 family consists of the following four products: · · · · VTX2804 VTX2804 VTX2803 VTX2803 VTX2802 VTX2802 VTX2801 VTX2801 8 8 4 4 Gigabit Gigabit Gigabit Gigabit ports ports ports ports Managed Unmanaged Managed Unmanaged The MVTX2801AG MVTX2801AG supports up to 64K MAC addresses to aggregate traffic from multiple wiring closet stacks. The centralized shared-memory architecture allows a very high performance packet-forwarding rate of 11.904M packet per second at full wire speed. The chip is optimized to provide a low-cost, high performance workgroup, and wiring closet, layer 2 switching solution with 4 Gigabit Ethernet ports. One Frame Buffer Memory domain utilize cost effective, high-performance ZBT-SRAM with aggregated bandwidth of 8.5Gbps to support full wire speed on all external ports simultaneously. With Strict priority, Delay Bounded, and WRR transmission scheduling, plus WRED memory congestion scheme, the chip provides powerful QoS functions for convergent network multimedia and mission-critical applications. The chip provides 8 transmission priorities and 2 level drop precedence. Traffic is assigned its transmission priority and dropping precedence based on the frame VLAN Tag priority. The MVTX2801AG MVTX2801AG supports port trunking/load sharing on the 1000 Mbps ports with fail-over capability. The port trunking/load sharing can be used to group ports between interlinked switches to increase the effective network bandwidth. In full-duplex mode, IEEE 802.3x flow control is provided. The Physical Coding Sublayer (PCS) is integrated on-chip to provide a direct 10-bit GMII interface, or the PCS can be bypassed to provide an interface to existing fiber-based Gigabit Ethernet transceivers. The MVTX2801AG MVTX2801AG is fabricated using 0.25(µm technology. Inputs, however, are 3.3V tolerant and the outputs are capable of directly interfacing to LVTTL levels. The MVTX2801AG MVTX2801AG is packaged in a 596-pin Ball Grid Array package. ii SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet Table of Contents Features . i Traffic Classification . i Description . ii List of Figures . xi List of Tables . xiii 1.0 Block Functionality . 15 1.1 Frame Data Buffer (FDB) Interfaces.15 1.2 Switch Database (SDB) Interface.15 1.3 GMII/PCS MAC Module (GMAC) .15 1.4 Frame Engine .15 1.5 Search Engine .15 1.6 LED Interface.15 1.7 Internal Memory.15 2.0 System Configuration . 16 2.1 I2C Interface .16 2.1.1 Start Condition .16 2.1.2 Address .16 2.1.3 Data Direction .16 2.1.4 Acknowledgment.16 2.1.5 Data .16 2.1.6 Stop Condition .17 2.2 Synchronous Serial Interface .17 2.2.1 Write Command .17 2.2.2 Read Command.18 3.0 Data Forwarding Protocol . 18 3.1 Unicast Data Frame Forwarding.18 3.2 Multicast Data Frame Forwarding .19 4.0 Memory Interface. 19 4.1 Overview.19 4.2 Detailed Memory Information .20 5.0 Search Engine . 20 5.1 Search Engine Overview .20 5.2 Basic Flow .20 5.3 Search, Learning, and Aging .21 5.3.1 MAC Search.21 5.3.2 Learning .21 5.3.3 Aging.21 5.3.4 Data Structure.21 6.0 Frame Engine. 22 6.1 Data Forwarding Summary.22 6.2 Frame Engine Details .22 6.2.1 FCB Manager.22 6.2.2 Rx Interface.22 6.2.3 RxDMA.22 6.2.4 TxQ Manager .22 6.3 Port Control .23 6.4 TxDMA.23 7.0 Quality of Service and Flow Control. 24 7.1 Model.24 7.2 Four QoS Configurations.25 7.3 Delay Bound .25 SEMICMF.019 iii MVTX2801AG MVTX2801AG Data Sheet Table of Contents 7.4 Strict Priority and Best Effort . 26 7.5 Weighted Fair Queuing . 26 7.6 Shaper. 26 7.7 WRED Drop Threshold Management Support. 27 7.8 Buffer Management. 27 7.8.1 Dropping When Buffers Are Scarce . 28 7.9 Flow Control Basics . 28 7.9.1 Unicast Flow Control . 29 7.9.2 Multicast Flow Control . 29 7.10 Mapping to IETF Diffserv Classes. 30 8.0 Port Trunking . 31 8.1 Features and Restrictions . 31 8.2 Unicast Packet Forwarding . 31 8.3 Multicast Packet Forwarding . 31 8.4 Preventing Multicast Packets from Looping Back to the Source Trunk . 32 9.0 LED Interface. 32 9.1 Introduction . 32 9.2 Serial Mode . 32 9.3 Parallel Mode . 33 9.4 LED Control Registers . 33 10.0 Register Definition. 35 10.1 Register Description. 35 10.2 Group 0 Address - MAC Ports Group . 40 10.2.1 ECR1Pn: Port N Control Register . 40 10.2.2 ECR2Pn: Port N Control Register . 41 10.2.3 ECRMISC1 - CPU Port Control Register MISC1. 42 10.2.4 ECRMISC2 - CPU Port Control Register MISC2. 43 10.2.5 GGControl 0- Extra GIGA Port Control. 44 10.2.6 GGControl 1- Extra GIGA Port Control. 44 10.2.7 GGControl 2- Extra GIGA Port Control. 45 10.2.8 GGControl 3- Extra GIGA Port Control. 45 10.3 Group 1 Address - VLAN Group . 45 10.3.1 AVTCL - VLAN Type Code Register Low . 45 10.3.2 AVTCH - VLAN Type Code Register High . 46 10.3.3 PVMAP00 PVMAP00_0 - Port 00 Configuration Register 0 . 46 10.3.4 PVMAP00 PVMAP00_1 - Port 00 Configuration Register 1 . 46 10.3.5 PVMAP00 PVMAP00_2 - Port 00 Configuration Register 2 . 46 10.3.6 PVMAP00 PVMAP00_3 - Port 00 Configuration Register 3 . 47 10.3.7 PVMODE . 48 10.4 Group 2 Address - Port Trunking Group . 48 10.4.1 TRUNK0 - Trunk group 0 Member (Managed Mode Only). 48 10.4.2 TRUNK1 - Trunk group 1 Member (Managed Mode Only). 48 10.4.3 TRUNK2- Trunk group 2 Member (Managed Mode Only). 49 10.4.4 TRUNK3- Trunk group 3 Member (Managed Mode Only). 49 10.4.5 TRUNK_HASH_MODE - Trunk hash mode . 49 10.4.6 TRUNK0_MODE - Trunk group 0 and 1 mode . 49 10.4.7 TRUNK0_HASH0 - Trunk group 0 hash result 0,1,2 destination port number . 49 10.4.8 TRUNK0_HASH1 - Trunk group 0 hash result 2,3,4,5 destination port number . 49 10.4.9 TRUNK0_HASH2 - Trunk group 0 hash result 5,6,7 destination port number . 50 10.4.10 TRUNK0_HASH3 - Trunk group 0 hash result 8,9,10 destination port number . 50 10.4.11 TRUNK0_HASH4 - Trunk group 0 hash result 10,11,12,13 destination port number . 50 10.4.12 TRUNK0_HASH5 - Trunk group 0 hash result 13,14,15 destination port number . 50 10.4.13 TRUNK1_MODE - Trunk group 1 mode (Unmanaged Mode) . 50 iv SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet Table of Contents 10.4.14 TRUNK1_HASH0 - Trunk group 1 hash result 0, 1, 2 destination port number.50 10.4.15 TRUNK1_HASH1 - Trunk group 1 hash result 2, 3, 4, 5 destination port number.50 10.4.16 TRUNK1_HASH2 - Trunk group 1 hash result 5, 6, 7 destination port number.50 10.4.17 TRUNK1_HASH3 - Trunk group 1 hash result 8, 9, 10 destination port number.50 10.4.18 TRUNK1_HASH4- Trunk group 1 hash result 11, 12, 13 destination port number.51 10.4.19 TRUNK1_HASH5 - Trunk group 1 hash result 13, 14, 15 destination port number.51 10.4.20 TRUNK2_HASH0 - Trunk group 2 hash result 0, 1, 2 destination port number.51 10.4.21 TRUNK2_HASH1 - Trunk group 2 hash result 2, 3, 4, 5 destination port number.51 10.4.22 TRUNK2_HASH2 - Trunk group 2 hash result 5, 6, 7 destination port number.51 10.4.23 TRUNK2_HASH3 - Trunk group 2 hash result 8, 9, 10 destination port number.51 10.4.24 TRUNK2_HASH4 - Trunk group 2 hash result 10, 11, 12, 13 destination port number.51 10.4.25 TRUNK2_HASH5 - Trunk group 2 hash result 13, 14, 15 destination port number.51 10.4.26 TRUNK3_HASH0 - Trunk group 3 hash result 0, 1, 2 destination port number.51 10.4.27 TRUNK3_HASH1 - Trunk group 3 hash result 2, 3, 4, 5 destination port number.51 10.4.28 TRUNK3_HASH2 - Trunk group 3 hash result 5, 6, 7 destination port number.52 10.4.29 TRUNK3_HASH3 - Trunk group 3 hash result 8, 9, 10 destination port number.52 10.4.30 TRUNK3_HASH4 - Trunk group 3 hash result 10, 11, 12, 13 destination port number.52 10.4.31 TRUNK3_HASH5 - Trunk group 3 hash result 13, 14, 15 destination port number.52 10.4.32 Multicast Hash Registers .52 10.4.33 Multicast_HASH00 HASH00 - Multicast hash result0 mask byte [7:0] .52 10.4.34 Multicast_HASH01 HASH01 - Multicast hash result1 mask byte [7:0] .52 10.4.35 Multicast_HASH02 HASH02 - Multicast hash result2 mask byte [7:0] .52 10.4.36 Multicast_HASH03 HASH03 - Multicast hash result3 mask byte [7:0] .52 10.4.37 Multicast_HASH04 HASH04 - Multicast hash result4 mask byte [7:0] .53 10.4.38 Multicast_HASH05 HASH05 - Multicast hash result5 mask byte [7:0] .53 10.4.39 Multicast_HASH06 HASH06 - Multicast hash result6 mask byte [7:0] .53 10.4.40 Multicast_HASH07 HASH07 - Multicast hash result7 mask byte [7:0] .53 10.4.41 Multicast_HASH08 HASH08 - Multicast hash result8 mask byte [7:0] .53 10.4.42 Multicast_HASH09 HASH09 - Multicast hash result9 mask byte [7:0] .53 10.4.43 Multicast_HASH10 HASH10 - Multicast hash result10 mask byte [7:0] .53 10.4.44 Multicast_HASH11 HASH11 - Multicast hash result11 mask byte [7:0] .54 10.4.45 Multicast_HASH12 HASH12 - Multicast hash result12 mask byte [7:0] .54 10.4.46 Multicast_HASH13 HASH13 - Multicast hash result13 mask byte [7:0] .54 10.4.47 Multicast_HASH14 HASH14 - Multicast hash result14 mask byte [7:0] .54 10.4.48 Multicast_HASH15 HASH15 - Multicast hash result15 mask byte [7:0] .54 10.4.49 Multicast_HASHML - Multicast hash bit[8] for result7-0.54 10.4.50 Multicast_HASHMH - Multicast hash BIT[8] for result 15-8 .54 10.5 Group 3 Address - CPU Port Configuration Group.55 10.5.1 MAC0 - CPU Mac address byte 0.55 10.5.2 MAC1 - CPU Mac address byte 1.55 10.5.3 MAC2 - CPU Mac address byte 2.55 10.5.4 MAC3 - CPU Mac address byte 3.55 10.5.5 MAC4 - CPU Mac address byte 4.55 10.5.6 MAC5 - CPU Mac address byte 5.55 10.5.7 INT_MASK0 - Interrupt Mask 0.55 10.5.8 INT_MASK1 - Interrupt Mask 1.55 10.5.9 INT_STATUS0 - Masked Interrupt Status Register0 .55 10.5.10 INT_STATUS1 - Masked Interrupt Status Register1 .56 10.5.11 INTP_MASK0 - Interrupt Mask for MAC Port 0,1.56 10.5.12 INTP_MASK1 - Interrupt Mask for MAC Port 2,3.56 10.5.13 INTP_MASK4 - Interrupt Mask for MAC Port 4,5.56 10.5.14 INTP_MASK5 - Interrupt Mask for MAC Port 6,7.56 10.5.15 RQS - Receive Queue Select .56 10.5.16 RQSS - Receive Queue Status.56 10.5.17 TX_AGE - Tx Queue Aging timer.56 SEMICMF.019 v MVTX2801AG MVTX2801AG Data Sheet Table of Contents 10.6 Group 4 Address - Search Engine Group . 57 10.6.1 AGETIME_LOW - MAC address aging time Low . 57 10.6.2 AGETIME_HIGH -MAC address aging time High. 57 10.6.3 V_AGETIME - VLAN to Port aging time . 57 10.6.4 SE_OPMODE - Search Engine Operation Mode . 57 10.6.5 SCAN - SCAN Control Register . 58 10.7 Group 5 Address - Buffer Control/QOS Group. 58 10.7.1 FCBAT - FCB Aging Timer . 58 10.7.2 QOSC - QOS Control . 58 10.7.3 FCR - Flooding Control Register . 59 10.7.4 AVPML - VLAN Priority Map. 59 10.7.5 AVPMM - VLAN Priority Map. 60 10.7.6 AVPMH - VLAN Priority Map . 60 10.7.7 TOSPML - TOS Priority Map . 60 10.7.8 TOSPMM - TOS Priority Map . 61 10.7.9 TOSPMH - TOS Priority Map . 61 10.7.10 AVDM - VLAN Discard Map . 61 10.7.11 TOSDML - TOS Discard Map . 62 10.7.12 BMRC - Broadcast/Multicast Rate Control . 62 10.7.13 UCC - Unicast Congestion Control . 63 10.7.14 MCC - Multicast Congestion Control . 63 10.7.15 PRG - Port Reservation for Giga ports . 64 10.7.16 SFCB - Share FCB Size . 64 10.7.17 C2RS - Class 2 Reserved Size . 65 10.7.18 C3RS - Class 3 Reserved Size . 65 10.7.19 C4RS - Class 4 Reserved Size . 65 10.7.20 C5RS - Class 5 Reserved Size . 65 10.7.21 C6RS - Class 6 Reserved Size . 66 10.7.22 C7RS - Class 7 Reserved Size . 66 10.7.23 QOSC00 QOSC00 - BYTE_C2_G0 . 66 10.7.24 QOSC01 QOSC01 - BYTE_C3_G0 . 66 10.7.25 QOSC02 QOSC02 - BYTE_C4_G0 . 67 10.7.26 QOSC03 QOSC03 - BYTE_C5_G0 . 67 10.7.27 QOSC04 QOSC04 - BYTE_C6_G0 . 67 10.7.28 QOSC05 QOSC05 - BYTE_C7_G0 . 67 10.7.29 QOSC06 QOSC06 - BYTE_C2_G1 . 67 10.7.30 QOSC07 QOSC07 - BYTE_C3_G1 . 68 10.7.31 QOSC08 QOSC08 - BYTE_C4_G1 . 68 10.7.32 QOSC09 QOSC09 - BYTE_C5_G1 . 68 10.7.33 QOSC0A - BYTE_C6_G1. 68 10.7.34 QOSC0B - BYTE_C7_G1. 68 10.7.35 QOSC0C - BYTE_C2_G2 . 69 10.7.36 QOSC0D - BYTE_C3_G2 . 69 10.7.37 QOSC0E - BYTE_C4_G2. 69 10.7.38 QOSC0F - BYTE_C5_G2 . 69 10.7.39 QOSC10 QOSC10 - BYTE_C6_G2 . 69 10.7.40 QOSC11 QOSC11 - BYTE_C7_G2 . 70 10.7.41 QOSC12 QOSC12 - BYTE_C2_G3 . 70 10.7.42 QOSC13 QOSC13 - BYTE_C3_G3 . 70 10.7.43 QOSC14 QOSC14 - BYTE_C4_G3 . 70 10.7.44 QOSC15 QOSC15 - BYTE_C5_G3 . 70 10.7.45 QOSC16 QOSC16 - BYTE_C6_G3 . 70 10.7.46 QOSC17 QOSC17 - BYTE_C7_G3 . 71 10.7.47 QOSC18 QOSC18 - BYTE_C2_G4 . 71 10.7.48 QOSC019 QOSC019 - BYTE_C3_G4 . 71 vi SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet Table of Contents 10.7.49 QOSC1A - BYTE_C4_G4 .71 10.7.50 QOSC1B - BYTE_C5_G4 .71 10.7.51 QOSC1C - BYTE_C6_G4.71 10.7.52 QOSC1D- BYTE_C7_G4 .72 10.7.53 QOSC1E- BYTE_C2_G5 .72 10.7.54 QOSC1F - BYTE_C3_G5 .72 10.7.55 QOSC20 QOSC20 - BYTE_C4_G5 .72 10.7.56 QOSC21 QOSC21 - BYTE_C5_G5 .72 10.7.57 QOSC22 QOSC22 - BYTE_C6_G5 .72 10.7.58 QOSC23 QOSC23 - BYTE_C7_G5 .72 10.7.59 QOSC24 QOSC24 - BYTE_C2_G6 .73 10.7.60 QOSC25 QOSC25 - BYTE_C3_G6 .73 10.7.61 QOSC26 QOSC26 - BYTE_C4_G6 .73 10.7.62 QOSC27 QOSC27 - BYTE_C5_G6 .73 10.7.63 QOSC28 QOSC28 - BYTE_C6_G6 .73 10.7.64 QOSC29 QOSC29 - BYTE_C7_G6 .73 10.7.65 QOSC2A - BYTE_C2_G7 .73 10.7.66 QOSC2B - BYTE_C3_G7 .73 10.7.67 QOSC2C - BYTE_C4_G7.74 10.7.68 QOSC2D - BYTE_C5_G7.74 10.7.69 QOSC2E - BYTE_C6_G7 .74 10.7.70 QOSC2F - BYTE_C7_G7 .74 10.7.71 QOSC30 QOSC30 - BYTE_C01 .74 10.7.72 QOSC31 QOSC31 - BYTE_C02 .74 10.7.73 QOSC32 QOSC32 - BYTE_C03 .75 10.7.74 QOSC33 QOSC33 - CREDIT_C0_G0 .75 10.7.75 QOSC34 QOSC34 - CREDIT_C1_G0 .75 10.7.76 QOSC35 QOSC35 - CREDIT_C2_G0 .76 10.7.77 QOSC36 QOSC36 - CREDIT_C3_G0 .76 10.7.78 QOSC37 QOSC37 - CREDIT_C4_G0 .76 10.7.79 QOSC38 QOSC38 - CREDIT_C5_G0 .76 10.7.80 QOSC39- QOSC39- CREDIT_C6_G0 .76 10.7.81 QOSC3A- CREDIT_C7_G0 .77 10.7.82 QOSC3B - CREDIT_C0_G1 .77 10.7.83 QOSC3C - CREDIT_C1_G1.77 10.7.84 QOSC3D - CREDIT_C2_G1 .78 10.7.85 QOSC3E - CREDIT_C3_G1 .78 10.7.86 QOSC3F - CREDIT_C4_G1 .78 10.7.87 QOSC40 QOSC40 - CREDIT_C5_G1 .78 10.7.88 QOSC41- QOSC41- CREDIT_C6_G1 .78 10.7.89 QOSC42- QOSC42- CREDIT_C7_G1 .79 10.7.90 QOSC43 QOSC43 - CREDIT_C0_G2 .79 10.7.91 QOSC44 QOSC44 - CREDIT_C1_G2 .79 10.7.92 QOSC45 QOSC45 - CREDIT_C2_G2 .80 10.7.93 QOSC46 QOSC46 - CREDIT_C3_G2 .80 10.7.94 QOSC47 QOSC47 - CREDIT_C4_G2 .80 10.7.95 QOSC48 QOSC48 - CREDIT_C5_G2 .80 10.7.96 QOSC49- QOSC49- CREDIT_C6_G2 .80 10.7.97 QOSC4A- CREDIT_C7_G2 .81 10.7.98 QOSC4B - CREDIT_C0_G3 .81 10.7.99 QOSC4 - CREDIT_C1_G3 .81 10.7.100 QOSC4D - CREDIT_C2_G3 .82 10.7.101 QOSC4E - CREDIT_C3_G3 .82 10.7.102 QOSC4F - CREDIT_C4_G3 .82 10.7.103 QOSC50 QOSC50 - CREDIT_C5_G3 .82 SEMICMF.019 vii MVTX2801AG MVTX2801AG Data Sheet Table of Contents 10.7.104 QOSC51- QOSC51- CREDIT_C6_G3 . 82 10.7.105 QOSC52- QOSC52- CREDIT_C7_G3 . 83 10.7.106 QOSC53 QOSC53 - CREDIT_C0_G4 . 83 10.7.107 QOSC54 QOSC54 - CREDIT_C1_G4 . 83 10.7.108 QOSC55 QOSC55 - CREDIT_C2_G4 . 83 10.7.109 QOSC56 QOSC56 - CREDIT_C3_G4 . 83 10.7.110 QOSC57 QOSC57 - CREDIT_C4_G4 . 83 10.7.111 QOSC58 QOSC58 - CREDIT_C5_G4 . 83 10.7.112 QOSC59- QOSC59- CREDIT_C6_G4 . 83 10.7.113 QOSC5A- CREDIT_C7_G4. 84 10.7.114 QOSC5B - CREDIT_C0_G5. 84 10.7.115 QOSC5C - CREDIT_C1_G5 . 84 10.7.116 QOSC5D - CREDIT_C2_G5 . 84 10.7.117 QOSC5E - CREDIT_C3_G5. 84 10.7.118 QOSC5F - CREDIT_C4_G5 . 84 10.7.119 QOSC60 QOSC60 - CREDIT_C5_G5 . 84 10.7.120 QOSC61- QOSC61- CREDIT_C6_G5 . 84 10.7.121 QOSC62- QOSC62- CREDIT_C7_G5 . 85 10.7.122 QOSC63 QOSC63 - CREDIT_C0_G6 . 85 10.7.123 QOSC64 QOSC64 - CREDIT_C1_G6 . 85 10.7.124 QOSC65 QOSC65 - CREDIT_C2_G6 . 85 10.7.125 QOSC66 QOSC66 - CREDIT_C3_G6 . 85 10.7.126 QOSC67 QOSC67 - CREDIT_C4_G6 . 85 10.7.127 QOSC68 QOSC68 - CREDIT_C5_G6 . 85 10.7.128 QOSC69- QOSC69- CREDIT_C6_G6 . 85 10.7.129 QOSC6A- CREDIT_C7_G6. 86 10.7.130 QOSC6B - CREDIT_C0_G7. 86 10.7.131 QOSC6C - CREDIT_C1_G7 . 86 10.7.132 QOSC6D - CREDIT_C2_G7 . 86 10.7.133 QOSC6E - CREDIT_C3_G7. 86 10.7.134 QOSC6F - CREDIT_C4_G7 . 86 10.7.135 QOSC70 QOSC70 - CREDIT_C5_G7 . 86 10.7.136 QOSC71- QOSC71- CREDIT_C6_G7 . 86 10.7.137 QOSC72- QOSC72- CREDIT_C7_G7 . 87 10.7.138 QOSC73 QOSC73 - TOKEN_RATE_G0 . 87 10.7.139 QOSC74 QOSC74 - TOKEN_LIMIT_G0 . 87 10.7.140 QOSC75 QOSC75 - TOKEN_RATE_G1 . 87 10.7.141 QOSC76 QOSC76 - TOKEN_LIMIT_G1 . 87 10.7.142 QOSC77 QOSC77 - TOKEN_RATE_G2 . 88 10.7.143 QOSC78 QOSC78 - TOKEN_LIMIT_G2 . 88 10.7.144 QOSC79 QOSC79 - TOKEN_RATE_G3 . 88 10.7.145 QOSC7A - TOKEN_LIMIT_G3 . 88 10.7.146 QOSC7B - TOKEN_RATE_G4. 88 10.7.147 QOSC7C - TOKEN_LIMIT_G4. 89 10.7.148 QOSC7D - TOKEN_RATE_G5 . 89 10.7.149 QOSC7E - TOKEN_LIMIT_G5 . 89 10.7.150 QOSC7F - TOKEN_RATE_G6. 89 10.7.151 QOSC80 QOSC80 - TOKEN_LIMIT_G6 . 89 10.7.152 QOSC81 QOSC81 - TOKEN_RATE_G7 . 89 10.7.153 QOSC82 QOSC82 - TOKEN_LIMIT_G7 . 90 10.7.154 RDRC0 - WRED Rate Control 0 . 90 10.7.155 RDRC1 - WRED Rate Control 1 . 90 10.8 Group 6 Address - MISC Group. 91 10.8.1 MII_OP0 - MII Register Option 0 . 91 10.8.2 MII_OP1 - MII Register Option 1 . 91 viii SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet Table of Contents 10.8.3 FEN - Feature Register .91 10.8.4 MIIC0 - MII Command Register 0 .92 10.8.5 MIIC1 - MII Command Register 1 .92 10.8.6 MIIC2 - MII Command Register 2 .92 10.8.7 MIIC3 - MII Command Register 3 .93 10.8.8 MIID0 - MII Data Register 0 .93 10.8.9 MIID1 - MII Data Register 0 .93 10.8.10 LED Mode - LED Control .94 10.8.11 DEVICE Mode.95 10.8.12 CHECKSUM - EEPROM Checksum.96 10.8.13 LED User .96 10.8.14 LEDUSER0 .96 10.8.15 LEDUSER1 .96 10.8.16 LEDUSER2/LEDSIG2.97 10.8.17 LEDUSER3/LEDSIG3.97 10.8.18 LEDUSER4/LEDSIG4.98 10.8.19 LEDUSER5/LEDSIG5.99 10.8.20 LEDUSER6/LEDSIG6.99 10.8.21 LEDUSER7/LEDSIG1_0.100 10.8.22 MIINP0 - MII Next Page Data Register 0 .101 10.8.23 MIINP1 - MII Next Page Data Register 1 .101 10.9 Group F Address - CPU Access Group.101 10.9.1 GCR-Global Control Register .101 10.9.2 DCR-Device Status and Signature Register .101 10.9.3 DCR01-Giga port status.102 10.9.4 DCR23-Giga port status.103 10.9.5 DCR45-Giga port status (useless in MVTX2801 MVTX2801) .103 10.9.6 DCR67-Giga port status (useless in MVTX2801 MVTX2801) .103 10.9.7 DPST - Device Port Status Register .104 10.9.8 DTST - Data Read Back Register.104 11.0 BGA and Ball Signal Description. 105 11.1 BGA Views .105 11.2 Power and Ground Distribution .106 11.3 Ball- Signal Descriptions.107 11.4 Ball Signal Name .115 11.5 AC/DC Timing.121 11.5.1 Absolute Maximum Ratings .121 11.5.2 DC Electrical Characteristics .121 11.5.3 Recommended Operation Conditions.121 11.6 Local Frame Buffer ZBT SRAM Memory Interface.122 11.6.1 Local ZBT SRAM Memory Interface A.122 11.7 Local Switch Database SBRAM Memory Interface .123 11.7.1 Local SBRAM Memory Interface.123 11.8 AC Characteristics.124 11.8.1 Media Independent Interface .124 11.8.2 Gigabit Media Independent Interface.125 11.8.3 PCS Interface.126 11.8.4 LED Interface .127 11.8.5 MDIO Input Setup and Hold Timing .128 11.8.6 I2C Input Setup Timing.128 11.8.7 Serial Interface Setup Timing.129 12.0 Mechanical Data . 130 12.1 Packaging Information.130 SEMICMF.019 ix MVTX2801AG MVTX2801AG Data Sheet Table of Contents x SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet List of Figures Figure 1 - Chip Block Diagram .i Figure 2 - Data Transfer Format for I2C Interface . 16 Figure 3 - Write Command . 17 Figure 4 - Read Command. 18 Figure 5 - SRAM Interface Block Diagram (DMAs for Gigabit Ports) . 19 Figure 6 - Buffer Partition Scheme Used in the MVTX2801AG MVTX2801AG. 28 Figure 7 - Timing diagram for serial mode in LED interface. 32 Figure 8 - Local Memory Interface - Input setup and hold timing . 122 Figure 9 - Local Memory Interface - Output valid delay timing. 122 Figure 10 - Local Memory Interface - Input setup and hold timing . 123 Figure 11 - Local Memory Interface - Output valid delay timing. 123 Figure 12 - AC Characteristics - Media Independent Interface . 124 Figure 13 - AC Characteristics - Media Independent Interface . 124 Figure 14 - AC Characteristics- GMII . 125 Figure 15 - AC Characteristics - Gigabit Media Independent Interface. 125 Figure 16 - AC Characteristics - PCS Interface. 126 Figure 17 - AC Characteristics - PCS Interface. 126 Figure 18 - AC Characteristics - LED Interface . 127 Figure 19 - MDIO Input Setup and Hold Timing . 128 Figure 20 - MDIO Output Delay Timing. 128 Figure 21 - I2C Input Setup Timing . 128 Figure 22 - I2C Output Delay Timing. 129 Figure 23 - Serial Interface Setup Timing. 129 Figure 24 - Serial Interface Output Delay Timing . 129 SEMICMF.019 xi MVTX2801AG MVTX2801AG xii Data Sheet SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet List of Tables Table 1 - Two-dimensional World Traffic.24 Table 2 - Four QoS configurations per port.25 Table 3 - WRED Dropping Scheme .27 Table 4 - Mapping between MVTX2801AG MVTX2801AG and IETF Diffserv Classes for Gigabit Ports.30 Table 5 - MVTX2801AG MVTX2801AG Features Enabling IETF Diffserv Standards .30 Table 6 - MVTX2801AG MVTX2801AG Register Description.35 Table 7 - Ball- Signal Descriptions .107 Table 8 - Ball Signal Name.115 Table 9 - Recommended Operation Conditions .121 Table 10 - AC Characteristics - Local frame buffer ZBT-SRAM Memory Interface A .122 Table 11 - AC Characteristics - Local Switch Database SBRAM Memory Interface.123 Table 12 - AC Characteristics - Media Independent Interface .124 Table 13 - AC Characteristics - Gigabit Media Independent Interface .126 Table 14 - AC Characteristics - PCS Interface.127 Table 15 - AC Characteristics - LED Interface .127 Table 16 - MDIO Timing .128 Table 17 - I2C Timing .129 Table 18 - Serial Interface Timing .130 SEMICMF.019 xiii MVTX2801AG MVTX2801AG xiv Data Sheet SEMICMF.019 Data Sheet 1.0 Block Functionality 1.1 MVTX2801AG MVTX2801AG Frame Data Buffer (FDB) Interfaces The FDB interface supports pipelined ZBT-SRAM memory at 133 MHz. To ensure a non-blocking switch, one memory domain is required. Each domain has a 64-bit wide memory bus. At 133 MHz, the aggregate memory bandwidth is 8.5 Gbps, which is enough to support 4 Gigabit ports at full wire speed switching. A patent pending scheme is used to access the FDB memory. Each slot has one tick to read or write 8 bytes. 1.2 Switch Database (SDB) Interface A pipelined synchronous burst SRAM (SBRAM) memory is used to store the switch database information including MAC Table. Search Engine accesses the switch database via SDB interface. The SDB bus has 32-bit wide bus at 133MHz. 1.3 GMII/PCS MAC Module (GMAC) The GMII/PCS Media Access Control (MAC) module provides the necessary buffers and control interface between the Frame Engine (FE) and the external physical device (PHY). The MVTX2801AG MVTX2801AG has two interfaces, GMII or PCS. The MAC of the MVTX2801AG MVTX2801AG meets the IEEE 802.3z specification and supports the MII interface. It is able to operate 10M/100M/1G 10M/100M/1G in Full Duplex mode with a back pressure/flow control mechanism. It has the options to insert Source Address/CRC/VLAN ID to each frame. The GMII/PCS Module also supports hot plug detection. 1.4 Frame Engine The main function of the frame engine is to forward a frame to its proper destination port or ports. When a frame arrives, the frame engine parses the frame header (64 bytes) and formulates a switching request, which is sent to the search engine to resolve the destination port. The arriving frame is moved to the FDB. After receiving a switch response from the search engine, the frame engine performs transmission scheduling based on the frame's priority. The frame engine forwards the frame to the MAC module when the frame is ready to be sent. 1.5 Search Engine The Search Engine resolves the frame's destination port or ports according to the destination MAC address (L2) by searching the database. It also performs MAC learning, priority assignment, and trunking functions. 1.6 LED Interface The LED interface can be operated in a serial mode or a parallel mode. In the serial mode, the LED interface uses 3 pins for carrying 4 port status signals. In the parallel mode, the interface can drive LEDs by 8 status pins. The LED port is shared with bootstrap pins. In order to avoid error when reading the bootstraps, a buffer must be used to isolate the LED circuitry from the bootstrap pins during bootstrap cycle (the bootstrap pins are sampled at the rising edge of the Reset). 1.7 Internal Memory Several internal tables are required and are described as follows: · · Frame Control Block (FCB) - Each FCB entry contains the control information of the associated frame stored in the FDB, e.g. frame size, read/write pointer, transmission priority, etc. MCT Link Table - The MCT Link Table stores the linked list of MCT entries that have collisions in the external MAC Table. SEMICMF.019 15 MVTX2801AG MVTX2801AG 2.0 Data Sheet System Configuration The MVTX2801AG MVTX2801AG can be configured by EEPROM (24C02 24C02 or compatible) via an I2C interface at boot time, or via a synchronous serial interface during operation. I2C Interface 2.1 The I2C interface uses two bus lines, a serial data line (SDA) and a serial clock line (SCL). The SCL line carries the control signals that facilitate the transfer of information from EEPROM to the switch. Data transfer is 8-bit serial and bi-directional, at 50 Kbps. Data transfer is performed between master and slave IC using a request / acknowledgment style of protocol. The master IC generates the timing signals and terminates data transfer. The figure below shows the data transfer format. START SLAVE ADDRESS R/W ACK DATA 1 (8 bits) ACK DATA 2 ACK DATA M ACK STOP Figure 2 - Data Transfer Format for I 2C Interface 2.1.1 Start Condition Generated by the master, the MVTX2801AG MVTX2801AG. The bus is considered to be busy after the Start condition is generated. The Start condition occurs if while the SCL line is High, there is a High-to-Low transition of the SDA line. Other than in the Start condition (and Stop condition), the data on the SDA line must be stable during the High period of SCL. The High or Low state of SDA can only change when SCL is Low. In addition, when the I 2C bus is free, both lines are High. 2.1.2 Address The first byte after the Start condition determines which slave the master will select. The slave in our case is the EEPROM. The first seven bits of the first data byte make up the slave address. 2.1.3 Data Direction The eighth bit in the first byte after the Start condition determines the direction (R/W) of the message. A master transmitter sets this bit to W; a master receiver sets this bit to R. 2.1.4 Acknowledgment Like all clock pulses, the master generates the acknowledgment-related clock pulse. However, the transmitter releases the SDA line (High) during the acknowledgment clock pulse. Furthermore, the receiver must pull down the SDA line during the acknowledge pulse so that it remains stable Low during the High period of this clock pulse. An acknowledgment pulse follows every byte transfer. If a slave receiver does not acknowledge after any byte, then the master generates a Stop condition and aborts the transfer. If a master receiver does not acknowledge after any byte, then the slave transmitter must release the SDA line to let the master generate the Stop condition. 2.1.5 Data After the first byte containing the address, all bytes that follow are data bytes. Each byte must be followed by an acknowledge bit. Data is transferred MSB-first. 16 SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet 2.1.6 Stop Condition Generated by the master. The bus is considered to be free after the Stop condition is generated. The Stop condition occurs if while the SCL line is High, there is a Low-to-High transition of the SDA line. The I2C interface serves the function of configuring the MVTX2801AG MVTX2801AG at boot time. The master is the MVTX2801AG MVTX2801AG, and the slave is the EEPROM memory. 2.2 Synchronous Serial Interface The synchronous serial interface serves the function of configuring the MVTX2801AG MVTX2801AG not at boot time but via a PC. The PC serves as master and the MVTX2801AG MVTX2801AG serves as slave. The protocol for the synchronous serial interface is nearly identical to the I2C protocol. The main difference is that there is no acknowledgment bit after each byte of data transferred. The unmanaged MVTX2801AG MVTX2801AG uses a synchronous serial interface to program the internal registers. To reduce the number of signals required, the register address, command and data are shifted in serially through the PS_DO pin. PS_STROBE- pin is used as the shift clock. PS_DI- pin is used as data return path. Each command consists of four parts. · START pulse · Register Address · Read or Write command · Data to be written or read back Any command can be aborted in the middle by sending an ABORT pulse to the MVTX2801AG MVTX2801AG. A START command is detected when PS_DO is sampled high at PS_STROBE - leading edge, and PS_DO is sampled low when STROBE- falls. An ABORT command is detected when PS_DO is sampled low at PS_STROBE - leading edge, and PS_DO is sampled high when PS_STROBE - falls. 2.2.1 Write Command PS-STROBE2 Extra clocks after last transfer PS_DO A0 START A1 A2 . A9 ADDRESS A10 A11 W D0 D1 D2 D3 D4 D5 D6 D7 COMMAND DATA Figure 3 - Write Command SEMICMF.019 17 MVTX2801AG MVTX2801AG 2.2.2 Data Sheet Read Command PS_STROBE- PS_DO A0 A1 A2 . A0 A1 A2 . START A10 A11 A9 A10 A11 A9 ADDRESS R COMMAND DATA D0 D1 D2 D3 D4 D5 D6 D7 PS_DI Figure 4 - Read Command All registers in the MVTX2801AG MVTX2801AG can be modified through this synchronous serial interface. 3.0 Data Forwarding Protocol 3.1 Unicast Data Frame Forwarding When a frame arrives, it is assigned a handle in memory by the Frame Control Buffer Manager (FCB Manager). An FCB handle will always be available, because of advance buffer reservations. The memory (ZBT-SRAM) interface is a 64-bit bus, connected to a ZBT-SRAM domain. The Receive DMA (RxDMA) is responsible for multiplexing the data and the address. On a port's "turn," the RxDMA will move 8 bytes (or up to the end-of-frame) from the port's associated RxFIFO into memory (Frame Data Buffer, or FDB). Once an entire frame has been moved to the FDB, and a good end-of-frame (EOF) has been received, the Rx interface makes a switch request. The RxDMA arbitrates among multiple switch requests. The switch request consists of the first 64 bytes of a frame, containing among other things, the source and destination MAC addresses of the frame. The search engine places a switch response in the switch response queue of the frame engine when done. Among other information, the search engine will have resolved the destination port of the frame and will have determined that the frame is unicast. After processing the switch response, the Transmission Queue Manager (TxQ manager) of the frame engine is responsible for notifying the destination port that it has a frame to forward to it. But first, the TxQ manager has to decide whether or not to drop the frame, based on global FDB reservations and usage, as well as TxQ occupancy at the destination. If the frame is not dropped, then the TxQ manager links the frame's FCB to the correct per-port-per-class TxQ. Unicast TxQ's are linked lists of transmission jobs, represented by their associated frames' FCB's. There is one linked list for each transmission class for each port. There are 8 classes for each of the 4 Gigabit ports - a total of 32 unicast queues. The TxQ manager is responsible for scheduling transmission among the queues representing different classes for a port. When the port control module determines that there is room in the MAC Transmission FIFO (TxFIFO) for another frame, it requests the handle of a new frame from the TxQ manager. The TxQ manager chooses among the head-of-line (HOL) frames from the per-class queues for that port, using a Zarlink Semiconductor scheduling algorithm. 18 SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet As at the transmit end, each of the 4 ports has time slots devoted solely to reading data from memory at the address calculated by port control. The Transmission DMA (TxDMA) is responsible for multiplexing the data and the address. On a port's turn, the TxDMA will move 8 bytes (or up to the EOF) from memory into the port's associated TxFIFO. After reading the EOF, the port control requests a FCB release for that frame. The TxDMA arbitrates among multiple buffer release requests. The frame is transmitted from the TxFIFO to the line. 3.2 Multicast Data Frame Forwarding After receiving the switch response, the TxQ manager has to make the dropping decision. A global decision to drop can be made, based on global FDB utilization and reservations. If so, then the FCB is released and the frame is dropped. In addition, a selective decision to drop can be made, based on the TxQ occupancy at some subset of the multicast packet's destinations. If so, then the frame is dropped at some destinations but not others, and the FCB is not released. If the frame is not dropped at a particular destination port, then the TxQ manager formats an entry in the multicast queue for that port and class. Multicast queues are physical queues (unlike the linked lists for unicast frames). There are 4 multicast queues for each of the 4 Gigabit ports. During scheduling, the TxQ manager treats the unicast queue and the multicast queue of the same class as one logical queue. The port control requests a FCB release only after the EOF for the multicast frame has been read by all ports to which the frame is destined. 4.0 Memory Interface 4.1 Overview The figure below illustrates the first part of the ZBT-SRAM interface for the MVTX2801AG MVTX2801AG. As shown, a 64 bit bus ZBT-SRAM bank A is used for Tx/RxDMA access. Because the clock frequency is 133 MHz, the total memory bandwidth is 64-bits x 133 MHz = 8.5 Gbps, for frame data buffer (FDB) access. ZBT-SRAM Bank A TX DMA 0-1 TX DMA 2-3 RX DMA 0-1 RX DMA 2-3 Figure 5 - SRAM Interface Block Diagram (DMAs for Gigabit Ports) SEMICMF.019 19 MVTX2801AG MVTX2801AG 4.2 Data Sheet Detailed Memory Information Because the memory bus is 64 bits wide, frames are broken into 8-byte granules, written to and read from each memory access. In the worst case, a 1-byte-long EOF granule gets written to memory Bank. This means that a 7-byte segment of memory bus is idle. The scenario results in a maximum 7 bytes of waste per frame, which is always acceptable because the interfame gap is 20 bytes. 5.0 Search Engine 5.1 Search Engine Overview The MVTX2801AG MVTX2801AG search engine is optimized for high throughput searching, with enhanced features to support: · · · 5.2 Up to 64K MAC addresses 4 groups of port trunking Traffic classification into 8 transmission priorities, and 2 drop precedence levels Basic Flow Shortly after a frame enters the MVTX2801AG MVTX2801AG and is written to the Frame Data Buffer (FDB), the frame engine generates a Switch Request, which is sent to the search engine. The switch request consists of the first 64 bytes of the frame, which contain all the necessary information for the search engine to perform its task. When the search engine is done, it writes to the Switch Response Queue, and the frame engine uses the information provided in that queue for scheduling and forwarding. In performing its task, the search engine extracts and compresses the useful information from the 64-byte switch request. Among the information extracted are the source and destination MAC addresses, the transmission and discard priorities, whether the frame is unicast or multicast. Requests are sent to the external SRAM Switch Database to locate the associated entries in the external MCT table. When all the information has been collected from external SRAM, the search engine has to compare the MAC address on the current entry with the MAC address for which it is searching. If it is not a match, the process is repeated on the internal MCT Table. All MCT entries other than the first of each linked list are maintained internal to the chip. If the desired MAC address is still not found, then the result is either learning (source MAC address unknown) or flooding (destination MAC address unknown). If the destination MAC address belongs to a port trunk, then the trunk number is retrieved instead of the port number. But on which port of the trunk will the frame be transmitted? This is easily computed using a hash of the source and destination MAC addresses. When all the information is compiled, the switch response is generated, as stated earlier. 20 SEMICMF.019 Data Sheet 5.3 Search, Learning, and Aging 5.3.1 MVTX2801AG MVTX2801AG MAC Search The search block performs source MAC address and destination MAC address searching. As we indicated earlier, if a match is not found, then the next entry in the linked list must be examined, and so on until a match is found or the end of the list is reached. In port based VLAN mode, a bitmap is used to determine whether the frame should be forwarded to the outgoing port. The bitmap is not dynamic. Ports cannot enter and exit groups dynamically. The MAC search block is also responsible for updating the source MAC address timestamp, used for aging. 5.3.2 Learning The learning module learns new MAC addresses and performs port change operations on the MCT database. The goal of learning is to update this database as the networking environment changes over time. Learning and port change will be performed based on memory slot availability only. 5.3.3 Aging Aging time is controlled by register 400h and 401h. The aging module scans and ages MCT entries based on a programmable "age out" time interval. As we indicated earlier, the search module updates the source MAC address and VLAN port association timestamps for each frame it processes. When an entry is ready to be aged, the entry is removed from the table. 5.3.4 Data Structure The MCT data structure is used for searching for MAC addresses. The structure is maintained by hardware in the search engine. The database is essentially a hash table, with collisions resolved by chaining. The database is partially external, and partially internal, as described earlier: the first MCT entry of each linked list is always located in the external SRAM, and the subsequent MCTs are located internally. SEMICMF.019 21 MVTX2801AG MVTX2801AG 6.0 Frame Engine 6.1 Data Sheet Data Forwarding Summary · · · · · · Enters the device at the RxMAC, the RxDMA will move the data from the MAC RxFIFO to the FDB. Data is moved in 8-byte granules in conjunction with the scheme for the SRAM interface. A switch request is sent to the Search Engine. The Search Engine processes the switch request. A switch response is sent back to the Frame Engine and indicates whether the frame is unicast or multicast, and its destination port or ports. A Transmission Scheduling Request is sent in the form of a signal notifying the TxQ manager. Upon receiving a Transmission Scheduling Request, the device will format an entry in the appropriate Transmission Scheduling Queue (TxSch Q) or Queues. There are 8 TxSch Queues for each Gigabit port, one for each priority. Creation of a queue entry either involves linking a new job to the appropriate linked list if unicast, or adding an entry to a physical queue if multicast. When the port is ready to accept the next frame, the TxQ manager will get the head-of-line (HOL) entry of one of the TxSch Qs, according to the transmission scheduling algorithm (so as to ensure per-class quality of service). The unicast linked list and the multicast queue for the same port-class pair are treated as one logical queue. The TxDMA will pull frame data from the memory and forward it granule-by-granule to the MAC TxFIFO of the destination port. 6.2 Frame Engine Details This section briefly describes the functions of each of the modules of the MVTX2801AG MVTX2801AG frame engine. 6.2.1 FCB Manager The FCB manager allocates FCB handles to incoming frames, and releases FCB handles upon frame departure. The FCB manager is also responsible for enforcing buffer reservations and limits. The default values can be determined by referring to Chapter 8. In addition, the FCB manager is responsible for buffer aging, and for linking unicast forwarding jobs to their correct TxSch Q. The buffer aging can be enabled or disabled by the bootstrap pin and the aging time is defined in register FCBAT. 6.2.2 Rx Interface The Rx interface is mainly responsible for communicating with the RxMAC. It keeps track of the start and end of frame and frame status (good or bad). Upon receiving an end of frame that is good, the Rx interface makes a switch request. 6.2.3 RxDMA The RxDMA arbitrates among switch requests from each Rx interface. It also buffers the first 64 bytes of each frame for use by the search engine when the switch request has been made. 6.2.4 TxQ Manager First, the TxQ manager checks the per-class queue status and global Reserved resource situation, and using this information, makes the frame dropping decision after receiving a switch response. If the decision is not to drop, the TxQ manager requests that the FCB manager link the unicast frame's FCB to the correct per-port-per-class TxQ. If multicast, the TxQ manager writes to the multicast queue for that port and class. The TxQ manager can also trigger source port flow control for the incoming frame's source if that port is flow control enabled. Second, the TxQ manager handles transmission scheduling; it schedules transmission among the queues representing different classes for a port. Once a frame has been scheduled, the TxQ manager reads the FCB information and writes to the correct port control module. 22 SEMICMF.019 Data Sheet 6.3 MVTX2801AG MVTX2801AG Port Control The port control module calculates the SRAM read address for the frame currently being transmitted. It also writes start of frame information and an end of frame flag to the MAC TxFIFO. When transmission is done, the port control module requests that the buffer be released. 6.4 TxDMA The TxDMA multiplexes data and address from port control, and arbitrates among buffer release requests from the port control modules. SEMICMF.019 23 MVTX2801AG MVTX2801AG Data Sheet 7.0 Quality of Service and Flow Control 7.1 Model Quality of service (QoS) is an all-encompassing term for which different people have different interpretations. In this chapter, by quality of service assurances, we mean the allocation of chip resources so as to meet the latency and bandwidth requirements associated with each traffic class. We do not presuppose anything about the offered traffic pattern. If the traffic load is light, then ensuring quality of service is straightforward. But if the traffic load is heavy, the MVTX2801AG MVTX2801AG must intelligently allocate resources so as to assure quality of service for high priority data. We assume that the network manager knows his applications, such as voice, file transfer, or web browsing, and their relative importance. The manager can then subdivide the applications into classes and set up a service contract with each. The contract may consist of bandwidth or latency assurances per class. Sometimes it may even reflect an estimate of the traffic mix offered to the switch, though this is not required. The table below shows examples of QoS applications with eight transmission priorities, including best effort traffic for which we provide no bandwidth or latency assurances. Class Example Assured Bandwidth (user defined) Low Drop Subclass (If class is oversubscribed, these packets are the last to be dropped.) High Drop Subclass (If class is oversubscribed, these packets are the first to be dropped.) Highest transmission priorities, P7 Latency < 200 µs 300 Mbps Sample application: control information Highest transmission priorities, P6 Latency < 200 µs 200 Mbps Sample applications: phone calls; circuit emulation Sample application: training video; other multimedia Middle transmission priorities, P5 Latency < 400 µs 125 Mbps Sample application: interactive activities Sample application: non-critical interactive activities Middle transmission priorities, P4 Latency < 800 µs 250 Mbps Sample application: web business Low transmission priorities, P3 Latency < 1600 µs 80 Mbps Sample application: file backups Low transmission priorities, P2 Latency < 3200 µs 45 Mbps Sample application: email Best effort, P1-P0 TOTAL - Sample application: web research Sample application: casual web browsing 1 Gbps Table 1 - Two-dimensional World Traffic It is possible that a class of traffic may attempt to monopolize system resources by sending data at a rate in excess of the contractually assured bandwidth for that class. A well-behaved class offers traffic at a rate no greater than the agreed-upon rate. By contrast, a misbehaving class offers traffic that exceeds the agreed-upon rate. A misbehaving class is formed from an aggregation of misbehaving microflows. To achieve high link utilization, a misbehaving class is allowed to use any idle bandwidth. However, the quality of service (QoS) received by well-behaved classes must never suffer. 24 SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet As Table 1 illustrates, each traffic class may have its own distinct properties and applications. As shown, classes may receive bandwidth assurances or latency bounds. In the example, P7, the highest transmission class, requires that all frames be transmitted within 0.2 ms, and receives 30% of the 1 Gbps of bandwidth at that port. Best-effort (P1-P0) traffic forms a lower tier of service that only receives bandwidth when none of the other classes have any traffic to offer. In addition, each transmission class has two subclasses, high-drop and low-drop. Well-behaved users should not lose packets. But poorly behaved users, users who send data at too high a rate, will encounter frame loss, and the first to be discarded will be high-drop. Of course, if this is insufficient to resolve the congestion, eventually some low-drop frames are dropped as well. Table 1 shows that different types of applications may be placed in different boxes in the traffic table. For example, web search may fit into the category of high-loss, high-latency-tolerant traffic, whereas VoIP fits into the category of low-loss, low-latency traffic. 7.2 Four QoS Configurations There are four basic pieces to QoS scheduling in the MVTX2801AG MVTX2801AG: strict priority (SP), delay bound, weighted fair queuing (WFQ), and best effort (BE). Using these four pieces, there are four different modes of operation, as shown in Table 2. P7 P6 P5 P4 Op1 (default) SP Delay Bound Op3 SP P1 P0 WFQ Op4 P2 Delay Bound Op2 P3 BE WFQ BE Table 2 - Four QoS configurations per port The default configuration is six delay-bounded queues and two best-effort queues. The delay bounds per class are 0.16 ms for P7 and P6, 0.32 ms for P5, 0.64 ms for P4, 1.28 ms for P3, and 2.56 ms for P2. Best effort traffic is only served when there is no delay-bounded traffic to be served. P1 has strict priority over P0. We have a second configuration in which there are two strict priority queues, four delay bounded queues, and two best effort queues. The delay bounds per class are 0.32 ms for P5, 0.64 ms for P4, 1.28 ms for P3, and 2.56 ms for P2. If the user is to choose this configuration, it is important that P7-P6 (SP) traffic be either policed or implicitly bounded (e.g. if the incoming SP traffic is very light and predictably patterned). Strict priority traffic, if not admission-controlled at a prior stage to the MVTX2801AG MVTX2801AG, can have an adverse effect on all other classes' performance. P7 and P6 are both SP classes, and P7 has strict priority over P6. The third configuration contains two strict priority queues and six queues receiving a bandwidth partition via WFQ. As in the second configuration, strict priority traffic needs to be carefully controlled. In the fourth configuration, all queues are served using a WFQ service discipline 7.3 Delay Bound In the absence of a sophisticated QoS server and signaling protocol, the MVTX2801AG MVTX2801AG may not be assured of the mix of incoming traffic ahead of time. To cope with this uncertainty, our delay assurance algorithm dynamically adjusts its scheduling and dropping criteria, guided by the queue occupancies and the due dates of their head-of-line (HOL) frames. As a result, we assure latency bounds for all admitted frames with high confidence, even in the presence of system-wide congestion. Our algorithm identifies misbehaving classes and intelligently discards SEMICMF.019 25 MVTX2801AG MVTX2801AG Data Sheet frames at no detriment to well-behaved classes. Our algorithm also differentiates between high-drop and low-drop traffic with a weighted random early drop (WRED) approach. Random early dropping prevents congestion by randomly dropping a percentage of high-drop frames even before the chip's buffers are completely full, while still largely sparing low-drop frames. This allows high-drop frames to be discarded early, as a sacrifice for future low-drop frames. Finally, the delay bound algorithm also achieves bandwidth partitioning among classes. 7.4 Strict Priority and Best Effort When strict priority is part of the scheduling algorithm, if a queue has even one frame to transmit, it goes first. Two of our four QoS configurations include strict priority queues. The goal is for strict priority classes to be used for IETF expedited forwarding (EF), where performance guarantees are required. As we have indicated, it is important that strict priority traffic be either policed or implicitly bounded, so as to keep from harming other traffic classes. When best effort is part of the scheduling algorithm, a queue only receives bandwidth when none of the other classes have any traffic to offer. Two of our four QoS configurations include best effort queues. The goal is for best effort classes to be used for non-essential traffic, because we provide no assurances about best effort performance. However, in a typical network setting, much best effort traffic will indeed be transmitted, and with an adequate degree of expediency. Because we do not provide any delay assurances for best effort traffic, we do not enforce latency by dropping best effort traffic. Furthermore, because we assume that strict priority traffic is carefully controlled before entering the MVTX2801AG MVTX2801AG, we do not enforce a fair bandwidth partition by dropping strict priority traffic. To summarize, dropping to enforce quality of service (i.e. bandwidth or delay) does not apply to strict priority or best effort queues. We only drop frames from best effort and strict priority queues when global buffer resources become scarce. 7.5 Weighted Fair Queuing In some environments - for example, in an environment in which delay assurances are not required, but precise bandwidth partitioning on small time scales is essential (WFQ may be preferable to a delay-bounded scheduling discipline). The MVTX2801AG MVTX2801AG provides the user with a WFQ option with the understanding that delay assurances cannot be provided if the incoming traffic pattern is uncontrolled. The user sets eight WFQ "weights" such that all weights are whole numbers and sum to 64. This provides per-class bandwidth partitioning with error within 2%. In WFQ mode, though we do not assure frame latency, the MVTX2801AG MVTX2801AG still retains a set of dropping rules that helps to prevent congestion and trigger higher level protocol end-to-end flow control. As before, when strict priority is combined with WFQ, we do not have special dropping rules for the strict priority queues, because the input traffic pattern is assumed to be carefully controlled at a prior stage. However, we do indeed drop frames from SP queues for global buffer management purposes. In addition, queues P1 and P0 are treated as best effort from a dropping perspective, though they still are assured a percentage of bandwidth from a WFQ scheduling perspective. What this means is that these particular queues are only affected by dropping when the global buffer count becomes low. 7.6 Shaper Although traffic shaping is not a primary function of the MVTX2801AG MVTX2801AG, the chip does implement a shaper for expedited forwarding (EF). Our goal in shaping is to control the peak and average rate of traffic exiting the MVTX2801AG MVTX2801AG. Shaping is limited to class P6 (the second highest priority). This means that class P6 will be the class used for EF traffic. (By contrast, we assume class P7 will be used for control packets only.) If shaping is enabled for P6, then P6 traffic must be scheduled using strict priority. With reference to Table 2, only the middle two QoS configurations may be used. Peak rate is set using a programmable whole number, no greater than 64 (register QOS-CREDIT_C6_Gn). For example, if the setting is 32, then the peak rate for shaped traffic is 32/64 x 1000 Mbps = 500 Mbps. Average rate is also a programmable whole number, no greater than 64, and no greater than the peak rate. For example, if the setting is 16, then the average rate for shaped traffic is (16/64) x 1000 Mbps = 250 Mbps. As a consequence of the 26 SEMICMF.019 MVTX2801AG MVTX2801AG Data Sheet above settings in our example, shaped traffic will exit the MVTX2801AG MVTX2801AG at a rate always less than 500 Mbps, and averaging no greater than 250 Mbps. Also, when shaping is enabled, it is possible for a P6 queue to explode in length if fed by a greedy source. The reason is that a shaper is by definition not work-conserving; that is, it may hold back from sending a packet even if the line is idle. Though we do have global resource management, we do nothing to prevent this situation locally. We assume SP traffic is policed at a prior stage to the MVTX2801AG MVTX2801AG. 7.7 WRED Drop Threshold Management Support To avoid congestion, the Weighted Random Early Detection (WRED) logic drops packets according to specified parameters. The following table summarizes the behavior of the WRED logic. P7 P6 P5 P4 P3 P2 Level 2 N > 280 |P7| > A KB |P6| > B KB |P5| > C KB |P4| > D KB |P3| > E KB |P2| > F KB Level 3 N > 320 Low Drop X% Level 1 N > 240 High Drop 0% Y% 100% Z% 100% Table 3 - WRED Dropping Scheme In the table, |Px| is the byte count in queue Px. The WRED logic has three drop levels, depending on the value of N, which is based on the number of bytes in the priority queues. If delay bound scheduling is used, N equals 16|P7| + 16|P6| + 8|P5| + 4|P4| + 2|P3| + |P2|. If WFQ scheduling is used, N equals |P7| + |P6| + |P5| + |P4| + |P3| + |P2|. Each drop level has defined high-drop and low-drop percentages, which indicate the percentage of high-drop and low-drop packets that will be dropped at that level. The X, Y, and Z percent parameters can be programmed using the registers RDRC0 and RDRC1. Parameters A-F are the byte count thresholds for each priority queue, and are also programmable. When using delay bound scheduling, the values selected for A-F also control the approximate bandwidth partition among the traffic classes; see application note. 7.8 Buffer Management Because the number of frame data buffer (FDB) slots is a scarce resource, and because we want to ensure that one misbehaving source port or class cannot harm the performance of a well-behaved source port or class, we introduce the concept of buffer management into the MVTX2801AG MVTX2801AG. Our buffer management scheme is designed to divide the total buffer space into numerous reserved regions and one shared pool (see Figure 6). As shown in the figure, the FDB pool is divided into several parts. A reserved region for temporary frames stores frames prior to receiving a switch response. Such a temporary region is necessary, because when the frame first enters the MVTX2801AG MVTX2801AG, its destination port and class are as yet unknown, and so the decision to drop or not needs to be temporarily postponed. This ensures that every frame can be received first before subjecting it to the frame drop discipline after classifying. Six reserved sections, one for each of the highest six priority classes, ensure a programmable number of FDB slots per class. The lowest two classes do not receive any buffer reservation. Another segment of the FDB reserves space for each of the 4 ports. These source port buffer reservations are programmable. These 8 reserved regions make sure that no well-behaved source port can be blocked by another misbehaving source port. SEMICMF.019 27 MVTX2801AG MVTX2801AG Data Sheet In addition, there is a shared pool, which can store any type of frame. The registers related to the Buffer Management logic are: · · · · · · · · PRG- Port Reservation for Gigabit Ports SFCB- Share FCB Size C2RS- Class 2 Reserved Size C3RS- Class 3 Reserved Size C4RS- Class 4 Reserved Size C5RS- Class 5 Reserved Size C6RS- Class 6 Reserved Size C7RS- Class 7 Reserved Size Temporary Reservation RTMP Per-Class Reservations Rp7, Rp6 . Rp2 Shared Pool S Per-Source Reservations 8 . R1G Figure 6 - Buffer Partition Scheme Used in the MVTX2801AG MVTX2801AG 7.8.1 Dropping When Buffers Are Scarce Summarizing the two examples of local dropping discussed earlier in this chapter: · If a queue is a delay-bounded queue, we have a multilevel WRED drop scheme, designed to control delay and partition bandwidth in case of congestion. · If a queue is a WFQ-scheduled queue, we have a multilevel WRED drop scheme, designed to prevent congestion. In addition to these reasons for dropping, the MVTX2801AG MVTX2801AG also drops frames when global buffer space becomes scarce. The function of buffer management is to ensure that such droppings cause as little blocking as possible. 7.9 Flow Control Basics Because frame loss is unacceptable for some applications, the MVTX2801AG MVTX2801AG provides a flow control option. When flow control is en