NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
SiS5596 SiS5513 SiS5596/5513 SiS6204 74F04 74F244 LS245 256K/512K/1M/2M/4M/16M - Datasheet Archive
1. Overview SiS5596 PCI, Memory & VGA Controller SiS5513 PCI System I/O The SiS5596/5513 with built-in VGA controller is a
SiS5596 SiS5596 Pentium PCI Chipset 1. Overview SiS5596 SiS5596 PCI, Memory & VGA Controller SiS5513 SiS5513 PCI System I/O The SiS5596/5513 SiS5596/5513 with built-in VGA controller is a two-chip solution for Pentium PCI/ISA system. A portion of on board DRAM is shared with the built-in VGA controller. In that way, the system cost is substantially reduced. The SiS5596/5513 SiS5596/5513 two chips solution for shared memory architecture is achieved by allowing both GUI / VGA, and System DRAM controller to control system memory. For the shared memory application, the chipset always acts as the arbiter of memory bus masters. Whenever the GUI wants to access the memory bus, it requests the memory bus from the chipset first. The chipset grants the memory bus to the GUI, only if the memory bus is not needed by the chipset. The chipset also supports the two priority scheme. Other important key features such as direct access frame buffer and memory access latency are also supported. The system block diagram is shown in Figure 1.1. SRAM Pentium CPU Host Address Bus Host Data Bus Video Decoder MA[11:0] Standard FC SiS 5596 SiS6204 SiS6204 FC F244 DRAM MD[63:0] DDC1,2B Address/Data PCI Bus ROM KBC PnP Port SiS 5513 IDE Bus 74F04 74F04 Power,RESET 74F244 74F244 SD[15:8] XD BUS LA[23:17] LS245 LS245 SA,14Mhz SA[19:0] SD[7:0] ISA Address Bus CLKCHIP CLKGEN 7406 ISA Data Bus KBC (optional) Figure 1.1 System Block Diagram Preliminary V2.1 March 26, 1996 1 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 1.1 General Features · Supports Intel Pentium CPU and other compatible CPU at 66/60/50MHz (external clock speed) · Supports VGA Shared Memory Architecture - Direct Memory Accesses - Shared Memory Area 0.5M, 1M, 1.5M, 2M, 2.5M, 3M, 3.5M, 4M. - Built-in 2-Priority Scheme. · Supports the Pipelined Address Mode of Pentium CPU. · Integrated Second Level ( L2 ) Cache Controller - Write Through and Write Back Cache Modes - 8 bits or 7 bits Tag with Direct Mapped Cache Organization - Supports Pipelined Burst SRAM. - Supports 256 KBytes to 1 MBytes Cache Sizes. - Cache Read/Write Cycle of 3-1-1-1 Pipelined Burst SRAM at 66 Mhz and 3-1-1-1-1-11-1 at back to back read cycle. · Integrated DRAM Controller - Supports 4 RAS lines, the memory size is from 4MBytes up to 512Mbytes. - Supports 256K/512K/1M/2M/4M/16M 256K/512K/1M/2M/4M/16M x N 70ns FP/EDO DRAM - Supports 4K Refresh DRAM - Supports 3V or 5V DRAM. - Supports Symmetrical and Asymmetrical DRAM. - Supports 32 bits/64 bits mixed mode configuration - Supports Concurrent Write Back - Table-free DRAM Configuration, Auto-detect DRAM size, Bank Density, Single /Double sided DRAM, EDO/ FP DRAM for each bank - Supports CAS before RAS "Intelligent Refresh" - Supports Relocation of System Management Memory - Programmable CAS# Driving Current - Fully Configurable for the Characteristic of Shadow RAM ( 640 KByte to 1 Mbyte) · Supports EDO/FP 5/6-2-2-2/-3-3-3 Burst Read Cycles · Two Programmable Non-Cacheable Regions · Option to Disable Local Memory in Non-Cacheable Regions · Shadow RAM in Increments of 16 KBytes · Supports SMM Mode of CPU. Preliminary V2.1 March 26, 1996 2 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset · Supports CPU Stop Clock. · Supports Break Switch. · Provides High Performance PCI Arbiter. - Supports 4 PCI Master. - Supports Rotating Priority Mechanism. - Hidden Arbitration Scheme Minimizes Arbitration Overhead. - Supports Concurrency between CPU to Memory and PCI to PCI. · Integrated PCI Bridge - Supports Asynchronous PCI Clock. - Translates the CPU Cycles into the PCI Bus Cycles - Provides CPU-to-PCI Read Assembly and Write Disassembly Mechanism - Translates Sequential CPU-to-PCI Memory Write Cycles into PCI Burst Cycles. - Zero Wait State Burst Cycles. - Supports Advance Snooping for PCI Master Bursting. - Maximum PCI Burst Transfer from 256 Bytes to 4 KBytes. · 388-Pin BGA Package. · 0.5µm CMOS Technology. 1.2 Features for Integrated VGA Controller 1.2.1 PCI Bus Interface · PCI 2.1 Compliance - Support subsystem vender ID and subsystem ID, two write-once registers, in PCI configuration space · Built-in memory mapped I/O base registers in configuration space · Supports 32-bit PCI local bus standard Revision 2.1 · Supports PCI burst write · Supports PCI multimedia design guide Rev. 1.0 1.2.2 Performance · Supports Turbo Queue (Software Command Queue in off-screen memory) architecture to achieve extra-high performance (SiS patent pending) · Built-in an enhanced 64-bit BITBLT graphics engine with the following functions: - 256 raster operation functions Preliminary V2.1 March 26, 1996 3 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset - Rectangle fill - Color/Font expansion - Line-drawing with styled pattern - Built-in 8x8 pattern registers - Built-in 8x8 mask registers - 32 doublewords hardware Command Queue · Built-in 64x64x2 bit-mapped hardware cursor · Built-in 6 stages CPU write-buffer and 128 bits read-ahead cache to minimize CPU wait-state · Built-in 2 stages engine write-buffer and 320 bits read-buffer to minimize engine wait-state · Built-in 64x32 CRT FIFOs to support super high resolution graphic modes and reduce CPU wait-state · Memory-mapped I/O to reduce I/O trapping overhead under protected mode · Supports linear addressing mode up to 4MByte to speed up graphics performance · Built-in two line-buffers (90x64) with bilinear interpolation logic to improve video quality and video playback frame rate. · Support Direct Draw -Built-in transparent Blt logic with source/destination color key. -Built-in 4-bit blending logic for video overlay. -Built-in 2-bit blending logic for Blt logic. -Built-in color key and chroma key for video overlay. -Support logic to read back current scan line of refresh. -Support fast page flipping function. 1.2.3 Integration · Built-in programmable 24-bit true-color RAMDAC with reference-voltage generator · Built-in dual-clock generator · Built-in monitor-sense circuit · Built-in graphics accelerator and VGA controller · Built-in video accelerator · Built-in Phillips SAA7110/SAA7111 SAA7110/SAA7111, Brooktree Bt815/817/819A, SONY CXA1790q video decoder interface · Built-in PCI multimedia interface · Built-in Standard feature connector logic support Preliminary V2.1 March 26, 1996 4 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 1.2.4 Resolution, Color & Frame Rate · Supports 135 MHz pixel clock · Supports super high resolution graphic modes - 640x480 256/32K/64K/16M 256/32K/64K/16M colors NI - 800x600 16/256/32K/64K/16M 16/256/32K/64K/16M colors NI - 1024x768 16/256/32K/64K/16M 16/256/32K/64K/16M colors NI - 1280x1024 16/256 colors NI, 32K/64K 32K/64K colors interlace only · Supports virtual screen up to 2048x2048 · Supports 80/132 columns text mode in 25, 30, 44 or 60 rows and other modes · Supports 75Hz vertical refresh rate 1.2.5 Video Functions · Supports full motion picture required only 1 Megabyte DRAM and up to 1024x768x256 mode · Uses SiS defined 8-bit feature connector direct connecting to SiS 6204 for video overlay · Supports single frame buffer architecture to save the DRAM cost · Supports graphics/video overlay function by color-key and chroma-key operations · Supports multi-format Video For Windows such as YUV422 YUV422, RGB565 RGB565, and RGB555 RGB555 · Supports YUV-to-RGB color space conversion · Supports video scaling in integer increments of 1/64 · Support horizontal 2-tap, 8-tap DDA interpolation · Support vertical 2-tap, 8-tap DDA interpolation for better quality of video windows expansion · Built-in 64x16 video capture FIFOs to support video capture · Built-in 64x32 video playback FIFOs to support video playback · Supports Microsoft Video For Windows · Real-Magic MPEG API compatible for interactive title · Supports DCI Drivers and Direct Draw Drivers · Built-in brightness adjustment and contrast enhancement logic to support high quality video playback · Support video overlay for any graphic modes. · Built-in genlock circuit for video capture. Preliminary V2.1 March 26, 1996 5 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 1.2.6 Power Management · Dynamic power management to reduce internal SRAM, DAC and line-buffer power consumption. · Supports VESA Display Power Management Signaling (DPMS) compliant VGA monitor for power management · Supports direct I/O command to force graphics controller into standby/suspend/off state · Power down internal SRAM in direct color mode 1.2.7 Multimedia Application · Supports DDC1 and DDC2B specifications · Follows the plug & play specification for display controller · Supports RAMDAC snoop for multimedia applications 1.2.8 Misc. · Support Signature Analysis for automatic test · Support 32/64 bit display memory path Preliminary V2.1 March 26, 1996 6 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 1.3 Functional Block Diagram 1.3.1 System Block Diagram ADS# M/IO# D/C# W/R# CACHE# HA[31:3] HBE[7:0]# HD[63:0] BRDY# NA# KEN# EADS# HITM# BOFF# SMIACT# CPUCLK CPUHLDA CPUHOLD ADSC# ADSV# KRE# KWE7#/GWE# KWE6#/BWE# KWE5#/KCE# KWE4# KWE[3:0]# /SRAS[1:0]#,SCAS[1:0]# TA[7:0] TAGWE# PWRGD PCIRST# CPURST INIT A20M# DRAM CONTROLLER 89 PCI HOST BRIDGE AD[31:0] C/BE[3:0]# FRAME# IRDY# TRDY# STOP# DEVSEL# PLOCK# PAR PCICLK VGA CONTROLLER HSYNC VSYNC BLANK PCLK VIDEO[7:0] DDCDAT DDCCLK ENSYNC ENVIDEO ENDCLK REFCLK ROUT GOUT BOUT COMP RSET VREF MFILTER VFILTER HOST INTERFACE 116 45 CACHE CONTROLLER 22 RESET INTERFACE. 5 SMOUT0/INTA# SMI# STOPCLK# FLUSH#/SMOUT1 BREAK# /KBRST#/LLC1 TURBO/WAKEUP0 WAKEUP1 RAS[3:0]# CAS[7:0]# RAMWE# MA[11:0] MD[63:0] 26 PMU & MISC. PCI BUS ARBITOR 6 10 PREQ[3:0]# PGNT[3:0]# SIOREQ# SIOGNT# 5596bd.drw Figure 1.2 Preliminary V2.1 March 26, 1996 7 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 1.3.2 Integrated VGA Controller Block Diagram PCI System Bus 32 PCI Bus Interface 32 Graphic Controller 32 64 Readahead Cache CPU Write Buffer Command Queue RAS* 32 CAS* Engine Read Buffer Displsy Memory Controller 32 64 Graphic Engine MCLK VCLK 64 Engine Write Buffer 64 Attribute Controller DDC Controller 24 Video Accelerator 24 RAMDAC DDC Clock CRT Timing FC or 8 DPMS 14.318Mhz DRAM 64 64 CRT Controller Dual-Clock Synthesizer MA MD 64 CRT FIFO Sequencer WE# R G VAFC or 16 B DDC DATA Video Decoder Figure 1.3 Preliminary V2.1 March 26, 1996 8 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 1.3.3 Integrated VGA Controller Video Accelerator Block Diagram CPU or PCI Video Apapter Video Decoder SiS 6204 8 16 PCI Bus Feature Connector 32 8/16 Video Input Interface 16 Video Capture Down Scaling Video Capture FIFO 16 64 Display Memory 64 DRAM Video Playback Interpolation 24 Color Format & 16,24 Space Conversion Video Playback FIFO Attribute Controller 8,16,24 & RAM LUT CRT FIFO 64 Controller 24 Video Playback Up Scaling 24 24 64 24 Color Key Graphics/Video Overlay FC Interface & Format Conversion EVIDEO 24 DAC 16/8 FC RGB 16/8 FC Video Adapter 205bd.drw Figure 1.4 Preliminary V2.1 March 26, 1996 9 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2. Functional Description The SiS5596 SiS5596 integrates the VGA controller, memory controller, PCI bridge, and power management unit into a 388 pin chip that uses the advanced BGA packaging. The memory controller in SiS5596 SiS5596 can support Fast Page and EDO DRAMs and for Level 2 cache it can support Pipelined Burst SRAMs. There are a one line deep posted write buffer in memory controller which can reduce latency between CPU & DRAM. 2.1 CPU Interface The SiS5596 SiS5596 is designed to support Pentium CPU host interface at 66.667/60/50MHz. The host data bus and the DRAM bus are 64-bit wide. The SiS5596 SiS5596 supports the pipelined addressing mode of the Pentium CPU by issuing the next address signal, NA#. NA# is only generated in the following cases: (a) Burst read L2 cache or DRAM, (b) Single read DRAM. The SiS5596 SiS5596 supports the CPU L1 write back (WB) or write through (WT) cache policies and the 5596 L2 WB or WT cache policies. The L1 cache is snooped by the assertion of EADS# when the CPU is put in the HOLD state. The SiS5596 SiS5596 issues CPUHOLD to the Pentium CPU in response to the assertion of PCI master requests(REQ[3:0]#, and PHOLD#). Upon receiving the CPUHLDA from the CPU, it does not immediately assert GNT[3:0]# or PHLDA# until both the CPU to PCI posted write buffer and the memory write buffer are empty. During inquire cycles, the CPUHOLD may be negated temporarily to allow the CPU to write back the inquired hit modified line to L2 or DRAM. 2.2 Cache Controller The built-in L2 Cache Controller uses a direct-mapped scheme, which can be configured as either in the write through or write back mode. Pipelined burst SRAMs are supported. SiS5596 SiS5596 supports SRAM types auto-detection and auto-sizing. Table 2-1 shows the cache sizes that are supported by the SiS5596 SiS5596 when using synchronous SRAM, with the corresponding TAG RAM sizes, data RAM sizes, and cacheable memory sizes. Tables 2-2 summarize the recommended speed setting when the pipelined Burst SRAMs are used. Preliminary V2.1 March 26, 1996 10 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Table 2-1 Cache Size with 8-bit tag Cache Size Data RAM Tag RAM Cacheable Size 256K 32Kx32x2 8Kx8 64M 512K 32Kx32x4 16Kx8 128M 1M 32Kx32x8 32Kx8 256M The SiS5596 SiS5596 also provides an alternative to save the dirty SRAM chip. This is accomplished by sharing the alter bit with tag address bits in the same 8-bit wide TAG RAM. System uses this implementation supports 7 tag address bits and 1 dirty bit. By doing so, the cacheable local memory sizes are reduced to half of the original sizes as indicated in Table 2-2. Table 2-2 Cache Size with 7-bit tag and 1 dirty bit Cache Size Data RAM Tag RAM Cacheable Size 256K 32Kx32x2 8Kx8 32M 512K 32Kx32x4 16Kx8 64M 1M 32Kx32x8 32Kx8 128M In reality, the L2 Cacheable DRAM Size is determined by: 1) Max. L2 Cacheable Size as described in Table 2-1 and Table 2-2. 2) Non-Cacheable Area defined in register 56h, 57h, 58h and 59h and 3) C, D, E, F Segment Cacheability defined in registers 80h~86h. 4) Non-cacheable SMRAM area But, the L1 Cacheable size is only determined by 2), 3), 4) and the maximum DRAM size, i.e., 512M bytes. Thus, the cycles with address ranging over the L2 Cacheable Size but within the 512M bytes can also be cacheable to L1. The behavior of KEN# is ruled by the L1 cacheability. Note that only code of C, D, E, F segment is cacheable to L1/L2, and the data portion of C, D, E, F segment is not cacheable to L1/L2. Table 2-3 Synchronous SRAM Speed Settings Data RAM Speed Tag RAM Speed 66 MHz 60 MHz 50 MHz 66 MHz 60 MHz 50 MHz Pipelined 15ns 15ns 20ns SRAM Preliminary V2.1 March 26, 1996 Read Performance Write Performance 12ns 12ns 20ns 3-1-1-1 3-1-1-1 15ns 15ns 20ns 4-1-1-1 4-1-1-1 11 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset NOTE: (1) The SRAM parameters of data RAMs showed in above table are "cycle time". (2) Use asynchronous SRAM for Tag RAM. SRAM Address Mapping Table 2-4 TAG=8-bit 256K 512K 1M TA7 HA23 HA23 HA23 TA6 HA22 HA22 HA22 TA5 HA21 HA21 HA21 TA4 HA20 HA20 HA20 TA3 HA19 HA19 HA27 TA2 HA18 HA26 HA26 TA1 HA25 HA25 HA25 TA0 HA24 HA24 HA24 256K 512K 1M TA6 HA22 HA22 HA22 TA5 HA21 HA21 HA21 TA4 HA20 HA20 HA20 TA3 HA19 HA19 HA23 TA2 HA18 HA23 HA26 TA1 HA23 HA25 HA25 TA0 HA24 HA24 HA24 Table 2-5 TAG=7-bit NOTE: TA7 acts as ALT. 2.3 DRAM Controller The 5596 can support up to 512Mbytes of DRAM. Single or Double sided 64/72 bits (with/without parity) FP (Fast Page mode) DRAM or EDO (Extended Data Output) DRAM could be used. Preliminary V2.1 March 26, 1996 12 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset The installed DRAM type can be 256K, 512K, 1M, 2M, 4M or 16M bit deep by n bit wide DRAM, and both symmetrical and asymmetrical type DRAM are supported. It is also permissible to mix the EDO DRAM and FP DRAM bank by bank and corresponding DRAM timing will be switched automatically according to register setting. But if the FP and EDO DRAM are mixed in the same bank, they'll recognized as FP DRAM. 2.3.1 DRAM Configuration The SiS5596 SiS5596 has four RAS# (RAS[3:0]#) and eight CAS# (CAS[7:0]#) output signals. The DRAM configuration that it can support is described as below. Suppose there are four DRAM banks, i.e. bank0~3. 1. If there are just two banks used, i.e. bank0 and bank1, the single/double sided DRAM modules can be used with arbitrary order. 2. If there are three banks used, the single sided DRAM module can be used in all banks. But if double sided DRAM module is used, it must be plugged in bank1. In other words, SiS5596 SiS5596 just supports double sided DRAM in bank1 when three banks are used concurrently. 3. If there are four banks used, the single sided DRAM module can still be used in all banks. But it can't support the double sided DRAM module when four banks are used at the same time. 4. Each bank can be half populated bank, but the DRAM module must be plugged in the even SIMM of bank. Two Banks 3~0 RAS2 3~0 RAS3 7~4 3~0 RAS1 7~4 BANK0 7~4 7~4 BANK1 3~0 RAS0 Figure 2.1 Preliminary V2.1 March 26, 1996 13 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Three Banks BANK2 7~4 3~0 RAS1 BANK1 7~4 3~0 RAS3 7~4 3~0 RAS2 7~4 3~0 RAS0 BANK0 Figure 2.2 Four Banks BANK3 7~4 3~0 RAS3 BANK2 7~4 3~0 RAS1 BANK1 7~4 3~0 RAS2 BANK0 7~4 3~0 RAS0 Figure 2.3 The DRAM address MA[11:0] and CAS[7:0]# are connected to each bank. There are several DBRs (DRAM Bank Register). The DRAM type is recorded in DBR which includes the status of FP/EDO, Half/Full populated and Symmetrical/ Asymmetrical DRAM for each bank. If the DRAM types of even and odd SIMM are different, the type of smaller one is recognized. The accumulated DRAM density is programmed to DBRs which is described below. Note that DBRx-0 has the same value as DBRx-1. DBR0-0 = DBR0-1 = Amount of DRAM corresponding to RAS0# DBR1-0 = DBR1-1 = DBR0-0+Amount of DRAM corresponding to RAS2# DBR2-0 = DBR2-1 = DBR1-0+Amount of DRAM corresponding to RAS1# DBR3-0 = DBR3-1 = DBR2-0+Amount of DRAM corresponding to RAS3# Preliminary V2.1 March 26, 1996 14 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset By the definition of DBR above, the double sided DRAM in Bank0 will affect the amount of DRAM corresponding to RAS0# and RAS1#. In addition, if the even SIMM of bank0 is plugged with single sided DRAM and the odd SIMM is plugged with double sided DRAM, then half size of the doubled sided DRAM will appear in the amount of DRAM corresponding to RAS1#, and be recognized as half populated. These rules are also applied to Bank1. Some situations are listed below for reference. Bank0 Bank1 Bank2 Bank3 DBR0-x DBR1-x DBR2-x DBR3-x Configuration Sa,Sa Sb,Sb - - value of DBR 2a 2a+2b 2a+2b 2a+2b Configuration De,De Sa,Sa - - value of DBR e e+2a 2a+2e 2a+2e Configuration Sa,Sa De,De - - value of DBR 2a 2a+e 2a+e 2a+2e Configuration Sa,D2a Sb,D2b - - value of DBR 2a 2a+2b 3a+2b 3a+3b Configuration De,De Df,Df - - value of DBR e e+f 2e+f 2e+2f Configuration Sa,Sa De,De Sb,Sb - value of DBR 2a e+2a 2a+e+2b 2a+2b+2e Configuration Sa,Sa Sb,Sb Sc,Sc Sd,Sd value of DBR 2a 2a+2b 2a+2b+2c 2a+2b+2c+2d where Sx / Dx indicate single-sided or double-sided DRAM with size equal to X. 2.3.2 DRAM Address Mapping The following tables show the different address mapping for different DRAM configuration. Preliminary V2.1 March 26, 1996 15 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Table 2-6 Non-Interleave 64-bit (FP, EDO) 256K Sym. 1M Sym. 4M Sym. 16M Sym. MA CAS RAS CAS RAS CAS RAS CAS RAS 0 4 12 4 22 4 22 4 22 1 11 13 11 13 11 24 11 24 2 3 14 3 14 3 14 3 26 3 5 15 5 15 5 15 5 15 4 6 16 6 16 6 16 6 16 5 7 17 7 17 7 17 7 17 6 8 18 8 18 8 18 8 18 7 9 19 9 19 9 19 9 19 8 10 20 10 20 10 20 10 20 9 NA NA 12 21 12 21 12 21 10 NA NA NA NA 13 23 13 23 11 NA NA NA NA NA NA 14 25 512K Asym. 1M Asym. 2M Asym. 4M Asym. MA CAS RAS CAS RAS CAS RAS CAS RAS 0 4 12 4 12 4 22 4 22 1 11 13 11 13 11 13 11 13 2 3 14 3 14 3 14 3 14 3 5 15 5 15 5 15 5 15 4 6 16 6 16 6 16 6 16 5 7 17 7 17 7 17 7 17 6 8 18 8 18 8 18 8 18 7 9 19 9 19 9 19 9 19 8 10 20 10 20 10 20 10 20 9 NA 21 NA 21 12 21 12 21 10 NA NA NA 22 NA 23 NA 23 11 NA NA NA NA NA NA NA 24 16 Silicon Integrated Systems Corporation Preliminary V2.1 March 26, 1996 SiS5596 SiS5596 Pentium PCI Chipset 12x8 Asym. 12x9 Asym. MA CAS RAS CAS RAS 0 4 22 4 22 1 10 13 11 13 2 3 14 3 14 3 5 15 5 15 4 6 16 6 16 5 7 17 7 17 6 8 18 8 18 7 9 19 9 19 8 NA 20 10 20 9 NA 21 NA 21 10 NA 12 NA 23 11 NA 11 NA 12 Table 2-7 Non-Interleave 32-bit 256K Sym. 1M Sym. MA CAS RAS CAS RAS CAS RAS CAS RAS 0 4 12 4 12 4 22 4 22 1 2 13 2 13 2 13 2 24 2 3 14 3 14 3 14 3 14 3 5 15 5 15 5 15 5 15 4 6 16 6 16 6 16 6 16 5 7 17 7 17 7 17 7 17 6 8 18 8 18 8 18 8 18 7 9 19 9 19 9 19 9 19 8 10 11 10 20 10 20 10 20 9 NA NA 11 21 11 21 11 21 10 NA NA NA NA 12 23 12 23 11 NA NA NA NA NA NA 13 25 17 Silicon Integrated Systems Corporation Preliminary V2.1 March 26, 1996 4M Sym 16M Sym SiS5596 SiS5596 Pentium PCI Chipset 512K Asym. 1M Asym. 2M Asym. 4M Asym. MA CAS RAS CAS RAS CAS RAS CAS RAS 0 4 12 4 12 4 22 4 22 1 2 13 2 13 2 13 2 13 2 3 14 3 14 3 14 3 14 3 5 15 5 15 5 15 5 15 4 6 16 6 16 6 16 6 16 5 7 17 7 17 7 17 7 17 6 8 18 8 18 8 18 8 18 7 9 19 9 19 9 19 9 19 8 10 20 10 20 10 20 10 20 9 NA 11 NA 21 11 21 11 21 10 NA NA NA 11 NA 12 NA 23 11 NA NA NA NA NA NA NA NA 12x8 Asym 12x9 Asym. MA CAS RAS CAS RAS 0 4 12 4 22 1 2 13 2 13 2 3 14 3 14 3 5 15 5 15 4 6 16 6 16 5 7 17 7 17 6 8 18 8 18 7 9 19 9 19 8 NA 20 10 20 9 NA 21 NA 21 10 NA 10 NA 12 11 NA 11 NA 11 2.3.3 DRAM Performance All the DRAM cycles are synchronous with the CPU clock. The following table shows the different possible speed settings that depend on different DRAM type, RAS# setting, CAS# setting, and so forth. Preliminary V2.1 March 26, 1996 18 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Table 2-8 DRAM Performance Cycle Type DRAM type 66/60 MHz 50MHz Read Page hit EDO 5-2-2-2 5-2-2-2 FP 5-3-3-3 5-2-2-2 Read Row start EDO/FP 7-3-3-3 7-3-3-3 Read Page miss EDO/FP 11-3-3-3 11-3-3-3 Post Write EDO/FP 3-1-1-1 3-1-1-1 Write Retire EDO 2/3 2/3 (Buffer to DRAM) FP 3/4 3/4 There is a one level built-in CPU to Memory post-write buffer with 4 Quad Word deep ( CTMFF). All the write access to DRAM will be buffered. For the CPU read miss / Line fill cycle, the write-back data from the second level cache will be buffered first, and the SiS5596 SiS5596 will start to read data from DRAM at the same time. The buffered data are written to DRAM right after the read cycle. With this concurrent write back policy, many wait states are eliminated. However, any other cycle targeting DRAM will be pending until the CTMFF is empty. For the read access, there will be either single or burst read cycle to access the DRAM which depends on the cacheability of the cycle. If the current DRAM configuration is half-populated bank, then the SiS5596 SiS5596 will assert 8 consecutive cycles to access DRAM for the burst cycle. For the single cycle that only accesses DRAM within a Dword, the SiS5596 SiS5596 will only issue one cycle to access DRAM. For the single cycle that accesses one Qword or cross Dword boundary, the SiS5596 SiS5596 will issue two consecutive cycles to access DRAM. 2.3.4 Refresh cycle The refresh cycle will occur every 15.6 us. It is timed by a counter of 14Mhz input. The CAS[7:0]# will be asserted at the same time. The RAS[3:0]# are asserted sequentially. In order to reduce the impact of performance, the "Intelligent Refresh" will only refresh those populated banks. 2.3.5 Characteristics of Shadow RAM The SiS5596 SiS5596 defines the characteristics of any 16K memory block between 640 KBytes to 1 MByte address range through register 80h to 86h. Through these registers, the memory blocks can be programmed not only to be directly accessible by the CPU or PCI Bus Master (combined with another enable bit for PCI Master accessible), but also their cacheability attributes. There are three bits: Read Enable, Write Enable, and Cache Enable, in each Preliminary V2.1 March 26, 1996 19 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset registers to define the corresponding memory blocks as normal read/write DRAM function, these bits also specify the cacheability of these blocks to the first/second level cache. Table 2-9 shows the attributes of these enable bits, Table 2-10 is the attribute bits assignments and the attribute definitions, and Table 2-11 represents the registers and their corresponding memory segments. Table 2-9 Attributes of Enable Bits Read Enable When this bit is set to 1, the CPU read cycles that access to the corresponding memory block are regarded as normal DRAM read cycles. Otherwise, the read cycles are directed to the PCI bus. Write Enable When this bit is set to 1, the CPU write cycles that access to the corresponding memory block are regarded as normal DRAM write cycles. Otherwise, the write cycles are directed to the PCI bus. Cache Enable When this bit is set to 1, the corresponding memory block is programmed to be L1/L2 cacheable. Note that the cacheable function is for code portion only, and the cacheability works only if Read Enable bit is also enabled. Table 2-10: Attribute Bit Assignments and Attribute Definitions Read Enable Cache Enable Write Enable Attribute Definition 0 0 0 Disable Cycles are transferred to PCI bus. 0 0 1 Write Only Write cycles are conducted to DRAM in normal manners, and read cycles are passed to PCI bus for termination. 0 1 0 Disable Cycles are transferred to PCI bus. 0 1 1 Write Only Write cycles are conducted to DRAM in normal manners, and read cycles are passed to PCI bus for termination. 1 0 0 Read Only Read cycles are conducted to DRAM in normal manners, and write cycles are passed to PCI bus for termination. 1 0 1 Read/Write Normal DRAM Read/Write cycles. 1 1 0 Read/Cache able Normal DRAM read cycles and code portion is cacheable to L1/L2. 1 1 1 Read/Write/ Cacheable Normal DRAM Read/Write cycles and code portion is cacheable to L1/L2. Preliminary V2.1 March 26, 1996 20 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset NOTE: When PCI master access enable bit is set, the PCI master Read/Write cycles are served as the same as the descriptions in Table 2-9. And the relation of registers and corresponding memory blocks is described in Table 2-10. Table 2-11 Registers and Corresponding Memory Blocks Reg Bits 80h 7:4 Read Enable Cache Enable Write Enable Reserved 0c0000-0c3fffh 3:0 Read Enable Cache Enable Write Enable Reserved 0c4000-0c7fffh 7:4 Read Enable Cache Enable Write Enable Reserved 0c8000-0cbfffh 3:0 Read Enable Cache Enable Write Enable Reserved 0cc000-0cffffh 7:4 Read Enable Cache Enable Write Enable Reserved 0d0000-0d3fffh 3:0 Read Enable Cache Enable Write Enable Reserved 0d4000-0d7fffh 7:4 Read Enable Cache Enable Write Enable Reserved 0d8000-0dbfffh 3:0 Read Enable Cache Enable Write Enable Reserved 0dc000-0dffffh 7:4 Read Enable Cache Enable Write Enable Reserved 0e0000-0e3fffh 3:0 Read Enable Cache Enable Write Enable Reserved 0e4000-0e7fffh 7:4 Read Enable Cache Enable Write Enable Reserved 0e8000-0ebfffh 3:0 Read Enable Cache Enable Write Enable Reserved 0ec000-0effffh 7:4 Read Enable Cache Enable Write Enable Reserved 0f0000-0fffffh 81h 82h 83h 84h 85h 86h Attribute Memory Block 2.3.6 SMRAM Area Re-mapping The SMRAM area is 64K or 32K. This area can be re-mapped to A or B segments. Table 2-12 shows the types of remapping corresponding registers setting. Table 2-12 Preliminary V2.1 March 26, 1996 21 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Reg Physical Address Size 0 0 E0000 E0000~E7FFFh E0000 E0000~E7FFFh 32K 0 1 E0000 E0000~E7FFFh B0000 B0000~B7FFFh 32K 1 0 E0000 E0000~E7FFFh A0000 A0000~A7FFFh 32K 1 65h Bit 7 Bit 6 Logical Address 1 A0000 A0000~AFFFFh A0000 A0000~AFFFFh 64K 2.3.7 Others It is supported to assert the RAMW# at the end of each memory read cycle when EDO DRAM is accessed. When the power saving mode is enabled, the RAMW# pulse will be 1.5 CPU clock at least to reduce the power consumption. The DRAM always-page-miss modes, code always-page-miss and data always-page-miss, are also supported. Once it is programmed, the DRAM cycle will be a page-start cycle. The CAS current can be programmed as 8 mA or 4 mA by register 5Dh bit 3 and 4. 2.4 PCI Arbiter The SiS5596 SiS5596 contains a high performance hidden arbitration scheme that allows efficient bus sharing among five PCI Masters and the CPU. Note that one PCI master is reserved for the PSIO chip. The SiS5596 SiS5596 employs the priority rotation scheme that is done at two different layers. The first layer is shared between PSIO and four PCI Masters as a group. The second layer consists of four PCI masters with equal priority. Arbitration is done at both layers. The winner of arbitration among the four PCI masters arbitrates the PCI bus against PSIO. Fair rotation scheme applies only at layer level. The arbitration scheme assures that ISA master or DMA channels (represented by PSIO) can access the bus with short bus latency required by the traditional ISA masters or DMA devices. This implementation together with PCI Programmable Bursting Address Counter guarantees ISA device will not be starved during PCI master long bursting cycle. For example, when the maximum bursting length is 512 bytes, the maximum arbitration latency for PSIO, and PCI master is about 12us, and 40us respectively. The following two figures detail the rotation arbitration structure and its corresponding timing diagram. Preliminary V2.1 March 26, 1996 22 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Rotation Arbitration Scheme: BUS GRANT PRIORITY SW1 G 0123 G4 SW2 G23 G01 SW1 SW3 G0 G2 G1 G3 busgrant.drw Figure 2.4 Notation: SW1:is the switch for path from node G4 or G0123 G0123 to BUS GRANT PRIORITY SW2:is the switch for path from node G01 or G23 to node G0123 G0123 SW3:is the switch for path from node G0 or G1 to node G01 SW4:is the switch for path from node G2 or G3 to node G23 G01,G23,G0123 G0123:are intermediate nodes G4:is the bus request from PSIO G0, G1, G2, G3:are the bus requests from PCI device 0, device 1, device 2, device 3 respectively. Initial Path Parking: SW1: BUS GRANT PRIORITY-G4 SW2: G0123-G01 G0123-G01 SW3: G01-G0 G01-G0 SW4: G23-G2 G23-G2 Preliminary V2.1 March 26, 1996 23 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Rule of Rotating Priority for Bus Arbitration: · BUS GRANT PRIORITY will choose a path whenever it encounters an optional path. · PCI bus will be granted as Daisy Chain · Path switches will be toggled from BUS GRANT PRIORITY to any request node(G4,G0,G1,G2,G3) if any of them have been utilized Example: Initial Priority:G4,G01,G0,G2 1.PSIO(G4) Request Bus PHLDA# is asserted SW1 is toggled to G0123 G0123 (since it has been utilized) Priority change to G0,G1,G2,G3,G4 2.PSIO,REQ3,REQ2,REQ1,REQ0 are requesting bus GNT0# is asserted SW1, SW2 and SW3 are toggled to G4, G23 and G1 respectively ( since they have been utilized) Priority change to G4,G2,G3,G1,G0 3.REQ3,REQ2,REQ1,REQ0 are active GNT2# is asserted SW2 and SW4 are toggled to G01 and G3 respectively(since they have been utilized) Priority change to G4,G1,G0,G3,G2 4.REQ3,REQ2,REQ1,REQ0 are active GNT1# is asserted SW2 and SW3 are toggled to G23 and G0 respectively(since they have been utilized) Priority change to G4,G3,G2,G0,G1 5.REQ3,REQ2,REQ1,REQ0 are active GNT3# is asserted SW2 and SW4 are toggled to G01 and G2 respectively(since they have been utilized) Priority change to G4,G0,G1,G2,G3 6. During 3-5, if there is a request coming from PSIO, the Arbiter will grant bus to PSIO. Preliminary V2.1 March 26, 1996 24 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset PCI Arbiter - Rotation Arbitration scheme CPUCLK PCICLK REQ[3:0]# F 0 PHOLD# GNT[3:0]# F E B D 7 E PHLDA# HOLD CPUHOLD CPUHLDA HLDA FRAME# IRDY# 501arbi Note : HOLD and HLDA are internal signals. Figure 2.5 A PCI master can burst so long as the PCI target can source/sink the data, and no other agent requests the bus. However, PCI specifies two mechanisms that cap a master's tenure in the presence of other requests, so that predictable bus acquisition latency can be achieved. One is the Master Latency Timer(LT) that is not implemented into the SiS5596 SiS5596, the other is the Target Initiated Termination. In the SiS5596 SiS5596, a programmable Bursting Address Counter(PBAC) is implemented to disconnect the PCI master during the long bursting cycle. In this way, high throughput is maintained, and the bus latency is still kept reasonably small. Note that the bursting length is naturally applied to PCI master to local memory accessing. When PCI master access non-local memory target, both the master and target should have the responsibility of maintaining reasonable latency. The PCI arbiter asserts only one GNT# at any time. The SiS5596 SiS5596 has also implemented a time-out counter to prevent faulty device hugging the bus. If the PCI bus is granted to a PCI device and the bus is currently idle, 16 PCI clocks is the limitation that device should assert FRAME# during the period of time. If time-out occurs, the arbiter will mask request line, therefore deasserts GNT#. When this happens, all PCI devices start arbitration again. Note that PSIO is free to this constraint. The SiS5596 SiS5596 will release the host bus to CPU when PCI master is not targeting to main memory. The arbiter will keep the GNT# to that PCI master until the PCI bus is idle even when other PCI master has asserted REQ# to SiS5596 SiS5596. Preliminary V2.1 March 26, 1996 25 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.5 PCI Bridge 2.5.1 PCI Master Controller The PCI Master Controller forwards the CPU cycles not targeting the local memory to the PCI bus. In the case of a 64-bit CPU request or a misaligned 32-bit CPU request, the SiS5596 SiS5596 assumes the read assembly and write disassembly control. A 4 level posted write buffer (CTPFF) is implemented to improve the CPU to PCI memory write performance. Except for on-board memory write cycles, any cycles forwarded to the PCI bus will be suspended until the CTPFF is empty. For PCI bus memory write cycles, the CPU data are pushed into the CTPFF if it is not full. The push rate for a DW is 3 CPUCLKs. The pushed data are, at later time, written to the PCI bus. If the consecutive written data are in DW incremental sequence, they will be transferred to the PCI bus in a burst manner. The SiS5596 SiS5596 provides a mechanism for converting standard I/O cycles on the CPU bus to Configuration cycles on the PCI bus. Configuration Mechanism#1 in PCI Specification 2.0 page 61 is used to do the cycle conversion. The SiS5596 SiS5596 always intercepts the first interrupt acknowledge cycle from CPU bus, and forwards the second interrupt acknowledge cycle onto the PCI bus. The general timing required for CPU read from/write to PCI bus is shown in the following table. Table 2-13 CPU forwards to PCI cycle CPUCLK=50/60/66MHz CPU read 12~14 CPU write ( nonposted) 14~16 CPU posted write 3 2.5.2 PCI Slave Controller The SiS5596 SiS5596 operates as a slave on the PCI bus whenever a PCI master requests an access to the SiS5596 SiS5596 resource such as Cache, DRAM and the SiS5596 SiS5596 internal registers. Note that the internal registers can only be accessed by the SiS5596 SiS5596 itself when in CPU cycle. In the SiS5596 SiS5596 PCI/ISA system, the CPU is placed in HOLD state before granting the PCI bus to a PCI master. The following figure shows the behavior of CPUHOLD/CPUHLDA in response to PCI masters requests. Only linear ordered PCI cycles are supported by the SiS5596 SiS5596 PCI slave interface. Preliminary V2.1 March 26, 1996 26 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset CPUCLK PCICLK REQ# HOLD CPUHOLD CPUHLDA HLDA GNT# FRAME# IRDY# CIP# HA 5596 drives HA PCI master drives AD 5596 park AD 5596 park Note: HOLD, HLDA# and CIP# (current in progress) are internal signal Figure 2.6 A PCI master to the local memory access is not conducted until the snoop cycle has completed. The snoop cycle is used to inquire the first level cache to maintain coherency between first level and second level caches and main memory. Snoop cycles are performed by driving the PCI master address onto the CPU bus and asserting EADS#. Depending on the status of HITM# two clocks after the assertion of EADS#, SiS5596 SiS5596 conducts the PCI master cycles as Table 2-14 outlines. Table 2-14 PCI Master Read Cycle L1 L2 Data Transfer Miss (or Unmodified) Miss Data transfer from DRAM to PCI Miss (or Unmodified) Hit (Dirty or !Dirty)(*1) Data transfer from L2 to PCI HitM Miss Data is first written back from L1 to DRAM. Then, PCI master gets data from DRAM.(*3) HitM Hit (Dirty or !Dirty)(*1) Data is first written back from L1 to L2. Then, PCI master gets data from L2. The line is marked dirty in the L2.(*3) Preliminary V2.1 March 26, 1996 27 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset PCI Master write Cycle L1 L2 Data Transfer Miss (or Unmodified) Miss Data transfer from PCI to DRAM Miss (or Unmodified) Hit (Dirty or !Dirty)(*2) Data transfer from PCI to DRAM and L2. The Dirty bit is not changed. HitM Miss Data is first written back from L1 to DRAM. Then, PCI master writes data to DRAM.(*3) HitM Hit (Dirty or !Dirty)(*2) Data is first written back from L1 to L2. Then, PCI master writes data to L2 and DRAM. The Line is marked dirty in the L2. (*3) NOTE: (*1) For burst or pipeline SRAM, the rule is changed as it is described below. If L2 is in WT mode, data transfer is always from DRAM to PCI side. If L2 is in WB mode, data transfer is from DRAM to PCI side if the line is not dirty. If the line is dirty, data transfer is from L2 to PCI side, and PCI transfer is disconnected after the completion of reading this line. (*2) The rule is changed when burst or pipeline SRAM is used. No matter that the line may be dirty or not, data transfer conducts from PCI to L2 side, and PCI transfer is disconnected after the completion of writing this line. (*3) This case is only applied to the initial line(line 0). The PCI transfer will be disconnected after the completion of line n if line n+1 is a Modified one in L1, where n>=0. The snooping write back cycle will be deferred until line n is completely transferred. In the SiS5596 SiS5596, the INV signal of the CPU should be connected to W/R# that is driven by the SiS5596 SiS5596 in the PCI master cycle. In this way, the SiS5596 SiS5596 can invalidate the line that is currently inquired via the assertion of EADS# in the PCI master write cycles. The SiS5596 SiS5596 slave interface supports PCI burst transfers, the bursting length can be 256 bytes, 512 bytes, 1K bytes, 2K bytes, or 4K bytes. A burst transfer will be disconnected (retry) if the transfer goes across the bursting length. In this way, at most 128 cache lines can be uninterruptedly transferred if they are in I, S, or E state in the L1 cache. Another reason for the constraint is that page miss may occur only once during the entire bursting transaction since the maximum bursting length is always within the page size in any of the used DRAM . Preliminary V2.1 March 26, 1996 28 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset There is a 4QW deep FIFO to prefetch data when PCI master reads from the local memory. To achieve the utmost data transfer speed, the SiS5596 SiS5596 implements an advanced prefetch algorithm and snoop ahead function. It causes the PCI burst transfer performed in the pace of X-1-1-1. SiS5596 SiS5596 always prefetches one or two QW from L2/DRAM in advance to the asserting of TRDY#. This can be programmed by writing bit 2 of configuration register 5Bh. The snoop ahead mechanism ensures the acquiring of the hit modify status of the next prefetching line(line n+1) before the prefetching of line n is completed. If n+1 is not a Modified line in L1, prefetching of n+1 can be conducted right after the completion of prefetching line n. In such a case, SiS5596 SiS5596 keeps piping data into the FIFO in L2/DRAM side, and it also keeps piping data out of the FIFO in the PCI side in 0 wait state. If n+1 is a Modified line in L1, 5512 will issue STOP# to disconnect the burst transfer after line n being consumed. This function also performs on PCI master write cycle. The PCI master writes are buffered in the 4 QW deep PCI to memory posted write buffer(PTHFF). The SiS5596 SiS5596 always posted an aligned QW PCI write data into the write buffer and then retires it into the DRAM array or the L2 cache. The PCI write performance is X-1-1-1. The PCI bus data transfer rate can be calculated from the following formula. DATA TRANSFER RATE = NB/{ X + ( W + 1 ) * [( NB / 4 ) -1 ] } * ( 1 / f ) where NB: Total number of bytes Transferred or Bursting Length which is defined in bit 6-4 of configuration register 5Bh. X: number of PCI clocks for the first data transfer or leadoff cycle time. W: number of wait state for PCI burst transfer F: frequency of PCI clock Since SiS5596 SiS5596 PCI bridge is designed as asynchronous to CPU clock, the PCI clock is always running at 33MHz to gain the fast transfer rate. The leadoff cycle is in general determined by: 1) the relative clock phase between CPUCLK and PCICLK, and 2) L1 cache policy. Specifically, in the PCI master read cycle, the leadoff cycle is determined by the logic of bit 2 of register 5Bh. Moreover, whether the initial line hits L2 or whether it is a page hit or miss cycle also affects the leadoff cycle time. It is estimated that the leadoff cycle is 4 to 5 PCICLKs and 6 to 10 PCICLKs for PCI master write and read cycle, respectively. If the initial line hits a modified line in L1, ten more PCICLKs is required for the leadoff cycle. The following table illustrates the PCI Master performance in different Bursting length when the leadoff cycle is 5 and 7 for write and read, respectively. Preliminary V2.1 March 26, 1996 29 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Table 2-15 Data Transfer Rate PCI master cycle 7-1-1-1.read 5-1-1-1.write 512 bytes 127MB/s 129MB/s 1K bytes 130MB/s 131MB/s 2K bytes 131MB/s 132MB/s 4K bytes 132MB/s 133MB/s Bursting length An important factor in the sustaining of 0 wait PCI transferring is the prefetching and retiring rate that the system controller can perform. The following table outlines the rates that SiS5596 SiS5596 can keep. The rate is numbered in terms of CPUCLK per Qw. For 32-bit DRAM organization, it takes twice the parameters cited below. Table 2-16 EDO FP PBSRAM Prefetching Rate 2/3 2/3 2 Retiring Rate 2/3 3 2 Concurrent refresh will still be performed when CPU is put into Hold state. If the DRAM is idle, refresh can be conducted at any time. If refresh request occurs at the same time that a PCI master wants to access DRAM, an arbitration scheme is employed to resolve the conflict. The refresh request may thus get service while the PCI master accessing is suspended until refresh cycle is completed. Although refresh may win the DRAM bus, at most one refresh cycle may be conducted for each individual PCI transaction, i.e. for each FRAME# initiating. On the other hand, refresh may be also deferred until the DRAM is idle. In SiS5596 SiS5596 system, the refresh may be postponed for no more than 33 us in the worst case when a PCI master is reading the whole 128 lines through one burst transaction. 2.6 Green PC Function The following paragraphs are the PMU ( Power Management Unit ) features description: 2.6.1 Power States The PMU provides different power management states, which are described in the following sections. (i) Monitor Standby State The Monitor will be blanked and the external devices are turned off through SMOUT when the Monitor standby timer expires. Monitor Standby monitors the following events: Preliminary V2.1 March 26, 1996 30 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset IRQ 1-15 HOLD NMI Each IRQ has two sets of mask bits, one for wake up mask, and the other for standby mask. The HOLD includes the PCI local masters and the ISA master request. Each event is maskable. If no event happens during the monitored period and the timer expires, an SMI is generated and the monitor enters the standby state. Once the Monitor is in the standby state, any event from IRQ1-15 IRQ1-15, NMI or HOLD will cause an SMI which brings the Monitor back to the normal state. The time slot of the Monitor standby timer is programmable to 6.6sec, 0.84sec, 13.3ms, 1.6ms. (ii) System Standby State If the system standby timer expires, an SMI is generated for the system to enter the system standby state. The following events happen: STPCLK# is asserted to stop the CPU clock The hard disk drives spindle motors can be turned off The serial, parallel ports or the programmable I/O port can be turned off Once the STPCLK# is asserted, any events from IRQ1-15 IRQ1-15, NMI, HOLD, INIT will cause the STPCLK# be de-asserted. If any of the Hard disk motors, serial, parallel or programmable I/O ports were turned off, they will be back to the normal state only when they are accessed. System Standby monitored events (each event is maskable) Programmable I/O ports (one is a 10-bit I/O port, another is a16-bit I/O port) IRQ 1-15 (each has 2 sets of mask bits as for Monitor Standby State) HOLD NMI Hard Disk ports ( 1F0-1F7h, 3F6-3F7h, 170-17Fh, 320-32Fh) Serial ports ( 2F8-2FFh, 3F8-3FFh, 2E8-2EFh, 3E8-3EFh) Parallel ports ( 278-27Fh, 378-37Fh, 3BC-3BEh) A0000-AFFFFh or B0000-BFFFFh Address trap (Video RAM) C0000-C7FFFh Address trap (Video BIOS) Preliminary V2.1 March 26, 1996 31 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 3Bx-3Dxh (Video I/O port) The time slot of the System standby timer is programmable to 9 sec, 1.1 sec, 70ms, and 8.85ms. (iii) Throttling state In throttling state, STPCLK# is asserted and de-asserted periodically. This function is maskable. The throttling timer (Registers 61h and 62h) is programmable and the time slot is 35us. 2.6.2 Break Switch SMI Whenever the break switch is pressed, it caused an SMI to enter or leave power saving state. The signal from the break switch is a level trigger signal which lasts for more than 3 CPU clocks. 2.6.3 Software SMI If the software SMI enable bit is set and a '1' is written to bit 1 of Register 60h, an SMI# is generated and the software SMI service routine is invoked. The bit 1 of Register 60h should be cleared at the end of the SMI handler. 2.7 Internal Data Buffer The Internal Data Buffer provides a bi-directional data buffering among the 64-bit Host Data Bus, the 64/32-bit Memory Data Bus, and the 32-bit PCI Address/Data bus. The Internal Data Buffer incorporates three FIFOs and one read buffer among the bridges of the CPU, PCI, and memory buses. This buffering scheme smoothes the differences in access latencies and bandwidths among three buses, therefore improves the overall system performance. During bus operation between the Host, PCI, and Memory, the Internal Data Buffer performs functions such as latching data, forwarding data to destination bus, data assemble and disassemble. The main features of integrated data buffer are listed below: -1 level CPU-to-Memory Posted Write Buffer (CTMFF) with 4 QW Deep -4 level CPU-to-PCI Posted Write Buffer(CTPFF) with 4 DW Deep -1 level CPU-to-PCI IDE Read Prefetch Buffer(PTHFF) with 1 DW Deep -1 level PCI-to-Memory Posted Write Buffer(PTHFF) with 4 QW Deep -1 level PCI-to-Memory Read Prefetch Buffer(CTPFF) with 4 QW Deep In CPU read DRAM cycle, a one QW read buffer (CTMRB) is used to latch the DRAM data onto host bus. 2.8 Integrated VGA Controller Preliminary V2.1 March 26, 1996 32 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Integrated VGA Controller is a high performance 3-in-1 PCI true-color graphics accelerator with video accelerate functions. Integrated VGA Controller video accelerator could work in four different modes: standard FC (Feature Connector) mode, SiS FC (SiS Proprietary Defined Feature Connector) mode, direct video interface mode, and PCI multimedia mode. Furthermore Integrated VGA Controller could work with SW MPEG Player Programs through DCI driver or Direct Draw driver to provide high performance SW MPEG playback to meet future PC trends. In SiS FC mode, after receiving the video data from SiS 6204, Integrated VGA Controller would perform scaling and store these scaled video data to the display memory. Furthermore Integrated VGA Controller would perform color-space conversion, interpolation, and scaling on the stored video data before overlaying with graphics data for final display. In direct video mode, Integrated VGA Controller could work with the Philips SAA7110 SAA7110 / SAA7111 SAA7111, Sony CXA1790Q CXA1790Q, Brooktree Bt815/817/819A (8-bit SPI mode 1, 2), to provide the PC-Video solution and provide the very flexible overlaying ability mentioned above. In PCI multimedia mode, Integrated VGA Controller supports PCI multimedia design guide Rev. 1.0 spec to meet future potential trend. 2.8.1 Attribute Controller The Attribute Controller formats the display for the screen. Display color selection, text blinking, alternate font selection, and underlining are performed by the Attribute Controller. 2.8.2 CRT Controller The CRT Controller generates the HSYNC and VSYNC signals required for the monitor, as well as BLANK* signals required by the Attribute Controller. 2.8.3 CRT FIFO The 64x32 CRT FIFO allows the Display Memory Controller to access the display memory for screen refresh at maximum memory speed rather than at the screen refresh rate. It provides 3 programmable thresholds - CRT/CPU Threshold Low, CRT/CPU Threshold High, and CRT/Engine Threshold High. With adequate programming these three thresholds, the CPU wait-time would be reduced to improve the graphics performance. 2.8.4 DDC Controller The DDC Controller provides two different channels to communicate with the monitor which supports DDC level 1 or DDC level 2B. One is DDC CLK channel which is bidirectional and provides the clock for DDC. The other is DDC DATA channel which is bidirectional and could query some information from monitor. With the advantage of DDC, VGA BIOS could realize the capability of the connected monitor and take adequate action (such as to program the parameters for higher frame rate, ., etc.) to make end users feel more comfortable. 2.8.5 Display Memory Controller The Display Memory Controller generates timing for display memory. This includes RAS*, CAS*, and multiplexed-address timing, as well as WE*. Preliminary V2.1 March 26, 1996 33 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.8.6 DPMS It provides some registers to control the CRT timing to be compatible with the VESA DPMS specification. 2.8.7 Dual-Clock Synthesizer The Dual-Clock Synthesizer generates MCLK and VCLK with single external reference clock. With this character, we could set the MCLK at the maximum speed which the display memory could work normally, thus it takes the advantage of the real peak memory bandwidth and improves the graphics performance. 2.8.8 Graphics Controller It performs text manipulation, data rotation, color mapping, and miscellaneous operations. 2.8.9 Graphics Engine It is an enhanced 64-bit BitBlt Graphics Engine. For enhanced 256-color graphics mode, the engine supports the following functions: * 256 Raster Operation Functions * Rectangle Fill * Color/Font Expansion * Enhanced Color expansion * Enhanced Font expansion * Line Drawing * Built-in 8x8 Pattern Registers * Built-in 8x8 Mask Registers * Direct Draw For 32K or 64K high-color graphics mode, the engine supports the following functions: * 256 Raster Operation Functions * Rectangle Fill * Color/Font Expansion * Enhanced Color expansion * Enhanced Font expansion * Line Drawing * Built-in 8x8 Mask Registers * Direct Draw For 16M-color graphics mode, due to different graphics process methods, the engine supports the following functions: * Source/Destination BitBlt * Pattern/Destination BitBlt Preliminary V2.1 March 26, 1996 34 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset * Color/Font Expansion * Enhanced Font expansion Descriptions of the graphics engine functions are summarized as follows: Bit Block Transfer (BitBlt) BitBlt moves a block of data from one location (source) to another location (destination). It is a ternary operation. The operands could be the source data, the destination data, and the brush pattern. There are three different kinds of BitBlt: from the host memory to the display memory, from the display memory to the host memory, and from one location of the display memory to another location of the display memory. In the first two cases, the operation simply uses the "move string instruction" (REP MOVS) to move the source data to the destination to accomplish the BitBlt operation. It is called "CPUdriven BitBlt". In the case of moving from the display memory to the display memory, integrated VGA Controller could gain the advantage of its advanced engine design to solve the problems of memory overlapping during the block transfers. The only effort is to program the adequate parameters. BitBlt with Mask When the BitBlt operation deals with the hatched brush pattern, the programmer just needs to set the monochrome mask into Mask Registers and program an adequate BG Rop and Background Color, then the engine would handle the complicated process. Color/Font Expansion The color/font expansion is used to expand a monochrome data (one bit per pixel) into a second color format which is n-bit per pixel during a moving operation. The foreground color and background color is addressed respectively from I/O address 8290h to 8292h and from I/O address 8294h to 8296h. The font patterns are stored in the pattern registers (I/O address 82ACh to 82EBh) or in the off-screen memory which is called Enhanced Color/Font Expansion. These pattern registers store the monochrome bitmap. The BitBlt engine can expand 512 pixels at a time. Thus the font-drawing and monochrome bitmap expansion can be easily accomplished. Enhanced Color Expansion If the size of a monochrome bitmap is larger than 512 pixels, there is not enough space in pattern registers to store this bitmap. In this case, the bitmap should be stored in the offscreen display memory instead of the pattern registers. The operation is called Enhanced Color Expansion or Enhanced Font Expansion depended on the data format. The format written into the off-screen memory of the Enhanced Color Expansion operation is m x n. When the Command 1 Register D[5] (Enhanced Color Expansion Enable Bit, I/O address 82ABh) is set to 1, the Enhanced Color Expansion mode is enable. The SRC Start Linear Preliminary V2.1 March 26, 1996 35 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Address (I/O address 8280h to 8282h) is used to specify the starting address of the off-screen memory. Integrated Graphics Controller stores the monochrome bitmap into the assigned offscreen memory. Therefore the BitBlt engine could expand more pixels using the Enhanced Color Expansion. Enhanced Font Expansion The Enhanced Font Expansion is very similar to the Enhanced Color Expansion. The major difference is the format stored in the off-screen memory. The format written into the offscreen memory of the Enhanced Font Expansion operation is 8 x n. When the Command 1 Register D[4] (Enhanced Font Expansion Enable Bit, I/O address 82ABh) is set to 1, the Enhanced Font Expansion mode is enable. The SRC Start Linear Address (I/O address 8280h to 8282h) is used to specify the start address of the off-screen memory. Integrated Graphics Controller stores the monochrome bitmap into off-screen memory byte by byte successively. Therefore the BitBlt engine would expand these pixels using the Enhanced Font Expansion. Line Drawing The Bresenham's Line Algorithm is a well popular algorithm in graphics, which is used to draw a line. The drawing line could be either a solid line or a dashed line. To draw a solid line, we must use one solid foreground color. To draw a dashed line, we'll use two colors specified by the foreground and background color registers. There are several registers involved to control the starting location, pixel count, error term, and line style, etc. Rectangle Fill A rectangle area fill is a function to fill a specified rectangle area by using either a solid color (rectangle fill) or a pattern (pattern fill). Rectangle Fill is simply to fill the destination rectangle with a solid color. The solid color is specified into the foreground color register. Pattern Fill repeats a source pattern into a destination rectangle. Therefore the pattern registers (I/O address 82ACh to 82EBh) must be specified. The pattern often consists of a background and foreground color because the color expansion would be used in conjunction with the pattern fill. Raster Operations (Raster Ops or ROPs) Raster Ops would perform some logical or arithmetic operations on the graphics data. There are 256 raster ops defined by Microsoft. Each raster op code is a Boolean operation with three operands: the source, the selected pattern, and the destination. Direct Draw The Windows 95 Game SDK enables the creation of world class computer games. Direct Draw is a component of that SDK that allows direct manipulation of video display memory. In order to enhance the performance of games, Integrated VGA Controller provides some Direct Draw functions. Preliminary V2.1 March 26, 1996 36 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Since the former engine functions can just support part of Direct Draw capabilities, three new functions are added into the graphics accelerator in order to meet the other Direcr Draw functions. They are color key range comparison, alpha blending, and Direct Draw raster operation. The register format for Direct Draw is different from those of the engine's functions listed above. To enable Direct Draw, the Direct Draw enable bits must be set to "11". Once Direct Draw is enabled, all of the engine operations are under the "Read-Modify-Write" mode. That is, the destination data have to be read from memory for processing before being written back. After receiving the destination data, the source and destination data are sent to the color key range comparators to determine whether they are between the high and low color key values. If they are in the color key range, the Direct Draw raster operation (D_Rop) will determine whther the data after alpha blending or the original destination will be written back to memory. There are two control bits for alpha blending. They are the S_Alpha bit and D_alpha Bit. The table below shows the relationship between these two control bits and the data after alpha blending. S_Alpha D_Alpha Data after Alpha Blending 0 0 Source 0 1 Destination 1 0 Source 1 1 (Source+Destination)/2 2.8.10 RAMDAC The RAMDAC contains the color palette and 24-bit true color DAC. The color palette, with 256 18-bit entries, converts a color code that specifies the color of a pixel into three 6-bit values, one each for red, green, and blue. The 24-bit true color DAC is designed for direct color graphics mode. It converts each digital color value to three analog voltages for red, green, and blue. 2.8.11 Read-ahead Cache It is a 128-bit cache. With this cache, the times of the operation of display memory read would be reduced, thus increase the performance. 2.8.12 Write FIFO The Write FIFO contains a queue of CPU write accesses to display memory that have not been executed because of memory arbitration. With this queue, the Integrated VGA Controller will release CPU as soon as it records the address and data, and then write into display memory when the display memory is available. Thus CPU performance is increased. Preliminary V2.1 March 26, 1996 37 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.8.13 Bus Interface The Integrated VGA Controller dedicatedly supports 32-bit PCI Local Bus Standard Revision 2.1. Furthermore Integrated VGA Controller supports PCI burst write to take advantage of PCI bus advanced feature to further improve performance. But PCI burst read is not supported since it has very little impact on performance in graphics application. 2.8.14 DRAM Support Integrated VGA Controller supports 0.5 MB, 1 MB, 1.5 MB, 2 MB, 2.5 MB, 3 MB, 3.5 MB,and 4 MB FP DRAM and EDO DRAM configuration. 2.8.15 Video Memory Data Bus Architecture The Integrated VGA Controller uses the 64-bit DRAM data bus with peak video memory bandwidth of 220 MByte/sec for FP DRAM with 55Mhz MCLK. In 2MByte DRAM configuration, Integrated VGA Controller can support 1024x768x32K color, 1024x768x64K color, and 800x600x16M color resolutions with no degradation in the graphics performance. In 4MByte DRAM configuration, Integrated VGA Controller can support 1024x768x16M color, 1280x1024x32K color, and 1280x1024x64K color resolutions. These resolutions are not easily implemented by the regular Graphics Controller architecture. 2.8.16 Internal Dual-Clock Synthesizer Integrated VGA Controller has built-in a dual-clock synthesizer to generate the MCLK and VCLK. This clock synthesizer could generate several variable frequencies, thus it could provide the flexibility for selecting the working frequency. The following block diagram is for clock synthesizer. Numerator Divider fr + DeNumerator CP VCO Post Scaler fd PD GAIN where PD is phase detection, CP is charge pump, VCO is voltage controlled oscillator, fr is reference frequency, and Preliminary V2.1 March 26, 1996 38 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset fd is desired frequency. The operation of clock synthesizer is described as follow: When the synthesizer outputs the steady frequency, it means that fr/DeNumerator = fd*Post Scaler /(Divider*Numerator). i.e. fd=fr*(Numerator/DeNumerator)*(Divider/Post Scaler). With this formula, we could select adequate values for Numerator, DeNumerator, Divider, and Post Scaler to obtain the desired frequency. The planned Video Clocks (VCLK) are as follow: (units: MHz) 25.175 28.322 40.000 50.000 77.000 36.000 44.889 135.000 120.000 80.000 31.500 110.000 65.000 75.000 94.500 These frequencies are compatible with ICS2494-275 ICS2494-275 or -280. Other video clocks would be added to the scheme after verified OK. The planned Memory Clocks (MCLK) are from 50 MHz to 80 MHz with resolution 2 MHz. Higher memory clocks would be added after verified OK. 2.8.17 Power Management To satisfy the power saving for Green PC, Integrated VGA Controller supports the control protocol of DPMS (Display Power Management Signaling) proposed by VESA Monitor Committee. This protocol can reduce the VGA Monitors' power consumption. Integrated VGA Controller has built-in two timers for stand-by and suspend modes that can be programmed from 2 minutes to 30 minutes (2 min./increase) with the extended registers. Integrated VGA Controller also supports forcing the video subsystem into stand-by, suspend, or off modes with the extended registers. Power saving is done by blocking HSYNC and/or VSYNC signals to the VGA monitor. The sources of activation are from the monitoring of keyboard, hardware cursor, and/or video memory read/write. The overview of the signal blocking requirements are as follows: POWER MANAGEMENT STATE HORIZONTAL SYNC VERTICAL SYNC VIDEO DISPLAY ON Pulses Pulses Yes Preliminary V2.1 March 26, 1996 39 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset Stand-By No Pulses Pulses No Suspend Pulses No Pulses No OFF No Pulses No Pulses No Preliminary V2.1 March 26, 1996 40 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.8.18 Resolutions Supported Resolution 0.5 MB 1 MB 1.5 MB 2 MB 2.5 MB 3MB 3.5 MB 4 MB 640x480x8 * * * * * * * * 640x480x16 * * * * * * * 640x480x24 * * * * * * * 800x600x4 * * * * * * * * 800x600x8 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 800x600x16 800x600x24 1024x768x4 1024x768x8 * 1024x768x16 1024x768x24 1280x1024x4 * 1280x1024x8 * * * * * * * * * * * * * * * 1280x1024x16 Except these real resolution modes, Integrated VGA Controller is also built-in virtual screen mode which could support up to 2048x2048 resolution. 2.8.19 Turbo Queue In Integrated VGA Controller, the graphics engine performs the acceleration functions via the acceleration commands stored in the command queue. The command queue is a FIFO (First In First Out) and ring structure. i.e. If an acceleration command is filled in the last stage of the command queue, then the following acceleration command would be filled in the first stage of the command queue. Once this command queue is congested, the CPU's request will be pending until the command queue has free space to accept more acceleration commands. This would downgrade the graphics system performance severely. Thus the length of command queue will dominate the performance of the graphics engine. To lengthen the command queue as long as required, Integrated VGA Controller provides two different kinds of command queue. The first one is built in Integrated VGA Controller, which is called Hardware Command Queue. The other one is built in the off-screen display memory, which is called Turbo Queue. The Hardware Command Queue is a 32 doublewords queue built in front of the graphics engine. Since the average length of an engine command is 8 doublewords, it could be regarded as 5 stages command queue, the first one is in the active state and the last four are in the wait states. The Turbo Queue is an extraordinary structure developed and patent pending by SiS Corp. Preliminary V2.1 March 26, 1996 41 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset The system configuration of the two command queues and the graphics engine is shown in the following diagram. PCI Bus 32-doubleword Hardware Command Queue M U X Graphic Engine Control Logic Display Memory Turbo Queue in off-screen display memory turboqu.drw Figure 2.7 Turbo Queue Architecture The Turbo Queue is also a FIFO and ring structure as stated before. The size of the Turbo Queue in Integrated VGA Controller is 32K bytes. Thus the stages of graphics engine could be regarded as infinity. It could get rid of the disadvantages of the CPU waiting problems due to the limited length of command queue and It could get extra high graphics performance. To program the extended register SR2C (Turbo Queue Base Address Register) could allocate the Turbo Queue into the off-screen region of the display memory automatically. Once the commands in the Hardware Command Queue were moved into the Turbo Queue, the free space in the Hardware Command Queue could be vacated to store the next acceleration command and the condition of CPU waiting could be avoided. If both the command queues are not empty, the graphics engine would perform the commands in Turbo Queue first until Turbo Queue is empty. 2.8.20 Video Accelerator Video Password/Identification Register A video registers protection is implemented in the index 80h of CRT index register 3D4. To disable the protection, the software must first match the protection key value of 86h. If not match, read/write to any of the video associated registers are denied. Video Capture and PlayBack Integrated VGA Controller video accelerator can work in four different modes: standard FC (feature connector) mode, SiS FC (SiS Proprietary Defined Feature Connector) mode, direct video mode, and PCI multimedia mode. In standard FC mode, Integrated VGA Controller supports standard FC operation. In SiS FC mode, Integrated VGA Controller would co-operate with SiS 6204 MPEG and/or video adapter. After receiving the video data from SiS 6204, Integrated VGA Controller Preliminary V2.1 March 26, 1996 42 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset would perform scaling and store these video data to display memory. Furthermore Integrated VGA Controller would perform color-space conversion, interpolation, and scaling on the stored video data before overlaying with graphics data for final display. The SiS proprietary defined feature connector are described in the next table: Symbol FC Pin No. Description VIDEO[7:0] 1-8 Video Data The 8-bit video data format can be RGB 555, RGB 565, YUYV 422, YVYU 422, UYVY 422, VYUY 422 and Brooktree ByteStreamTM format. VDDE 10 Video Data Valid Active high signalWhen VDDE is high, the video data will be captured by Integrated VGA Controller. PCLK 9 Video pixel clock The video data output is based on PCLK. The frequency should be under 30MHz. VDVSYNC 18 Video Data Vertical Sync Signal This signal is active when frame is change. The positive edge will be detected. VDFIELD 19 Video Data Field Signal This signal indicates the current frame is odd or even frame. EVIDEO 17 Enable Video Data Intput Active low When this pin is low and the video controller is programmed to video capture mode, the video data can be transformed from Feature Connector or direct input by using the same signal definition. In direct video mode, Integrated VGA Controller could work with the Philips SAA7110 SAA7110 / SAA7111 SAA7111, Sony CXA1790Q CXA1790Q, and Brooktree Bt815/817/819A (8-bit SPI mode 1, 2) to provide the PC-Video solution and provide the very flexible overlaying ability mentioned above. In PCI multimedia mode, Integrated VGA Controller supports PCI multimedia design specification to meet future potential trend. In addition to the SiS proprietary video solution, Integrated VGA Controller also supports the industry standard FC spec to provide a standard video link to the third-parties' video adapters. Furthermore in PCI multimedia mode, Integrated VGA Controller supports PCI multimedia design guide Rev. 1.0 spec to meet future potential trend. Feature Connector Interface Preliminary V2.1 March 26, 1996 43 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset In standard feature connector mode, Integrated VGA Controller would transfer the graphics data to the connected video adapter for overlay and can accept the video data from the connected video adapter. However in SiS feature connector mode, SiS redefined the pin definition of the feature connector allowing SiS 6204 to pass the video data to Integrated VGA Controller. The passed video data format is RGB565 RGB565 and the maximum data rate is 30 MByte/sec. The RGB565 RGB565 data are 16-bit. SiS 6204 would transfer the 16-bit data by two successive bytes cycle. Integrated VGA Controller would recover the data back to RGB565 RGB565 format. The data input/output direction of Integrated VGA Controller is controlled by the ESYNC, EVDCLK, EVIDEO pins and is automatically controlled by BIOS. Video Capture Window BLANK* ESYNC Input Video VDVES* Captured Video VDVEE* VDHES* VDHEE* Integrated VGA Controller provides video capture windowing to select a part of input video to be captured into video frame buffer. This capture window is defined by four parameter: video data horizontal start (VDHES), video data horizontal end (VDHEE), video data vertical start (VDVES), and video data vertical end (VDVEE). There are the video data horizontal counter and the video data vertical counter inside Integrated VGA Controller. The video data horizontal counter is reset at the positive edge of signal BLANK* and counted up by PCLK or LLC1. The video data vertical counter is reset at the positive edge of ESYNC and counted up by positive of BLANK*. When the value of the video data horizontal counter is equal to or greater than VDHES and the video data vertical counter is equal to or greater than VDVES, the video data capture starts or continues. After the value of the video data horizontal counter is equal to or greater than VDHEE or the video data vertical counter is equal to or greater than VDVEE, the video capture ends. Video Captured Down Scaling Integrated VGA Controller provides independent X-Y down scaling of the captured video image in integer increments of 1/64. Images may be scaled down to n/64 (n = 1 ~ 64) of the original image size to support video icons for graphics user interfaces, or to reduce the Preliminary V2.1 March 26, 1996 44 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset memory bandwidth. The scaling factor is controlled by HDSF and VDSF, which ranging from 0 to 63, and the scaling factors are (64-HDSF 64-HDSF)/64 in horizontal and (64-VHSF 64-VHSF)/64 in vertical. Video Capture FIFO The scaled-down video data would be fed into the video capture FIFO before being stored to display memory. The 64x16 video capture FIFOs serve as buffers between the video capture mechanisms and the display memory, are provided to fit the bandwidth limitation of the display memory during video image capture operation. Multi-format Video Frame Buffer The video frame buffer of Integrated VGA Controller is shared with graphics frame buffer and is a multi-format frame buffer. It could accept 16-bpp YUV422 YUV422, RGB555 RGB555, and RGB565 RGB565 color format. The decompression CODEC, hardware or software, could fill the valid decompressed video frame data into the off-screen video frame buffer through the PCI local bus. The other PCI motion video card or CPU can transfer the video data through PCI local bus directly into video frame buffer. Thus Integrated VGA Controller can overlay the video on the screen. Video Playback Line Buffers When CRT refresh the screen, the video data must be overlaid with graphics data. Therefore the video data would first be read out from off-screen video frame buffer into the video playback line buffers for further handling. The video playback line buffers serve as buffers between display memory and the playback mechanisms, are provided to fit the limitation of the display memory during video playback operation. Color Space Conversion & Color Format Conversion If the data read from the video frame buffer is in YUV422 YUV422, the real time YUV-to-RGB converter will be turn on. The video data would be converted to RGB888 RGB888 format for successive processing. The YUV422 YUV422 are converted following the CCIR601-2 CCIR601-2 standard. If the data read from the video frame buffer is in RGB format, the YUV-to-RGB converter would be bypassed. All the RGB565 RGB565 and RGB555 RGB555 format are supported and then would be converted to RGB888 RGB888 format. Horizontal Interpolation DDA The DDA (Digital Differential Accumulator) using the following mathematical calculation with 2-tap, N-phase and scaling up factor UFACT (from J points scaling up to J * UFACT points): Destination[i] = (1 - Weight) * Source[j] + Weight * Source[j+1] Preliminary V2.1 March 26, 1996 45 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset j = TRUNC(i / UFACT) Weight" = TRUNC(i / UFACT) - j However since the Weight" is not an integer, the multiplication is hard to implement and therefore the following Weight is used for calculation. Weight = TRUNC(Weight" * N) / N The Integrated VGA Controller built-in an X-interpolation DDA mechanism to get better video stretching quality. The interpolation accuracy of DDA mechanism is 2-tap and 8phase. Vertical Interpolation DDA The Integrated VGA Controller built-in a Y-interpolation DDA mechanism and two line buffers mechanism to get better video stretching quality. The interpolation accuracy of DDA mechanism is 2-tap and 8-phase. Video Playback Horizontal Zooming The playback video data can be horizontal zoom-in in 64/n factor (n = 1 ~ 64) and zoom-out in about m/16 factor (m = 1 ~ 16). The zooming factor (HPFACT) is controlled by 4-bit integer part and 6-bit fraction part. The horizontal video size will be zoomed to 1/HPFACT. If HPFACT1, it will performing horizontal down scaling. Video Playback Vertical Zooming The playback video data can be vertical zoom-in in 64/n factor (n = 1 ~ 64) and zoom-out in arbitrary factor. The zooming factor (VPFACT) is controlled by 6-bit fraction part. The video size will be zoomed to 1/VPFACT. Since the VPFACT is always less than 1, therefore you can only perform vertical up scaling by this factor. The vertical down scaling can be done by multiplying the Video Frame Buffer Offset with an integer I. Then the vertical video size will be zoomed to 1/(I*VPFACT). Video Data Blending The pixels of graphics data can be blended by graphics data alpha value, then add with the blended video data to generate blended data. The accuracy of the blending is 4 bits, the 4 MSBs of Graphic data alpha value register. The pixels of video data can be blended by video data alpha value, then add with the blended graphics data to generate blended data. The accuracy of the blending is 4 bits, the 4 MSBs of Video data alpha value register. Color Keying Preliminary V2.1 March 26, 1996 46 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset A control signal is generated by comparing the 24 bits graphics data to the 24 bits color key low value and 24 bits clolor key high value. The bit number is dependent on color depth used. If the graphics data value is between the two color key values ( all of three RGB parts), the color key is detected. This comparison mechanism can be disable by setting the video window size to zero, i.e. X-start=0, X-end=0, Y-start=0, and Y-end=0. Chroma Keying A control signal is generated by comparing the 24 bits video data to the 24 bits chroma key low value and 24-bit chroma key high value. The chroma key can be YUV or RGB format. If the video data value is between two chroma key values ( all of three RGB or YUV parts), the chroma key is detected. Graphics & Video Overlay The overlay of the graphics data and the video data is performed by color keying and chroma keying method. The overlay operation is set by Key Overlay Operation Mode Register. The operation is defined below: Operation Operation Mode 0000 always select graphics data 0001 select blended data when color key and chroma key,otherwise select graphics data 0010 select blended data when color key and not chroma key,otherwise select graphics data 0011 select blended data when color key,otherwise select graphics data 0100 select blended data when not color key and chroma key,otherwise select graphics data 0101 select blended data when chroma key,otherwise select graphics data 0110 select blended data when color key xor chroma key,otherwise select graphics data 0111 select blended data when color key or chroma key,otherwise select graphics data 1000 select blended data when not color key and not chroma key,otherwise select graphics data 1001 select blended data when color key xnor chroma key,otherwise select graphics data 1010 select blended data when not chroma key,otherwise select graphics data 1011 select blended data when color key or not chroma key,otherwise select graphics data 1100 select blended data when not chroma key,otherwise select graphics data Preliminary V2.1 March 26, 1996 47 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 1101 select blended data when not color key or chroma key,otherwise select graphics data 1110 select blended data when not color key or not chroma key,otherwise select graphics data 1111 always select blended data Video Window Control Registers The video window area is defined by six registers that specify a rectangular region by X-start, X-end, Y-start, and Y-end (X: Horizontal, Y: Vertical). The location of the video window is referenced to the VGA sync signals. The size of the video window is defined in VGA pixels and lines. Video Panning The displayed video image could be panned around the captured video image by setting the video display starting address. i.e. You may selectively display any part of the captured video image. The video display starting address is equal to the video frame buffer starting address adds the panning offset. Overlay Memory Data The display memory is configured to two areas: one is the graphics area (which is the actual screen display area) storing graphics pixel data, and the other is the video area (which is also called off-screen area) storing the video pixel data. In the graphics area, the corresponding video window area is reserved with the color key value. During the CRT scan period, a comparison of graphics data with color key data is performed. Once a match meet, the CRT output path would be switched from graphics path to video path to display the video data. When the shared-memory architecture is used, the video frame buffer could be anywhere of the system memory, independent with the location of the graphics frame buffer. This provides more flexiblility for video control application program. The video frame buffer should be set to non-cacheable and non-swappable. 2.8.21 Video Playback Contrast Enhancement and Brightness Control To achieve higher video quality, the SiS 6205 built-in the Contrast Enhancement and Brightness Control mechanism. For Contrast Enhancement, first, the brightness mean value is calculated by some pixels and some frames. The number of sampled pixels and frames is programmable by registers. Contrast Enhancement mechanism then increases the difference between the video data and mean value. The increasing rate is programmed by gain. The value of gain is from 1.0 to 1.4375. The Brightness of video data can also be controlled. The Brightness is a 2's complement value from -128 to +127. This value is then added with the video data to increase or decrease the brightness of video. Preliminary V2.1 March 26, 1996 48 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.8.22 Signature Analysis The signature analysis is provided to automatically test the graphics data which is the input of the DAC. This technique is based on the concept of cyclic redundancy checking (CRC) and is realized in hardware using linear feedback shift registers (LFSRs). It is composed of a 16-bit signature generator register which is called multiple-input signature register (MISR, shown in the following figure) and is used to ensure a unique signature of different patterns. For a given test image, the signature analysis could get a right unique signature number. If an error occurs in the controller or the data manipulation, it would result in a different wrong signature number as compared to the pre-calculated signature value. Thus a test technician could sort the good or bad chips more quickly and accurately and requires no visual inspection of the screen for errors in the mass product environment. This could save significant testing time. If the display screen includes blinking attributes or a blinking cursor, then the signature will be different when blink-off and blink-on for those frames. Assume all error patterns are equally likely, then the probability of failing to detect an error by the MISR is approximately 0.000015. To match the inputs of MISR, the 24-bit graphics data (i.e. the input of the DAC of the RAMDAC) would be first converted into 16-bit data. The corresponding transfer function of the MISR of the following figure is p( x ) = 1 + c1 x + c2 x 2 + c3 x 3 +.+ c16 x16 where can be either 0 or 1. Integrated VGA Controller sets the parameters of the signature register as p( x ) = 1 + x + x7 + x10 + x16 Once the software enables the signature analysis function, Integrated VGA Controller could test itself intelligently and automatically. This function could also be disabled by the extended control register for power saving purposes. D1 D2 D.F.F. C16 D3 D.F.F. C15 D 16 D.F.F. C14 D.F.F. C1 Figure 2.8 Multi-Input Signature Register (MISR) Preliminary V2.1 March 26, 1996 49 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.8.23 Compatibility The Integrated VGA Controller is fully compatible with all standard IBM VGA modes and EGA, CGA, MDA, and Hercules modes. 2.8.24 Software Support To fully utilize and support the Integrated VGA Controller hardware features, SiS has developed a high-performance VESA extension compliant BIOS. Extended graphics and text modes are supported by software application drivers developed by SiS. The following applications are currently supported: * 3D Studio Ver. 3.0 & 4.0 * AutoCAD/386 Release 11, 12, 13 * Auto Shade/386 Ver. 2.0 * GEM 3.0/Ventura 2.0 * Lotus 1-2-3/Symphony Ver. 3.x * MicroSoft Windows 3.1 * MicroSoft Windows 95 * MicroSoft Windows NT Ver. 3.1 & 3.5 * OrCad (SDT/VST/PCB) Rel 4 * OS/2 Presentation Manager 2.1 & 3.0 * P-CAD Ver. 6.06 * VersaCAD/386 Ver. 2.1 * Word Perfect 5.x & 6.0 Video operation are supported by software application drivers developed by SiS. following applications are currently supported: The * Microsoft Video For Windows * DCI driver * Direct Draw driver Preliminary V2.1 March 26, 1996 50 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.9 Mode Table 2.9.1 Standard VGA Modes MODE TYPE DISPLAY COLORS SIZE SHADES ALPHA BUFFER FORMAT START BOX SIZE MAX. PAGES 0 A/N 320x200 16 40x25 B800 8x8 8 0* A/N 320x350 16 40x25 B800 8x14 8 0+ A/N 360x400 16 40x25 B800 9x16 8 1 A/N 320x200 16 40x25 B800 8x8 8 1* A/N 320x350 16 40x25 B800 8x14 8 1+ A/N 360x400 16 40x25 B800 9x16 8 2 A/N 640x200 16 80x25 B800 8x8 8 2* A/N 640x350 16 80x25 B800 8x14 8 2+ A/N 720x400 16 80x25 B800 9x16 8 3 A/N 640x200 16 80x25 B800 8x8 8 3* A/N 640x350 16 80x25 B800 8x14 8 3+ A/N 720x400 16 80x25 B800 9x16 8 4 APA 320x200 4 40x25 B800 8x8 1 5 APA 320x200 4 40x25 B800 8x8 1 6 APA 640x200 2 80x25 B800 8x8 1 7 A/N 720x350 4 80x25 B000 9x14 8 7+ A/N 720x400 4 80x25 B000 9x16 8 0D APA 320x200 16 40x25 A000 8x8 8 0E APA 640x200 16 80x25 A000 8x8 4 0F APA 640x350 2 80x25 B000 8x14 2 10 APA 640x350 16 80x25 A000 8x14 2 11 APA 640x480 2 80x30 A000 8x16 1 12 APA 640x480 16 80x30 A000 8x16 1 13 APA 320x200 256 40x25 A000 8x8 1 NOTE: 1. A/N: Alpha/Numeric 2. APA: All Point Addressable (Graphics) Preliminary V2.1 March 26, 1996 51 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset MODE DISPLAY SIZE COLORS SHADES FRAME RATE. H-SYNC. VIDEO FREQ. 0 320x200 16 70 31.5 K 25.1 M 0* 320x350 16 70 31.5 K 25.1 M 0+ 360x400 16 70 31.5 K 28.3 M 1 320x200 16 70 31.5 K 25.1 M 1* 320x350 16 70 31.5 K 25.1 M 1+ 360x400 16 70 31.5 K 28.3 M 2 640x200 16 70 31.5 K 25.1 M 2* 640x350 16 70 31.5 K 25.1 M 2+ 720x400 16 70 31.5 K 28.3 M 3 640x200 16 70 31.5 K 25.1 M 3* 640x350 16 70 31.5 K 25.1 M 3+ 720x400 16 70 31.5 K 28.3 M 4 320x200 4 70 31.5 K 25.1 M 5 320x200 4 70 31.5 K 25.1 M 6 640x200 2 70 31.5 K 25.1 M 7* 720x350 4 70 31.5 K 28.3 M 7+ 720x400 4 70 31.5 K 28.3 M 0D 320x200 16 70 31.5 K 25.1 M 0E 640x200 16 70 31.5 K 25.1 M 0F 640x350 2 70 31.5 K 25.1 M 10 640x350 16 70 31.5 K 25.1 M 11 640x480 2 60 31.5 K 25.1 M 12 640x480 16 60 31.5 K 25.1 M 13 320x200 256 70 31.5 K 25.1 M NOTE: i - interlaced mode n - noninterlaced mode Preliminary V2.1 March 26, 1996 52 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 2.9.2 Enhanced Video Modes MODE TYPE DISPLAY SIZE COLORS SHADES ALPHA FORMAT BUFFER BOX START SIZE MAX. PAGES 22 A/N 1056x352 16 132x44 B800 8x8 2 23 A/N 1056x350 16 132x25 B800 8x14 4 24 A/N 1056x364 16 132x28 B800 8x13 4 25 APA 640x480 16 80x60 A000 8x8 1 26 A/N 720x480 16 80x60 B800 9x8 3 29 APA 800x600 16 100x37 A000 8x16 1 2A A/N 800x600 16 100x40 B800 8x15 4 2D APA 640x350 256 80x25 A000 8x14 1 2E APA 640x480 256 80x30 A000 8x16 1 2F APA 640x400 256 80x25 A000 8x16 1 30 APA 800x600 256 100x37 A000 8x16 1 37 APA 1024x768 16 128x48 A000 8x16 1 38 APA 1024x768 256 128x48 A000 8x16 1 39 APA 1280x1024 16 160x64 A000 8x16 1 3A APA 1280x1024 256 160x64 A000 8x16 1 40 APA 320x200 32K 40x25 A000 8x8 1 41 APA 320x200 64K 40x25 A000 8x8 1 42 APA 320x200 16.8M 40x25 A000 8x8 1 43 APA 640x480 32K 80x30 A000 8x16 1 44 APA 640x480 64K 80x30 A000 8x16 1 45 APA 640x480 16.8M 80x30 A000 8x16 1 46 APA 800x600 32K 100x37 A000 8x16 1 47 APA 800x600 64K 100x37 A000 8x16 1 48 APA 800x600 16.8M 100x37 A000 8x16 1 49 APA 1024x768 32K 128x48 A000 8x16 1 4A APA 1024x768 64K 128x48 A000 8x16 1 4B APA 1024x768 16.8M 128x48 A000 8x16 1 4C APA 1280x1024 32K 160x64 A000 8x16 1 4D APA 1280x1024 64K 160x64 A000 8x16 1 NOTE: 1. A/N: Alpha/Numeric 2. APA: All Point Addressable (Graphics) Preliminary V2.1 March 26, 1996 53 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset MODE DISPLAY SIZE COLORS SHADES FRAME RATE. H-SYNC. VIDEO FREQ. 22 1056x352 16 70 30.5 K 40.0 M 23 1056x350 16 70 30.5 K 40.0 M 24 1056x364 16 70 30.5 K 40.0 M 25 640x480 16 60 31.5 K 25.1 M 26 720x480 16 60 31.5 K 25.1 M 29 800x600 16 56 35.1 K 30.0 M 29* 800x600 16 60 37.9 K 40.0 M 29+ 800x600 16 72 48.0 K 50.0 M 29# 800x600 16 75 46.8 K 50.0 M 29## 800x600 16 85 53.7 K 56.3 M 2A 800x600 16 56 35.1 K 36.0 M 2D 640x350 256 70 31.5 K 25.1 M 2E 640x480 256 60 31.5 K 25.1 M 2E* 640x480 256 72 37.9 K 31.5 M 2E+ 640x480 256 75 37.5 K 31.5 M 2E+ 640x480 256 85 43.4 K 36.0 M 2F 640x400 256 70 31.5 K 25.1 M 30 800x600 256 56 35.1 K 36.0 M 30* 800x600 256 60 37.9 K 40.0 M 30+ 800x600 256 72 48.0 K 50.0 M 30# 800x600 256 75 46.8 K 50.0 M 30## 800x600 256 85 53.7 K 56.3 M 37i 1024x768 16 87 35.5 K 44.9 M 37n 1024x768 16 60 48.4 K 65.0 M 37n+ 1024x768 16 70 56.5 K 75.0 M 37n# 1024x768 16 75 60.2 K 80.0 M 37n## 1024x768 16 85 68.7 K 94.5 M 38i 1024x768 256 87 35.5 K 44.9 M 38n 1024x768 256 60 48.4 K 65.0 M 38n+ 1024x768 256 70 56.5 K 75.0 M 38n# 1024x768 256 75 60.2 K 80.0 M Preliminary V2.1 March 26, 1996 54 Silicon Integrated Systems Corporation SiS5596 SiS5596 Pentium PCI Chipset 38n## 1024x768 256 85 68.7 K 94.5 M 39i 1280x1024 16 87 48.8 K 80.0 M 39n 1280x1024 16 60 65.0 K 110.0 M 39n+ 1280x1024 16 75 80.0 K 135.0 M 3Ai 1280x1024 256 87 48.8 K 80.0 M 3An 1280x1024 256 60 65.0 K 110.0 M 3An+ 1280x1024 256 75 80.0 K 135.0 M 40 320x200 32K 70 31.5 K 25.1 M 41 320x200 64K 70 31.5 K 25.1 M 42 320x200 16.8M 70 31.5 K 25.1 M 43 640x480 32K 60 31.5 K 25.1 M 43* 640x480 32K 72 37.9 K 31.5 M 43+ 640x480 32K 75 37.5 K 31.5 M 43+ 640x480 32K 85 43.4 K 36.0 M 44 640x480 64K 60 31.5 K 25.1 M 44* 640x480 64K 72 37.9 K 31.5 M 44+ 640x480 64K 75 37.5 K 31.5 M 44+ 640x480 64K 85 43.4 K 36.0 M 45 640x480 16.8M 60 31.5 K 25.1 M 45* 640x480 16.8M 72 37.9 K 31.5 M 45+ 640x480 16.8M 75 37.5 K 31.5 M 45+ 640x480 16.8M 85 43.4 K 36.0 M 46 800x600 32K 56 35.1 K 36.0 M 46* 800x600 32K 60 37.9 K 40.0 M 46+ 800x600 32K 72 48.0 K 50.0 M 46# 800x600 32K 75 46.8 K 50.0 M 46## 800x600 32K 85 53.7 K 56.3 M 47 800x600 64K 56 35.1 K 36.0 M 47* 800x600 64K 60 37.9 K 40.0 M 47+ 800x600 64K 72 48.0 K 50.0 M 47# 800x600 64K 75 46.8 K 50.0 M 47## 800x600 64K 85 53.7 K 56.3 M 48 800x600 16.8M 56 35.1 K 36.0 M Preliminary V2.1 March 26, 1996 55 Silicon Integrated S