 |
|
| |
|
|
| |
On-chip SerDes clock distribution implementation
( 01 Jul 2008 )
by Dr. Satya Gupta, VP Engineering, Open-Silicon Research Pvt Ltd
|
Intended for the gaming and computer industry, the scalable graphics switch chip from Open-Silicon acts as a hub connecting the main processor to the multiple graphics processors. The chip helps in dividing graphics processing work among several graphical processing units (GPUs). A prominent feature of this chip is the clocking structure for its 12 quad 48-channel 2.5Gbps PCIe SerDes complex. This clock structure alleviates the system complexity by ensuring on-die distribution of high-quality synchronized reference clocks to all the 12 quads. This facilitates just one input for all the 12 quads from a board/package perspective.
Moreover, the chip met the rest of the design goals such as low package cost, adequate power distribution, and ESD and DFT design. Meeting these goals required frequent solving of constraints that came up while integrating the 12 quads.
DIE FLOOR PLAN
The size of the pad-limited die is determined by the size and placement of the 12 PCIe SerDes PHYs. An optimal die is achieved after placing three PHYs on each side (Figure 1). This placement is determined by package routing constraints. To achieve low package cost, a wirebond implementation is preferred. Thus, to achieve minimal inductance and matched delays for the differential signals from the 12 PHYs, it requires the placement to ensure straight bond wire routing from the pads to the package balls.The clock distribution for the PHYs is implemented through a buffer ring structure created near the periphery of the chip, outside the pad region (Figure 1). This insulates the clock from crosstalk by reducing the capacitive coupling with the on-die signals.
CLOCK RING DESIGN
The 12 PHYs are divided in three groups. Each group is an independent clock domain with minimal clock skew requirements within the domain.
At chip level, the input to this “clock ring” is a 100MHz reference clock fed through a LVDS buffer to all the PHYs and the internal PLL (Figure 2). A dedicated and isolated power supply is provided on-die for this buffer. It has enough decap cells to suppress any power supply noise both for the core and IO rails. The differential clock coming through this LVDS buffer is then fed through a combination of differential repeaters and high edge rate differential buffers to the clock pins of the PHYs. There are two types of differential buffers – a repeater and a fast edge rate buffer driving the PHY clock pins. The differential buffers on all four sides are fed with dedicated power supplies after on-die RC filtering. All the components in this ring structure are designed equal in height. The continuity of the clock ring, as well as the power ring, is ensured by using specially designed decap and filler cells, also of the same height as the rest. These decaps and fillers are used to fill the space between the repeaters, buffers driving the PHY, and the power supplies. All clock nets are routed in the topmost metal layer to ensure minimum delay.
The repeaters are designed with two inputs and two outputs for the differential clocks, and with duty cycle correction. The high edge rate buffers are closely placed to the reference clock inputs of the PHYs. Each PHY has one buffer driving its clock pins. The repeaters drive two buffers each (Figure 2).
This implementation is verified by spice simulations done on an extracted Spice net list. The simulations ensure that all results meet specifications for a specified frequency range the clock transition, SerDes jitter and duty cycle requirements at the clock input pins of all the PHYs (Table 1 and Figure 3).
A reference clock distribution with very low clock jitter is thus achieved by a combination of duty-cycle-correcting buffers and fast-edge-rate inputs supplied with carefully filtered power supply, as well as routing the clock in a shielded channel along the edge of the die.
The placement of the PHYs and the clock ring posed some difficulty for proper distribution of power pads and achieving a low static IR drop. The current carrying capability of the power pads is just sufficient to meet the electromigration requirements but not the voltage drop. In addition, only two layers could be used to route over the PHYs to get a proper power distribution.
In order to augment the current supply to the core, power pads with more metal layers are used and jumpers are created in the top metal layer from the bond pad to core ring. This enhances the total current supplied to the core. Moreover, power rings are routed on top of the PHY in the upper two layers to help sustain higher current density. Thus, a strengthened ring around the core and dedicating the top metal layer only for power supply ensures enough current supply to the core and thus meets the overall voltage drop requirements for the chip (Figure 4).
ESD AND DFT DESIGN
All the PHYs come with built-in ESD protection circuits and are isolated from the core. ESD protection for the rest of the chip is provided by adding discharge paths in between the PHYs. The pad ring is designed appropriately to provide sufficient low impedance ESD discharge paths.
The DFT implementation of the chip facilitates simultaneous or individual testing of all PHYs. This helps in scenarios that require debugging the PHYs individually while reducing test times by testing all PHYs in one go. DFT logic for testing memory, IOs, PLL, LVDS and all the standard cells is also included.
All functional and DFT modes are analyzed through STA for all relevant process corners on the full chip post route extracted netlist.
BOARD AND PACKAGE DESIGN
A compact die size is achieved by using staggered bond-pads for all the power and signals and using inline bond-pads for the PHYs. The use of staggered bond-pads, however, limits the number of pads that can be bonded out. The package uses only two rings for power connections and these are segmented to provide separate power supplies to the core and to the PHYs. Additional rings are not introduced, thereby limiting the inductance of the bond wires for the differential signals from the PHY.
The maximum possible number of power pads is bonded out to achieve a low composite die-package voltage drop. Those power pads that are not bonded out have their bond pads shorted with adjacent pads that are bonded out. This facilitates better distribution of current into the core.
The use of low voltage differential reference clock reduces jitter and noise. The differential clock signals on the package are routed with bond wires with almost identical lengths and are shielded to achieve the required signal integrity. Differential clock pairs on the board are routed point-to-point with utmost care to avoid any crosstalk effects. The reference clock distributions to all the devices are matched within 15 inches on the system board. The phase delay between the transmitter and receiver clock is made less than 10ns. The routing of any signal in a given clock pair between the clock source and the connector is matched in length and spaced away from other non clock signals to avoid any crosstalk effects.
A new clock distribution implementation has been developed for multiple PHYs integrated on the same die. The associated design challenges and their solutions have achieved remarkable results. Designed for the company’s client Lucid (Israel), the tested chip proved to be fully operational with all the features working as desired. Click here for the illustrations:
Figure 1, Figure 2, Figure 3, Figure 4, Table 1 |
|
|