ECN Asia
  Mark as your homepage Bookmark us Print Subscription
               
Friday, July 4, 2008
Home About Us Current Issue Archive RSS Free Subscription Trade Shows Media Kit Contact Us

Boards & Modules

Computers, Peripherals & Networking Devices

Digital Den

Electromechanical/Mechanical Devices

Embedded Systems & Networking

Integrated Circuits & Semiconductors

Microwave & RF Components

Optoelectronics & Displays

Packaging & Interconnects

Passive & Discrete Components

Power Sources & Conditioning Devices

Sensors & Actuators

Software

Test & Measurement

Search:
 
 
Product Info Search:
 
     
 
 
 
Issue > Feb 2007 > Cover Story
 
 
Ads by Google
 

Eliminating massive clock trees in SoC designs using GALS


( 01 Feb 2007 )

by Mohit Arora, Senior Design Engineer, Transportation and Standard Products Group, Freescale Semico

As high speed I/O buses such as SATA II, PCIExpress (2.5 to 80Gbps) are implemented in SoC designs, it is becoming increasingly difficult to meet timing constraints. The result of combining many different separatelydeveloped IP functional blocks which are provided by different vendors, each having different specifications for both local and system clocks, is massive clock tree structures. The result is that clock skew is now comparable to the clock period, meaning that clock skew is definitely not negligible.



Moreover, verifying the thirdparty IP blocks at the system level poses another set of integration issues.

This article proposes a new methodology for IP integration that uses globally-asynchronous, locallysynchronous (GALS) clock circuits that significantly reduce the power consumed by the SoC by providing a smaller clock tree structure. Using GALS also makes it simple to meet timing requirements and reduce the amount of system level verification needed for third-party IP blocks.



TIMING PROBLEM

Conventional SoC designs consist of an interconnect bus with many IP blocks combined with a multiplicity of coupled timing domains. Since chip size and speed are continually increasing (per Moore’s Law), meeting the timing requirements of a system with hundreds of IP elements has become a major challenge. Conventional design methodologies have resulted in massive clock tree structures, which substantially increase the average power consumed by the SoC.



With clock requirements being the culprit, it would appear that a simple solution would be to eliminate the clock completely. Unfortunately, making a SoC design completely asynchronous (without a system clock) is in practice quite difficult today, one reason being the lack of mature CAD tools.



Moreover, the aggressive pressure to release product to the market means that the SoC industry has been forced to commit to deliver silicon that is right the first time, so that designers are forced to re-use existing IP blocks with minimal integration design effort. The result is that there has effectively been no change in the methodology of IP integration since the first SoC designs. As the complexity of SoC is increasing exponentially, full functional verification has become a big challenge. SoC integrators are often unable to spend the time to thoroughly understand the IP blocks purchased from third party vendors and, to meet project time constraints, are interested only in integrating the blocks.



The initial design approach of SoC designers was to select the IP blocks needed to meet application requirements, place them on silicon and connect them with a standard on-chip bus. As was the case with multimillion-gate ASICs containing many connected IP blocks, today's SoC cannot be built around a single bus. Instead, complex hierarchies of buses are used, with sophisticated protocols and multiple bridges between them (Figure 1). Communication between any two IP blocks can be via several buses, which places a lot of strain on meeting timing requirements. Essentially bus-based interconnects are being stretched to the point where they cannot be scaled further.



SoC designers face a basic paradox in today's environment: rather than enjoying significant time savings by using acquired IP blocks, they spend additional time in learning the function of the blocks in order to build the logic and test vectors for these blocks. Except for the vendors of processor cores, IP vendors typically provide little of the detailed documentation designers need. Consequently, designers find they have to acquire some level of application expertise or use consulting resources to understand the IP well enough to complete these tasks. This additional design and verification burden currently adds months to SoC design projects. Besides imposing a drain on resource-strapped projects, the additional logic inevitably degrades performance and increases chip area, while the additional test requirements further complicate final test stages.



CMOS feature size is decreasing and would be, according to Moore's Law, on the order of 22nm by 2016, with clock frequencies reaching around 28.7GHz. It is clear that interconnect speed is not keeping up with increases in transistor speed. This means that in future circuits, wire delay will no longer be negligible, but play a major role in deciding the maximum frequency at which a circuit can operate. In line with the clocking trends, global clock skew becomes an increasing fraction of clock period.



Examining all these issues makes it clear that a new interconnect strategy is required to bring design risks back under control: large high-speed integrated circuits will eventually need to be designed without global clocking. This requirement could be fulfilled by the GALS circuits that will be discussed.



Eliminating the clock completely is not likely to happen very soon, but a new GALS method has been developed in which communication between two synchronous blocks can be performed asynchronously without the need for a master clock.



As per the International Technology Roadmap for Semiconductors (ITRS, ex. SIA), 1999 edition: "With clock speed possibly exceeding GHz, and across-chip communication taking upwards of five to 20 clock cycles, an approach is needed to building a hierarchy of clock speeds with locally synchronous and globally asynchronous interconnects. Tools to handle asynchronous, multicycle interconnect as well as locally synchronous, high performance near neighbor communication are needed."



Figure 2 shows two cases: the first is a normal synchronous system with a master clock, the second, a GAL system in which two blocks talk to each other with a handshake interface. Though each block has its own local clock, the overall system works without a global clock.



In SoC designs, the GALS architecture helps to solve the increasingly difficult problem of integrating multiple clock domains into a single chip. Current synchronous solutions involve a number of inefficient design tricks such as using Grey code on a dualported SRAM to act as an interface between logic blocks with different clock frequencies. Leveraging an asynchronous SoC interconnect, for example an on-chip crossbar, independent synchronous blocks can be linked together using a simple clock domain converter on each of the crossbar ports. The clock domain converters work just like synchronous FIFOs, and can operate independently of each other at the rate of the connected synchronous block. A similar idea has been shown in detail in the next section.



Figure 3 shows how the IP is connected to an interconnect bus using a GALS circuit. IP blocks are completely decoupled from the interconnect bus.



With this method, a synchronous layer interacts with the interconnect bus using the system clock and transmits the address, data and other information to the asynchronous layer. The asynchronous layer then encodes the data via delay-insensitive encoding—for example dual rail/mon- n encoding—and transmits the same to the asynchronous layer on the IP side. This creates a boundary between the system clock and the IP clock due the asynchronous bridge that effectively eliminates the need for large clock tree buffers for the system clock. A major result is lower power consumption.



The synchronous layer acts as a target for the interconnect bus and as an initiator for the asynchronous layer, making the asynchronous communication look transparent. The particular synchronous layer implementation would be specific to the properties of the interconnect bus and the targeted application targeted (such as low- or high-performance peripherals). The asynchronous layer converts the synchronous protocol signals (address, data, etc.) to delay-insensitive encoding such as m-on-n or dual-rail to transmit the symbols across the line.



BENEFITS

With the GAL methodology, the IP is completely de-coupled from the interconnect bus. This makes it possible to integrate asynchronous communication within an existing synchronous system. Due to the delay insensitive encoding, the wires supporting the communication at physical level do not have to be balanced. Unlike the "legacy" technique for IP integration, GAL does not require large clock tree buffers due to the IP being decoupled from the interconnect bus. This saves a considerable amount of power, which can be extremely important for handheld devices that operate on battery power.



The GAL approach also means that the interconnect bus can run at a much higher frequency, thus increasing overall system performance.



Last but not least, the GAL approach can simplify systemlevel verification of the IP block. With the IP block being completely decoupled from the interconnect bus, verification can be performed at the asynchronous dividing point. In the case of third party preverified IP, IP level verification can be completely eliminated.



CHALLENGES, DRAWBACKS

Asynchronous circuits tend to implement registers using latches rather than flip-flops. In combination with the absence of a global clock, this makes it less straightforward to connect registers into scanpaths. Another consequence of the distributed self-timed control (the lack of a global clock) is that it is more complicated to single-step the circuit through a sequence of well-defined states. This makes it less straightforward to steer the circuit into particular quiescent states, which is necessary for IDDQ testing, the technique that is used to test for the short and open faults, which are typical in today's CMOS processes.



The extensive use of stateholding elements (such as the Muller C element, a basic element in the asynchronous domain, just as a flip-flop is a basic element in any synchronous design), together with the self timed behavior, also makes it difficult to test the feedback circuitry that implements the state holding behavior. Delay fault-testing represents yet another challenge.



One drawback of a GALS design is increased latency due to an additional asynchronous communi-cation layer. A cost of using GALS would be approximately 8,000 to 10,000 additional gates per IP block, the number depending on the bus width and encoding used.

 

 
 
 
ADVERTISEMENT
 
 
 
Ads by Google
 
OUR SPONSOR
   
   
 
 
 
   
   
     
 
 
         
     
 
Related Articles
   
WiMAX MIMO RF transceiver offers “best” receiver noise performance
“First” PLL integrated DDR3 register for RDIMM
“First” controller ICs to integrate 75V half-bridge MOSFET drivers with 1.6A peak current
Connector series feature touch-safe female housings
“First” fully interoperable 100/200/400Mbps powerline communications
SoC integrates DRAM and tamper protection
Processor reference solution for portable GPS market
High frequency rectifiers reduce switching losses
eSATA connector provides backup at 3Gbps
SAW clock oscillator provides low jitter
   
 
Top News
   
RECOM Offers 30Watt Power Minuature Shielded Package
Linear Tech Launches Synchronous N-Channel MOSFET Driver
CLV1370A-LF Features Ultra Low Phase Noise
TI achieves 16-bit analog-to-digital converter and low-jitter clock combination
Multiphase Step-Up DC/DC Controller Delivers High Power
Linear Tech Releases Switch Mode USB Power Manager
Maxim Integrated Launches 5-/7-Channel Temperature Monitors
Atmel Introduces 800/900 MHz IEEE 802.15.4 Compliant Transceiver for ZigBee
Vishay Releases 20-V P-Channel TrenchFET Power MOSFET
Texas Instruments Introduces Linearity Amplifier for Differential Signals
   
  More News >>
 
     
     
 
         
 
 
     
         
 
spacer
Country Report
spacer
   
bullet

TAIWAN: Inductor technologies are developed independently

bullet

KOREA: Inductor manufacturers are highly competitive, but scarce

bullet

CHINA: World’s high-volume producer of transformer, coil and inductor

bullet

TAIWAN: Moderate but steady growth in LED market

bullet

KOREA: LED has a bright future in our homes

  more on country report >>
   
 
spacer
Our Sponsor
spacer
   
bullet
 
   
 
     
 
     
 
spacer
Features
spacer
   
bullet

Switching power supplies go green

bullet

Using fast recovery MOSFETs for synchronous rectification

bullet

Rescuing VoIP quality in high speed broadband CPEs

bullet

On-chip SerDes clock distribution implementation

bullet

Designing digital displays with FPGA

  more on features >>
   
 
Distribution
   

Distributors supply solutions, not just parts

Taiwan distributors compete by bolstering in-house R&D

“Nature of distribution is changing”

Top supply chain predictions in Asia Pacific for 2008

Global impact of environmental legislations in 2008

  more on distribution >>
   
 
     
         
 
 
     
         
 
Industry Focus
   

Ethernet adoption encourages open protocols

Managing Bluetooth profiles: A billion served

Enabling a true wireless multimedia home network

Bluetooth paves the way for truly wireless car interiors

Eliminating massive clock trees in SoC designs using GALS

  more on industry focus >>
   
 
Web Exclusives
   

WiMAX “personality pack” provides complete IEEE802.16 functionality

LED: A tiny light source with a bright future

SSDs: Carving a Niche in the Consumer and Enterprise Markets

FRAM reaches highest capacity to date

Considering enclosure needs up-front saves time and cost

  more on web exclusives >>
   
 
     
     
   
     
 
Semiconductors
   

Simulating the effect of blockers on data converter performance in wideband receivers

Decrease processor power consumption using a CPLD

Taking full advantage of new, low-power MCUs

Power train integration for 2007 and beyond: The true dawn of multi-chip modules

Wireless network options for industrial applications

  more on semiconductors >>
   
 
Field Applications
   

Test Equipment

Power Sources/Circuit Protection

Advanced Signal Processing Dramatically Improves Capability of Artificial Limbs

Voice Interface Technology for Hands-free Function in Automobiles

LXI: A Technology Leap for Test Instrumentation

  more on field applications >>
   
 
     
     
   
     
     
 
INDUSTRY LINKS
   
Photonics Association (Singapore)
bullet Singapore Industrial Automation Association (SIAA)
   
 
 

 

 
         
 

 
 
 
 
 
© 2008 Reed Business Information, a division of Reed Elsevier Inc.
All rights reserved. Use of this web site is subject to its Terms and Conditions of Use. View our Privacy Policy.