ECN Asia
  Mark as your homepage Bookmark us Print Subscription
               
Tuesday, September 9, 2008
Home About Us Current Issue Archive RSS Free Subscription Trade Shows Media Kit Contact Us

Boards & Modules

Computers, Peripherals & Networking Devices

Digital Den

Electromechanical/Mechanical Devices

Embedded Systems & Networking

Integrated Circuits & Semiconductors

Microwave & RF Components

Optoelectronics & Displays

Packaging & Interconnects

Passive & Discrete Components

Power Sources & Conditioning Devices

Sensors & Actuators

Software

Test & Measurement

Search:
 
  Advanced Search
Product Info Search:
 
     
 
 
 
Issue > Jun 2008 > Cover Story
 
 
Ads by Google
 

FPGA architecture designed for efficiency


( 01 Jun 2008 )

by Amit Verma, Altera Corp.

FPGAs were traditionally developed using a 4-input look up table (LUT), where a LUT constructed from SRAM bits stores digital information (1 or 0). The digital information stored, also known as configuration memory (CRAM), was stitched together using a set of multiplexers to select a bit to drive the output for any given function based on a 4-input mapping scheme. At the time, 4-input LUTs provided the best area-delay product. Altera’s Stratix core (130nm) was based on these 4-input logic elements (LEs), as shown in Figure 1.

As process geometries began shrinking to 90nm and eventually to 65nm, the benefits of higher performance and increased density became available, but at the cost of higher power consumption in the core. In addition, with the dramatic increase in FPGA density, the critical path delays in the routing fabric increased, as the routing wires did not scale as well as transistors. Hence, the traditional method of implementing logic in an FPGA on a 4-input LUT had to be fundamentally challenged through design innovation. A new core architecture that would efficiently pack more logic per logic element, thereby delivering higher performance at lower power and ultimately lowering the overall cost, had to be created.BIGGER LOOK-UP TABLES

Technically, to create a k-input LUT (K-LUT) – a LUT that maps k input functions – 2K SRAM bits and a 2K:1 multiplexer are required to map the selected CRAM bit to the output. For example, as shown on the left side of Figure 2, a 4-input LUT is implemented using 16 CRAM bits with a 16:1 LUT adeptly made with 15 2:1 multiplexers.

Larger LUTs can be built by using the smaller LUTs and one or more multiplexers (shown on the right side of Figure 2). Similarly, a 5-LUT can be built from two 4-LUTs and a multiplexer, while a 6-LUT can be built with two 5-LUTs and a 4:1 multiplexer. The problem with these architectures, however, is the logic elements built from smaller LUTs are inefficient and result in wasted resources when implementing smaller functions with fewer k-inputs. Other inefficiencies include the replication of routing to the smaller LUTs when building a larger LUT, and the extra delays between LUTs, which result in a non-optimized logic structure.

DEVELOPING ALM

With the acquisition of Right Track CAD and the creation of the Toronto Technology Centre (TTC) in 2000, Altera brought together a senior team of FPGA architecture researchers from the academia. The TTC, in conjunction with software and IC design engineers at the company’s San Jose site, created the Altera FPGA Modeling Toolkit (FMT), which allows complete "virtual prototyping" of different FPGA architecture ideas. The FMT provides an in-depth understanding of the effects of the different aspects of FPGA architecture. Figure 3 illustrates the tradeoff between cost and delay for various LUT sizes when using FMT.

Figure 3 shows that increasing the LUT size results in a significantly lower logic delay, but with a considerable cost increase. Research has consistently showed that for k-4 inputs, the 4-LUT configuration provides the best area-delay product with minimal wasted resources (inputs, CRAM, and multiplexers). A basic 6-LUT configuration can increase performance by 15 percent, but the tradeoff is area size, which can increase by 17 percent. It is imperative to analyze the logic block, its various resources, targeted software, and overall impact on cost.

Simply increasing the LUT size to a 6-LUT from a 4-LUT or 5-LUT is highly inefficient. In Figure 4, a synthesis modeling tool was used to synthesize and pack a target logic block. It provided an excellent understanding of the effects of die area (cost). The HDL was translated into various LUT sizes.

The spread in Figure 4 shows that when utilizing a synthesis tool to optimally pack functions in a 6-LUT, the outcome is a spread of smaller LUTs that are required to implement smaller logic functions. While the use of 6-LUTs can improve performance, only a relatively small number of the LUTs use all six inputs. Costly silicon real estate (CRAM bits and multiplexers) and logic and routing resources are wasted resulting in increased costs.

During an exhaustive iterative process with over 150,000 experiments – where the requirement was to reduce levels of logic for increased performance without suffering the inefficiencies – a new criterion for designing a larger LUT became apparent. With this goal, the larger LUT had to be divided into smaller LUTs when required to reduce costs. This LUT would then be able to deliver the performance benefits with no wasted silicon, as the LUT would be divided into smaller LUTs wherever appropriate. In 2002, an adaptive LUT was optimized to share LUT masks between functions, which resulted in the final design of the adaptive logic module (as shown in Figure 5).

Figure 5 shows a representation of the ALM with 4-input/3-input LUTs and multiplexers. It illustrates how the LUT mask can be divided and shared between two different logic functions. The ALM consists of an 8-input LUT, two registers, two adders, and multiplexers providing a highly-efficient core that is adaptable to any logic design.

ADVANTAGES OF ALM

Altera’s patented LUT technology was designed into the adaptive logic module of the 90nm Stratix II FPGA, where the adaptability of the LUT was adopted to efficiently pack the most logic into the least area, while delivering the maximum performance in an FPGA core. An alternate ALM is shown in Figure 6.

The ALM can implement a 6-LUT, select 7-input functions, or be fractured into smaller LUTs to implement two independent functions. The ALM consists of two extra registers providing an optimal register-to-logic ratio (2:1) for register-rich designs, and two adders each capable of performing 2-bit additions or a single ternary adder for increased arithmetic capabilities.

Unique to the ALM is the patented LUT, which is capable of supporting a large number of modes and features an efficient logic implementation. The ALM can implement 2.5 LEs in a classic 4-LUT configuration. It can hold, on average, 1.6x more combinational logic than competitive basic 6-LUT architectures. The 1.6x factor further increases to 1.8x when considering the two registers per ALM in the densest packing mode. Figure 7 shows the different LUT configurations that a single ALM can support with minimal input sharing, and Table 1 describes each ALM configuration.

The TTC and San Jose software teams were involved in the design of the ALM that enabled Quartus II software to automatically utilize all the features of the ALM. Leveraging the benefits of the ALM, the synthesis tool can alter the distribution of LUT size to produce the right mix of large and small LUTs for the fewest ALMs and software optimizations. Shown in Figure 8 is the distribution of LUTs generated when optimizing for speed, area, and a mix of both (balanced).

Depending on the optimization goal and the design, the synthesis tool generates a mix of LUTs. When maximizing for performance, the largest number of 6-LUTs is generated. When minimizing for area, a mix of smaller LUTs is generated for efficiently mapping functions with the fewest number of ALMs.

About the author

Amit Verma is the senior high-end technical analysis staff responsible for technical product analysis, FGPA architecture and technology solutions for Altera’s high-end FPGA product lines. He holds a BSEE from Rochester Institute of Technology in New York.

Click here for the illustrations:



Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Table 1

 

 
 
 
ADVERTISEMENT
 
 
 
Ads by Google
 
OUR SPONSOR
   
   
 
 
 
   
   
     
 
 
         
     
 
Related Articles
   
Fundamental LSI technology for the age of multi-core devices
Xilinx completes 65nm Virtex-5 FPGA Family with FXT Series
Industry “first” single optical platform serving Metro to Core
DVB cores for low-cost FPGA
Single-chip laser navigation system simplify mouse design, manufacturing
Has optical technology’s 13-year victory over Moore’s Law come to an end?
Low power platform meets next-generation multimedia apps
Ethernet switch enables auto-sensing
Development tool for 8- and 32-bit microcontrollers
Audio processor for mobile phones
   
 
Business and Technology News
   
Asia’s Electronics Manufacturing Industry Shows Resilience amid Global Uncertainties
TippingPoint Wins Network Products Guide Reader Trust Award for Best in Network Access Control
Agilent Technologies' Asia Wireless Test Tour to Conclude in Hong Kong Tomorrow
Healthy Growth in Global Home Networks but Potential Issue on Horizon
Qimonda Started Volume Production of Rambus XDR DRAM for Playstation 3
Lam Research Opens Advanced Global Training Center in Taiwan
Chi Mei Optoelectronics Puts On Hold Plans to Build LCD Module Assembly Plant in Vietnam
Ericsson and STMicroelectronics to Create World Leader in Semiconductors and Platforms for Mobile Applications
Renesas Technology Singapore Roll Out Alliance Partner Program
Mentor Graphics Joins Altera’s DO-254 Global Partner Network
   
  More News >>
 
     
     
 
         
 
 
     
         
 
spacer
Country Report
spacer
   
bullet

TAIWAN: Inductor technologies are developed independently

bullet

KOREA: Inductor manufacturers are highly competitive, but scarce

bullet

CHINA: World’s high-volume producer of transformer, coil and inductor

bullet

TAIWAN: Moderate but steady growth in LED market

bullet

KOREA: LED has a bright future in our homes

  more on country report >>
   
 
spacer
Our Sponsor
spacer
   
bullet
 
   
 
     
 
     
 
spacer
Features
spacer
   
bullet

Drain current dynamic sharing of paralleled MOSFETs

bullet

Handy features of a USB current limit switch

bullet

Floating-point DSCs yield greener control systems

bullet

Solutions for LCD TV super IP applications

bullet

DSP: From ideas to implementation

  more on features >>
   
 
Distribution
   

Dealing with distributors even when there are manufacturers around

Value addition is the key in distribution

Distributors supply solutions, not just parts

Taiwan distributors compete by bolstering in-house R&D

“Nature of distribution is changing”

  more on distribution >>
   
 
     
         
 
 
     
         
 
Industry Focus
   

Ethernet adoption encourages open protocols

Managing Bluetooth profiles: A billion served

Enabling a true wireless multimedia home network

Bluetooth paves the way for truly wireless car interiors

Eliminating massive clock trees in SoC designs using GALS

  more on industry focus >>
   
 
Web Exclusives
   

Power-management solutions for telecom systems improve performance, cost, and size

Changing the network security playing field

WiMAX “personality pack” provides complete IEEE802.16 functionality

LED: A tiny light source with a bright future

SSDs: Carving a Niche in the Consumer and Enterprise Markets

  more on web exclusives >>
   
 
     
     
   
     
 
Semiconductors
   

Simulating the effect of blockers on data converter performance in wideband receivers

Decrease processor power consumption using a CPLD

Taking full advantage of new, low-power MCUs

Power train integration for 2007 and beyond: The true dawn of multi-chip modules

Wireless network options for industrial applications

  more on semiconductors >>
   
 
Field Applications
   

Test Equipment

Power Sources/Circuit Protection

Advanced Signal Processing Dramatically Improves Capability of Artificial Limbs

Voice Interface Technology for Hands-free Function in Automobiles

LXI: A Technology Leap for Test Instrumentation

  more on field applications >>
   
 
     
     
   
     
     
 
INDUSTRY LINKS
   
Photonics Association (Singapore)
bullet Singapore Industrial Automation Association (SIAA)