#### CS152 – Computer Architecture and Engineering

Lecture 3 - Field Programmable Gate Arrays

2003-09-02

Dave Patterson (www.cs.berkeley.edu/~patterson)

www-inst.eecs.berkeley.edu/~cs152/



Patterson Fall 2003 (

#### Review: Verilog

- Verilog allows both structural and behavioral descriptions, helpful in testing
- ° Some special features only in Hardware Description Languages
  - •# time delay, nonblocking assignments, initial vs. always, forever loops
- ° Syntax a mixture of C (operators, for, while, if, print) and Ada (begin... end, case...endcase, module ...endmodule)
- °Verilog can describe everything from single gate to full computer system; you get to design a simple processor

S 152 L03 FPGA (2)

Patterson Fall 2003 © LICE

#### Multiple Review

- ° Multiply: successive refinement to see final design
  - 1st iteration:
  - 64-bit Adder,
  - 64-bit Multiplicand shift register,
  - 32-bit Multiplier shit register,
  - 64-bit Product register
  - 3rd iteration:
  - 32-bit Adder,
  - 64-bit Product/Mutiplier shift register,
  - 32-bit Multiplicand Register
  - There are algorithms that calculate many bits of multiply per cycle (see exercises 4.36 to 4.39 in COD)



CS 152 L03 FPGA (

Patterson Fall 2003 © UC

#### Outline

- °FPGAs Overview
- °Why use FPGAs? (a short history lesson).
- °FPGA variations
- ° Internal logic blocks.
- Designing with FPGAs.
- ° Specifics of Xilinx Virtex-E series.



CS 152 L03 FF

atterson Fall 2003 ©

#### **FPGA Overview**

- Basic idea: 2D array of combination logic blocks (CL) and flip-flops (FF) with a means for the user to configure both:
  - 1. the interconnection between the logic blocks,
  - 2. the function of each block.



#### Why FPGAs? (1 / 5)

- By the early 1980's most of logic circuits in typical systems were absorbed by a handful of standard large scale integrated circuits (LSI ICs).
  - Microprocessors, bus/IO controllers, system timers, ...
- Every system still needed random small "glue logic" ICs to help connect the large ICs:
  - generating global control signals (for resets etc.)
  - · data formatting (serial to parallel, multiplexing, etc.)
- Systems had a few LSI components and lots of small low density SSI (small scale IC) and MSI (medium scale IC) components.



Printed Circuit (PC) board with many small SSI and MSI ICs and a few LSI ICs

Patterson Fall 2003 © U

#### Why FPGAs? (2 / 5) Custom ICs sometimes designed to replace glue logic: · reduced complexity/manufacturing cost, improved performance · But custom ICs expensive to develop, and delay introduction of product ("time to market") because of increased design time Note: need to worry about two kinds of costs: 1. cost of development, "Non-Recurring Engineering (NRE)", fixed 2. cost of manufacture per unit, variable Usually tradeoff between NRE cost and manufacturing costs Units manufactured

#### Why FPGAs? (3 / 5)

- Therefore custom IC approach was only viable for products with very high volume (where NRE could be amortized), and not sensitive in time to market (TTM)
- FPGAs introduced as alternative to custom ICs for implementing glue logic:
  - improved PC board density vs. discrete SSI/MSI components (within around 10x of custom ICs)
  - computer aided design (CAD) tools meant circuits could be implemented quickly (no physical layout process, no mask making, no IC manufacturing), relative to Application Specific ICs (ASICs)
    (3-6 months for these steps for custom IC)

    - lowers NREs (Non Recurring Engineering)
    - shortens TTM (Time To Market)



#### Why FPGAs? (4 / 5)

- FPGAs continue to compete with custom ICs for special processing functions (and glue logic) but now try to compete with microprocessors in dedicated and embedded applications
  - · Performance advantage over microprocessors because circuits can be customized for the task at hand. Microprocessors must provide special functions in software (many cycles)
- MICRO: Highest NRE, SW: fastest TTM
- ASIC: Highest performance, worst TTM



#### Why FPGAs? (5 / 5)

- °As Moore's Law continues, FPGAs work for more applications as both can do more logic in 1 chip and faster
- °Can easily be "patched" vs. ASICs
- ° Perfect for courses:
  - · Can change design repeatedly
  - · Low TTM yet reasonable speed
- "With Moore's Law, now can do full CS 152 project easily inside 1 FPGA



### Administrivia °Prerequisite Quiz Results °Lab 1 due tomorrow °How many bought \$37 PRS Transmitor ? from behind ASUC textbook desk (Chem 1A, CS 61ABC, 160) Can sell back to bookstore













## Given basic idea LUT built from RAM Latches connected as shift register What other functions could be provided at very little extra cost? Using CLB latches as little RAM vs. logic Using CLB latches as shift register vs. logic

More functionality for "free"?















#### Peer Instruction

- On How would you place ASIC, FPGA, and Microprocessors+software from best to worst?
  - · Performance?
  - Non Recurring Engineering?
  - Unit cost?
  - Time To Market?
  - 1. ASIC, FPGA, MICRO
  - 2. ASIC, MICRO, FPGA
  - 3. FPGA, ASIC, MICRO
  - 4. FPGA, MICRO, ASIC
  - 5. MICRO, ASIC, FPGA
  - 6. MICRO, FPGA, ASIC

6. MICRO, FPGA, AS

atterson Fall 2003 ©











#### Virtex-E Block RAM

- °Flexible 4096-bit block... Variable aspect ratio
  - 4096 x 1
  - 2048 x 2
  - 1024 x 4
  - 512 x 8
  - · 256 x 16
- °Increase memory depth or width by cascading blocks



atterson Fall 2003 © U



# \*\*Have faster internal clock relative to external clock source \*\*Use 1 DLL for 2x multiplication \*\*Combine 2 DLLs for 4x multiplication \*\*Reduce board EMI \*\*Reduce board EMI \*\*Reduce board EMI endow-frequency clock externally and multiply clock on-chip



#### Clock Management Summary

- °All digital DLL Implementation
  - Input noise rejection
  - 50/50 duty cycle correction
- °Clock mirror provides system clock distribution
- Multiply input clock by 2x or 4x
- °Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16
- ° De-skew clock for fast setup, hold, or clock-to-out times



Patterson Fall 2003 © UCB

#### Virtex-E Family of Parts Table 1: Virtex-E Field-Pr CLB Logic Gates Logic Cells User Device I/O Pairs Bits RAM Bits XCV50E 71.693 20.736 16 x 24 1.728 65 536 24.576 XCV100E 128:236 32.400 20 x 30 2.700 83 196 81.920 38:400 306,393 284 XCV200E 63,504 114.688 XCV300E 411,955 82.944 32 x 48 6.912 316 131.072 98,304 163.840 512 221,184 186,624 419,904 72 x 108 34,992 724 589,824 804 4,074,387 876,096 804 851,968 1,038,336

#### Summary: Xilinx FPGAs

- ° How they differ from idealized array:
  - In addition to their use as general logic "gates", LUTs can alternatively be used as general purpose RAM or shift register
    - Each 4-LUT can become a 16x1-bit RAM array
  - Special circuitry to speed up "ripple carry" in adders and counters
    - Therefore adders assembled by the CAD tools operate much faster than adders built from gates and LUTs alone.
  - Many more wires, including tri-state capabilities.



Patterson Fall 2003 © U









