EE 552 2003w 2003-1-31
Lab 4: Fast Arithmetic, Pipelining, Packages, Interfacing
Please read the requirements for lab
assignments.
In this lab, you will obtain greater speed in arithmetic by through
the use of alternative number representations and pipelining.
Exercise
Area Measurement
Compile any design, such as the adder you used in the exercise of Lab 3.
Find out how many of the chip's logic cells were used by this adder, double-click
the "rpt" icon below the fitter, then search "^F Total logic". You
will find a result that looks like:
Total logic cells used:
7/1152 ( 0%)
Lab
Part A: Growth of Delay
-
Measure the minimum clock period for at least four different sizes of counters
(e.g. 32, 64, 128, 256, 512 bits) and use this to estimate the clock period
as a function of the number of bits in the counter. To conserve IO
pins, direct only the 8 most significant bits to the output. (Hand
in only calculation and function.)
Part B: Reducing Critical Paths
Carry-save arithmetic does not fully propagate carries, but rather, stores
results in a redundant format with 2 bits representing each bit of the
result. VHDL code for a carry-save adder
is provided as an example.
Design a carry-save counter. If you had used a simple binary adder,
you would tie one adder input to "00001" and connect the output through
a register to the other adder input. Using the carry-save adder,
connect sum_part1 and sum_part2 through registers to
adden2
and adden3. To avoid running out of pins, you may
want to send only the most significant bits "off-chip".
-
Again, express the minimum clock period as a function of the number of
bits in the counter.
A note for the curious, this carry-save arithmetic technique is also used
in fast multipliers.
Part C: Configuring FPGAs/CPLDs
This is the last lab period to get Part C checked off from the previous
week.
Please include the requirements in this lab report.
Part D: Pipelines
Incorporate an unsigned 8 by 8 multiplier (16 bit result) into a design
using lpm_mult.
Use lpm_mult -- read the Maxplus2 online documentation. Do
not use any EAB's (embedded array blocks = memory) or the "*" operator
in this section. Additional documentation, for the curious,
is available at http://www.edif.org/lpmweb/
Include the library with:
library lpm;
use lpm.lpm_components.all;
Your design should include lpm_mult and pipeline registers
(D-flipflops) before and after it. The data input and output of your
multiplier must be registers in order for registered-performance timing
analysis to provide a report for your entire circuit. Measure the
total number of logic cells in the design. Throughput is the maximum
clock frequency (MHz). We will define latency as the number of pipeline
register stages, not counting the first one, (alternatively, the number
of pipeline stages in lpm_mult plus one) times the minimum clock period.
-
With the pipeline generic parameter set to 0 and to 4 (1 and 5 pipeline
stages), compare the size, throughput, and latency of the two variants
of the design.
Part E: Runtimes for Behavioural vs. Post-layout Simulation
Using a stopwatch or the unix "time" command, compare how long it takes
in mentor graphics vs. maxplus2 to:
-
compile the following code
-
simulate the following code for 1000 cycles (or some smaller number if
you have a slow workstation)
If you're not convinced that behavioural simulations can save time, try
increasing counterWidth.
If it runs too slowly on your workstation, decrease counterWidth.
Either way, if you change counterWidth, also change the
second parameter to conv_unsigned().
squares.vhd