Implement using synthesizable VHDL the following unsigned adders with carry in and carry out:
Phase 1 (up to 3 bonus points if progress report submitted by Tuesday, March 3, 11:59 PM)
A. 256-bit Carry-Lookahead Adder
B. 256-bit Multilevel Carry Select Adder based on 32-bit Ripple Carry Adders (with 32-bit adders implemented using the "+" sign in VHDL)
C. k-bit default adder obtained by using the "+" in VHDL (referred in the slides as Ripple Carry Adder or Carry-Chain Adder)
Phase 2 (up to 3 bonus points if progress report submitted by Wednesday, March 11, 11:59 PM)
D. k-bit Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Network Adder (working at least for k=2n)
OR
k-bit Conditional-Sum Adder (working at least for k=2n)
E. k-bit Carry-Skip Adder with the fixed block size b (where b is a divisor of k; b optimized for the minimum product of latency times area)
Phase 3
F. pipelined version of the adder D (with m pipeline stages, where m optimized for the maximum throughput to area ratio).
Verify designs C-F first for k=16, and then synthesize and implement them for k=256.
Optimize all designs as follows:
A, B, C and D - for minimum latency
E - for the minimum product of latency (in ns) times area (in CLB slices)
F - for the maximum throughput (in additions per seconds) to area (in CLB slices) ratio.
All performance
measures (latency, throughput, area) should be calculated after placing and
routing.
Bonus tasks
Task 1 (2 bonus points)
Using binary search find the minimum value of k, for which
adder D has smaller latency than adder C.
Task 2 (2 bonus points)
Using binary search find the minimum value of k (and optimum value of the block size b), for which
adder E has a smaller product of latency times area than adder C.
Task 3 (2 bonus points)
Using binary search find the minimum value of k (and optimum value of the number of pipeline stages m), for which
adder F has a greater throughput to area ratio than adder C.
In Tasks 1-3, document all intermediate results obtained using binary search.
Design Requirements
Your VHDL code for
EACH adder should consists of three
levels of the design hierarchy
I. synthesizable code of an adder itself with a clearly defined adder
boundary,
II. synthesizable test circuit with ALL inputs and outputs of an adder
stored in registers in order to facilitate static timing
analysis of your circuit during implementation,
III. non-synthesizable testbench.
All
adder types
- should have the same entity declaration at level I
- share the test circuit at level II,
- share the testbench at level III
- use different test vector files at level III.
The total numbers of inputs and outputs of your circuit at level II should be limited by the total number of i/o pins available in the smallest Xilinx Spartan 3 device capable of holding the adder (Hint: You can use, for example, 32-bit input data bus to load data to the operand registers and 32-bit output data bus to read out the contents of the output register).
Dataflow description is a preferred design style for synthesizable portions of your code. Use behavioral description only if necessary (e.g., for description of flip-flops and registers).
Behavioral description is a preferred design style for your testbench. Your testbench should stimulate circuit inputs using multiple representative test vectors (triggering the most critical path of a respective adder) read from a file specific to a given adder.
Synthesize and implement all adders (levels I and II) for k=256 targeting
Xilinx Spartan 3 FPGAs.
In each case, use the smallest device from the Spartan family, for which the number of CLB slices does not exceed 80% of the total number of CLB slices. Perform static timing analysis after placing and routing, and determine the minimum clock period and critical path for all circuits.
Your area of each adder should be calculated by taking the area of the circuit at levels I and II, and subtracting the approximate area of the circuit at level II (area of surrounding registers).
Deliverables
(submitted using Blackboard):
1. ALL source files you have developed as a part of the project (in a separate catalog for each adder)
2. test vectors, and a short description how these test vectors were generated. Hint: You may use software (your own or public domain) to generate your test vectors. Your test vectors should be chosen in such a way to trigger the most critical paths of a respective adder.
3. waveforms demonstrating the correct operation of each circuit for test vectors triggering the most critical path of a given adder
4. full reports from static timing analysis and the textual description of the critical path in terms of notation used in lecture slides
5. table summarizing the relative performance of each of the implemented adders (for k=256) in terms of
minimum latency
speed up in terms of latency compared to an adder implemented using a "+" sign
area (in CLB slices)
area increase vs. an adder implemented using a "+" sign
maximum throughput (in the number of additions per second)
product latency * area
ratio throughput / area
number of lines of VHDL code at level I
6. two-dimensional graph showing the performance of all implemented adders (for k=256) in terms of
latency vs. area
throughput vs. area
Hint: Use area as your X-coordinate, and latency/throughput as your Y coordinate.
7. graphs showing the results of your binary search for Tasks 1-3 (bonus)
Task 1: latency as a function of k for adders D and C
Task 2: product latency times area as a function of k for adders E and C
Task 3: throughput to area ratio as a function of k for adders F and C
8. conclusions summarizing your recommendations regarding the choice of the best adder for the given optimization goal and operand size.
9. list of encountered problems & difficulties, and unexplained behavior of your designs or design tools.