Project 1

Final Specification

final report due: Wednesday, March 18, 2009, 11:59 PM, (1/3 of points off for each week of the delay)
progress report for Phase 2 due: Wednesday, March 11, 2009, 11:59 PM, [optional, up to 3 bonus points for the working code]

 

Implement using synthesizable VHDL the following unsigned adders with carry in and carry out:

Phase 1 (up to 3 bonus points if progress report submitted by Tuesday, March 3, 11:59 PM)

A. 256-bit Carry-Lookahead Adder

B. 256-bit Multilevel Carry Select Adder based on 32-bit Ripple Carry Adders (with 32-bit adders implemented using the "+" sign in VHDL)

C. k-bit default adder obtained by using the "+" in VHDL (referred in the slides as Ripple Carry Adder or Carry-Chain Adder)

Phase 2 (up to 3 bonus points if progress report submitted by Wednesday, March 11, 11:59 PM)

        D. k-bit Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Network Adder (working at least for k=2n)

            OR

          k-bit Conditional-Sum Adder (working at least for k=2n)

        E. k-bit Carry-Skip Adder with the fixed block size b (where b is a divisor of k; b optimized for the minimum product of latency times area)

Phase 3

        F. pipelined version of the adder D (with m pipeline stages, where m optimized for the maximum throughput to area ratio).

 

Verify designs C-F first for k=16, and then synthesize and implement them for k=256.

Optimize all designs as follows:

        A, B, C and D - for minimum latency

        E - for the minimum product of latency (in ns) times area (in CLB slices)

        F - for the maximum throughput (in additions per seconds) to area (in CLB slices) ratio.

All performance measures (latency, throughput, area) should be calculated after placing and routing.
 

Bonus tasks

Task 1 (2 bonus points)

Using binary search find the minimum value of k, for which

   adder D has smaller latency than adder C.

Task 2 (2 bonus points)

Using binary search find the minimum value of k (and optimum value of the block size b), for which

   adder E has a smaller product of latency times area than adder C.

Task 3 (2 bonus points)

Using binary search find the minimum value of k (and optimum value of the number of pipeline stages m), for which

   adder F has a greater throughput to area ratio than adder C.

In Tasks 1-3, document all intermediate results obtained using binary search.

 

Design Requirements

  1. Your VHDL code for EACH adder should consists of three levels of the design hierarchy
      I. synthesizable code of an adder itself with a clearly defined adder boundary,
     II.  synthesizable test circuit with ALL inputs and outputs of an adder stored in registers in order to facilitate static timing analysis of your circuit during implementation,
     III. non-synthesizable testbench.

  2. All adder types
     - should have the same entity declaration at level I
     - share the test circuit at level II,
     - share the testbench at level III
     - use different test vector files at level III.

  3. The total numbers of inputs and outputs of your circuit at level II should be limited by the total number of i/o pins available in the smallest Xilinx Spartan 3 device capable of holding the adder (Hint: You can use, for example, 32-bit input data bus to load data to the operand registers and 32-bit output data bus to read out the contents of the output register).

  4. Dataflow description is a preferred design style for synthesizable portions of your code. Use behavioral description only if necessary (e.g., for description of flip-flops and registers).

  5. Behavioral description is a preferred design style for your testbench. Your testbench should stimulate circuit inputs using multiple representative test vectors (triggering the most critical path of a respective adder) read from a file specific to a given adder.

  6. Synthesize and implement all adders (levels I and II) for k=256 targeting

    In each case, use the smallest device from the Spartan family, for which the number of CLB slices does not exceed 80% of the total number of CLB slices. Perform static timing analysis after placing and routing, and determine the minimum clock period and critical path for all circuits.

  7. Your area of each adder should be calculated by taking the area of the circuit at levels I and II, and subtracting the approximate area of the circuit at level II (area of surrounding registers).


Deliverables (submitted using Blackboard):

1. ALL source files you have developed as a part of the project (in a separate catalog for each adder)

2. test vectors, and a short description how these test vectors were generated. Hint: You may use software (your own or public domain) to generate your test vectors. Your test vectors should be chosen in such a way to trigger the most critical paths of a respective adder.

3. waveforms demonstrating the correct operation of each circuit for test vectors triggering the most critical path of a given adder

4. full reports from static timing analysis and the textual description of the critical path in terms of notation used in lecture slides

5. table summarizing the relative performance of each of the implemented adders (for k=256) in terms of

6. two-dimensional graph showing the performance of all implemented adders (for k=256) in terms of

Hint: Use area as your X-coordinate, and latency/throughput as your Y coordinate.

7. graphs showing the results of your binary search for Tasks 1-3 (bonus)

8.  conclusions summarizing your recommendations regarding the choice of the best adder for the given optimization goal and operand size.

9. list of encountered problems & difficulties, and unexplained behavior of your designs or design tools.