Project 1

due Tuesday, March 25, 7:20 pm Friday, March 28, 12:00 noon (NO LATE PROJECTS ACCEPTED)

submission using WebCT

 

REFERENCE MATERIAL

There are two papers that may be helpful to understand conditional-sum adders. One is the original paper on conditional-sum adders and another is a paper that is more recent but which has a helpful explanatory figures, Fig. 1 and Fig. 2 in the paper. The papers are:

Sklansky: Conditional-Sum
Cho: Conditional-Sum

In addition, two web resources may be helpful:

Arithmetic Simulator by Koren.
Click on "Conditional-Sum Adder". You can do an 8-bit example by setting A=10100110, B=00111001, Total Bits=8, Inidividual Adder Size = 1 1 1 1 1 1 1 1 (include the spaces), and the leaving the remaining settings as default. Press "Done" and wait a few seconds.

8-bit Conditional-Sum Demonstrator
Put some input data into Ai and Bi, then push "GO" only once and watch the program run. The simulation will time-step through the steps of the conditional sum adder.

A student in class also pointed out another paper on conditional sum:

Han: Conditional-Sum

TIP: When doing the VHDL code for conditional-sum, you should be able to use FOR/GENERATE loops to help with the implementation.

PART 1 - Minimum Latency Adders (Combinational Logic Only)

Implement, compare and contrast the FIVE adders below:

A. Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Network Adder

B. Conditional-Sum Adder

C. Ripple-Carry Adder (using sum = x xor y xor cin, cout = y.cin + x.cin + x.y for full-adders)

D. Carry-Lookahead Adder (using 4-bit CLA modules)

E. Default Adder using "+" sign

For each adder type, develop a combinational carry-propagate adder optimized for the minimum latency (i.e. no pipeline registers, only combinational logic).

Make an effort to write the codes and perform analysis in a generic way, independent of an operand size, k (for k sizes that "make sense" for the above architectures) .

Nevertheless, make sure that your codes support and conclusions are true for at least the following operand size:

  • k = 256 bits

Assume that each adder is a part of a bigger system located on the same chip. As a result, all operands are generated on the chip, and all results are consumed on the chip, without the need of crossing a boundary of an integrated circuit.

Implement your adders using synthesizable RTL VHDL code. Perform synthesis, implementation, and static timing analysis for Xilinx Spartan 3 FPGAs.
 

Design Requirements

  1. Your VHDL code for EACH adder should consists of three levels of the design hierarchy
      I. synthesizable code of an adder itself with a clearly defined adder boundary,
     II.  synthesizable test circuit with ALL inputs and outputs of an adder stored in registers in order to facilitate static timing analysis of your circuit during implementation,
     III. non-synthesizable testbench.

  2. The adder types should
     - have the same entity declaration at level I
     - use the same test circuit at level II
     - use the same testbench at level III
     - use the same test vector files at level III.

  3. The total numbers of inputs and outputs of your circuit at level II should be limited by the total number of i/o pins available in the smallest Xilinx Spartan 3 device capable of holding the adder (Hint: You can use at level II, for example, a 32-bit input data bus to load data to the operand registers and 32-bit output data bus to read out the contents of the output register).

  4. Dataflow description is a preferred design style for synthesizable portions of your code. Use behavioral description only if necessary (e.g., for description of flip-flops and registers).

  5. Behavioral description is a preferred design style for your testbenches. Your testbenches should stimulate circuit inputs using multiple representative test vectors read from a file common for all adder types.

 

Deliverables (submitted through WebCT):

Create a folder called "part_1" which you will submit in WebCT. In this folder, have five subfolders "adder_a", "adder_b", ..., "adder_e". In each subfolder, include the following:

1. ALL source files for that adder architecture. You can copy and paste the shared test circuit and test bench separately into each subfolder.

2. Input test vector file: adder_input.txt. This should list an input x, and input y, a carry-in cin, an expected output sum, and an expected carryout cout. The file should have at least 32 lines of (somewhat) random test vectors, one test vector per line. Hint: You may use software (your own or public domain) to generate your test vectors. Your test vectors should be chosen in such a way to try to trigger the most critical paths of all implemented adders.

3. Output test vector file: adder_output.txt. The contents of this file should list an expected output sum, an expected carryout cout, the actual output sum, and the actual carryout. There should be 32 lines of test vectors. This files should be produced post place-and-route.

4. Synthesis report.

5. Implementation report.

6. Static timing analysis report.

7. Waveforms demonstrating correction operation of circuit pre-synthesis. These should be in .awf or .xlf format.

8. Waveforms demonstrating correction operation of circuit post-place and route. These should be in .awf or .xlf format.


In the "part_1" folder include a project report. The report should contain:

1. A table summarizing the relative performance of each of the implemented adders in terms of:
  • device used
  • minimum latency in nanoseconds
  • maximum clock frequency (i.e. 1/minimum latency)
  • speed up in terms of latency compared to ripple carry adder
  • area (in CLB slices) after place-and-route
  • area increase compared to ripple carry adder
  • maximum throughput (number of additions per second)
  • lines of VHDL code at level I
  • product: latency * area
  • ratio: throughput / area
2. A two-dimensional graph showing the performance of all adders in terms of area versus latency. Put area as the X-coordinate and latency as the Y-coordinate.

3. Conclusions summarizing your recommendations regarding the choice of the best adder.

4. Any problems or issues you encountered with your designs or design tools.



PART 2 - Pipelined Adders (Combinational Logic and Pipeline Registers)

Now we will pipeline a few of the adders to increase throughput (at the expense of extra area). Design pipelined adder architectures for the following adders. Place a pipeline register at each natural "stage" of the adder. You can feel free to use between 4 to 10 pipeline stages, whatever is most efficient for your design. Implement, compare and contrast the THREE TWO adders below:

A. Hybrid Brent-Kung/Kogge-Stone Parallel Prefix Network Adder

B. Conditional-Sum Adder

C. Ripple-Carry Adder (using sum = x xor y xor cin, cout = y.cin + x.cin + x.y for full-adders) Update: you do not need to implement the pipelined ripple-carry adder (you still need to implement the non-pipelined version).

For each adder type, develop a pipelined carry-propagate adder optimized for maximum throughput.

Make an effort to write the codes and perform analysis in a generic way, independent of an operand size, k (for k sizes that "make sense" for the above architectures) .

Nevertheless, make sure that your codes support and conclusions are true for at least the following operand size:

  • k = 256 bits

Assume that each adder is a part of a bigger system located on the same chip. As a result, all operands are generated on the chip, and all results are consumed on the chip, without the need of crossing a boundary of an integrated circuit.

Implement your adders using synthesizable RTL VHDL code. Perform synthesis, implementation, and static timing analysis for Xilinx Spartan 3 FPGAs.
 

Design Requirements

  1. Your VHDL code for EACH adder should consists of three levels of the design hierarchy
      I. synthesizable code of an adder itself with a clearly defined adder boundary,
     II.  synthesizable test circuit with ALL inputs and outputs of an adder stored in registers in order to facilitate static timing analysis of your circuit during implementation,
     III. non-synthesizable testbench.

  2. The adder types should
     - have the same entity declaration at level I
     - use the same test circuit at level II
     - use the same testbench at level III (with slight modification due to cycle counts)
     - use the same test vector files at level III.

  3. The total numbers of inputs and outputs of your circuit at level II should be limited by the total number of i/o pins available in the smallest Xilinx Spartan 3 device capable of holding the adder (Hint: You can use at level II, for example, a 32-bit input data bus to load data to the operand registers and 32-bit output data bus to read out the contents of the output register).

  4. Dataflow description is a preferred design style for synthesizable portions of your code. Use behavioral description only if necessary (e.g., for description of flip-flops and registers).

  5. Behavioral description is a preferred design style for your testbenches. Your testbenches should stimulate circuit inputs using multiple representative test vectors read from a file common for all adder types.

 

Deliverables (submitted through WebCT):

Create a folder called "part_2" which you will submit in WebCT. In this folder, have three two subfolders "adder_a", "adder_b", and "adder_c". In each subfolder, include the following:

1. ALL source files for that adder architecture. You can copy and paste the shared test circuit and test bench separately into each subfolder.

2. Input test vector file: adder_input.txt. This should list an input x, and input y, a carry-in cin, an expected output sum, and an expected carryout cout. The file should have at least 32 lines of (somewhat) random test vectors, one test vector per line. Hint: You may use software (your own or public domain) to generate your test vectors. Your test vectors should be chosen in such a way to try to trigger the most critical paths of all implemented adders. This can be the same as for part 1.

3. Output test vector file: adder_output.txt. The contents of this file should list an expected output sum, an expected carryout cout, the actual output sum, and the actual carryout. There should be 32 lines of test vectors. This files should be produced post place-and-route.

4. Synthesis report.

5. Implementation report.

6. Static timing analysis report.

7. Waveforms demonstrating correction operation of circuit pre-synthesis. These should be in .awf or .xlf format.

8. Waveforms demonstrating correction operation of circuit post-place and route. These should be in .awf or .xlf format.


In the "part_2" folder include a project report. The report should contain:

1. A table summarizing the relative performance of each of the implemented adders in terms of:
  • device used
  • minimum latency in clock cycles
  • minimum latency in nanoseconds
  • maximum clock frequency (i.e. 1/minimum latency)
  • maximum throughput (number of additions per second)
  • speed up in maximum throughput to pipelined ripple carry adder
  • area (in CLB slices) after place-and-route
  • area increase compared to pipelined ripple carry adder
  • lines of VHDL code at level I
  • product: latency * area
  • ratio: throughput / area
2. A two-dimensional graph showing the performance of all adders in terms of area versus throughput. Put area as the X-coordinate and throughput as the Y-coordinate.

3. Conclusions summarizing your recommendations regarding the choice of the best pipelined adder.

4. Any problems or issues you encountered with your designs or design tools.