Coding Pipeline in VHDL – Part 2

In this article, I will show you how to design and implement a piped multiplier.

Suppose we want to multiply 2 numbers: A*B, where A is 4 bit and B is 2 bit:
A=a0a1a2a3, B=b0b1.

breaking the problem to parts:
First, let’s take a look at how the multiplication process looks like:

mul
image source: web.mit.edu/6.111/www/f2008/handouts/L09.pdf

before start even to think how to build that awful thing above, let’s first build a basic unit that know to multiply 2 bits(i.e A0*B0).
Moreover, we should also consider the case of having to multiply 2 bits multiple times and adding up the sum of the partial products in order to calculate the total product A*B.
Therefore, we should involve some sort of adder to the basic multiply unit.

Taking a quick look at the truth table of multiplying 2 bits, it is quite obvious that the process could be achieved using a single AND gate:

image source: www.circuitstoday.com
image source: http://www.circuitstoday.com

Right now we continue the whole block by adding a Full Adder:

X,Y are the 2 bits, Sum in is the partial product of the other 2 bits, the result of the full adder is another partial product to be the Sum in input for the next MUL1X1

Since we want to multiply 2 numbers one is 4 bit and the other is 2 bit, therefore we have 8 partial products and we will want to use 8 units of the MUL1X1 above.

The final circuit and the connections should look something like this:

Like in Part 1, the distribution of the registers is determined by the red lines. pay attention that the carry out of the last MUL1X1 is an output S5. Also see how the output of the upper row of  MUL1X1 is the  Sum_in input for the bellow MUL1X1 row.
Like in Part 1, the distribution of the registers is determined by the red lines. pay attention that the carry out of the last MUL1X1 is an output S5. Also see how the output of the upper row of MUL1X1 is the Sum_in input for the bellow MUL1X1 row.(Click the picture to enlarge)

Writing the Code:
using the same method as in part 1, we start by determining the signals we have, no doubt that here we will have a bunch more than we did in part 1!
In order to simplify the code, we will start first by coding the MUL1X1:

MUL1X1 – code:
*note that in the code bellow I used the Full Adder that we already implemented in part 1!

library ieee;
use ieee.std_logic_1164.all;

entity Mul is
 port (
 A : in std_logic;
 B : in std_logic;
 Sum_in : in std_logic;
 Ci : in std_logic; -- Carry in
 Co : out std_logic;-- Carry out
 Z : out std_logic;
 
 Ao : out std_logic; -- This output is equal to input A 
 Bo : out std_logic -- This output is equal to input B


 );
end entity;

architecture behavior of Mul is

component FA
 port (
 A : in std_logic;-- first number
 B : in std_logic;-- Second number
 Cin : in std_logic;-- Carry in
 Cout : out std_logic;-- Carry out
 S : out std_logic -- Sum

 );
end component;
signal result: std_logic;
begin
 result <= A and B;
 FA1 : FA port map (A=>result,B=>Sum_in,Cin=> Ci,Cout=>Co,S=>Z);
 Ao <= A;
 Bo <= B;
end architecture;

ALL pieces of the code :
Since we use registers for the pipeline, we will want to add the component of Dff which we already designed and coded in part 1.
As promised, the code includes way more signals than the example in part one, thus the code is made too long, therefore, I am going to put it all in the following code block bellow, but please pay close attention to the following things:
1)the signal types and length, and compare it with the diagram of the circuit
2)the use of component
3) the port map of the MUL1X1 units: the input and outputs
4)how the data pass between the registers until we see it as an output(the output of a register is the input of the next register on the same signal).

library ieee;
use ieee.std_logic_1164.all;

-- defining a register(normal DFF - 1 bit):
entity reg is
 port(
 D : in std_logic;
 CLK : in std_logic;
 RSTn : in std_logic;
 Q : out std_logic
 );
end entity;
architecture behavior of reg is
begin
 process(CLK, RSTn)
 begin
 if RSTn = '0' then Q <= '0';
 elsif rising_edge(CLK)then Q <= D;
 end if; 
 end process;
end architecture;
---------------------end of register--------------------------------------------------
library ieee;
use ieee.std_logic_1164.all;
entity PipedMul is
 port (
 A : in std_logic_vector (3 downto 0); --First number
 B : in std_logic_vector (1 downto 0); --Second number
 CLK : in std_logic; -- Clock (Active high)
 RSTn : in std_logic; -- Reset (Active low)
 Cin : in std_logic; -- Carry in
 Cout : out std_logic;-- Carry out
 S : out std_logic_vector (5 downto 0) -- Sum
 );
end entity;

architecture behavior of PipedMul is
component Mul port(A,B,Sum_in,Ci : in std_logic;
 Co,Z,Ao,Bo : out std_logic);
end component;
component reg port(D,CLK,RSTn : in std_logic;
 Q : out std_logic);
end component;
--defining alot of signsls:
signal tmp_cout : std_logic_vector(3 downto 0);
signal tmp_cin : std_logic_vector(2 downto 0);
--MUL1:
signal out1: std_logic;
signal bo_1,ao_1:std_logic;
--MUL2:
signal tmp_A2 : std_logic;
signal tmp_B2 : std_logic;
signal sum_in_2 : std_logic;
signal out2:std_logic;
signal ao_2,bo_2:std_logic;
--MUL3:
signal tmp_A3 : std_logic_vector(1 downto 0);
signal tmp_B3 : std_logic_vector(1 downto 0);
signal sum_in_3 : std_logic_vector(1 downto 0);
signal out3: std_logic;
signal ao_3,bo_3:std_logic;
--MUL4:
signal tmp_A4 : std_logic_vector(2 downto 0);
signal tmp_B4 : std_logic_vector(2 downto 0);
signal sum_in_4 : std_logic_vector(2 downto 0);
signal out4 : std_logic;
signal ao_4,bo_4:std_logic;
--MUL5:
signal tmp_A5 : std_logic_vector(3 downto 0);
signal tmp_B5 : std_logic_vector(2 downto 0);
signal sum_in_5 : std_logic_vector(2 downto 0);
signal cout_5 :std_logic;
signal out5:std_logic;
signal ao_5,bo_5 : std_logic;
--MUL6:
signal tmp_A6 : std_logic_vector(2 downto 0);
signal tmp_B6 : std_logic_vector(1 downto 0);
signal sum_in_6 : std_logic_vector(1 downto 0);
signal cout_6 :std_logic;
signal out6: std_logic;
signal ao_6,bo_6 : std_logic;
--MUL7:
signal tmp_A7 : std_logic_vector(1 downto 0);
signal tmp_B7 : std_logic;
signal sum_in_7 : std_logic;
signal cout_7 :std_logic;
signal out7:std_logic;
signal ao_7,bo_7 : std_logic;
--MUL8:
signal tmp_A8 : std_logic;
signal tmp_B8 : std_logic;
--signal sum_in_8 : std_logic;
signal cout_8 :std_logic;
signal out8:std_logic;
signal ao_8,bo_8 : std_logic;
--output bits signals:
signal tmp_z0 : std_logic_vector(3 downto 0);
------------------------------------end of signal definitions-------------------------------
begin
-- defining the 8 multiplier units used.
Mul_1: Mul port map (A=>A(0),B=>B(0),Sum_in=>'0',Ci=>Cin,Co=>tmp_cout(0),Z=>out1,Ao=>ao_1,Bo=>bo_1); 
Mul_2: Mul port map (A=>tmp_A2,B=>tmp_B2,Sum_in=>sum_in_2,Ci=>tmp_cin(0),Co=>tmp_cout(1),Z=>out2,Ao=>ao_2,Bo=>bo_2); 
Mul_3: Mul port map (A=>tmp_A3(1),B=>tmp_B3(1),Sum_in=>sum_in_3(1),Ci=>tmp_cin(1),Co=>tmp_cout(2),Z=>out3,Ao=>ao_3,Bo=>bo_3); 
Mul_4: Mul port map (A=>tmp_A4(2),B=>tmp_B4(2),Sum_in=>sum_in_4(2),Ci=>tmp_cin(2),Co=>tmp_cout(3),Z=>out4,Ao=>ao_4,Bo=>bo_4); 
Mul_5: Mul port map (A=>tmp_A5(3),B=>tmp_B5(2),Sum_in=>sum_in_5(2),Ci=>'0',Co=>cout_5,Z=>out5,Ao=>ao_5,Bo=>bo_5); 
Mul_6: Mul port map (A=>tmp_A6(2),B=>tmp_B6(1),Sum_in=>sum_in_6(1),Ci=>cout_5,Co=>cout_6,Z=>out6,Ao=>ao_6,Bo=>bo_6); 
Mul_7: Mul port map (A=>tmp_A7(1),B=>tmp_B7,Sum_in=>sum_in_7,Ci=>cout_6,Co=>cout_7,Z=>out7,Ao=>ao_7,Bo=>bo_7); 
Mul_8: Mul port map (A=>tmp_A8,B=>tmp_B8,Sum_in=>'0',Ci=>cout_7,Co=>cout_8,Z=>out8,Ao=>ao_8,Bo=>bo_8); 

--first bit output registers:
reg_z0_0: reg port map(out1,CLK,RSTn,tmp_z0(0));
reg_z0_1: reg port map(tmp_z0(0),CLK,RSTn,tmp_z0(1));
reg_z0_2: reg port map(tmp_z0(1),CLK,RSTn,tmp_z0(2));
reg_z0_3: reg port map(tmp_z0(2),CLK,RSTn,tmp_z0(3));
reg_z0_4: reg port map(tmp_z0(3),CLK,RSTn,S(0));
--other bit outputs
reg_z1_1: reg port map(out5,CLK,RSTn,S(1));
reg_z2_1: reg port map(out6,CLK,RSTn,S(2));
reg_z3_1: reg port map(out7,CLK,RSTn,S(3));
reg_z4_1: reg port map(out8,CLK,RSTn,S(4));
reg_z5_1: reg port map(cout_8,CLK,RSTn,S(5));
--Cin/Cout:
reg_cout0: reg port map(tmp_cout(0),CLK,RSTn,tmp_cin(0));
reg_cout1: reg port map(tmp_cout(1),CLK,RSTn,tmp_cin(1));
reg_cout2: reg port map(tmp_cout(2),CLK,RSTn,tmp_cin(2));
reg_cout3: reg port map(tmp_cout(3),CLK,RSTn,Cout);

--A2,B2,sum_in_2 registers:
reg_A2: reg port map(A(1),CLK,RSTn,tmp_A2);
reg_B2: reg port map(B(0),CLK,RSTn,tmp_B2);
reg_sum_in_2: reg port map('0',CLK,RSTn,sum_in_2);
--A3,B3,sum_in_3 registers:
reg_A3_0: reg port map(A(2),CLK,RSTn,tmp_A3(0));
reg_A3_1: reg port map(tmp_A3(0),CLK,RSTn,tmp_A3(1));
reg_B3_0: reg port map(B(0),CLK,RSTn,tmp_B3(0));
reg_B3_1: reg port map(tmp_B3(0),CLK,RSTn,tmp_B3(1));
reg_sum_in_3_0: reg port map('0',CLK,RSTn,sum_in_3(0));
reg_sum_in_3_1: reg port map(sum_in_3(0),CLK,RSTn,sum_in_3(1));
--A4,B4,sum_in_4 registers:
reg_A4_0: reg port map(A(3),CLK,RSTn,tmp_A4(0));
reg_A4_1: reg port map(tmp_A4(0),CLK,RSTn,tmp_A4(1));
reg_A4_2: reg port map(tmp_A4(1),CLK,RSTn,tmp_A4(2));

reg_B4_0: reg port map(B(0),CLK,RSTn,tmp_B4(0));
reg_B4_1: reg port map(tmp_B4(0),CLK,RSTn,tmp_B4(1));
reg_B4_2: reg port map(tmp_B4(1),CLK,RSTn,tmp_B4(2));

reg_sum_in_4_0: reg port map('0',CLK,RSTn,sum_in_4(0));
reg_sum_in_4_1: reg port map(sum_in_4(0),CLK,RSTn,sum_in_4(1));
reg_sum_in_4_2: reg port map(sum_in_4(1),CLK,RSTn,sum_in_4(2));
--A5,B5,sum_in_5 registers:
reg_A5_0: reg port map(A(0),CLK,RSTn,tmp_A5(0));
reg_A5_1: reg port map(tmp_A5(0),CLK,RSTn,tmp_A5(1));
reg_A5_2: reg port map(tmp_A5(1),CLK,RSTn,tmp_A5(2));
reg_A5_3: reg port map(tmp_A5(2),CLK,RSTn,tmp_A5(3));

reg_B5_0: reg port map(B(1),CLK,RSTn,tmp_B5(0));
reg_B5_1: reg port map(tmp_B5(0),CLK,RSTn,tmp_B5(1));
reg_B5_2: reg port map(tmp_B5(1),CLK,RSTn,tmp_B5(2));

reg_sum_in_5_0: reg port map(out2,CLK,RSTn,sum_in_5(0));
reg_sum_in_5_1: reg port map(sum_in_5(0),CLK,RSTn,sum_in_5(1));
reg_sum_in_5_2: reg port map(sum_in_5(1),CLK,RSTn,sum_in_5(2));
--A6,B6 registers:
reg_A6_0: reg port map(ao_2,CLK,RSTn,tmp_A6(0));
reg_A6_1: reg port map(tmp_A6(0),CLK,RSTn,tmp_A6(1));
reg_A6_2: reg port map(tmp_A6(1),CLK,RSTn,tmp_A6(2));
reg_B6_0: reg port map(B(1),CLK,RSTn,tmp_B6(0));
reg_B6_1: reg port map(tmp_B6(0),CLK,RSTn,tmp_B6(1));
reg_sum_in_6_0: reg port map(out3,CLK,RSTn,sum_in_6(0));
reg_sum_in_6_1: reg port map(sum_in_6(0),CLK,RSTn,sum_in_6(1));
--A7,B7
reg_A7_0: reg port map(A(2),CLK,RSTn,tmp_A7(0));
reg_A7_1: reg port map(tmp_A7(0),CLK,RSTn,tmp_A7(1));
reg_B7_0: reg port map(B(1),CLK,RSTn,tmp_B7);
reg_sum_in_7_0: reg port map(out4,CLK,RSTn,sum_in_7);
--A8,B8
reg_A8_0: reg port map(ao_4,CLK,RSTn,tmp_A8);
reg_B8_0: reg port map(B(1),CLK,RSTn,tmp_B8);

end architecture;

 Wave form:

Values of the signals appear on the left side, index (0) is the LSB
Values of the signals appear on the left side, index (0) is the LSB. Also notice the red lines on the outPut signals, that is because of the pipelining, we cannot achieve the output immediately but after few clock cycles(click picture to enlarge.)

 

2 thoughts on “Coding Pipeline in VHDL – Part 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: