Date: Sat, 31 Jul 2004 12:51:38 -0700
From: Navtej Sadhal <cs61c-td@torus.CS.Berkeley.EDU>
Newsgroups: ucb.class.cs61c
Subject: IMPORTANT PROJ UPDATE!! (READ!)
                                                                                     
Please read this entire message. This is very important!
Read every word! Don't skim! READ IT!
                                                                                     
The data memory module supplied to you in blocks.v was originally designed
for the single cycle processor, therefore it has a synchronous write and
an asynchronous read. This is not ideal for the pipelined processor as
you may already have found out. Two generally accepted solutions for data
memory for pipelined processors are either to have writes occur on the
negative edge of the clock or to have both reads AND writes be
synchronous. Rather than setup the project for either of these solutions,
we opted to use the existing data memory with its strange behavior in
order to get you to understand asynchronous vs. synchronous state
elements and how they behave relative to the clock.
                                                                               
As a result, the solution we had you pursue was to bypass the pipeline
register in the case of a write but not bypass the register in the
case of a read. This solution works for the most part and was done to avoid
having to forward memory data. That is, if you did not bypass the EX_MEM
pipeline register on a write, then your write to memory would happen one cycle
later. This is not an issue except in the case that you read from the
address you wrote to immediately. That is, you do the following:
        sw      $t0, 0($t1)
        lw      $t2, 0($t1)
The read and write will be happening at the same time, so the data
written will not be accessible to the lw instruction. Thus we bypass the
register so that the sw will complete in time for the lw to access the data. An
alternate solution is to do forwarding of the data memory. This is
different from the forwarding of register data described in the
book. What you would have to do is detect if you had a sw followed by a lw and
check if their addresses were the same with a comparator. Then you would
forward the sw's write data to the lw's read data bypassing the data
memory. This solution should work in all cases and is valid.
                                                                              
The problem with the bypassing via mux solution is that any lw followed by
a sw will not work because the load and store will be trying to setup
the address at the same time:
        lw      $t0, 0($t1)
        sw      $t1, 4($t2)
Since the store is trying to jump the gun and setup the address before
the clock while the load is doing it after the clock... there will be a
problem. Which one wins out depends on the select bit to your mux. If
you used MemWrite_EX as the select bit to your mux, then the store will
always work while the load will not. However, the behavior of the memory when
both read and write bits are asserted may be undefined. This is not
good.
                                                                                     
In general, it doesn't make sense to use a memory with this behavior
for a pipelined processor. If reads were synchronous, you would bypass
the pipeline register for both reads and writes, and everything should
work fine.
                                                                              
Because of this issue, we have decided to allow you to do you writes
on the negative edge of the CLK in hopes of making things simpler. I had
initially said that you may not put any gates on the CLK because this
does not work in hardware. Normally this would introduce awful delays and
clock skew into your system. Since this is simulation, however, negating the
CLK has no ill effects. It is important to understand, however, that
negating the CLK in this case is meant to be symbolic of having a data memory
that does writes on the negative edge. You should never do this in real
life (remember that when taking cs150 or 152). We decided to have you make
the change in your code rather than changing the blocks.v file so as not
to surprise people by breaking their processors because they have not yet
read this message.
                                                                               
The way you should do this is to change the input to CLK in your
instantiation of the data memory to be ~CLK:
mem memBlock (.CLK(~CLK), .RST(RST),.... etc
Since reads are asynchronous, they will be unaffected. Now you must
make sure you are not bypassing the EX_MEM pipeline register ANYWHERE. All
of the inputs to the data memory should be coming out of the EX_MEM
pipeline register (w/ the exception of CLK, RST, and DMP).
If you had already successfully implemented the bypassying solution
with a multiplexor that I had described previously, then the easiest way to
implement this fix is to do the following:
1. hardwire the select bit on your address bypass mux (the one that
chooses between the address coming out of the ALU or the one comnig
out of pr_EX_MEM) to always choose the value coming out of pr_EX_MEM. This
means you just need to do .select(0) or .select(1) depending on which input
is the one you want. This is a slightly hacky fix, but it's a good way to
get it working first before you start hacking up your wires. You should
eliminate the mux later once you get things cleaned up.

2. Change the .WR input to be .WR(MemWrite_EX_MEM) instead of
.WR(MemWrite_ID_EX). Or whatever your wires are named...
3. Change .writeD to take the value from pr_EX_MEM instead of the
value coming from the EX stage. You may need to add this to your EX_MEM
pipeline register. Make sure that you are using a forwarded value (if
you have already implemented forwarding).
                                                                               
We have provided a test case for you to check if this works in
s61c/lib/proj3/tests called pipeline.lw.sw.2. This should work with
both Stage 2 and Stage 3 as I have inserted sufficient noop instructions.
                                                                               
We apologize for making this change now, but we feel that it should be
simple enough and not consume very much of your time or set you back
very far. As to day is Saturday, you still have plenty of time to talk to
your TAs about any problems you are still having. This change should
simplify your testing and your code.
                                                                               
Feel free to email me and/or post to the newgroup with any issues you
are having with this change.
                                                                                     
Good Luck.
                                                                                     
Navtej