Answer from cs61c-ed (Cory Benavides 14101530) for Question 2 This won't run as efficiently as possible because when pipelined, lw will write the data to the register two clock cycles after add tries to access it. To minimize the performance penalty, forwarding hardware can be implemented that directly carries the results from lw's data access to add's ALU stage. This will decrease the penalty to a one clock cycle loss.