The following sequence of instructions cannot run
at full speed on the pipelined datapath without
special hardware:

lw $to 0($s1)
add $t1 $t0 $t0

Why not? What sort of hardware (or software) schemes
can we use to get reduce the peformance penalty?