The following sequence of instructions cannot run at full speed on the pipelined datapath without special hardware: lw $to 0($s1) add $t1 $t0 $t0 Why not? What sort of hardware (or software) schemes can we use to get reduce the peformance penalty?