The Official Project 3 Grading Information Page

<Testing | Grading | Regrading | Statistics >

Version 1.1 -- Last updated on 5/8/2001 10:45PM

Section 0: Recent Updates

[5/8/2001] A better policy for people who had non-standard implementations of 'nop' or 'move'.
[5/8/2001] Improved wording on the 'cascading errors' section.

Section 1: The Testing Procedure

The platform for our testing was quasar, however it is believed that the tests should work just as well on any machine (of course, the solution binaries will need to be recompiled for a non-Sun machine).

The testing for this project consisted in creating a large assembly file (bigtest-source.s) that contains every instruction name (i.e. opcode/funct pair), every possible register value, and many different immediate values.

(An attempt was made to isolate names, registers and immediates to prevent 'cascading errors'. Specifically, most immediate tests were conducted with instructions that had the zero register in each of the register fields. (e.g. addi $0 $0 0xFFFF). To my knowledge, all of the possible cascading errors (where you got 'penalized 2X for one mistake') are cascading by design. That is, your implementation did a particularly bad thing that screwed up multiple sections of the project -- as opposed to simpler errors which might have only affected one.)

The bigtest-source assembly file was loaded into spim, and dumped to a binary machine-language file (viz. bigtest-dump.bin. Of course, this file is unprintable, so it will look like garbage on your screen.) This dump file served as the raw material for each of the disassemblers.

Via an automated testing script, your submission was unpacked into a 'sandbox' testing directory. Your disassembler.c was compiled using the command 'gcc -g -Wall -lm -o disasm-student disassembler.c'. If your disassembler required a special command to build (e.g. 'make'), or used additional .c files, or called the dissasembler file something besides 'disassembler.c', the script would exit with an abnormal-compile error code. If this happened to you, please see the section on regrading.

Once your disassembler was compiled, the disassembler was executed with the binary dump file as the input stream and output redirected to a temporary output file (i.e. 'disasm-student < bigtest-dump.bin > bigtest-student-solution.s'). The same input was piped into two more disassemblers: 1. a pre-compiled TA-written solution disassembler with the output redirected to 'bigtest-normal-solution.s' and 2. a pre-compiled, TA-written extra-credit solution with the output going to 'bigtest-extra-solution.s'. Your output was compared with the output of the solution disassemblers through a script that amounted to an enhanced version of diff. proj3-compare.pl is a perl script that has the following pseudocoded form:

1. Attempt to synchronize the files via the __start signal or first label. Made more difficult by the fact that many people forgot to put a __start, or mis-spelled it, or mis-formatted their labels.

2. For each line of the solution file after __start:

1. Read one line of the student's file.

2. If a label exists in the solution line, look for it at the beginning of the student's line. If exists, OK, if not then:

1. If the student had the right label, but the wrong format (e.g. no colon), output a bad-format code.

2. If the student had the label wrong, output a (more serious) bad-label code.

3. Attempt to strip the (possibly non-existent) label off of both files.

4. Attempt to match up to three register fields. If the solution has a field while the student doesn't, or if the student has the wrong value in the field, then output a bad-reg code.

5. After stripping all of the register fields, attempt to match up the immediate field. If solution has one, but the student doesn't, then output a bad-imm code.

The output of this script was stored to two files ('bigtest-stu-nor.codes' and 'bigtest-stu-xtr.codes' for the normal and extra-credit solution, respectively. These ouput were analyzed by the main testing script (test-proj3.pl), which, on a purely size-based criterion, decided which solution your disassembler did better against. (In this way, formatting errors, which were slightly longer, were de-emphasized in comparison to ins, reg and imm codes, which were shorter.) test-proj3.pl would then count up all of the unique instances of the error codes, and then assign your disassembler into an 'error bucket'. E.g., if your disassembler had one or two different imm-type codes (say j-imm and addiu-imm) you were assigned into the imm1 error code bucket. Similar remarks apply to the other error buckets (ins, reg, imm, lab, and form).

Finally, these codes were collected and placed into glookup/gexplain.

Testing your disassembler yourself

If you want to run the tests yourself, follow this procedure:

Please read the rest of this document before delving into testing! You'll save yourself (and me) a lot of time.
create a new, testing directory (using mkdir) and 'cd' into it.
copy your disassembler.c file into the directory
copy the entire contents of ~cs61c-tm/proj3-tests/ to that directory using 'cp ~cs61c-tm/proj3-tests/* .'
invoke 'test-proj3.pl'. It will create several files which are explained above. The 1st-order codes are perhaps the most important
if that isn't enough, you can invoke 'proj3-compare.pl' using the command line arguements given by test-proj3.pl.
Please do not send me the output of 'diff' -- I don't care about diff; I care only about the outputs of my testing programs!

The most enlightening of the tests will be the ouptut of the test-proj3.pl script. The 1st-codes are the original per-instance things that your disassembler did wrong, while the 2nd-order codes are summary values of the 1st-order codes. Note that if you got a perfect score, then the 1st-order and 2nd-order sets will be empty.

Once you have the codes, you should compare the output of the solution disassembler to your disassembler at the instuctions indicated. The solution output resides at 'bigtest-normal-solution.s' while your output will be in 'bigtest-student-solution'. Again, please use my proj3-compare.pl (with the final arguemnt as 1) instead of diff.

Section 2: The Grading Standards

In general:

The grading for project 3 is based entirely on correctness. I know that it is harsh, but there is no other way to ensure fairness without having the readers and TAs re-grade 300 projects.

As such, unless you can show that your solution is correct at some point where it was marked incorrect, your grade will stand. Exceptions are made only for students who received some sort of 'abnomal' code.

In particular:

If your disassembler failed to compile, or failed to complete our tests, it was given an automatic 0. This problem could be due to one of four things:

A problem with our testing platform. If this is the case, please see the section on regrading. This could apply to the abnormal-compile codes and the abnormal-runtime codes.
A minor problem (e.g. at the last minute, you added your TA's name at the top, but forgot to put it in a valid C comment. In this case, you may also want to take a look at the regrading procedures. This will most-likely be indicated by an 'abnormal-compile' code.
A major bug in your project (e.g. serious pointer errors). In this it is probably not even worth your effort to submit a regrade.
You used a Makefile, or had additional .c or .h files that my gcc command failed to recognize.

In any case, if your disassembler received an abnormal-compile or abnormal-runtime code, please read the regrading section to see whether you should regrade. We are being lenient with compile and runtime errors; please bear in mind that, in your future classes at Berkeley, a compile or runtime error is grounds for an instant and un-appealable 0.

If your disassembler did compile, but failed several tests, it was thrown into error-categories depending on the prevalence of the bugs. The five types of categories are: ins, reg, imm, lab, and form. The severity levels are 1 (for least severe), 2 (for moderate) and 3 (for very severe). Thus, the error code reg2 means that you had quite a few errors regarding the placement and/or values of several registers. An attempt was made to prevent 'cascading errors', i.e. errors that spilled over into another category, however we were uanble to prevent all such occurences and will not regrade such cases. A more detailed explanation of each category follows.

i. ins errors

These errors arose from decoding the name of an instruction incorrectly. You got one error for each unique opcode/funct pair that you decoded incorrectly, i.e. if you mislabeled 'addi' as 'addiu' in four different places, it would count as only one error. This was the easiest part of the project. The buckets were as follows:

1-2 Errors: -5 points. We didn't penalize too heavily for these few errors on the assumption that they were mainly due to an oversight. (But that you still had the right idea.)
3-7 Errors: -15 points. This was a more serious error, since it usually meant a systematic misunderstanding of the opcode/funct decoding mechanism.
>7 Errors: -25 points. Usually meant a demonstration of a complete lack of understanding of opcode/funct decoding.

ii. reg errors

These errors accrued in a few main places: Firstly, many people confused the order of the destination-source registers of several instructions (notably the logical shifts). Secondly, many people decoded the register mnemonics incorrectly or only partially. However, we did give full credit for the $0 register (instead of $zero) and the $s8 register (instead of $fp). Register errors could also cascade over from a misidentification of the instruction format. In this way, reg-codes and ins-codes served to effectively penalize errors in the instruction format. The buckets were as follows:

1-2 Errors: -5 points. Again, usually due to a single problem (like switching reigster on two instructions.
2-7 Errors: -15 points. Usually caused by at least two different types of errors (e.g. register switching and misnaming, although it could apply to one one type of error if the error was particularly egregious.)
>7 Errors: -25 points. Usually a very serious mangling of the register orders and/or names.

iii. imm errors

These were by far the most prevalent of the errors. At last count, nearly 2/3 of the class missed some part or another of the immediate testing. The biggest problems were related to 1. using unsigned when you should have used signed, 2. using signed when you should have used unsigned, 3. using hex instead of decimal, 4. using decimal instead of hex, 5. using too many bits in the hex fields (the spec said exactly 4!), or simply miscalculating the immediate. We were quite generous in some areas, giving full credit for both signed and unsigned comparison and logical-shift immediates. Otherwise, this was the most difficult section of the project. The buckets were as follows:

1-2 Errors: -5 Points. Usually, a simple mistake.
2-7 Errors: -15 Points. Most often pair smaller types of mistakes, but some single mistakes could reach this range if they were particularly egregious.
>7 Errors: -25 Points. Usually two or three serious mistakes. This penalty was incurred quite often.

iv. lab errors

Simply put, these errors were due to messing up the labels of the program. Actual formatting errors are handled in the form codes; lab codes apply only to having the wrong 8-digit value in the label. The buckets were as follows:

1-2 Errors: -5 Points. Usually something simple.
2-7 Errors: -15 Points. Usually indicated that there was some sort of serious error with the referenceing or definition of labels.
>8 Errors: -25 Points. Usually a complete misuse of most labels.

v. form errors

This category is reserved for people who failed to follow the spec. A formatting error is due to something like 1. forgetting/misspelling the __start symbol, 2. forgetting the 'L' in front of labels, 3. forgetting the ':' after the labels, 5. putting extraneous symbols in the labels, etc. In general, however, there were quite a few cases where we simply let rather egregious formatting errors (most of which were in blatant contradiction with the spec) pass by. The buckets had an unusual spacing, due to the inter-dependence of format errors. I.e. if you missed one, you were very likely to miss many more.

1-7 Errors: -5 Points. Usually something simple.
7-99 Errors: -15 Points. The maximum bucket. Usually indicated some systematic mis-formatting errors.
>99 Errors: -25 Points. Unused

Extra credit

Several solutions attempted the extra credit, and, as noted above, your disassembler was compared with both the normal and the extra-credit solution and received the best score of the two.

However, it was decided that the bonus points for the extra credit would be awarded only if you received a perfect score on the other tests. I.e. everyone who got the extra-credit has a final score of 110, and no-one who missed any points received the extra-credit. This policy re-iterates the statement on the specification that you should have attempted the extra-credit only when you were positive that everything else was working.

Notes about Testing

Before you send an email to me with a question or complaint, please see if your problem occurs in the following list.

The testing scripts incorrectly take off points for people who implemented different versions of 'nop' and 'move' according to the various versions of SPIM. If this may apply to you, please read the relevent piece of the regrading section.
If you are having trouble finding out out why you were penalized for something, try running 'test-proj3.pl' as described above and looking at the 1st-order codes. Simply running 'diff' on our output and yours will not be a reliable indicator of what is not working. You must use my testing scripts! FYI: this is becuase there are many things that diff would treat as an error but that we consider OK.
Your results from 'test-proj3.pl' are only valid if there are no compiling errors and no sh: errors. This means that if you see 'sh: ./disasm-extra-credit: not found', then your results are invalid! Fix the shell error first, then test your project.
There is tension between the spec and standard mips as far as the immediate field for logical operations. Therefore, we gave full credit for both signed or unsigned shift amounts, even though our solution calculates unsigned.
Due to the confusion about 'done', 'end', and 'ori/syscall', my scripts will give you full credit for any of these constructions. Our solution, however, decodes the terminator as 'end'.
In a similar vein, we gave full credit for both '$0' and '$zero' as mnemonics for the zero-valued register.

Section 3: The Regrading Policy

In general:

Due to the relatively small amount of time left in the semester and to the large number of superfluous, ill-reasoned regrade-requests that have been received in other assignments, Project 3 will have a unique regrading policy. It is important that you understand this policy before you ask for a regrade. Firstly, there will be a penalty for submitting regrade requests that have no merit. If you want a regrade, it is your responsibility to 1. test your disassembler against our testing scripts (see below), 2. make sure that you have valid reasons for a request, and 3. email me with specific documentation. The following are valid reasons for a regrade:

You an inconsistency between the solution and the specification that resulted in a material point loss to your disassembler. The 'bug-in-the-solution' approach.
You believe that your project was diagnosed with an incorrect error code, and you are unable to reproduce the error codes using our tests. The important point here is that it is not enough for you to say 'well, it worked for my test cases...'. Rather, you must test your disassembler using our tests.

The following are invalid reasons for a regrade (there are many more):

You think that the standard was too harsh -- e.g. you lost too many points for a small error. Let me reiterate that the standard will be considered inviolate.
You just feel like re-submitting to see if you'll get lucky.

If you do decide to submit a regrade request, you must email me with (1) the output of test-proj3.pl (including the stdout of test-proj3.pl and the 'bigtest-student-solution.s'), and a specific description of the problem. I will personally look over your project and determine whether your reasoning is valid. If it is valid, then I will give you back the points that you deserve. However, if I determine that your reasoning is invalid, then I will subtract 20 points from your project score, using the superfluous-regrade code.

The whole point of this is to cut down on meritless regrades so that I can concentrate my time on people who truly do deserve more points. Please be absolutely certain that you will get points back before you submit a request. Unless you find a bug in our solution, you must show me that when you run the tests as described in the testing section (i.e. proj3-test.pl), you get a better score than was originally recorded.

Note for people who had an 'abnormal' code:

You have two choices. You can either keep your zero (It's only 1 out 6 projects -- it won't hurt that much.) or you can attempt to get your project working and resubmit it to me. If you decide to resubmit it, you should do the absolute minimum possible to get your program to compile or not segfault on out tests. Once you have fixed your compile or runtime bugs, then you should use our testing scripts to determine you score -- if you would have done reasonably well (say above 75 or so, depending on the magnitude of the changes that you had to make), then it is worth your time to resubmit.

If you do decide to submit a regrade request, then you must email me with (1) the output of test-proj3.pl on your original project, (2) the output of test-proj3.pl on your 'fixed' project, and (3) a line-by-line specific explanation of what you had to change to get it to compile or not exit abnormally. Please do not forget to include both the stdouts of each run of test-proj3.pl and the 'bigtest-student-solution.s' from your fixed disassembler.

I will compare your new submission to your old one, and based on the severity of the changes that you had to make to get it to work and the performance on the standard tests, I will assign you a new grade. The smaller the amount of changes that you had to make, the more likely you are to get points back. The maximum number of points back in such a situation will be around 50.

If you received an abnormal-compile error because you used a Makefile or had extra/different .c or .h files, please email me with the command line that you used to compile -- I will recompile it and give you points back. Since the spec said nothing about the compilation options, you won't be penalized for this.

Please note that students who do not have an abnormal code may not change their code and resubmit it.

Note for people who lost points on sll and/or move:

Recently, it has been pointed out that different versions of SPIM use different machine formats for the 'nop' and 'move' pseudoinstructions. It has also been pointed out that the spec does not specifically say that we must decode the 'nop's and 'move's of the version of SPIM installed on quasar. Therefore, it seems unfair to penalize those persons who 'went above and beyond the call of duty' (as one such person described himself) by implementing the 'nop's and 'move's of different versions of SPIM. So here is the new policy:

In order for anyone to recieve full credit for the cases that test 'nop' both of the following conditions must obtain:

You decode at least one instruction as a 'nop'.
You can provide evidence that every instruction that your solution decodes as a nop is also decoded as a nop by at least one version of SPIM.

For example, if you did as the spec said, you would get full credit (this is the case that applies to most). Also, if your home version of SPIM decodes nop as 'sll $0 $0 0' (and this was the only thing that you decoded as nop), you would get full credit. And, if you wanted to go the extra mile and decode the nops of two or three or ALL versions of SPIM, then you would also get full credit. However, if you decoded some instruction as a 'nop' but are unable to find a version of SPIM that does likewise, you are out of luck and are stuck with the score that you originally received.

Similar reasoning also applies to the move pseudoinstructions. I.e., both of the following must obtain:

You decode at least one instruction as a 'move'
Every 'move' that you decode is also decoded as a 'move' by some version of SPIM.

If one or the other fails, you are again subject to the original grade you received.

The regrade procedure for cases like this will be as follows: Email me with the following:

The output from 'test-proj3.pl' run on your project.
A list of each of the nops/moves decoded, and the version/platforms of SPIM that decodes them as such.

This is the only way to receive credit for these types of nop/move problems. Please note that I will not patch this into the testings scripts and re-grade everyone's projects -- you are responsible for detecting the situation and dealing with it appropriately.

A final note about resubmitting: Since regrading an entire project is a time-consuming prospect, my scheduled deadline for completion of all project 3 regrades is the beginning of the final exam. Since I will be spending quite a bit of time on each project, there will be no regrade-regrade requests. Furthmore, please do email me about 'how's the progress of my regrade' until after the final exam. Regraded projects will hae the special code 'regraded' added into gexplain.

Section 4: Project Statistics

These statistics were generated automatically on 5/3/2001. They may change slightly as regrades are processed.


Analyzing proj3-auto-grade-roster.txt...done.

-- SUMMARY DATA -------------------------------------------------

    #total valid logins : 343
  #submissions received : 298
          average score : 69.4630872483222 (including abnormals)
          average score : 78.4090909090909 (excluding abnormals)
 
-- HISTOGRAM ----------------------------------------------------

Score : #People
  100 : 53
   95 : 44
   90 : 15
   85 : 38
   80 : 14
   75 : 23
   70 : 15
   65 : 16
   60 : 2
   55 : 10
   50 : 6
   45 : 7
   40 : 2
   35 : 6
   30 : 
   25 : 2
   20 : 
   15 : 1
   10 : 6
    5 : 1
    0 : 37

-- CODE OCCURENCES ----------------------------------------------

           Code : #Occurences   
           ins1 : 85
           ins2 : 9
           ins3 : 16
  no ins errors : 188

           reg1 : 64
           reg2 : 34
           reg3 : 33
  no reg errors : 167

           imm1 : 61
           imm2 : 72
           imm3 : 51
  no imm errors : 114

           lab1 : 8
           lab2 : 8
           lab3 : 3
  no lab errors : 279

          form1 : 5
          form2 : 11
          form3 : 0
 no form errors : 282

   any abnormal : 34
   extra-credit : 15