Department of Electrical Engineering and Computer Science
EECS
61C, Summer 2004
Lab 6-2: Caching
To determine the parameters of a computer's cache by observing the amount of time it takes to access the elements of an array.
P&H, 7.1-3 and 7.5
The program ~cs61c/labs/lab6-2/cache.c (updated at 1 pm today) produces timing data for accesses of elements in an array. Here is its main loop:
for (index = 0; index < limit; index = index + stride) {
x[index] = x[index] + 1;
}
The rest of the code sets the variables limit and stride, and determines the time it takes to execute the loop.
The program runs a series of tests (varying the values of limit and stride from test to test). Each test is repeated many times, and the results (time it takes to execute the loop) are averaged. The same loops are run without memory accesses to measure the time taken for loop overhead. The program subtracts the overhead time from the loop time so the final result should be an estimate on the time spent on memory accesses only. Unfortunately, some of our compilers don't comple the real loop and dummy loop in the same way, so your times may be a exaggerated. You don't have to fix this.
* The size value refers to the size, in bytes, of the array you are testing with, not the size of the cache. (One of your goals is to figure out the cache size for yourself.)
* The stride value says how many bytes are being skipped for each memory access. For example, if the stride is 4, then bytes 0, 4, 8, 12, ... in the array are being accessed while bytes 1, 2, 3, 5, 6, 7, 9, 10, 11, ... are being skipped.
* The read+write value is the amount of time needed to access
memory, in nanoseconds. You'll have to figure out for yourself whether these
times denote hits or misses. As explained above, the calculated times may be
a exaggerated if the compiler doesn't make equivalent machine code for the two
loops that process the array.
Exercise 1: Gather Timing Data
Start by gathering some data. Log in to a machine of your choice. Note the
machine name and any information you can find about its processor. uname -a
or uname -X may work on some of the instructional machines. Copy
~cs61c/labs/lab6-2/cache.c and compile it without any optimizations. Then compile using
gcc -o cache cache.c, and run the program. The program tests reading
and writing with arrays from 4 Kbytes to 1 Mbytes at strides varying from 4 to 512 bytes.
You will want to capture the output into a file, so use the command
./cache > out (which puts the result in a file named out). This may take awhile as each test in cache.c takes a second or two to complete (because each test is repeated multiple times).
Take a look at the file out. Then graph the data by running gnuplot. If it doesn't work on the machine you
connected to, run it from your local lab machine. Give gnuplot the command:
plot "out" using 0:8 with linespoints
That command means "plot the values in the file out as points, where each data value is a new point on the x-coordinate with the read+write time as its y-coordinate". gnuplot breaks the graph line where there's a blank line in the
input. You should see the results of each array size experiment as separate
graph segments. Each segment shows various strides, increasing from 4 bytes at
its leftmost point to 512 bytes at its rightmost. (To learn more about
gnuplot, try its excellent help system: help plot, help plot with,
help set, help, etc.)
Before you start number-crunching, try to understand the qualitative features
of the graph. You should be able to explain what each of the jumps in access
time means, and why some are much bigger than others. Do all your graph
segments start at the same access time? Do they all end at the same access time?
Exercise
2: Determining Cache Parameters
From what you see in the graph and the numbers in your table, you should be
able to determine most of the parameters of the test computer's cache. Here
are the questions you should answer:
* Approximately how long does a cache hit take? Why?
* Approximately how long does a cache miss take? Why?
* What is the size in bytes of the first-level cache? Why?
* What is the block size in bytes of the first-level cache? Why?
Exercise
3: Determining Associativity
Answer one more question:
* What is the set associativity on the cache you're analyzing? Why?
You may not be able to figure this out from the data given by the original
cache.c. Changing the stride_min, stride_max, cache_min, and cache_max
settings at the top of the file will help. When you're calculating the set
associativity, you ll want a large stride and large cache sizes to eliminate
the effects of the block size and cache size. With a sufficiently large stride
length along with arrays big enough to contain 1, 2, 4, ... elements to access,
you should be able to observe the effects of putting several elements into the
same cache set (row). Normally, retrieving elements into the same set would
cause repeated misses and no benefit from the cache. But if the cache is
associative at all, you ll see benefits even when multiple blocks are loaded
into the same set. cache.c's testing style of increasing the array size should
give you results that show how many blocks can be loaded into one set, and
that's the cache's associativity.
A piece of information not required for this lab but worth exploring is the
following:
* Does the machine have more than one level of cache? Why or why not? If so, what the parameters of the non-primary cache(s)?
You will probably not be able to conclude very much about multiple levels of
cache because the smaller caches tend to hide the effects of the larger ones.
Do not expect to be able to calculate all the parameters of non-primary
caches.
What You Should Have Learned =)
As a result of this lab, you ought to be able to specify what data and
features of the cache program let you determine a given cache parameter, and
to answer questions like the following (drawn from an old exam):
Running the cache program on the Wazcog Mark IV computer, you get the following results:
stride -> 4 8 16 32 64 128 256
size
64 109 126 122 121 101
128 120 118 116 121 117 105
256 119 116 102 117 113 113 110
512 466 966 949 949 983 308 311
1024 458 949 983 966 949 983 307
2048 449 992 954 966 979 979 979
(Stride and size are in bytes, times in nanoseconds. Yes, this is an artificially small example, to make the numbers fit easily.)
* What is the cache size in bytes?
* What is the block size in bytes?
* What is the allocation policy? (Direct mapped, fully associative, or N-way set associative? If N-way set associative, what is N?)