We will use a cache software simulator to compare several hardware design features. We will use an input file which traces all the memory references while executing a 32x32 matrix multiply program. The cache behavior is reported, based on the design parameters set for the simulation run. The name of the simulator is dinero. The parameters defining the cache to be simulated are specified as arguments on the command line.
Before you run the simulator, stretch your terminal window very wide. cd to ~cs320 and run the simulator there.
Here is an example of running the simulation, with the meaning of the parameters defined below it:
p2d -4 < mm32 | dinero -l1-usize 8k -l1-ubsize 4 -l1-uassoc 1
p2d reads the memory reference trace file mm32 and pipes it to dinero with the following parameters:
-l1-usize 8k -- indicates a level one unified cache with 8k bytes -- the other possibilites for the type of cache are instruction and data
-l1-ubsize 4 -- indicates that the block size for a cache move is 4 bytes (just one word)-- a block size of 16 would mean that each cache move would copy 4 words
-l1-uassoc 1 -- indicates that the associativity set size is 1 word per block -- means it is direct-mapped, no associativity -- the other possibilities would be a set size of 2/4/8 or larger
If you want the results printed, you can pipe to enscript with the landscape mode indicated, as follows:
p2d -4 < mm32 | dinero -l1-usize 8k -l1-ubsize 4
-l1-uassoc 1 | enscript -r
You can get more information on parameters with dinero
-help
Run the simulation and record the information for the first example shown above in the table: (For the miss rate, use the Total Demand miss rate (as a percent).)
Continue to collect statistics in the table provided below to answer the following questions.
1. What is the effect of increasing the block size to 16?
What principle is this based on?
2. What is the effect of doubling the size of the unified
cache (to 16k) while keeping the block size at 4?
3. What is the effect of doubling the size of the unified
cache (to 16k) and changing the block size to 16?
4. What is the effect of a cache size of 32K, 64K, 128K,
256K, 512K with a block size of 16? Look at results other than Total Demand
miss rate and comment on the changes.
| Level of Cache | Type of Cache | Cache size | Block size | Set associativity | Miss rate |
5. Repeat the simulation described in question 4 but use a block
size of 128 for all sizes of the cache. Save your measurements in
the table below. How does this increase in the block size change your
results?
| Level of Cache | Type of Cache | Cache size | Block size | Set associativity | Miss rate |
Now we will look at associativity. Record you results in the table below.
6. What is the effect of setting 2 way and 4 way
set associativity for the original unified direct-mapped 8k cache with
a blocksize of 4?
7. What is the effect of setting 2 way and 4 way
set associativity for a unified 128K cache? Blocksize of 16. Compare to question 4 results.
These examples have all used a unified cache. Now we will split the cache.
8. Run the simulator for an 16K instruction cache and
an 16K data cache and compare your results to a 32K unified cache (from question
4). Use direct-mapped and a block size of 16.
Now look at adding a level-2 cache.
9. Add a 256K level 2 cache to a 32K unified cache
using a block size of 16 for both. Compare its behavior to the 32K
without level 2. Assume direct mapped.
| Level of Cache | Type of Cache | Cache size | Block size | Set associativity | Miss rate | Miss rate |
11. Devise your own experiment that uses level 1 and level 2 caches. Run the simulator and record statistics. Describe your experiment and comment on the results you observe.
| Level of Cache | Type of Cache | Cache size | Block size | Set associativity | Miss rate | Miss rate | Miss rate |