CSCI320 Lab #9
Cache Simulation
November 6, 2007

This is initially a repeat of a lab exercise that you did in CSCI206.  It will serve both as a reminder of some cache concepts and will ask you to think more deeply about the meaning of the simulation runs.  You will need to run additional simulations to answer the questions.

ssh to a Sun to run this simulation.  The files are found in ~cs320.

A Cache Simulator

As the Level One cache is part of the actual processor chip, collecting statistics on its performance (hits, misses, etc.) is only possible with specialized hardware. Another method that allows computer designers to experiment with different sizes, policies, etc. is to use a software simulation of a cache. Software simulations are usually run for processor designs, including cache, before the hardware processor is available, or perhaps to test a design decision.

We will use a cache software simulator to compare several hardware design features. We will use an input file which traces all the memory references while executing a 32x32 matrix multiply program. The cache behavior is reported, based on the design parameters set for the simulation run. The name of the simulator is dinero. The parameters defining the cache to be simulated are specified as arguments on the command line.

Before you run the simulator, stretch your terminal window very wide.  cd to ~cs320 and run the simulator there.

Here is an example of running the simulation, with the meaning of the parameters defined below it:

p2d -4 < mm32 | dinero -l1-usize 8k -l1-ubsize 4 -l1-uassoc 1

p2d reads the memory reference trace file mm32 and pipes it to dinero with the following parameters:

-l1-usize 8k -- indicates a level one unified cache with 8k bytes -- the other possibilites for the type of cache are instruction and data

-l1-ubsize 4 -- indicates that the block size for a cache move is 4 bytes (just one word)-- a block size of 16 would mean that each cache move would copy 4 words

-l1-uassoc 1 -- indicates that the associativity set size is 1 word per block -- means it is direct-mapped, no associativity -- the other possibilities would be a set size of 2/4/8 or larger

If you want the results printed, you can pipe to enscript with the landscape mode indicated, as follows:

p2d -4 < mm32 | dinero -l1-usize 8k -l1-ubsize 4 -l1-uassoc 1 | enscript -r

You can get more information on parameters with dinero -help

Run the simulation and record the information for the first example shown above in the table: (For the miss rate, use the Total Demand miss rate (as a percent).)

Continue to collect statistics in the table provided below to answer the following questions.

1. What is the effect of increasing the block size to 16? What principle is this based on?
 
 
 

2. What is the effect of doubling the size of the unified cache (to 16k) while keeping the block size at 4?
 
 
 

3. What is the effect of doubling the size of the unified cache (to 16k) and changing the block size to 16?
 
 
 

4. What is the effect of a cache size of 32K, 64K, 128K, 256K, 512K with a block size of 16? Look at results other than Total Demand miss rate and comment on the changes.
 
 
 
 

 
 
 
Level of Cache Type of Cache  Cache size Block size Set associativity Miss rate
           
           
           
           
           

























5.  Repeat the simulation described in question 4 but use a block size of 128 for all sizes of the cache.  Save your measurements in the table below.  How does this increase in the block size change your results?





Level of Cache Type of Cache  Cache size Block size Set associativity Miss rate
           
           
           
           
           

Now we will look at associativity. Record you results in the table below.

6.  What is the effect of setting 2 way and 4 way set associativity for the original unified  direct-mapped 8k cache with a blocksize of 4?
 
 
 


7.  What is the effect of setting 2 way and 4 way set associativity for a unified 128K cache?  Blocksize of 16. Compare to question 4 results.
 
 
 

 
 

These examples have all used a unified cache.  Now we will split the cache.

8.  Run the simulator for an 16K instruction cache and an 16K data cache and compare your results to a 32K unified cache (from question 4).  Use direct-mapped and a block size of 16.
 
 
 
 

Now look at adding a level-2 cache.

9.  Add a 256K level 2 cache to a 32K unified cache using a block size of 16 for both.  Compare its behavior to the 32K without level 2. Assume direct mapped.
 
 
 
 
 

 
 
Level of Cache Type of Cache  Cache size Block size Set associativity Miss rate Miss rate
             
             
             
             
           
             
             
             
             

  10.  Add a 1024K level 3 cache with a 128 block size to the memory described in question 9.  Record your observations in the table below.  Compare its behavior to that of question 9. 





11.  Devise your own experiment that uses level 1 and level 2 caches. Run the simulator and record statistics.  Describe your experiment and comment on the results you observe.









 
Level of Cache Type of Cache  Cache size Block size Set associativity Miss rate Miss rate Miss rate
               
               
               
               
               
               
               
               
               







Page maintained by Dan Hyde, hyde at bucknell dot edu Late update November 3, 2007
Back to CSCI 320 Home Page.