Lab 7

Introduction to the MIPS VM, arrays, and structures in assembly and C


  • Learn how to access the mips machine, write, compile, and execute MIPS programs
  • Learn how to write programs using both C and assembly code and work with complex data in both realms (C/assembly)
  • Practice MIPS coding and debugging

Notes: One suggestion while doing this lab is that you should edit your files on your regular Linux machine using your favorite editor, but compile and run the programs on the host mips. The reason is that the mips machine is very slow. In addition, please pay attention that in the mips machine, the Linux commands are plain, for example, if you use the rm command, it will no longer ask you to confirm, the file you remove will be immediately gone. In our regular lab machines, the commands such as rm have been revised to provide more protections. We also recommend you do git on your regular lab machines. If you do it on the mips you will need to edit the .gitconfig in your mips home directory unixspace and remove the line default = simple after the symbol [push] as the git on mips is slightly different from what is in the lab machines.

Exercise 1: Working with C and assembly

In the prelab you wrote a MIPS assembly program that used a syscall to set the return value. To perform more complex actions we have to find the right syscalls. Looking through the Linux syscalls in the prelab (on the mips machine!) (defined in /usr/include/mips-linux-gnu/asm/unistd.h) you might have noticed there are no read/print string/int/etc syscalls like there are in MARS. Remember back to when we said in Linux everything is a file. Linux handles I/O through syscalls to open/read/write/close and there is no built in type conversion (everything is just raw bytes). To write to the terminal, you open the special file /dev/tty and send your data to that file descriptor. Input is handled similarly. In short, if you think I/O in MARS is a pain, it is far more complex using Linux syscalls and there really is no good reason to be doing I/O in assembly using syscalls anyway. The syscalls man page even starts with the warning: System  calls  are generally not invoked directly, but rather via wrapper functions in glibc. So, while we could continue down this road, we will follow the suggested approach and rely on glibc (aka the C library). To accomplish this we will write C code to take care of I/O but do the important work (processing) in assembly.

In prelab, you wrote the is_prime function in C. The goal in this exercise is to implement is_prime in assembly to see if we can improve the run time. To get started, copy your primes.c file to primes2.c and create the file is_prime.s. In primes2.c, remove the is_prime function. Add a function signature for is_prime() before the C main() function as extern int is_prime(int);. Doing so allows the C compiler to look for the actual implementation of the function is_prime() somewhere else, not necessarily in this file (primes2.c). In is_prime.s, write in MIPS assembly, a new implementation of the function is_prime. Your C main function in file primes2.c will call your assembly implementation defined in is_prime.s. In order for this to work, your assembly function must have a label with the same name as the function name in C (i.e., is_prime) – you also need to add the line .global is_prime to tell the compiler to expose this function to files other than is_prime.s, similar to what you did in prelab.

The MIPS procedure calling conventions you learned in the zyBook chapter 8.2 and labs are essential for you to integrate your C and assembly code. This will work only as long as your assembly function follows these conventions, which are also used by the C compiler.

animated-gifs-alarm-07Read this section carefullyanimated-gifs-alarm-07

There are two potential problems writing for “real” MIPS hardware instead of MARS:

  1. When writing MIPS assembly on the real machine, you need to add a nop instruction immediately after every branch statement or jump statement. MIPS uses what are called branch delay slots (we will learn more about these later). Essentially this means the instruction after a branch or jump statement will always be executed (even if the branch is taken!). An easy solution is to always make sure the instruction after a branch has no effect (hence the nop instruction).
  2. The div instruction is assembled to include code to trap on divide by zero. This code unexpectedly overwrites the first register passed to the div instruction. To disable this divide by zero check, use the three argument div instruction with the first register as $zero. For example: div $zero, $rs, $rt is equivalent to div $rs, $rt. You can access the result result using the mflo and mfhi instructions. Remember that the div instruction leaves the quotient in register lo and the remainder in register hi.

When you believe you have a working is_prime function, change the #define in primes2.c to print only the first 10 primes for testing. You can compile both files in one line with the command below. Create a Makefile rule to build your program using this command.

The C compiler knows that a .s file contains assembly code, so it will compile your C file, assemble your .s file, and invoke the linker (ld) to put them together into an executable. It is important that you can get through this invocation of gcc without any warnings or errors. If gcc reports anything wrong, make sure to fix all warnings and errors before proceeding.

Finally, after you verify that the results are correct, change the #define to again print the first 10,020 primes. Again, use time to benchmark your program. Hopefully you saw a decrease in the execution time. Note these times (real, user, and sys) in another comment in your primes2.c file. Include a few sentences that speculate why your assembly code is faster (or slower) than the code generated by the C compiler.

When all of this is completed, make sure to add primes2.c, is_prime.s, and your Makefile to git and push to gitlab.

Exercise 2: Revisiting 1-dimensional Arrays

In this exercise you will continue to work with arrays in assembly. To prepare we start with a little more background on arrays in C and assembly.

An array is a data structure in which every element has the same type and, therefore, each element occupies up the same number of bytes in memory. This characteristic allows us to calculate easily the address of each individual array element; all we need to know is the size of each element in bytes.

For a one-dimensional array of an arbitrary element type T, we calculate the memory address of the i-th element using the formula:

(base address + sizeof(T)*i),

where base address is the address of the first element. The term sizeof(T)*i can be viewed as the offset from the base address in bytes. C gives you the ability to do arithmetic with pointers and gives you the address of the base of the array (a pointer) in the base of the array, which means that you can implement that formula without much effort.

Consider an array of unsigned integers, as in the example below.

You can access an arbitrary element between 0 and 99 by indexing with square brackets ([ ]), as always, but you also have an alternative. For instance, to address element i, you could simply do:

The C compiler knows that since you’re working with a pointer to unsigned int, when you add i to the base address of the array (number), you actually mean to consider an offset from the base in the total bytes occupied by i elements. The number of bytes in each individual element, sizeof(unsigned int)is implicit in C pointer arithmetic. If you code a similar address calculation in MIPS assembly, you have to be explicit with the byte size of individual elements.

Consider a C string, as in the example below. Remember that in C, strings are simply null terminated arrays of characters (one byte each).

The array occupies a total of six bytes of memory, the last one being the sentinel value \0 (zero) indicating the end of the string. You can access the fourth character of the C string by indexing and also by pointer arithmetic:

In MIPS assembly, you would have to remember that each char is one byte long and compute the byte address of your target element using the formula given earlier:

To practice this, write a C program in the file counte.c. This program should prompt the user to enter a string and then pass the string to the function counte which you will write in MIPS assembly in another file called counte.s. Your assembly procedure will accept one argument: a pointer to a C string containing the text entered by the user. The function will traverse the string it receives while keeping count of the number of e’s encountered and returns this value to the caller (i.e., the C main function). When the C program receives this return value from counte, it should print it out. Your output should look like the output below. Add a rule to build counte in your Makefile. (In the above MIPS assembly code, multu is multiplication with unsigned values.)

You need to compile the counte.c in conjunction with counte.s to make the executable counte in a similar way to what you did for primes2. When you are done with this problem, make sure to add counte.c, counte.s and your updated Makefile to git and push to gitlab.

Exercise 3: 2-Dimensional Arrays

Say you want to create a two-dimensional array of three (x, y)-coordinates, such that x and y are integers. You might declare this in C as:

You can think of this as the matrix below with 3 rows and 2 columns:

Element points[0][0] holds the x-coordinate of point 0 and element points[0][1] holds the y-coordinate of point 0. Similarly, points[1][0] holds the x-coordinate of point 1 and points[1][1] holds the y-coordinate of point 1, and so on. This is the view of the memory organization you have from the perspective of the high-level language.

From the perspective the machine, however,  the points array is laid out in memory as a one-dimensional array of words. C uses row-major order, which means the rows are laid out in memory one after another. So, in memory, we simply see: 0, 0, 3, 8, 5, 9. In assembly, the equivalent declaration is:

When you work with the array points in assembly, you need to remember it is two-dimensional and consider that each element in the array specifies a “point,” which occupies two words of memory (i.e., 8 bytes). If you wish to read or modify the x or y coordinate of a particular point, you have to compute the effective address of this point by adding an additional offset to the base address of the array.

In general, to address a particular row and column in a 2-D array the offset can be calculated from the equation below where sizeof(t) returns the size of one element and NUMCOLS is the number of columns in the array.

offset = (row * NUMCOLS * sizeof(T)) + column * sizeof(T)

To practice using matrices in assembly, copy the code below into the file matrix-sum.c (or copy it from the ~cs206/Labs/Lab07 directory). This program declares three 2×3 matrices: A, B, and C. The intent is to add them together. Matrix addition simply requires summing each corresponding element as shown below.

You will notice that the function matrix_sum is missing. Implement this using MIPS assembly in a file called matrix-sum.s. Add a rule to compile matrix-sum to your Makefile and then debug your program until you have a functioning matrix_sum for 2×3 matrices in MIPS assembly.

When you are done with this problem, make sure to add matrix-sum.c, matrix-sum.s, and your updated Makefile to git and push to gitlab.

Exercise 4: Structures

High-level languages provide the array construct to as a mechanism to build homogeneous aggregates of elements. While an array may have one or of multiple dimensions, all its elements have the same data type.

When one needs to create aggregates of elements of arbitrary types, a different data structure is needed. C allows one to define structs, which are user-defined types that allow the programmer to put together elements of arbitrary (possibly heterogeneous) types.

You can combine the two ideas to make arrays of structures: each element of the array can be an aggregate of elements of arbitrary data types. When we look at the array, however, we see that all its elements have the same type definition. For example, a pair of Cartesian coordinates (x, y) can be defined as a C struct using the syntax below.

Note that the syntax above only defines the type, it does not reserve any memory for variables. If you want to declare a variable separately from the type definition (later in the same file or possibly in a different file), you must write:

You can refer to the components of a struct point with the “dot notation;” for instance: my_point1.x and my_point2.y.

You can also declare an array where each element is a struct point:

Additionally, if you want to define the type struct point and use it to declare two variables called my_point1 and my_point2 plus an array of points called my_point_array, all in one fell swoop, you could write:

Although this isn’t the most readable, a better coding practice would be to declare one variable at a time as in:

Note that to define the storage of an array of struct point in memory, the C compiler uses an organization similar to that of a two-dimensional vector of points.

For a concrete example of working with structures, we will use the struct tm defined in the system include /usr/include/time.h and shown below:

This structure is one way to represent a date/time value in C. The other is to count the number of elapsed seconds since January 1st 1970 (a.k.a. Epoch). The C standard libraries also provide functions to get the local time and move between the two representations (see man ctime). The C functions we will use are:

  • time – returns the time as the number of seconds since the Epoch (as a time_t type — a system-specific 1-word sized integer)
  • localtime – converts a pointer to a time_t integer to a statically allocated (elsewhere) and struct tm structure using the local timezone information.
  • asctime – takes a pointer to a struct tm and returns a nicely formatted null terminated string like Fri Feb 19 22:04:16 2016.

Copy and paste (or copy from ~cs206/Labs/Lab7/time.c) the basic C program below. When executed, this program will print the current date/time to the terminal.

After the printf add a loop to call the new function add_second(tm) 2,345 times which is 39 minutes and 5 seconds. Then call the printf again to print the time after your loop. The function add_second adds one second to the struct tm (passed via a pointer) each time it is called, rolling over from the seconds member to minutes and then hours as necessary. At 23:59 the time should reset to 00:00. You do not need to increment the date. Of course you will implement add_second in assembly in the file time.s, add the appropriate test code in time.c, and add an entry in your Makefile to compile the binary program time.

When you are done with this problem, make sure to add time.c, time.s, and your updated Makefile to git and push to gitlab.

Files to submit: You are to submit the following files to your git repo.

  • is_prime.s
  • is_prime2.c
  • counte.s
  • counte.c
  • matrix-sum.s
  • matrix-sum.c
  • time.s
  • time.c
  • Makefile


[prelab: 25 points total]:

  1. [10 points] Exercise 0: zyBook activities >80% complete.
  2. [5 points] Exercise 1; successfully modified exit.s.
  3. [10 points] Exercise 2: primes C created with correct is_prime function [5 points]. Benchmark results from mips machine for 10,020 primes added to primes.c as a comment [5 points].

[Lab: 75 points total]:

  1. [15 points] Exercise 1: is_prime.s works and benchmarked in primes2.c and Makefile builds primes2.
  2. [20 points] Exercise 2: counte implemented in C/asm and compiles with Makefile.
  3. [20 points] Exercise 3: matrix-sum.c/s correct and compiles with Makefile.
  4. [20 points] Exercise 4: time.c/s correct and compiles with Makefile.


Print Friendly
Posted in Lab Tagged with: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *


This blog is kept spam free by WP-SpamFree.