Lab 10

Floating point operations in assembly


  • Apply your understanding of how floating point numbers are represented
  • Practice writing MIPS programs with floating point operations


When we program with high-level languages, it’s very easy to overlook the fact that some complex operations might be happening at the machine code level. By hiding that complexity, abstractions afford us a good measure of comfort. It is important, however, to experience what goes on at the lower level so that we can have a thorough understanding of how the computer works.

This lab will illustrate how our programs can mix single-precision and double-precision types in arithmetical operations. You will learn to convert from single to double precision and to compare floating point values properly. You will also be reminded of some details of the IEEE 754 standard for floating point number representation.

To this end, you will continue to use the machine. Remember that this machine is slower and that it will be shared with multiple people. You are better off editing code on a text editor that runs on your lab machine. Again, you will write a main program in C  which calls functions coded in assembly language. It is easier to use C to write any I/O operations needed in your program. In general to compile use:

$ gcc program.c program.s -o program

Within the same chip, floating point in MIPS is handled by a separate co-processor names called coprocessor 1, or CP1 for short.  (The main processor is the system co-processor named CP0, which works exclusively with integer arithmetic and logic.) The MIPS Architecture specification goes into all the gory lovely, details. You may want to read the document at some point, but this lab will walk you through the important points.

Floating point operations are handled by a separate processor, which has its own registers. Just like the main processor, CP1 has 32 x 32-bit registers. These are named $f0 through $f31. To store a double-precision float (64-bits) you use two consecutive 32-bit registers. Double precision values must begin with an even register. Effectively this means that there are 16 double precision registers $f0, $f2, $f4 … $f30.

Thinking about how we encode the identifiers to these registers in machine code, you will remember that MIPS instructions contain 5-bit wide fields to identify registers. It is not possible to use those 5-bit register ids to refer to more than 32 registers so MIPS defines an entire set of floating point instructions that  implicitly operate with the $f registers. The mnemonics for these instructions typically look like an integer instruction with a .s or .d suffix to indicate whether they apply to single or double precision numbers. The instructions suffixes available to you in MIPS assembly are:

  • .s (single precision float)
  • .d (double precision float)
  • .w (32-bit integer “word”)
  • .l (64-bit integer)

Here is an example of how suffixes are used. If you want to load a single precision floating point value from memory address data into $f0 and add it to $f1, you would write:

l.s $f0, data
add.s $f0, $f0, $f1

One last thing we need to know is the register usage conventions, which are specified in the MIPS32 ABI specification and summarized as follows:

  • $f0..$f3 – Return values. Single precision uses $f0, $f1. Double precision uses register pairs beginning at $f0 and $f2.
  • $f4..$f11 Temporary registers (not preserved across function calls).
  • $f12..$f15 – Argument registers. Two single or double precision arguments in $f12 and $f14.
  • $f16..$f19Temporary registers (not preserved across function calls).
  • $f20..$f30Saved registers (preserved across function calls).
  • $f31 – control/status register. Used for comparisons, rounding, and exceptions. (Do not use unless you know what you’re doing!)

Exercise 1: Half full or half empty?

To get started, we are going to write a simple program to print the value of one-half (i.e., 0.5). Below is the C driver program to perform the required IO followed by the assembly code for the one_half functions. These files are also in ~cs206/Labs/Lab10/floats.c and ~cs206/Labs/Lab10/floats.s. Copy these files into your Lab10 folder.

Next, compile and run this program with the commands:

$ gcc floats.c floats.s -o floats
$ ./floats
0.5 (single) = 0.000000
0.5 (double) = 0.000000

You should not get any compiler errors or warnings. If you do, ask for help before proceeding. You should notice that the output is incorrect (0.000000 != 0.5). Modify the .data segment declarations in floats.s to store the value of 0.5 in the memory positions labeled by ohs and ohd using the encoding for the proper type (single or double). Remember that your encoding for sign, exponent, and fraction must follow the IEEE 754 standard, which is shown in your MIPS reference sheet. Note that you must write the value for 0.5 in hexadecimal (do not use the .float or .double macros to do the conversion for you). You will want to compute the proper values by hand before modifying the program. Compile and run the program so that it prints 0.5 for both cases.

When you are confident that your program is working, add floats.c and floats.s to git and push to gitlab.

Exercise 2: Floating-point inspection

As you saw in Exercise 1, converting to/from floating point by hand takes some work. However, it isn’t too hard to write a program to extract the various fields from a float.

In your floats.c file, create the C functions inspect_float and inspect_double. These functions take one argument and print the sign, exponent, and fraction components of the encoded number. To test these functions, call them with one_half_single() or one_half_double() as the argument. We want to generate output like what is shown below.

0.5 (single) = 0.500000
sign = 0, exponent = 0x7e, fraction = 0x000000
0.5 (double) = 0.500000
sign = 0, exponent = 0x3fe, fraction = 0x0000000000000

We can accomplish most of this with some bit masking and bit shifts as in the prelab. For the fraction, though, we have to convert the value to an integer, which is trickier. Your first instinct may be to type cast (i.e., int a = (int) double), but this won’t work because C converts the double to the nearest integer (truncate). We want to create an integer with the same underlying binary value.

In C, we could accomplish this with a union, as shown below:

The union data structure overlays two representations of data over each other. Although this looks somewhat complex in C, it is easier to write in assembly. The entire function f2u can be implemented with 2 assembly instructions. When f2u is called, the floating point value is in register $f12 (following the floating point calling conventions) and the result is expected to be returned in $v0. All our function has to do is move the value from $f12 into $v0 and return!

To accomplish this we need to move data between CP1 registers and CP0 registers, which requires MIPS  instructions that are not on your green sheet. They are mfc1 and mtc1:

  • mfc1 rt, fs – move from CP1 to CP0. Regs[rt] <= Regs[fs]
  • mtc1 rt, fs – move to CP1 from CP0. Regs[fs] <= Regs[rt]

Notice that these instruction are a bit odd. The first register is always the CP0 register and the second is always the CP1 register. So, when you use mtc1the destination is the 2nd argument.

Implement the functions: f2u, u2f, d2u, and u2d in floats.s. In floats.c add inspect_float and inspect_double with one_half_single() or one_half_double() as the argument.

Some things to watch out for:

  • In C on the MIPS machine, an unsigned 64-bit integer is represented as unsigned long long.
  • Don’t forget to create function prototypes in C for all of your assembly language functions so gcc will use the proper calling conventions (float vs. integer).
  • The printf format for a 64-bit unsigned integer is %llx or %lld.
  • The printf format for a double is %lf.
  • IEEE754 doubles are stored with the least significant word first (at the lower memory address), your d2u function needs to perform a word-swap.

When you are confident that your program is working, add floats.c and floats.s to git and push to gitlab.

Exercise 3: Floating point precision

Floating point numbers are a way to represent real numbers on the computer. Since real numbers have infinite, precision, we can’t do this exactly without an infinite number of bits in our representation. In the case of integers, the range of the numbers is bounded by a minimum and maximum value. This  creates a finite set of numbers which we can uniquely map to binary values. In floating point, this doesn’t work because even if we pick a small minimum and maximum, say -1.0 and 1.0, there are still an infinite number of values within this range. As a result, we can only represent a subset of the real numbers within range defined by the minimum and the maximum values.

To explore this, add the function precision to your floats.c file. Call this function at the end of your main program. Inside this function define a single precision float initialized with the value 1.0 (which can be exactly represented as a single precision float with exp = 0, fraction = 0). The next consecutive float has exp = 0 and fraction  = 00000000000000000000001. This can be found by incrementing the binary representation of the floating point number by one. Using our f2u and u2f functions to access the raw data you could write:

u2f(f2u(1.0) + 1)

Use this technique to find the next single precision float after 1.0. Between these numbers there is no other single precision float. Print out this value using your inspect_float function. Document this value with a comment inside the precision function. From what you know about the floating point representation, do you think the difference between all consecutive floating point numbers is the same? Add a few more lines of comments describing how you think floating point precision changes with the value be represented (i.e., small numbers vs large numbers). Add code to precision that demonstrates your conclusions.

When you are done, push your updated floats.c to gitlab.

Exercise 4: Sum

In your floats.c file, add the function sum for which the code is given below.

The procedure loops 1,000 times adding the value 0.1 to sum. The result should be 100, but you will observe that is not the case, when you run the program . In the sum function, add comments to explain why not. Add calls to inspect_float to help explain the problem.

This demonstrates why not to use == (equals) when comparing floating point numbers. Remember that floating point values are approximations of real numbers and with approximations, there is error. Arithmetic can cause the approximation errors to accumulate. To help avoid this problem, examine the difference between two floating point numbers to determine if they are close or not. How close you might consider equal is dependent on the expected inputs.

In floats.c, you will create a new function called is_near  to replace the == comparison in the sum function. The is_near function takes three arguments: the first two are the numbers to compare, and the last is the equality threshold, epsilon. The function returns true if the absolute value of the difference is less than epsilon. Adjust the value of the constant epsilon so that the desired result (the summation should equal to the value of 100) is achieved. Note that in C, integer value zero (0) represents false, any other integer value represents true. Use this convention in your is_near function. Also note that C has separate functions for absolute values of integers and floats. Find out how to use these functions using manual pages for abs and fabs. When complete the output should end with:

…sum is_near 100 ==> TRUE

Define epsilon as a constant near the top of the program where all function prototypes are defined. Vary the value of epsilon such that you can use the smallest value of epsilon to make above comparison become true.

When you are done, push your updated floats.c to gitlab.


Make sure you submit the following files to your git repo.

  • bits.c (should’ve submitted already in prelab)
  • floats.c
  • floats.s


[Lab: 75 points total]:

  • [10 points] Exercise 1: Modified floats.s to define the correct values for ohs and ohd.
  • [25 points] Exercise 2: inspect_float and inspect_double work correctly. Implemented f2u, u2f, d2u, and u2d in floats.s.
  • [25 points] Exercise 3: precision implemented. Comments have the correct value for the next float after 1.0. Comments discuss how precision changes for large/small numbers.
  • [15 points] Exercise 4: sum added with comments describing why it doesn’t work as it. Implemented the is_near function and modified sum to use it with an appropriate epsilon.
Print Friendly
Posted in Lab Tagged with:

Leave a Reply

Your email address will not be published. Required fields are marked *


This blog is kept spam free by WP-SpamFree.