## Floating point operations in assembly

## Objectives

- Apply your understanding of how floating point numbers are represented
- Practice writing MIPS programs with floating point operations

## Introduction

When we program with high-level languages, it’s very easy to overlook the fact that some complex operations might be happening at the machine code level. By hiding that complexity, abstractions afford us a good measure of comfort. It is important, however, to experience what goes on at the lower level so that we can have a thorough understanding of how the computer works.

This lab will illustrate how our programs can mix single-precision and double-precision types in arithmetical operations. You will learn to convert from single to double precision and to compare floating point values properly. You will also be reminded of some details of the IEEE 754 standard for floating point number representation.

To this end, you will continue to use the **mips.bucknell.edu** machine. Remember that this machine is slower and that it will be shared with multiple people. **You are better off editing code on a text editor that runs on your lab machine.** Again, you will write a main program in C which calls functions coded in assembly language. It is easier to use C to write any I/O operations needed in your program. In general to compile use:

**$** **gcc program.c program.s -o program**

Within the same chip, floating point in MIPS is handled by a separate **co-processor** names called **coprocessor 1**, or **CP1** for short. (The main processor is the *system co-processor* named CP0, which works exclusively with integer arithmetic and logic.) The MIPS Architecture specification goes into all the ~~gory~~ lovely, details. You may want to read the document at some point, but this lab will walk you through the important points.

Floating point operations are handled by a separate processor, which has its own registers. Just like the main processor, CP1 has 32 x 32-bit registers. These are named **$f0 **through** $f31**. To store a double-precision float (64-bits) you use two consecutive 32-bit registers. **Double precision values must begin with an even register.** Effectively this means that there are 16 double precision registers $**f0, $f2, $f4 … $f30**.

Thinking about how we encode the identifiers to these registers in machine code, you will remember that MIPS instructions contain 5-bit wide fields to identify registers. It is not possible to use those 5-bit register ids to refer to more than 32 registers so MIPS defines an entire set of **floating point instructions** that implicitly operate with the **$f **registers. The mnemonics for these instructions typically look like an integer instruction with a **.s** or **.d** suffix to indicate whether they apply to single or double precision numbers. The instructions suffixes available to you in MIPS assembly are:

**.s (single precision float)****.d (double precision float)****.w (32-bit integer “word”)****.l (64-bit integer)**

Here is an example of how suffixes are used. If you want to load a single precision floating point value from memory address **data **into** $f0 **and add it to **$f1**, you would write**:**

**l.s $f0, data
add.s $f0, $f0, $f1**

One last thing we need to know is the register usage conventions, which are specified in the MIPS32 ABI specification and summarized as follows:

**$f0..$f3**–**Return values**. Single precision uses $f0, $f1. Double precision uses register pairs beginning at $f0 and $f2.**$f4..$f11**–**Temporary registers**(not preserved across function calls).**$f12..$f15**–**Argument registers**. Two single or double precision arguments in $f12 and $f14.**$f16..$f19**–**Temporary registers**(not preserved across function calls).**$f20..$f30**–**Saved registers**(preserved across function calls).**$f31**– control/status register. Used for comparisons, rounding, and exceptions. (**Do not use**unless you know what you’re doing!)

## Exercise 1: Half full or half empty?

To get started, we are going to write a simple program to print the value of one-half (i.e., 0.5). Below is the C driver program to perform the required IO followed by the assembly code for the one_half functions. These files are also in **~cs206/Labs/Lab10/floats.c** and **~cs206/Labs/Lab10/floats.s**. **Copy these files into your Lab10 folder.**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
/* * CSCI 206 Computer Organization & Programming * Author: Alan Marchiori * Date: 2014-03-01 * Copyright (c) 2014 Bucknell University * * Permission is hereby granted, free of charge, to any individual or * institution obtaining a copy of this software and associated * on files (the "Software"), to use, copy, modify, and * distribute without restriction, provided that this copyright and * permission notice is maintained, intact, in all copies and supporting * documentation. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. * IN NO EVENT SHALL BUCKNELL UNIVERSITY BE LIABLE FOR ANY CLAIM, DAMAGES * OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #include <stdio.h> /* Forward declaration for our assembly functions * so gcc knows the return type (the return type defaults * to int so, it would look in the $v0 register w/o this). * Knowning the return type is a float/double gcc will * look in the $f0 (float) or $f0-$f1 (double) registers. */ float one_half_single(void); double one_half_double(void); int main() { printf ("0.5 (single) = %f\n", one_half_single()); printf ("0.5 (double) = %lf\n", one_half_double()); return 0; } |

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
.text .global one_half_single .global one_half_double .align 2 # word align # one_half_single is a function # that loads a signle precision float # from memory and returns the loaded # value to the caller. one_half_single: la $t0, ohs l.s $f0, 0($t0) jr $ra nop # one_half_doubel is a function # that loads a double precision float # from memory and returns the loaded # value to the caller. one_half_double: la $t0, ohd l.d $f0, 0($t0) jr $ra nop .data .align 2 # word align # the single precision float ohs: .word 0x00000000 .align 3 # double word align # the double precision float ohd: .word 0x00000000 .word 0x00000000 |

Next, compile and run this program with the commands:

$ gcc floats.c floats.s -o floats $ ./floats 0.5 (single) = 0.000000 0.5 (double) = 0.000000

You should not get any compiler errors or warnings. **If you do, ask for help before proceeding**. You should notice that the output is incorrect (0.000000 != 0.5).** Modify the .data segment declarations in floats.s** to store the value of 0.5 in the memory positions labeled by **ohs** and **ohd** using the encoding for the proper type (single or double). Remember that your encoding for sign, exponent, and fraction must follow the IEEE 754 standard, which is shown in your MIPS reference sheet. Note that you must write the value for 0.5 in **hexadecimal** (**do not use the .float or .double macros** to do the conversion for you). You will want to compute the proper values by hand before modifying the program. Compile and run the program so that it prints 0.5 for both cases.

When you are confident that your program is working, add floats.c and floats.s to git and push to gitlab.

## Exercise 2: Floating-point inspection

As you saw in Exercise 1, converting to/from floating point by hand takes some work. However, it isn’t too hard to write a program to extract the various fields from a float.

In your **floats.c** file, create the C functions **inspect_float** and **inspect_double**. These functions take one argument and print the **sign, exponent, and fraction** components of the encoded number. To test these functions, call them with **one_half_single()** or **one_half_double()** as the argument. We want to generate output like what is shown below.

0.5 (single) = 0.500000 sign = 0, exponent = 0x7e, fraction = 0x000000 0.5 (double) = 0.500000 sign = 0, exponent = 0x3fe, fraction = 0x0000000000000

We can accomplish most of this with some bit masking and bit shifts as in the prelab. For the fraction, though, we have to convert the value to an integer, which is trickier. Your first instinct may be to type cast (i.e., **int a = (int) double**), but this won’t work because C converts the double to the nearest integer (truncate). We want to create an integer with the same underlying binary value.

In C, we could accomplish this with a **union**, as shown below:

1 2 3 4 5 6 7 8 9 10 |
unsigned f2u(float f) { union { unsigned u; float f; } v; v.u = 0; v.f = f; return v.u; } |

The union data structure overlays two representations of data over each other. Although this looks somewhat complex in C, it is easier to write in assembly. The entire function **f2u** can be implemented with **2 assembly instructions**. When **f2u** is called, the floating point value is in register **$f12** (following the floating point calling conventions) and the result is expected to be returned in **$v0**. All our function has to do is move the value from **$f12** into **$v0 **and** return**!

To accomplish this we need to move data between **CP1** registers and **CP0** registers, which requires MIPS instructions that are not on your green sheet. They are **mfc1** and **mtc1:**

**mfc1 rt, fs – move from CP1 to CP0**. Regs[rt] <= Regs[fs]**mtc1 rt, fs – move to CP1 from CP0**. Regs[fs] <= Regs[rt]

**Notice** that these instruction are a bit odd. **The first register is always the CP0 register and the second is always the CP1 register**. So, when you use **mtc1**, *the destination is the 2nd argument*.

**Implement the functions**: **f2u, u2f, d2u**, and **u2d** in **floats.s**. In **floats.c** add **inspect_float** and **inspect_double **with** one_half_single() **or** one_half_double()**** **as the argument.

Some things to watch out for:

- In C on the MIPS machine, an unsigned 64-bit integer is represented as
**unsigned long long**. - Don’t forget to create function prototypes in C for all of your assembly language functions so gcc will use the proper calling conventions (float vs. integer).
- The
**printf**format for a 64-bit unsigned integer is**%llx**or**%lld**. - The
**printf**format for a double is**%lf**. - IEEE754 doubles are stored with the least significant word first (at the lower memory address), your
**d2u**function needs to perform a word-swap.

When you are confident that your program is working, add floats.c and floats.s to git and push to gitlab.

## Exercise 3: Floating point precision

Floating point numbers are a way to represent *real numbers* on the computer. Since real numbers have infinite, precision, we can’t do this exactly without an infinite number of bits in our representation. In the case of integers, the range of the numbers is bounded by a minimum and maximum value. This creates a finite set of numbers which we can uniquely map to binary values. In floating point, this doesn’t work because even if we pick a **small** minimum and maximum, say -1.0 and 1.0, there are still an infinite number of values within this range. As a result, we can only represent a subset of the real numbers within range defined by the minimum and the maximum values.

To explore this, add the function** precision **to your **floats.c** file. Call this function at the end of your main program. Inside this function define a single precision float initialized with the value **1.0** (which can be exactly represented as a single precision float with exp = 0, fraction = 0). The next consecutive float has exp = 0 and fraction = 00000000000000000000001. This can be found by incrementing the binary representation of the floating point number by one. Using our **f2u** and **u2f** functions to access the raw data you could write:

**u2f(f2u(1.0) + 1)**

Use this technique to find the next single precision float after 1.0. **Between these numbers there is no other single precision float**. Print out this value using your **inspect_float **function. **Document this value with a comment inside the precision function**. From what you know about the floating point representation, do you think the difference between all consecutive floating point numbers is the same? **Add a few more lines of comments** describing how you think floating point precision changes with the value be represented (i.e., small numbers vs large numbers). **Add code to precision that demonstrates your conclusions**.

When you are done, push your updated floats.c to gitlab.

## Exercise 4: Sum

In your **floats.c** file, add the function **sum** for which the code is given below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
void sum() { float a = 0.1; float sum = 0; int i; for (i = 0; i < 1000; i++) { sum += a; } printf ("a = %1.28f, sum = %1.28f, sum == 100 ==> %s\n", a, sum, sum == 100 ? "TRUE":"FALSE"); inspect_float(a); inspect_float(sum); inspect_float(100-sum); } |

The procedure loops 1,000 times adding the value 0.1 to sum. The result should be **100**, but you will observe that is not the case, when you run the program . **In the sum function, add comments to explain why not. **Add calls to** inspect_float** to help explain the problem.

This demonstrates why **not to use == (equals) when comparing floating point numbers**. Remember that floating point values are **approximations of real numbers** and with approximations, there is error. Arithmetic can cause the approximation errors to accumulate. To help avoid this problem, examine the difference between two floating point numbers to determine if they are close or not. How close you might consider equal is dependent on the expected inputs.

In **floats.c**,** **you will create a new function called** is_near** to replace the **==** comparison in the **sum** function. The **is_near** function takes three arguments: the first two are the numbers to compare, and the last is the equality threshold, **epsilon**. The function returns true if the absolute value of the difference is less than epsilon. Adjust the value of the constant **epsilon** so that the desired result (the summation should equal to the value of 100) is achieved. Note that in C, integer value zero (0) represents *false*, any other integer value represents *true*. Use this convention in your **is_near** function. Also note that C has separate functions for absolute values of integers and floats. Find out how to use these functions using manual pages for **abs** and **fabs**. When complete the output should end with:

**…sum is_near 100 ==> TRUE**

Define **epsilon** as a constant near the top of the program where all function prototypes are defined. Vary the value of **epsilon** such that you can use the smallest value of **epsilon** to make above comparison become true.

When you are done, push your updated floats.c to gitlab.

**Submission**

Make sure you submit the following files to your git repo.

- bits.c (should’ve submitted already in prelab)
- floats.c
- floats.s

## Grading

[Lab: 75 points total]:

**[10 points]**Exercise 1: Modified floats.s to define the correct values for**ohs**and**ohd**.**[25 points]**Exercise 2:**inspect_float**and**inspect_double**work correctly. Implemented**f2u**,**u2f**,**d2u**, and**u2d**in floats.s.**[25 points]**Exercise 3: precision implemented. Comments have the correct value for the next float after 1.0. Comments discuss how precision changes for large/small numbers.**[15 points]**Exercise 4:**sum**added with comments describing why it doesn’t work as it. Implemented the**is_near**function and modified sum to use it with an appropriate epsilon.

## Leave a Reply