Notice: Constant FORCE_SSL_ADMIN already defined in /nfs/unixspace/linux/accounts/COURSES/cs206/public_html/sp17/wp-config.php on line 94
Lab 4 | CSCI 206 Computer Organization & Programming

Lab 4

Intermediate C Programming

In this lab you will practice writing and debugging complete C programs with arrays, strings, functions, and file IO. You will learn how to compile C programs with multiple files using a Makefile.

Goals

After completing this lab you should be able to:

  • Use proper variable types and convert between types when needed.
  • Write correct format strings (printf) for various types.
  • Use multidimensional arrays.
  • Compile multiple source files into a single executable using a Makefile and appropriate preprocessor commands.
  • Access command line arguments at runtime.
  • Define and call functions including string, array, and pass-by-reference arguments.
  • Use the functions in string.h to process text data.

Exercise 1: Calculator arguments

Before starting actual programs, go to your ~/csci206/Labs directory, create a Lab04 sub-directory. Put all your program files under this directory.

Command line arguments are an important way to pass information to a program. Most system tools rely on them (for example, gcc, make, cp, etc…). To practice using command line arguments and converting data types, we’re going to make a simple calculator program. In the file calc.c, develop a calculator that works from the command line. The first command line argument is the operation to perform. The supported operations are listed below. The second and all following arguments are the values to operate on. These may be integers or floats for the arithmetic operations or a string for the length operation. If there is no supported operation, a suitable error message is displayed.

  • add – adds numbers (int or float)
  • mult – multiplies numbers (int or float)
  • div – divides numbers (int or float)
  • len – prints the total length of all strings

For example, several outputs are shown below.

zyBook section 6.15 describes how to access command line arguments. Section 6.10 describes how to compare strings. Also read the man page for atof to convert an ASCII string to a floating point number.

You may assume that the values are in proper order, that is, your program doesn’t have to check errors such as division by zero.

Be sure your program compiles without warning with -Wall flag (gcc) or CFLAGS=-Wall (make).

Finally, add calc.c to your git repo before continuing.

Exercise 2: C File IO

Files are just an abstraction for storing and accessing data on our computer systems. Through some possibly mysterious process, our file data is translated to bits on a disk and tagged with a convenient path and filename so that we can reverse the process and retrieve the data later. In the olden days (i.e., the early 90’s), we relied on IDE (Integrated Drive Electronics) drives with rotating platters holding a few 10’s of megabytes of data. Modern disks might use rotating platters or flash memory and store terabytes of data, typically using the SATA interface. Thankfully, as programmers we don’t have to be aware of the physical details because they are hidden behind APIs, which create a comfortable abstraction. This allows us to work in exactly the same way with a variety of types of disk.

The most basic access to files is provided directly by operating system functions named system calls (or syscalls). The operating system hides the complexity of handling each different type of physical disk and presents the programmer with a consistent API to access files. In Unix-like operating systems, this API corresponds to the functions creat(2), open(2), read(2), write(2), and close(2). Other operating systems (Windows) have similar (but not identical) mechanisms.

Read the man page for each of these five functions to get a good grasp of what they do and how they are used. Be sure to get the right man page: the number in parentheses after each function, as written above, indicates that they are all in section 2, so type, man 2 <function>. Create a text file called notes.txt and write a few sentences to describe each function (creat, open, read, write, close) in your own words. Use this file as a place to keep your organized notes, so that you can quickly remember how to use these functions later.

Now, we will practice using these functions. Our goal is to create a program with functionality similar to the system program head(1) – notice that this in section 1 of the Linux manual, which is reserved for system commands. You might want to take a moment to read this man page. Note the real head program reads a number of lines, the version we’re asking you to write reads a given number of bytes (or characters) from the file.

Create a file called head.c to store your C source code. The main program is given above, you do not have to modify the main, just implement the read_file_bytes function. A few things to notice in the main program is how the constant MAXBYTES is used, arguments are checked. If any argument is invalid a useful error message is printed and the program exits with a negative return value. Most system calls and libraries use this convention. Once you complete this program (compile with either make or gcc directly), we will want to run it as in the example below (this will use the default value of 10 bytes to print):

$ ./head notes.txt

Hint: read does not automatically add a null terminator ( \0 ) to the data you will read from the file. To pass data from read to a function expecting a null terminated string, you must ensure there is a null terminator.

Make sure your program compiles without warning.

When you are done with this exercise, add notes.txt and head.c to your git repo. 

Exercise 3: Compiling multiple source files

As our programs get longer and more complicated, we will want to separate our source code into multiple files. Copy the file head.c to head2.c. Cut the function read_file_bytes from head2.c and paste it in the file fileio.c. In fileio.c you will need to copy the appropriate  #include<> lines from head2.c.

The process to compile multiple files with gcc can be accomplished by listing each .c file on the command line as shown below. When you type this command you should get the same warning message.

This warning is caused by the call to read_file_bytes in head2.c. When gcc is compiling, it compiles each file independently of the input files. In the file head2.c there is no declaration of the read_file_bytes function. This can be solved by creating a header file (a file with a .h extension) containing the function declaration (terminated with a semicolon), as shown below. Notice we added an include guard (ifndef/define/endif statements) to protect from multiple inclusions. Essentially if multiple files in the same program #include your header the program might end up with multiple copies of the headers. These statements use the preprocessor to ensure this won’t happen. It is good practice to always add include guards in header files. Copy the following lines in a header file fileio.h

Now in head2.c, we can add #include "fileio.h"  to the list of includes at the top of our program. Notice we’re using double quotes because this file is in the local path (not the system path). Now, when gcc is processing your head2.c file, it will include the declaration of read_file_bytes. Your head2 executable should now function exactly the same as the head program from the previous exercise.

We could continue to use gcc to compile multiple input files but a better approach is to use make. We have previously seen how to use the implicit make rules to compile single file programs but this won’t work for multiple source files. To compile multiple source files with make we have to create a Makefile. A Makefile is just a textfile with the name “Makefile”, with no extension. The make program automatically looks for a Makefile in the current directory containing build rules to override the default (implicit) rules. The zyBook section 7.18 has a short overview about Makefile rules.

A basic Makefile for our program is below. This shows the target (head2) depends on three files: head2.c, fileio.c, and fileio.h. When the target needs to be rebuilt it executes the gcc line we used before. Write the following lines into a file named Makefile. Note the command must be indented with a tab (spaces do not work)!

This Makefile works but we can do better. For one thing this compiles both .c files when anything changes. If these were large files, this might result in a lot of duplicated work. To avoid this we can compile our source files to object files (with an .o extension) and link the objects to an executable separately. This means if only one source file changes only the changed object file would need to be rebuilt. While we’re making improvements, we’ll also use the standard Makefile variables for flags and compiler options.

Below is a Makefile that uses the standard variables for the compiler and flags and splits the compilation into an object file for each .c file. The target for head2 now depends on the object files (.o) but otherwise looks very familiar. This works because implicit make rules exist to compile a single object file from a single .c file.

You will notice the output from this will be:

The first two gcc calls were generated by the implicit rules. Our makefile rule depends on head2.o and fileio.o. The implicit rule is:

This says to build an .o file from a matching .c file, run  $(COMPILE.c) $(OUTPUT_OPTION) $< . The COMPILE.c variable is defined as  COMPILE.c = $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c . This explains the first two lines of gcc output. Notice the -c  flag passed to gcc. According to the man page, this option means “Compile or assemble the source files, but do not link.” Once all of the objects are compiled we will link them separately (using the first custom Makefile rule).

However, there is now a slight problem. Nothing in these rules captures the dependency on the fileio.h header. This means if we changed only the header file make would not recompile anything. This could cause hard to find bugs later. To capture this dependency we need to write our own rules to for head.o and fileio.o that depend on the .c file and  fileio.h.

To receive full credit for this section, modify the Makefile to have an explicit rule to build head2.o and fileio.o. Both of these rules should depend on fileio.h (and their matching .c file). Make sure your Makefile can generate proper object files and executable files before moving on.

Don’t forget to add head2.c, fileio.c, fileio.h, and Makefile to your git repo.

Exercise 4: C File IO with the standard library

The API we used in the previous exercises is rather low-level, it doesn’t support reading lines, formatted data, or buffering. As you have used printf to create nicely formatted output from your programs to the screen, you might be wondering if there isn’t something like that to write (or read) files. In fact there is: the C Standard Library fprintf(3) fits that bill perfectly. (Did you notice that ‘3’ after the function name? It indicates that fprintf is in section 3 of the Linux manual, which aggregates library functions.) If you take a moment to read about this function, you will notice that the first parameter we must pass to it is a FILE * data type. Whereas syscall file IO works with file descriptors, the C library functions for file IO all work with a pointer to this ‘special’ data type called FILE.

Take some time to read the man pages of functions fopen(3), fgets(3), fprintf(3), fseek(3), and fclose(3) and, again,write a few sentences in your own words describing each one of them in your notes.txt file.

The best way to learn to work with new syscalls or library functions is to experiment with them in the context of a small program. If  you’re expecting to be asked to do exactly that, you’re right! We will keep it simple, though: make a copy of head2.c to another file called head3.c, which you will modify to work exclusively with C library file IO functions.

Differently from read(2), fgets(3) allows you to read complete lines from a file. With this function, you will implement in head3.c exactly the same functionality of head(1): in this case now, your program must print the first 10 lines of the file (instead of the first 10 bytes) whose name is passed to the program as a command line argument.

Since the actual file IO work is done in your fileio module, add a new function, read_file_lines with the appropriate arguments. One argument should be an array to store the lines. You must allocate an array of 10 lines. Your program should assume the maximum line length is 1024 bytes. If the line length is longer than this, your program should truncate the line (not crash!). Also be sure you program works correctly if there are fewer than 10 lines in the specified file! With these assumptions, head3.c is even simpler than head.c!

Because we haven’t talked about dynamic memory allocation, we’ll keep it simple and not include the optional argument for number of linesWhen you run your program as below:

$ ./head3  notes.txt

the output produced be the first 10 lines in your notes.txt file (or the entire file if there are fewer than 10 lines).

Be sure to add rules in your Makefile to build head3!

When you are done with this exercise, add head3.c to your git repo.

Exercise 5: CSV, arrays, and tokens

Now that we’ve mastered file IO, we can use files to bring real-world data into our programs. The file 2015-10-01-data-raw is to a comma-separated values file that is generated from one of our campus weather stations (pictured below) for a single day. If you import this into a spreadsheet you will notice there are a number of columns where each row is a particular reading of various weather parameters. The first row defines what each of the columns are. If you open this file with your favorite text editor (Yes, CSV files are text files.) you will see it has the same information and faithful to the name, each column is separated by a comma (‘,’) character. Rows are separated by a newline.

In this exercise we are going to begin analyzing this data. The first thing we might want is the average of each parameter. We will eventually create a program that takes as a command-line argument the filename of the CSV file to process. The program will read the CSV file to an array of floats and then compute the average of each column. Don’t worry, we’re going to walk you through this process.

Since we haven’t discussed dynamic memory allocation we will have to allocate memory to hold the CSV file data at compile time. To help out, in fileio.h add the following lines (inside the include guard).

This will make it easy to adjust the maximum CSV file size later. The value MAXROWS is the maximum number of rows our program will process. Similarly, MAXCOLS is the maximum number of columns to consider. The last value, MAXLEN is the maximum length of a single entry (cell) in the CSV file. These values are enough to process the input file given above.

We give you most of the main program below (see the TODO). Save this to the file csv_avg.c. Read through it to understand the implementation. Save for the function marked TODO this complete and as far as we know, free from bugs.

You will notice this program calls to functions: read_csv_row and read_csv_cols.  These functions are up to you to implement in fileio.c. The function signatures are (put this in your fileio.h):

Both of these functions accept a FILE* from the standard library. read_csv_row reads a single row (line) from the file (fd). The line is split using the comma character to separate each element. The elements are copied into the provided array of strings called row_strings. The return value from this function is the number of columns found (i.e., number of extracted tokens) or -1 if there are no more lines to read in the file. read_csv_cols will read all remaining data from the CSV file. This function should call read_csv_row using a temporary array of buffer strings (e.g., char tmp_strings[MAXCOLS][MAXLEN]) until the end of file is reached. For every row, the function will iterate through the extracted column string and convert it to a floating point value using the atof function. If you read csv_avg.c, you’ll notice that the FILE* fd variable passed into both functions is already associated with an open file, so you should not open or close the file within the read_csv_cols and read_csv_row functions.

When reading a row from a CSV file, you’d read the entire row first. You then need to tokenize the line, i.e., separate the line into a collection of tokens. You will need to use the library function strtok(3). Read the manual page to find out how this function can be used. Another note is that the very first column in the given CSV file is the time stamp when the data entry was recorded. In general, it doesn’t make much sense to average the time stamp values. But in this exercise, we will average the time stamps anyway. What the function atof will do to a time stamp value is something you can find out, either by reading the manual page for atof carefully, or try a small program to see what atof will do to a time stamp value.

Advanced option [+5 bonus points]: Because some data may be missing, replace any missing data with the special value NAN. NAN stands for Not A Number and is defined in math.h. The proper way to check if a number is NAN is to use the isnan function (macro). Here is a datafile 2016-01-29-data-raw with missing values (the given file above has no missing data).

Hints

  • in our solution, both of these functions are fewer than 20 lines each. If your solution is getting long, think about a better way.
  • add debugging statements (both to the main program and your function) to validate your assumptions.
  • use fgets to read one line at a time, check the return value to see if you hit the end of file!
  • every line will end with a newline character, you probably don’t want this, replace it with a null terminator like this:  buf[strlen(buf)-1] = 0; The string will now have two null terminators, but that’s OK!
  • split the string into tokens using strtok (example).
  • your tokens may have leading spaces, remove them with  while (tok[0] == ' ') tok++; . We’ll talk more about this when we cover pointers.
  • copy strings with strncpy (you must copy each string token into your array of strings).
  • convert tokens to floats using atof.
  • add appropriate lines in your makefile to compile this program.
  • Advanced option: when computing the average, you have to ignore NAN values, use the isnan function.

Pseudocode for read_csv_row:

Pseudocode for read_csv_cols:

Advanced option[+5 bonus points]: When computing the average, you might consider an iterative algorithm. This avoids the possible loss of precision by summing all values and then dividing by N. .

With the given input file, the correct output is: (due to different ways of computing average, the exact values your program generate may differ slightly from the following values.)

Here is another datafile for the entire month of October 2015-October-data-raw. You will need to increase MAXROWS to process this file! The results should be:

Be sure to add rules in your Makefile to build csv_avg!

When complete, be sure you add csv_avg.c to your git repo and push to gitlab.

Grading Rubric

  • [25 points] Prelab zyBook activities.
  • [10 points] Exercise 1: calc.c program works as described and follows good coding conventions. -5 if it does not compile or has warnings. -1 for each minor error. -2 for each significant error.
  • [10 points] Exercise 2: head.c program works as described and follows good coding conventions. -5 if it does not compile or has warnings. -1 for each minor error. -2 for each significant error.
  • [10 points] Exercise 2: notes.txt created and correctly describes creat, open, read, write, close (2 points each).
  • [5 points] Exercise 3: Makefile created to compile head2.c using head2.o and fileio.o. -5 points for: gcc Wall head2.c fileio.c o head2.
  • [5 points] Exercise 3: read_file_bytes removed from head2.c and added to fileio.c. fileio.h has correct declaration with include guard.
  • [10 points] Exercise 4: notes.txt correctly describes fopen(3), fgets(3), fprintf(3), fseek(3), and fclose(3).
  • [10 points] Exercise 4: head3.c created and uses read_file_lines in fileio.c/.h to read lines using the standard libraries. -5 if it does not compile or has warnings. -1 for each minor error. -2 for each significant error.
  • [15 points] Exercise 5: csv_avg.c created; average, read_csv_row, and read_csv_cols implemented as described program works and follows good coding conventions. -5 if it does not compile with the Makefile or has warnings. -1 for each minor error. -2 for each significant error. +5 bonus points for implementing an iterative average (mean) function. +5 bonus points for handling missing values with NAN.
Print Friendly
Posted in Lab

Leave a Reply

Your email address will not be published. Required fields are marked *

*