Lab 10

The Unix File System Interface

Goals

  • Gain more practice in working with the Unix file system interface: This lab will have you work with functions to inspect various attributes of files, as you modify a program for traversing a directory tree.
  • Practice working with concurrency. Again.  This lab introduces you to a different kind of semaphore that can be accessed by processes that are unrelated by birth. You will learn to work with named semaphores, which use a couple of concepts related to files.

Credits

This lab assignment was adapted by Prof. L. Felipe Perrone based on previous work by other CSCI 315 instructors. Permission to reuse this material in parts or in its entirety is granted provided that this credits note is not removed. Additional students files associated with this lab, as well as any existing solutions can be provided upon request by e-mail to perrone[at]bucknell[dot]edu


Set Up

Copy to your lab directory all the files in:

~cs315/Labs/Lab10/

You should have now the following four files: file_stat.c, read_dir.c, read_dir.h, and Makefile.

  • file_stat.c will be used to build a program which takes a file name as a command-line parameter and reports various attributes of the given file (or directory) and the file system where the given file resides.
  • read_dir.c will be used to build a program which takes a directory as a command-line parameter. It will traverse the directory tree rooted at the given directory and print out the names of the files encountered.

Problem 1 (20 points)

Study the C source code in file_stat.c. The program is based on two calls: fstat(2) and fstatvfs(3); read the manual pages for these functions to learn what they do and how to use them.

The manual page for fstatvfs(3) shows that it returns various pieces of information on the underlying file system in which the given file resides. Using this call, you can learn the block size for the file system, the type of the file system, and the maximum length for a file name. The call returns all these data and more in an instance of struct statvfs (a pre-defined type), which you must have allocated previously. fstatvfs will receive a pointer to your instance of struct statvfs and fill up its various fields of file system information. As a self-check exercise, try to figure out whether this is a thread-safe call.

Reading about fstat(2), you will see that it returns a stat struct, which is another pre-defined type. This struct contains information such as the user id and the group id for the owner of the file, its protection bits (for user, group and other), the file size in numbers of blocks, and the times when it was last accessed, modified, and created. You must pass to fstat the pointer to an instance of stat struct, which you allocated previously; the system call will fill up the various fields with information on the specific file.

Reading the source code given to you, you will notice that you must open the file before you call either fstatvfs or fstat. The code contains missing portions that you will construct in this lab. Look for them where you find comments containing the string TO-DO. Once you have filled in the missing pieces, the output produced by your file_stat executable on file file_stat.c should be somewhat similar to what is presented below (minus the text in red):

$ file_stat file_stat.c

== FILE SYSTEM INFO ============================

file system fstatvfs() call successful

file system block size: 65536 <——————— For you to do

max. file name length: 255 <——————— For you to do

== FILE INFO ============================

file fstat() call successful

file protection bits = 0644

file protection string = rw-r–r– <————— For you to do

file protection mode (u:g:o) = 6:4:4  <————— For you to do

owner user name = perrone <———————– For you to do

owner group name = cs <————————— For you to do

mode = x <—– For you to do (x may be file, link, directory, socket, etc.)

time of last modification: Thu Nov 14 15:04:21 2014

time of last access: <————————— For you to do

time of status change: <————————— For you to do

First, work on file_stat.c and instrument it to print the information marked in red (that is, tackle the TO-DOs in the code). Next, augment the program to print the time of last access and the time of status change for the file given as command-line parameter. Note that you will need to read the man pages for getpwuid(3) and getgrgid(3) to learn how to translate numeric USER ID and GROUP IP to strings, respectively. Make sure to use versions of these calls which are thread-safe – Why you ask? Just to continue the practice of using functions that are thread-safe.

When you are done with this, you need to:

  • git add file_stat.c
  • git add Makefile
  • git commit “Lab 10.1 completed”
  • git push

Problem 2 (20 points)

(2.1) Combine the programs file_stat.c with read_dir.c to create a new program called traverse.c, which will traverse a given directory tree, printing to the standard output the following information:

  1. The value of the smallest, the largest, and the average file size.
  2. Total number of directories.
  3. Total number of regular files, that is, those which are not directories, symbolic links, devices, sockets, or fifos.
  4. The name of the file that was most recently modified, and the one that was least recently modified in the directory tree.

Note that the size of a file can be accessed from the struct stat returned by calling fstat(2). Read the program file_stat.c and the manual pages fstat(2) and lstat(2) for more information. Make sure to modify the Makefile given to you so that it will build the newly created traverse program.

(2.2) When you complete the previous step, run your program in a directory tree where there is no symbolic link and observe its behavior. Next, in a directory of your own, create a symbolic link which links it to its parent directory. Note that you are creating a loop in the directory graph. Run the program again and note what happens.

(2.3) To fix this problem, for each directory encountered, your program should figure out whether it is a symbolic link and display the appropriate data for the link without traversing it as it would do for a real directory. The manual page for lstat(2) contains information on how to identify whether a file is a symbolic link. Once this issue has been addressed, test the program using a directory with a symbolic link to a parent directory.

When you are done with this, you need to:

  • git add traverse.c
  • git add Makefile
  • git commit “Lab 10.2 completed”
  • git push

Problem 3 (8 points)

Create a file called answers.txt where you will respond to the items below. In three of the four items, you are asked for “an example of an operation” that accomplishes a certain goal. This means that you should provide a complete command line in Unix that causes the specified event to happen.

(3.1) Provide an example of an operation on a regular file which changes only its time of last status change.

(3.2) Provide an example of an operation on a regular file which changes its time of modification.

(3.3) Provide an example of an operation on a regular file which changes its time of last access.

(3.4) Provide an example of an application where knowing the maximum file name length is helpful. In this item, you can simply describe a scenario in which a given program needs to have the maximum length of file names.

When you are done with this, you need to:

  • git add answers.txt
  • git commit “Lab 10.3 completed”
  • git push

Problem 4 (22 points)

We are used to working with files as mechanisms of data storage, most often. As you know, we can also use files for interprocess communication. In this problem, you will use a file to store the communication between two types of processes (or rather, programs): sender.c and receiver.c. The file will work very much like a unidirectional channel (a pipe), but one where there can be multiple writers and multiple readers. There is no skeleton code for this problem.

The sender will be a program that takes a message from the command line as a string delimited by double quotes. The sender receives this string and appends it to a file called channel.txt that is shared with other senders and a receiver. You may have multiple instances of sender running concurrently at any time. In writing your sender and receiver, you may assume that the file channel.txt  already exists: you can use the touch(1) command to create an empty file or cat or, yet, a text editor to that end. The command line invocation of the sender should be like:

> sender “this is my message”

The receiver will be a program that runs on an infinite loop printing to the screen always the last message that was appended to the shared file. You will have only one instance of receiver running at any point in time, but it will run concurrently with senders. Once a sender puts a new message in the shared channel.txt file, the receiver should print it out to standard output, as in:

receiver [msg arrival]: “this is my message” 

Your senders and receiver processes need to guarantee that the messages written to and read from the file are recorded exactly as they are generated. That is, if your solution allows for the mixing up of two or more messages into the file, it is not a correct solution. By now, you might have realized that there are synchronization issues hidden in this problem statement. You should remember that the way out of this pickle is by the use of a synchronization construct like a semaphore.

Now, you used semaphores for processes related by birth earlier in the semester, but this is a different scenario. Your senders and receiver will need a semaphore that they all can see, that is, a named semaphore. Something that is like a file and that various independent processes can access. To create one of these structures, you will want to take a look at sem_open(3).

Feel empowered to make your own design decisions on how to solve this problem. Choose the file I/O API that makes things easier for your work (file descriptors or FILE *). Use comments to document your decisions in each of the programs you write.

Note: Named semaphores work just like any files. Once you run the program, named semaphores are created in the directory /dev/shm/, the semaphore names are preceded with “sem.” For example, if you use a named semaphore count in your program, a file named /dev/shm/sem.count will be created. These files are owned by you, no one else can read or write to them. Yet they reside in shared file space. As such you must delete these semaphores when you are done so not to pollute the environment.

When you are done with this, you need to:

  • git add Makefile
  • git add sender.c
  • git add receiver.c
  • git commit “Lab 10.4 completed”
  • git push

Hand In

Before turning in your work for grading, create a text file in your Lab 10 directory called submission.txt. In this file, provide a list to indicate to the grader, problem by problem, if you completed the problem and whether it works to specification. Wrap everything up by turning in this file:

  • git add ~/csci315/Lab10/submission.txt
  • git commit -m “Lab 10 completed”
  • git push

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.