Lab 03: Recursion

0. Credits

The lab content is the result of collective work of CSCI 204 instructors.

1. Objectives

In this lab, you will learn the following,

2. Preparation

Begin by opening a terminal window and creating a directory for this lab in your course directory.

cd ~/ cd csci204/ cd labs mkdir lab03 cd lab03

Then copy all files, including the skeleton program dirs.py, from the course Linux directory using the following command. You can use the scp command to copy the files from Linux server to your own computer if you so choose.

cp -rp ~csci204/student-labs/lab03/* .

The command line option -r means copying recursively as there is another directory and files within the lab03 directory. The option p means to preserve the dates when the files and the directories are created.

There are a few text files other than the Python program in the directory can help you practice the Linux diff command as described below.

Linux command diff

Linux provides many useful commands for programmers. The command diff is one of such commands. The diff command compares two text files and prints out the differences between the two. It can be used in many ways. For example, you can use the command to see if the output of your program is same, or very similar to the required output. Please read the following blog post to understand the general meaning of diff output.

https://unix.stackexchange.com/questions/81998/understanding-of-diff-output

After reading the post, try the diff command among the four text files you just copied. For example

diff file1.txt file2.txt diff file1.txt file3.txt diff file2.txt file4.txt

Make sure you can explain the output of these comparisons.

3. Recursively Listing Files

Recursive algorithms are natural for solutions to problems with hierarchical structure. An example problem is listing all the files in a directory and all of its sub-directories. Since the Linux file system is hierarchical, we should immediately think of using a recursive approach. You can see this in action with the following command.

ls -R ~csci204/student-labs/lab03/testpages/

The command will list all files and directories recursively. As you can probably guess, the command option -R tells the listing to be recursively going down the directory tree.

For this part of the lab, you will write a Python program to list all of the files in a directory and, recursively, in all of its sub-directories, in a similar fashion by the ls -R command.

Read the code segment contained within dirs.py, which you copied at the beginning of the lab, to get an idea what is involved in the program. Then try the following commands with the program using Linux terminal. If you are running your program from IDLE, spyder, or other IDEs, read the comments at the end of the program and revise the program properly. Note that IDLE or spyder may not work with paths that contain the character tilde ~. In these cases, you'd have to use the full path. To find out the full path for a file or directory, use the command pwd, e.g.,

cd ~/ pwd

You should see the full path to your home directory.

The following description assumes the use of commandline Python. But for spyder or idle, the operations are similar, with the exception of the tilde character which has to be replaced by a full path.

python dirs.py dirs.py python dirs.py ../ python dirs.py ~/csci204

Observe the behavior of the program as it stands before you make any modifications. The given version of the program dirs.pylists the names of the files and sub-directories in a directory, but does not recursively list the files under any sub-directories. You'll notice that it checks to see if there are any command-line arguments passed to it. If no argument is given, the program prints a usage message, asking the user to supply an argument.

You can also run the Linux command to list the above directories recursively, as a comparison to what the program dirs.py does. Try the following command:

ls -R ../ ls -R ~/csci204

You are to modify the dirs.py program to recursively print the names of all the files in all sub-directories.

Here are some details.

  1. You must use a recursive solution.
  2. File names should be printed one per line.
  3. Just before any recursive calls, print out a message "-- Entering [dir-name]" where your program should fill in the directory name "dir-name". The method os.path.basename() will extract the directory name from the path argument. Note that if the path ends with a '/' character, this method will return an empty string, so it is worth putting in a check to remove the trailing '/'.
  4. Just after any recursive calls, print out "-- Leaving [dir-name]" where your program should fill in the directory name.
  5. Python provides a few useful methods and functions in the os module and the os.path module that you need to use (read the relevant Python online documents for details).

Run your program using at least two more different directories each of which has files and a sub-directory (or sub-directories). You must test your program using ~csci204/student-labs/lab03/testpages/. Remember to deal with the trailing '/'. Make sure your program is well commented. Save and upload your program to Moodle.

4. Get File Statistics When Traversing Directories

If you list files on Linux using commands such as ls -l or ls -lt, you will find that other file statistics are printed, including dates when the files are created and the size of the files. Try these commands in your own home directory.

ls -l ~/ ls -lt ~/

For now, we will just concentrate on two pieces of information, the size of a file and the date when a file is last modified. These are the two middle columns in the above listing. For example, the first file grading.html in the testpages directory was created on September 11, 2019 with a size of 667 bytes, or 667 characters since this is a text file in which each character is a byte long.

Your tasks

Create a new program based on the program you just finished dirs.py so that the new program can count the total number of bytes all the files used on the disk, list the maximum and minimum sizes, as well the oldest and newest time of the files created in the directories your program is visiting.

The basic logic of the program is to compute the maximum and minimum sizes, as well as the oldest and newest time stamp of all files when visiting each file. Once you find the size and the date of creation of a file, you should be able to compute the max and min in a collection of those values.

Make a copy of your existing program, name the new program dirattrib.py for directory attributes. Modify the program dirattrib.py so that it can accomplish the following tasks.

4.1 Develop a FileStats class

  1. Since you are going to be creating a series of statistics, create a new FileStats Python class with the following data attributes: max_size, min_size, oldest_time, and newest_time.
  2. You should define three methods within FileStats.
    1. The constructor where you define your data attributes;
    2. The print_results(self) method which prints the statistics in the following format.

      maximum size of files : 667 minimum size of files : 53 oldest time : Thu Aug 29 15:56:41 2019 newest time : Thu Aug 29 16:07:47 2019

    3. An update(self, filename) function to carry out the task of retrieving the file statistics and updating the values held in your FileStats object.
  3. To retrieve file statistics, Python's os module provides a function called lstat(). The function lstat() returns an object. Among other pieces of information, the returned object contains the size of the file and the time stamp when the file is last modified. These two data members are called st_size and st_mtime. You can use them to collect the required statistics. Read the relevant Python document to make sure you understand how to access these values.
  4. The number of bytes a file takes is an integer, so you can print the maximum size and minimum size directly when the directory traversal is completed. However, the time when a file is last modified is the number of seconds since epoch (January 1, 1970), which is a huge value and in general it won't make sense for a human being to read.

    For example: a time stamp for March 16, 2011 about 1 o'clock in the afternoon would read something like: 1300296729.762571 seconds. The time module in Python provides a function called ctime() that converts a value in seconds to a human-readable time such as Wed Mar 16 13:32:09 2011. While you would use the time value directly to find minimum and maximum (oldest and newest time), you must use ctime() function to print these values in a human readable format. Remember to import the time package for these tasks.

4.2 Other Modifications

Complete the following tasks to make the program work.

  1. Modify the list_dir() method so that it takes a FileStats object as a parameter. We ask that the list_dir() method recursively call itself, passing your updated FileStats object each time.
  2. Initialize a FileStats object in the main() method before calling the list_dir() method. During the creation of your FileStats object, you'll be initializing its data attributes. What should those initial values be?

    Here are two hints.

    1. Python provides a sys.maxsize as a reasonable maximum integer value. (Note that there is really no limit on the values of numbers in Python.)
    2. On Linux systems, file size is measured by bytes and the time is measured as the number of seconds since January 1, 1970. Both are non-negative values.

The Final Product

Show your program works correctly in at least two directory listings, each of which must have multiple levels of directories. The first must be the directory

~csci204/student-labs/lab03/testpages/

The test run should show the following values:

maximum size of files : 667 minimum size of files : 53 oldest time : Wed Sep 11 20:51:49 2019 newest time : Thu Sep 12 09:02:40 2019

Save and upload to Moodle your newly completed program dirattrib.py as well as your modified dirs.py file.

Congratulations! You just completed the lab exercises!