Lab 01 : HTTP Protocol and C Programming Tools of GDB and Valgrind

Goals

Setup

  1. Assume you have set up your Gitlab account, created and shared your csci363 project (directory) with the instructor by now. If you haven't done so, please do so before continuing. You can visit this file for some brief guidelines.
  2. Open a terminal window and run the following sequence of commands (the directory "csci363" has been created when setting up Gitlab):
    • cd csci363
    • mkdir labs
    • mkdir labs/lab01
    • git add labs
    • git commit -m "lab01 created"
    • git push lab01
    • cd labs/lab01
  3. Now, in your csci363/labs/lab01 directory, create a file to contain answers to the questions in this assignment using the command below.
  4. After running this command in your shell, open the file in a text editor such as vi or emacs and write down the lab information including lab number, your name, and the date of the lab. This should be a standard heading that you are required to use in all lab reports.

  5. Copy the files for this lab to your lab01 directory

Problem 1: Using GDB to find where segmentation faults are

The programs you just copied consist of an implementation of simple doubly-linked list and a test program. You are asked to compile and run the test program, find and fix the bugs in the implementation using GDB.

Compile and run the program.

Run the program multiple times. You will see the program behaves normally. If you encounter an error (e.g., segmentation fault) ignore it, running the program again will mostly likely result in normal output.

Now using an editor such as vi or emacs to examine the program. You will find that the program basically reads from standard input one word at a time and insert the word into a doubly linked list. After finishing reading and building the list, the program simply traverses the list and prints out all the words it reads so far. The number printed before the word can be considered an order, or a count. The meaning isn't significant in this program. The program then removes a word at a random position in the list and prints the list again.

To make a good test, modify the program test_dlist.c such that the program will remove the very first node and very last node in the list, in addition to removing a node in a random spot in the list. In this program, the very first node stores the word[0] and very last node stores the word[NUM_WORD-1].

You will now encounter segmentation fault when removing first or last node of the list node from the list. You can certainly use printf to find out where the errors are. But a debugging tool such as gdb is much more flexible and easier to use.

Debug programs with GDB

GDB is a very powerful debugging tool. Programmers can use gdb to examine status of a program such as the value of a variable, address of a pointer, and elements in a structure, among others. One can also set value of a variable, thus alter the execution sequence of the program. You have been using gdb since taking CSCI 206. So consider this exercise as a review and learn to use gdb within the emacs editor if you have not used gdb in this way. General information about GDB and emacs are widely available on the web. We will work with a subset of commonly used gdb features. This website has a list of these commands. This file contains an abbreviated list that is easier to use.

In order to use gdb, you need to compile your program with the option of -g, e.g.,

gcc -c -g myprog.c
gcc -o myprog myprog.o

Then you can run gdb either as an independent program

gdb myprog

or run it within the emacs editor as in the following exercise.

Now that you have reviewed the commonly used gdb commands, let's put the knowledge in use.

Your work

Do the following.

  1. Edit the program test_dlist.c so that in addition to deleting a node in a random spot in the list, add the code segment to delete the first and last node in the list. After all, it is critical to test the boundary conditions. The first and last node can be specified by deleting the node with the first word word[0] and last word word[NUM_WORD - 1]. (You have done this step above when discussing GDB.)
  2. Compile and run the program again. Now your program will result in the infamous segmentation fault. Though the segmentation faults here are simple, you can use other scheme to find out where they are, you are asked to use gdb to locate where the problem(s) are. Copy and past into answers.txt the gdb message you see after you identify the location of the segmentation using the gdb command where. Explain briefly what causes the segmentation faults and how you fix the problem(s). Label this part of answer as Problem 1.1
  3. Revise the program dlist.c so that these above mentioned segmentation faults won't happen again. Make sure the drive program test_dlist.c prints proper information to remove the first node and last node. Copy and paste the now correct output into answers.txt. Label this part of the answer as Problem 1.2.

Problem 2: Using Valgrind to eliminate memory leaks

You have corrected the segmentation fault problem in your program. And the program runs fine, does what it is supposed to do. However the program still has the problem of memory leaks, that the program did not release the memories no longer needed. If you visually inspect the program, you will see that it has a number of calls to malloc(), but has no free() calls. Again, in this relatively simple program, you probably could fix the problem without other tools by simply adding free() to proper locations. But we will use a tool, valgrind, to help identifying the memory leak problems. You can then fix these problems.

Your work

Compile the program. (No special C flags are needed to use valgrind.) Run the program with valgrind.

% make
% valgrind --leak-check=full ./test_dlist < /usr/share/dict/words

You will see the report from valgrind, something similar to the following.

==13203== 
==13203== HEAP SUMMARY:
==13203==     in use at exit: 771 bytes in 41 blocks
==13203==   total heap usage: 41 allocs, 0 frees, 771 bytes allocated
==13203== 
==13203== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 1 of 4
==13203==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==13203==    by 0x40093A: make_node (dlist.c:126)
==13203==    by 0x400A2F: main (test_dlist.c:33)
==13203== 
==13203== 735 (16 direct, 719 indirect) bytes in 1 blocks are definitely lost in loss record 4 of 4
==13203==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==13203==    by 0x400775: dlist_create (dlist.c:28)
==13203==    by 0x4009C5: main (test_dlist.c:23)
==13203== 
==13203== LEAK SUMMARY:
==13203==    definitely lost: 48 bytes in 2 blocks
==13203==    indirectly lost: 723 bytes in 39 blocks
==13203==      possibly lost: 0 bytes in 0 blocks
==13203==    still reachable: 0 bytes in 0 blocks
==13203==         suppressed: 0 bytes in 0 blocks
==13203== 
==13203== For counts of detected and suppressed errors, rerun with: -v
==13203== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6)

Some heading information from valgrind and the normal output of your program are not included here. This report clearly tells you that while calls to malloc() or the like are made, no memory is freed, thus resulting in memory leaks. Your work now is to put calls of free() in proper places to free up the allocated memory blocks that were no longer needed.

You should do

  1. Read manual pages of library calls malloc() and free() to review what they do.
  2. Put calls free() in proper places so that the program still works correctly, and no memory leaks exist any more.

When done properly, you should see something similar to the following from valgrind.

==13410== 
==13410== HEAP SUMMARY:
==13410==     in use at exit: 0 bytes in 0 blocks
==13410==   total heap usage: 201 allocs, 201 frees, 3,761 bytes allocated
==13410== 
==13410== All heap blocks were freed -- no leaks are possible
==13410== 
==13410== For counts of detected and suppressed errors, rerun with: -v
==13410== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)

Include in your answers.txt as Problem 2. a screen capture (copy-and-paste text on the screen, not a real image) of executing the following two commands.

% make
% valgrind --leak-check=full ./test_dlist < /usr/share/dict/words

Clear and commit your work

Clear the directory by make realclean, then add, commit, and push the files dlist.c, dlist.h, Makefile, test_dlist.c to your git repository.

Problem 3: Working with HTTP (Hypertext Transfer Protocol)

HTTP has been the most popular application protocol on the web in the last decade. It uses a collection of simple, text-based commands to send and receive files such as web pages between a web server such as www.google.com and a client such as a browser. All web browsers follow HTTP protocol to communicate with a web server to retrieve web pages. You can consult many web resources (e.g., http://www.w3.org/Protocols/ or https://www.jmarshall.com/easy/http/) for the details of the protocol. In this exercise, you are asked to experiment the protocol by sending text-based HTTP command to a simple web server and observe the behavior of the protocol. Then you are asked to augment the server program such that the HEAD request can be served.

The most frequently used two commands in HTTP are GET and POST (case insensitive). The GET command requests a file from the web server, and the POST command sends information to the server for processing, e.g., submitting a form.

The basic flow of work is described as follows.

Your work

Compile, run, and experiment with the programs in the directory of simple-web-server-client which is a part of the programs you copied at the beginning of the lab. Specifically, do the following.

  1. Compile and run the server,

    % cd simple-web-server-client
    % make
    % ./webserver <port-number>

    where <port-number> should be one of your assigned port numbers. Then in a separate terminal window (it could be in the same terminal window if you run your server in the background) run the client program.

  2. Run the client to access your own server,

    % ./webclient <server-name> <GET|HEAD> <path> [port-number]

    where <server-name> is the computer name on which the server is running, e.g., dana132-lnx-3, <GET|HEAD> is one of the two commands you prefer to run, <path> is the file path to the document you'd like to retrieve from the server, and [port-number] is the port at which the server program is running. The following example shows the case of running the server on computer dana132-lnx-3 at port 6789 and running the client to retrieve a web page called home.html from the server.

    dana132-lnx-3 % ./webserver 6789

    dana132-lnx-3 % ./webclient dana132-lnx-3 GET /default.html 6789

    dana132-lnx-3 % ./webclient dana132-lnx-3 HEAD /default.html 6789

    If the server is running at the standard HTTP port number at 80, then the client doesn't have to supply the port number argument. Usually, the pair of brackets '[' and ']' means the argument inside is optional.

  3. Run the client to access other public servers,

    % ./webclient www.bucknell.edu GET /
    % ./webclient www.eg.bucknell.edu GET /
    % ./webclient www.example.org GET /

  4. Run a web browser against your own server. Assume your server is running on dana132-lnx-3 at port 6789. Use a web browser to access your server by the URL

    dana132-lnx-3:6789/

Include in your answers.txt as Problem 3.1 a summary of what you saw in the above exercises in a couple of paragraphs. In particular, describe the relation between the program webclient.c and a browser, as well as the program webwerver.c and a general web server such as www.bucknell.edu or www.example.org. You will note that the HEAD function has not been implemented yet.

  1. Examine both programs, webclient.c and webserver.c. Get a general idea how the two programs work. You are then asked to implement the function process_head() following the pattern of process_get() in the program of webserver.c. Note that currently the process_head() function is a skeleton. You need to complete the function. In addition, the function process_get() works for a set of specific files. Your process_head(), however, should work for files of any name. You can limit the file type to be text (html) and image (jpeg, png). Read the information about the head method from sources such as HTTP Method Definitions.

    Include in your answers.txt as Problem 3.2 a screem capture of running the client program webclient against your webserver with the following requests. Before running the following commands, change the file protection for default.html to be not-readable by others, that is, chmod 640 default.html.

  2. % ./webclient dana132-lnx-3 HEAD / 5678
    % ./webclient dana132-lnx-3 HEAD /JLH.jpg 5678
    % ./webclient dana132-lnx-3 HEAD /default.html 5678
    % ./webclient dana132-lnx-3 HEAD /none.html 5678

Clear and commit your work

Clear the directory simple-web-server-client by make realclean inside the directory, then go up to your lab01 directory, add, commit, and push the directory simple-web-server-client to your git repository.

Lastly add and commit your answers.txt to your git repository.

Deliverables: You should have added and committed the following files to your git repository.
  1. The answers.txt file which contains answers to the three sets of problems.
  2. The complete collection of files dlist.c, dlist.h, Makefile, test_dlist.c, the directory simple-web-server-client and all the files in that directory.

Congratulations! You have just completed this lab exercise!

Extra credit work

If you have time and are interested in exploring further, consider implementing the following as extra credit work. These pieces of work are not dependent on each other, so you can pick any to try.

If you complete any extra credit work, please indicate so in the answers.txt and tell the instructor how to test your extra credit work.