The material developed for this lab was developed from various sources. We acknowledge the direct and indirect contributions by Prof. Felipe Perrone (Bucknell University), Prof. Xiannong Meng (Bucknell University), and Prof. Phil Kearns (The College of William & Mary). Also, some of this lab is based on material from Unix Network Programming Volume 1, by W. Richard Stevens, Bill Fenner, and Andrew M. Rudoff (Addison Wesley, 2004).
Permission to reuse this material in parts or in its entirety is granted provided that the credits note is not removed. Additional students files associated with this lab, as well as any existing solutions can be provided upon request by e-mail to perrone[at]bucknell[dot]edu.
int retval; int fd[2]; retval = pipe(fd); if (-1 == retval) { perror("error creating pipe"); exit(-1); // to indicate abnormal termination }In fact, you could do something like that for every system or library call you ever invoked. And that would turn out to be a pain. It would be much easier if you write "wrappers" to these calls to do the dirty work for you. For instance, if you had a wrapper for pipe called
#include "wrappers.h" int retval; int fd[2]; wp_pipe(fd); ...
Assuming that wrappers.h and wrappers.c have the code for your programmer friendly function wrappers, you would never have write all that error checking code again. Your programs would be safer (because they would be always checking for error conditions) and easier to read.
You have learned that the Unix pipe is a construct for interconnecting two processes that executed on the same machine. Unix pipes follow the byte stream service model, meaning that you work with them by pushing bytes in on thewrite end and pulling bytes out from the read end. Since access to pipes is provided via Unix file descriptors, the programmer can use the same file read(2) and write(2) system calls to operate on pipes.
The concept of a TCP socket is very similar to that of a pipe. The most fundamental difference is that TCP sockets serve to interconnect two processes that execute on arbitrary machines. Whether the two processes execute on the same host or on networked hosts across the world from each other, the set up and operations on the sockets are the same.
You should think of a socket as a communication endpoint. If a socket interconnects processes on arbitrary hosts on the Internet, the first thing that should occur to you is that sockets must be related to Internet addresses. When we say Internet address, it might occur to you that were somehow referring to IP addresses, which we use to pinpoint hosts on the Internet. An IP address, however, can only identify a host, not an application process within that particular host. If you need to pinpoint a specific application process within a host, you need to extend this concept of address to the pair <IP address, port number>, where port number serves to identify an application within a host. This mapping of application to port number doesn't happen by magic, of course. An application must bind to a port number within a given host and it must choose a port number that is not used by the system for a standard service. Take a look at the text file /etc/services to find a large number of well-known port numbers that are used by standard applications. The port numbers you use should in your own application programs never conflict with these. In fact, you should be using port numbers in user space, so the instructor will assign each student unique port numbers to use in their programs for this class.
In this lab you will work with a pair of programs which implement the client/server paradigm. The server will be a type of program known as daemon, which performs the following tasks on an infinite loop: wait for a request to arrive, process the request, and send back a response. The client will be a program that crafts a request, sends it to the server, receives and processes the response, and then terminates.
The basic design pattern for client/server applications based on TCP sockets is illustrated in the figure below. This figure shows the sequence of calls to functions in the socket library that are appropriate for the client process and for the server process. Take the time to read the manual pages for the library calls that are new to you (that is, connect, listen, and accept). TCP sockets implements a high level abstraction that gives the programmer a communication channel across networked hosts that is reliable and order-preserving following the same byte-stream service model of pipes.
Next, look at the client/server pair given to you in files getfile.c and fileserver.c and do your best to try to understand the source code. In these files you will notice that we read from and write to sockets with the same read and write system calls that we would use for file I/O and pipes. Therein lies a danger, however. You will be asked to execute these programs to fetch a file and you will observe that when you call read or write you may, respectively, either input or output fewer bytes than were requested. What causes this behavior is the fact that when these calls are made buffer limits may be reached for this socket inside the kernel. Whenever this happens, you will need to invoke read(2) or write(2) repeatedly to complete reading or writing the remaining bytes desired.
When you know the exact number of bytes expected in a reading or writing call, we recommend that instead of calling read or write directly, you call the functions readn or writen provided to you, in files readn.c and writen.c, respectively. They guarantee that your requests for input or output don't ever return with a short count. Read the implementations of these functions and understand how they work. If you don't know the number of bytes in reading, for example, when the server reads a file name of unknown characters from a client, you can have the writing party (e.g., the client which is sending the file name to the server) to send an integer telling the receiving party how many bytes to expect first, then actually send the content, using the following pattern.
/* Code for the sender: */ /* The sender first sends the expected number of bytes to the receiver */ char *filaname = "hello.txt"; int expected_length = strlen(filename); bytes_sent = writen(sockID, &expected_length, sizeof(int)); /* The sender then sends the actual filename */ bytes_sent = writen(sockID, filename, expected_length); ... /* Code for the receiver: */ /* The receiver first reads how long the incoming message will be */ int expected_length; bytes_read = readn(sockID, &expected_length, sizeof(int)); char filename[expected_length + 1]; /* The receiver then reads the actual file name */ bytes_read = readn(sockID, filename, expected_length); filename[expected_length] = 0; // terminate the string
The above pattern works well with small amount of data. If the amount of data passed between the sender and the receiver is large, it would be best to call the read or readn (and write or writen on the sending side) repeatedly until the correct number of bytes is processed.
The code you are getting for this lab contains a naive design for a client/server application that imposes a severe restriction: all requests to the server are processed sequentially, in the order of their arrival. That means, a long request which arrives first will block the execution of a short request that may take a much smaller fraction of time to complete. In order to guarantee that requests are not forced to wait indefinetly for the completion of very long requests and keep the response time of your server down, you need to resort to concurrency.
The first concurrent server design we propose is based on operating system processes created with the fork() system call. The basic design pattern for such a server is outlined below:
#define LISTENQ 8 pid_t pid; int listenfd, connfd; listenfd = socket( ... ); // fill in sockaddr_in with server's well-known port bind((listenfd, ... ); listen(listenfd, LISTENQ); for (;;) { connfd = accept(listenfd, ... ); // probably blocks here if ((pid = fork()) == 0) { close(listenfd); // child closes listening socket descriptor process_request(connfd); close(connfd); // done serving this client exit(0); // child terminates successfully } else { close(connfd); // parent process closes 'connfd' } }
Note that in order to use this design pattern, you need to encapsulate the entire handling of a service request in a function, which must be defined as:
void process_request(int fd);
The general idea of this pattern is that the server process listens for client requests on a socket bound to a well-known port. Every time a client attempts to connect to your server, the server spawns off a child process to handle the clients request while keeping listening on its well-known port for other potential client requests. The maximum number of connection attempts that can be queued in the listening socket is defined as LISTENQ, but the pattern above does not limit the number of children that can be created to execute concurrently.
An easy way to overwhelm the machine running the server would be to bombard the server with numerous requests that take a long time to process. If the number of concurrent process rises beyond a tolerable level, the machine will spend most of its time scheduling processes around without getting much work done. Future legitimate requests to this server would effectively be denied processing. In order to prevent against this form of denial of service attack, the server should limit the number of children processes executing concurrently.
All the files you will need to start your work on this lab can be found at:
~cs363/Spring13/student/labs/lab02/
. Copy all the files in that directory to your lab02
directory by the following commands.
Among all files you copied, one is called port-assignment.txt which specifies the port numbers we should use in this class. Take a look at that file and identify the port numbers you should use. From now on, you should use these two assigned port numbers for your lab exercises and programming exercises.
So far, we have archived all files into our Git repository. Many of the files do not need to be saved, for example, the object files, or back-up files left by an editor. Git can ignore these files when archiving your project. All you need to do is to set up a .gitignore file in your repository.
In this lab, we have created a .gitignore file for you. First copy the suggested source file, name it .gitignore, then revise it as you see fit. Assume you are in your ~/csci363-s13/labs/lab02 directory.
Go over the code for fileserver.c and getfile.c given to you and identify all invocations to system calls and library functions. Understand the purpose of each system or library call and how they work. You should be referring to resources such as the UNIX manual pages, Stevens/Rago text, or the web to understand how these programs work.
Except for the read, write, send, and recv calls, write a "wrapper" for all other calls by building a header file called wrappers.h and an implementation file called wrappers.c. The wrapper for each call must check the return value of the actual system or library call to identify whether an error has occurred; if there has been error, the wrapper will call perror(3) and exit(3) passing the appropriate parameters, or do nothing extra if there is no error (note that certain calls require a return value to be used by the application, which the wrappers must do the same.) Revise the Makefile given to you so that these files become part of the dependency lists in the rules to build programs that use your wrappers. Experiment with the make command to make sure that when any of your wrapper files are changed, all your executables that depend on the changed wrapper(s) are indeed rebuilt.
In addition to writing the wrappers, make sure to edit the file lab2.h to reflect the port number you have been assigned. Use a file like this to define any port number for the sockets your applications create.
When youre done with this problem, you need to do:
The code given to you in files getfile.c and fileserver.c outlines an application for file transfer using TCP sockets. The server understands two kinds of commands:
Compile these programs and experiment with them: run a server on your own assigned port number on a remote machine and connect to it with the client requesting the transfer of any "long" text file of your choosing (/etc/services, for instance). Observe what happens it is highly likely that the client will not display on the standard output (your shell) the entire contents of the file requested. Observe also the strings sent to standard output by the server: is there anything that needs fixing?
Your first task is to modify this pair of programs to add the functionality below:
Hint: You can sort the listing of files by size using the command
ls -lS
that is, listing with the option of 'l' as in long and 'S' as in size.
When youre done with this problem, you need to do:
Using the design pattern for a concurrent server provided to you earlier in this lab, modify your server to spawn off a new child process every time it receives a request from a client. Add appropriate wrapper functions. Make sure to limit the maximum number of concurrent processes your spawn off so that your server is not easy prey to denial-of-service attacks.
Start your concurrent server program. Then run both client programs (getfile and getstats) multiple times with various files and stats requests. Observe usage stats received by the client. Do you see any problems? Why? How would you fix such problem(s)? Write what you see and what you think in your lab02-answer.txt.
When youre done with this problem, you need to do:
A note on byte manipulation functions
There are two groups of functions for setting bytes in memory, copying them around, or comparing them. One corresponds to Berkeley-derived functions:
#includevoid bzero(void *dest, size_t nbytes); void bcopy(const void *src, void *dest, size_t nbytes); int bcmp(const void *ptr1, const void *ptr2, size_t nbytes);
The other set comes from the ANSI C standard and the functions are provided with any system that supports the ANSI C library:
#includevoid *memset(void *dest, int c, size_t len); void *memcpy(const void *dest, void *src, size_t nbytes); int memcmp(const void *ptr1, const void *ptr2, size_t nbytes);
It can be argued that the ANSI C standard functions are more portable. (The choice you make for one set or the other should'nt affect the perfomance of your applications.) If you happen switch around from one set of functions to the other, however, be extra careful in observing the type and order of parameters that each of these functions takes and also the data type each function returns. Comparing the functions for copying bytes, for instance, you will notice that source and destination pointers are swapped, what will make a big difference in your code if you mistakingly assume that the order and the semantics of the parameters is the same in both cases.
IMPORTANTAt the end of todays lab and any further programming/testing sessions programs that use fork(), make sure to leave no processes running in the background before you log off. When you are done, make sure to terminate your server process.