Buffer overflow and stack smashing
- Copy to your ~/csci206/Labs/Lab11 directory all the files in ~cs206/Labs/Lab11.
- Create a file called answers.txt, in which you will write answers to the questions in this lab. The header of this file should have your name and lab section. Make sure to label each answer with the number of the corresponding problem and question.
- Log in to mips.bucknell.edu and change into your Lab11 directory.
Use gcc to compile the following C programs into executables: over1.c, over2.c, over3.c, and over4.c. Enter the answers to questions (1.1)-(1.4) in your answers.txt file.
(1.1) Run over1 and enter an input string with more than 10 characters and note what happens. Run over2, enter the same input string you used before, and note any difference in behavior in comparison with the first program. Explain how the buffer overflow risk was eliminated in over2.c.
(1.2) Now, run over3 and enter your same input string. Read the source code in over3.c and explain what causes the risk of buffer overflow in this program.
(1.3) Consider the flaws you identified in over1.c and over3.c, then generalize from your observations to write a rule of thumb that allows anyone to identify when a system call or library function call will give a program the risk of buffer overflows.
(1.4) Comparing over3.c and over4.c, you learn a lesson on how to handle user input to avoid the risk of buffer overflows. Try to state this lesson as clearly as possible.
(1.5) Construct a proof-of-concept program called over5.c to show how a certain function can introduce the risk of buffer overflow. The function you choose must be different from those that appeared earlier in this problem. The list that follows offers a few candidates for you to consider: strcat, strcpy, sprintf. You can try it on different machines with different compilers, e.g., cc on mips, or cc on the lab Linux machines. You might get different results. Try to explain what you see.
When you are done with this problem, add your the files (answers.txt and over5.c) to git and push to gitlab.
Run gcc to compile decode.c into the executable decode. Read the source code for this program carefully so that you develop a good understanding of each line. Next, run the program redirecting its output to a file with the command below:
$ ./decode > decode-run.txt
and observe the addresses mapped to the following symbols: main, i, j, k, and buf (note that the program will give you the address of each element of the array).
(2.1) Reading the source code and decode-run.txt, determine in which segment of memory does each of these five address appear.
(2.2) In a text editor, open the file decode.s provided to you, which contains the compilation from C to assembly. Look at the code in main to identify the point at which it sets up the parameters to pass in the invocation to function test. The function prototype of this C function has 6 formal arguments as shown below:
void test(int a, int b, int c, int d, int i, int j)
Knowing that MIPS can pass up only to 4 arguments using registers $a0-$a3, you realize that the two additional arguments i and j have to be passed to the function via the stack. The question is in which order these parameters are pushed onto the stack: from left to right or from right to left? Referring to the assembly code in main, specify how each of the 6 parameters is passed on to function test and note the order in which the parameters that are passed on the stack are pushed onto the stack.
(2.3) Function test has a stack frame associated with it. This stack frame will contain at least: the formal parameters i and j, the automatic variables in the scope of the function (the array buf and the integer k), and the return address to the caller of test. If you know where this return address is located in the stack, you can craft a buffer overflow to overwrite it and get just about any piece of code to execute. The question is how to find the location of this item on the stack!
The program gives you the address of these three items in the stack frame and also the size of an int type and the size of a pointer type in bytes. Using this information, inspecting the memory dump created when you run decode, do your best determine the address of the memory location (within the stack frame of function test) that contains the return address for this function.
This can be a bit tricky, but there are at least two ways to figure out the value of the return address from test back to main. The first method consists of counting how many instructions you will have from the start of main up until right after the call to test. Multiply the number of instructions by 4 to get the number of bytes from the start of main. Now, add this number to the starting address of main and you will have a good estimate of the value of the return address. The second method uses gdb. On the MIPS machine, you can run gdb to open your executable with:
$ gdb decode
Up until this lab, you have used gdb to debug only C programs, but you can also use it to debug and to disassemble an executable. Once you have entered the debugger, do the following:
(gdb) disassemble main
The disassembled code starts from the address where main starts – you can scroll down to estimate the distance to the call to test, you can have a decent idea of the value of the return address back to main to look for in the memory dump. This page contains a few hints on gdb usage that may help you in this problem.
You answer to this question should report your best estimate of the address of the first byte containing the return address of test and the method you used to find this information.
(2.4) Look at the address that the program reports for variable i. Knowing that it is a 32-bit integer, give the hexadecimal value of each of the 4 individual bytes make up this value (you know that it represents 11, the actual parameter passed from main to test).
(2.5) Looking at how the individual bytes that constitute the value of i are stored in memory, determine whether you are in a little endian or in a big endian processor.
(2.6) Thinking of how you determined the endian convention of your processor above, describe a strategy you might use in the future to discover what is the endian convention of an arbitrary processor. Your description can be a mixture of C statements and plain English to explain it step-by-step.
(2.7) Using the information you gathered so far, draw the stack frame for function test. Your must show the address each component of the stack frame for test, including each of the local variables in the function individually.
(2.8) Look at the stack frame diagram you constructed above. Stack smashing uses the lack of array bounds checking in C to force a program to make a jump to an arbitrary address – that is, the address of a piece of code injected into a running program. What data item in the stack frame must be overwritten to make this jump possible?
(2.9) For the scenario in this problem up to now: if one injects code into your running program via buffer overflow, into which segment of the program will this end up: text, data, stack, or heap segment?
(2.10) You have learned about programs being structured into these four regions or “compartments,” the open question is how the processor deals with their boundaries. Would the processor possibly execute code that is not in the text segment? Would the processor be able to manipulate data stored in the text segment?
(2.11) If you precede the declaration of buf with the keyword static, this array will be visible in the scope of function test but it will reside on the data segment of your program rather than on the stack. Make this modification in the program, save the revised program in decode_static.c, compile, and run the program. Save the output into a file named decode-static-run.txt. Describe what evidence you find to confirm that buf is now in the data segment. Please do not overwrite your previously saved decode-run.txt file!
(2.12) Consider what you observed when you place buf in the data segment of the program. Explain whether it is possible to use a buffer overflow in buf in this situation to force the program to jump to an arbitrary address.
Once you finish this problem, remember to add/update all files (answers.txt and decode-run.txt) to git and push to gitlab.
Run gcc to compile concat.c into the executable concat. Read the source code for this program carefully and try to understand what each line of code is trying to accomplish. Next, run the source code of this program through the “Rough Auditing Tool for Security” which is installed in our mips system (the executable is called rats):
$ rats concat.c
(3.1) Analyze the output of this program. In you answers.txt, Explain in your own words what makes concat.c an insecure program.
(3.2) Now, also in your answers.txt, try to generalize from the experiences you’ve had in this lab to explain what type of programming “mistake” makes a program vulnerable to stack smashing attacks.
(3.3) Adapt concat.c to eliminate, or at least, minimize the risks of stack smashing. Save the revised program in concat_safe.c. Once you have tried your best, make sure to run it through rats again to verify that you managed to improve it.
Once you finish this problem, remember to add your updated concat.c to git and push to gitlab.
[Prelab: 25 points]
-  for each question
[Lab: 75 points]
- [15 points] Exercise 1: [3 points] for each question
- [36 points] Exercise 2: [3 points] for each question
- [24 points] Exercise 3: [8 points] for each question