Project 2 – Pipelined CPU

Project Description

In this project you will develop a Behavioral Verilog model for a pipelined MIPS CPU. You will implement the standard 5-stage pipeline (fetch, decode, execute, memory, writeback) with EX-EX, MEM-EX, and MEM-MEM forwarding and hazard detection logic. If a hazard cannot be resolved via forwarding, the pipeline should stall (execute NOPs) until the hazard is avoided. Branches are resolved in the decode stage. If the branch cannot be resolved, the pipeline should stall decode until it can be. Your pipeline should be able to execute at least three programs (generated by gcc), two are given below (hello world, and Fibonacci). The third program is any program you want to use to test your CPU. You do not need to simulate delay (just functionality).

When you run a program in your simulator, it should execute the program until the exit syscall is called (or some fatal error/exception occurs). This will change registers and memory values in the simulator. You should also implement the required syscalls (using $display) to allow your program to output results to stdout.

Your simulator should output statistics including the clock cycle that each instruction completes each pipeline stage. This information is sufficient to generate a pipeline execution diagram. Once the program finishes your simulator should also output general statistics including clock cycles, instructions executed, and IPC. For extra credit, create another program (using any language) to visualize the output generated from your simulator (i.e., show a nice pipeline execution diagram).

The Harris & Harris MIPS implementation shown below is a good starting point. This is mostly the same as the Hennessy & Patterson implementation but their diagram clearly shows wire names and how the hazard unit operates between pipeline stages. Just as with H&P, this diagram leaves off important features like all jumps (J, JR, JAL).
harris_pipeline_mips

Learning Goal

The learning goal of this project is to develop a deep understanding of a pipelined CPU.

Groups

This is a 4-week group project. Your group will be assigned at the beginning of the project. You are expected to work together outside of class to complete the project and to help each other achieve the learning goal. In lab on the 4th week, you will present your solution.

Organization

Your group should select one member to host a shared gitlab repository called: mips_cpu (also add your instructor). You are required to have a readme.md file in the root of your project that has everyone’s name and a short overview of your file structure, documentation, and how to compile/run your test benches and the sample programs. To keep things organized, I suggest at a minimum putting documentation in a folder called docs and Verilog modules in a folder called src. Create other folders as needed to organize your project (test would be a good one to add). In the root of your project, you should have a Makefile or other build script. This file (submitted by a previous student team) has an improved Makefile and some organization completed for you, you may choose to use it, or not.

Milestone: Team contract and work plan [9/28/2017]

To get things started, after the first week, a team contract and work plan is due. You are responsible for managing your own team (feel free to elect a project manager and other roles!). You could elect to implement scrum by electing a srum master and generating a product backlog and planning a few sprints. If you go this route, keep in mind you only have 4 weeks, so you might want to do weekly sprints. If you like agile methods, but don’t think this is a good fit for scrum, check out Kanban. It is a leaner form of agile.

The team contract should define the goals, expectations, policies, procedures, and consequences for all of the team members. Here is a sample team contract. You may use this, or make your own that better suits your team. This should be a relatively short and to the point document (1 page) that everyone can agree to. You should include your goal/vision for the project. It could be to do the minimum to pass or exceed the given standard. Other important things like when/where you will meet outside of class. What happens if someone is late or doesn’t show up? What is your policy on using revision control? Daily check in’s? One check-in per change? Do people work on their own branch and merge or does everyone work on the master branch (this is not recommended). What about QA/QC/testing? Will you use an issue tracker? Put the important parts into your contract. The minute details will go into your work plan (next).

The work plan should define how you plan to attack the problem. Here is a very complete template. You don’t have to go to this level of detail. Your document should describe your organization, roles, schedule, and testing plan. Any team policies you set forth in your contract should be fully explained in detail in your work plan. Put your complete team contract and work plan into your project’s docs folder.

Weekly Status Reports [weekly]:

Each group will report their status weekly in the scheduled lab time. Everyone should contribute to the status update. It should be similar to a scrum daily meeting. Everyone should present what they did last week and what they have yet to do. The team leader/manager should also prepare a brief overview of the overall project status and completion schedule.

Pipelined CPU Project Presentation: In lab on [10/19/2017]

The project is due in lab on Thursday 10/19/2017. Each team will have 15-20 minutes to present an overview of their design, implementation, testing, and reflections on the project. Focus on the problems you ran into, how you solved them, and what you learned about pipelines/MIPS in the project. Did you develop and use module-level unit tests? How can the rest of the class benefit from what you learned?

Required Program 1: Hello World

A good place to start is ensuring your CPU can run the standard hello world program. In this case, we will use this implementation:

Notice we are declaring puts as an extern. This is because we will not be including all of stdlib. Why? Well, we’re not going to run an OS on our processor (that would entail far too much work). We will compile our code and run it on the bare metal processor. You can also think of it as we are writing a very simple operating system that can run one and only one program at a time and supports very few system calls. Below is a Makefile to generate Verilog-compatible code from a c file. If you get missing separator errors, be sure each line begins with a tab character (not space). Also, you’ll want to add the path /usr/remote/mipsel/bin  to your path (e.g., for bash do: export PATH=$PATH:/usr/remote/mipsel/bin ). This is the location of the MIPS cross-compiler and other utilities.

You might notice this includes two extra files, start.s and puts.s. These are the MIPS-assembly implementations of what is normally in stdlib. Since we aren’t using stdlib we need to add this logic manually. Our implementation given below is very simple. All it does is set up the stack pointer to the top of memory and jump to main. When main returns we signal to terminate the process with a syscall 10 (exit). Your simulator will trap this and $finish .

start.s:

puts.s

In our Verilog simulator, we will trap the syscall instruction, examine $v0 and perform the requested action (exit = $finish, puts = $display). You may need to add extra wires/ports to your register and/or memory modules to make these work correctly. The given puts function relies on the syscall to do all of the hard work to read the null-terminated string out of memory. You could consider rewriting puts as a simple loop that reads the string from memory and calls putc on each character in the string. This will be much easier to implement in Verilog.

If you compile hello.c using this Makefile you should now have a total of 3 outputs: a MIPS executable hello, the SRecords hello.s, and the Verilog compatible hello.v. The executable file is useful to disassemble (e.g., mipsel-linux-objdump -d hello  The .v file is what we will read in our Verilog CPU simulation. Below is the contents of hello.v. Notice the first line at 0x400000 is a header and the actual program begins at address 0x400020 (this is where you should initialize your PC). Verify this address by using obj-dump or by examining the generated binary instructions.

When compiled, this program uses only 9 of the MIPS instructions. So when your CPU supports them, it should work! Your CPU need not support more than this to receive full credit on this part. They are:

  • addiu
  • jal
  • jr
  • li
  • lui
  • lw
  • move (pseudo-add)
  • ori
  • sw

Don’t forget that MIPS has a branch delay slot. That is, the instruction after a branch or jump must always be executed. The GCC compiler has scheduled instructions in the branch delay slots, so your CPU must execute them properly!

Now you just have to implement the MIPS single-cycle CPU logic. Initialize your PC to 0x00400020, fetch an instruction and execute it. Repeat until you hit the syscall for exit().

Required Program 2: Fibonacci

One of the first recursive functions we learn generates the Fibonacci sequence. This is intuitive to implement recursively. This will also be a good test of our processor’s ability to properly handle branches and the stack.

The Fibonacci code can be written as:

Use the main program and all of the other files you need to get started are provided here. To print the output, we borrowed a few functions from known-good sources (these are actually from the source to stdlib).

When compiled for MIPS, this program uses 23 instructions:

addiu, addu, andi, b, bltz
bne, bnez, break, div, jal
jr, li, lui, lw, mfhi
mflo, move, ori, sb, sll
sra, subu, sw

The break instruction is only used for a divide by zero exception, it shouldn’t happen in this program. But you should print a message from Verilog and terminate the simulation if it does.

Required Program 3: Your own program

You can come up with anything you want for the third program. The only requirement is that it must use at least 3 more instructions than Fibonacci (at least 25 instructions in all). You should document these extra instructions.

If you’re up for a challenge, here are the files for the Dhrystone benchmark. It uses 38 instructions:

addiu, addu, b, beq, beqz
blez, bne, bnez, break, div
jal, jr, lb, lbu, lhu
li, lui, lw, lwl, lwr
mfhi, mflo, move, mult, or
ori, sb, sh, sll, slt
slti, sltiu, sltu, subu, sw
swl, swr, xori

Guides

Some suggestions made by previous CSCI320 students https://drive.google.com/drive/folders/0B3AZ5TCYAqmwQWtwWFpsWWFMT2s?usp=sharing

And a new and improved build system is here https://drive.google.com/a/bucknell.edu/file/d/0B3AZ5TCYAqmwX01tclFRT015azQ/view?usp=sharing.

Grading:

team contract and work plan: 10 points.

weekly team status reports: 10 points (total). All members meet at regular meeting time and contribute to the progress report.

design: 10 points. Modular, logical, concise, module parameters used, etc.

documentation: 10 points. Readme sufficiently explains the design (including needed module diagrams), compilation, execution, and testing methodology. Code contains module-level comments and in-line comments where appropriate.

functionality: 30 points. 10 points for hello world, 10 points for Fibonacci, 10 points for Dhrystone/your own program. You get points if your CPU can execute the program (and you demonstrate the output is correct!).

forwarding: 10 points The CPU implements EX-EX, MEM-EX, and MEM-MEM forwarding.

hazards: 10 points. The CPU stalls appropriately to avoid hazards when forwarding is not possible.

presentation: 10 points. The group clearly presents their design. Including design choices, architecture, and testing. Use this presentation rubric to guide your preparation.

Print Friendly

CC BY-NC-SA 4.0
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Posted in Projects Tagged with: , ,