Exercise 6 - Multicore Platform

Motivation

Objectives

This exercise will be divided into parts, and at the end of it, you should be able to:

Contextualization

Attention: As always, all the files you need for this activity are available at / home / staff / rodolfo / mc723 / download.

Part 1 - Creating a multicore platform

The description of this part is very simple: take the platform from the previous exercise and make it work with two processors. For this, create a new platform with the name dual_mips and assemble the rest of the necessary structure, according to the steps below.

Use the results of the previous exercise as a basis with the Lock peripheral, without the peripherals for floating point arithmetic operations. Since you already tested this code in the previous exercise, you probably won't find any problems until now (this is the code for the first execution of the matrix multiplication program, just rename the files needed to have another platform as indicated above).

Instantiate two processors in the platform's main.cpp file. Connect these two processors to the router. You do not need to create another port similar to target_export. In this part of the exercise, do not place the caches. This should be enough for your simulator to start running, only that it will try to run exactly the same program twice, which is not the goal of this exercise. Three steps will be needed to separate the execution flow, they are: getting all processors to start the same way, providing a different stack for each processor and separating the execution flow.

To start all processors in the same way, the only care you should take is to call the init method inside main.cpp. This method takes the parameters from the command line but, by the way it was implemented, removes the first parameter from the list. Thus, if you pass the same variables to all processors, each one will receive a parameter less than the previous one. Tip: mount the command lines in main.cpp correctly or duplicate all parameters before calling the init method.

Each thread of execution needs a separate stack, otherwise the behavior of the running program will be inconsistent. To provide a different stack for each thread, there are two simple alternatives: inserting a code in inline assembly at the beginning of the program. Make sure you have each processor with a different stack address (I recommend 32Kb or 64Kb per stack for now). If you are going to use inline assembly, separate the execution flow first, as described in the following paragraph. Another alternative is to edit the mips1_syscall.cpp file, when it starts the $ SP register, and set a different initial value each time you pass through this line of code. So you guarantee complete scalability, without having to worry about the number of processors or code inline assembly (I recommend the second option).

Separate the execution of each thread. You must use the lock peripheral, defined and implemented in the previous exercise, to manage concurrency control in a memory position and perform a different activity each time that section is executed. Remember the code examples from the previous exercise. I suggest that you implement the functions: AcquireGlobalLock, ReleaseGlobalLock, AcquireLock and ReleaseLock. The first two work with the global hardware lock and the other two with locks based on local variables. To implement these last two, you must obtain the global lock first and make changes to these variables. So with just one global lock you can have as many local locks as needed.

Describe your design decisions. I will be available in the lab to discuss possible design decisions and how to implement them.

Create a program, however simple, that takes advantage of its parallel platform. What is the simulator's performance?

Part 2 - Scalability

How scalable is your platform? Can you make a 4-color version? 8 colors? Do them and run your program. Evaluate the simulator's performance.

Prepare the 8-color version for delivery.

Bonus - for extra note

When using a multicore system, your cache can cause coherence problems, which you did not address in exercise 4. To earn a bonus, implement a coherence protocol in your cache and demonstrate its operation using variables shared in the code. You must verify that your code is working.

Improved thread management API. The program code separation mechanism is very complicated at this point. You need to take care of each processor, look at the global Lock, etc. Implement an initial function to be performed by all colors and, along with it, a reduced API and compatible with PThread to launch threads on your system. To earn another bonus, check how this code works as well.

Delivery

The code for this activity will have to be delivered. Send the report through Susy and keep the code in your area until I request it. Submit a report only 1 page describing the activity performed and the results obtained.