MC723 - Exercise 4

General Information

Deadline: 26/04/2010 until 8:00
Delivery format: Report in PDF format with only 1 page
Delivery method: via susy

Exercise Tips

Use, preferably, the versions of SystemC, TLM and ArchC that are installed on my homedir. That way you won't have to work to recompile everything. Use the ssh.students.ic.unicamp.br machine if you are not in IC3
If you are going to recompile, recompile everything. A major source of problems is having some of these packages compiled with a different version of gcc than the others.
This exercise is to familiarize yourself with the tools that will be used in the jobs. Do not let blank debts pass. The activities are not complicated, but try to understand exactly what is being requested.

Motivation

Transform the simple platform of exercise 3 into something more useful
Migrate software components to hardware, as is done on different peripherals in the real world (example: video cards)

Objectives

This exercise will be divided into parts and, at the end of them, you should be able to:

Include a router on exercise platform 3
Include an extra peripheral on this platform and execute its behavior through software
Understand where it occurs and what are the problems related to endian
Accelerate performance through specific peripherals

Contextualization

Attention: All the files you need for this activity are available at / home / staff / rodolfo / mc723 / download.

In each part, read the entire statement once before carrying out its activities.

Part 1 - Including a router

Based on the exercise 3 platform (part 1), you must include a router between the processor and the memory. The code of this router must be placed in the folder is the platform. Use one of the programs in Exercise 3.

See a little more details below, about the platform, before the recommendations for your code.

Every platform code is separated into components, as you may have noticed. Each of the components is in an isolated directory and they are grouped into directories according to their classification. This organization by directories is only to facilitate the location and also the understanding, it is not necessary for the assembly of the platform (although you should always follow it).

Its first platform had only two modules: a processor and a memory. Now it's time to add a third module, the router, which will act as the system bus. It is being called a router because it will not implement the traffic containment features that the bus does, nor some other interesting features. So you will work with something simplified at this point.

The communication between the processor and the memory is done through the TLM standard, which basically performs the connection between the system components. The TLM standard is point-to-point, requiring a port on each side for each communication channel. In addition, the TLM includes the concept of master and slave, where the master always makes requests and the slave only answers them. In the original case, the processor is the master and the memory is a slave. The big problem with this initial configuration is the difficulty in including a new peripheral as it will be necessary to modify the processor so that it communicates with two different devices. Thus, the solution is to include a peripheral in the middle of the path that will only play the role of routing, this is the router that you will do.

From the point of view of TLM, connections are always made between ports, the processor has a Master port and the memory has a Slave port. In the main program (main.cpp of the platform), you see the connection between the processor and the memory through the line:

mips1_proc1.DM_port (mem.target_export);

this line connects the door DM_port from the processor to the port target_export of memory. Looking at this code a little earlier, you can see the processor declaration mips1_proc1 and memory Member. Your task is to create a new component, called a router, that will be connected to mips1_proc1 and also to mem. For this, he will need to be a slave in his connection with the processor and a master in his connection with the memory. In addition, all requests from the processor must be transferred to memory at this time.

The simplest way to implement this router is to start with the memory code, removing the part related to data storage and including the part of the Master connection that you can follow the code example processor (look for the mips1 class declaration). you will always work around and in the implementation of the method transportation.

It is expected, in this activity, that you locate and assemble the correct code for your router. All source code examples are already given and you have recently used them (exercise 3).

Me ao slowdown factor and put it in your report. Compare with exercise 3. Since you added another component halfway, you may have noticed an increase in this number. Also note the number of instructions per second that the simulator can execute.

Part 2 - Inclusion of a basic peripheral

An overview of the platform so far: 3 components (processor, router and memory) that are run on your computer (x86) and a program that was compiled for MIPS and is running (simulated ) within the platform.

The next step is to create a new peripheral to include on the platform. This peripheral will have a very similar code to memory, except for the storage part, and will be used heavily in your work. The specification is quite simple: it only stores a 32-bit value (initially zeroed), every reading returns the stored value and changes the value to 1. Every write writes the requested value. The operation of this peripheral allows to simulate an instruction of load and increment, whose main purpose is to control competition. See the example: suppose two processors running exactly the same code at almost the same time:

volatile int * lock = (int *) ENDERECO_LOCK;

// Wait for the value to be 0
while (* lock);
// Perform something in the critical region
...
// Releases the critical region
* lock = 0;

One of the processors will arrive at the while first and read the value 0, passing directly to the critical region. At this point, it will be able to execute whatever code it wants without the other processor getting in the way. In the meantime, the other processor will be executing the while (remember that the first reading of the device will return 0 and change the value to 1). After the first processor closes the critical region, it can release the lock, allowing the second processor to enter the critical region. This applies to as many processors as needed. You can even define an AquireLock macro with the while and another ReleaseLock with the last line of the example.

To include the peripheral, start from the memory code, implement the new peripheral, add a new master port on the router and change the router code to send readings to the Lock to Lock address and the others for memory. What is the desired address of Lock? Looking at the source code of the platform and MIPS, note that they use 5Mb of memory, so you can use the next word as the base address.

Make a program that reads the device several times and shows the values read on the screen. In a future activity you will have a multicore platform and will be able to test the code like the one above. Measure again slowdown factor and number of instructions per second.

Part 3 - Redefining the base address of the peripheral

The simplest way to indicate the address of a peripheral is through the router, which should check for the value of the address field and send it to the peripheral. The big problem here is that each peripheral can have multiple addresses allocated to it. See the example of the memory that has 5M addresses. Whenever a peripheral has more than one address, its internal address is always considered to start at zero. Thus, in the activity in the previous part, the 5M address must be seen as the base address and sent as 0 to Lock. Implement this on your platform.

Working with different base addresses and keeping internal addresses always starting at zero allows you to move a peripheral from one address to another without having to worry about anything else, as everything will continue to work.

Measure again slowdown factor and number of instructions per second.

Part 4 - Inclusion of a more elaborate peripheral

Now it's time to use a peripheral to speed up the platform's performance. Before we define a program as metric. Implement a program that performs matrix multiplication. To make it easier, use square matrices, generated with the following property: a [i, j] = i + j.

The first version of your program should work with integers, define an array size so that the code runs around 5s. Now transform your matrix from integer to double. How long did your program run? This time difference is due to the fact that the simulator uses a floating point number emulation library, not having a native implementation. Therefore, now you must create 1 peripheral that performs the multiplication of two numbers of type double. See what you need:

Define the input and output parameters. A multiplication needs 2 inputs and 1 output
Define an address for each of these parameters. How big is a double?
Define a communication protocol with this device. My suggestion is to use the simplest of all: send the first number, then the second, then read the answer. The execution of the peripheral functionality can be done at the time of requesting the answer.

This peripheral will have to deal with the endian difference between the local architecture (x86) and MIPS. You have not had to worry about this until now because the simulator itself solves 32-bit endian problems but a double is bigger than this. How to fix the problem?

Take the time now again and see if your peripheral is faster.

Create a new peripheral with the functionality of adding two double numbers and see how much its performance improves.

Delivery

The code of this activity will have to be delivered. Send the report via Susy and keep the code in your area until I request it.
Submit a report only 1 page describing the activity performed and the results obtained.

Term: 26/04/2010 as 8:00
Format: PDF. See the example of names in the delivery list.
Delivery fee: via susy

Return discipline page.

Exercise 4 - Expansion of the platform, including router and peripherals