Exercise 5 - Including a peripheral on your platform

Motivation

Objectives

Activity

Attention: All the files you need for this activity are available at / home / staff / rodolfo / mc723 / download.

In each part, read the entire statement once before carrying out your activities.

Part 1 - Inclusion of a basic peripheral

An overview of the platform so far: 3 components (processor, caches, router and memory) that are code executed on your computer (x86) and a program that was compiled for MIPS and is running (simulated) within the platform.

The next step is to create a new peripheral to include on the platform. This peripheral will have the code very similar to the memory, except for the storage part, and will be used heavily in your work. The specification is very simple: it only stores a 32-bit value (initially zeroed), every reading returns the stored value and changes the value to 1. Every write writes the requested value. The operation of this peripheral allows to simulate an instruction of load and increment, whose main purpose is to control competition. See the example: suppose two processors running exactly the same code at almost the same time:

volatile int * lock = (int *) ENDERECO_LOCK;

// Wait for the value to be 0
while (* lock);
// Do something in the critical region
...
// Releases the critical region
* lock = 0;

One of the processors will arrive at the while first and will read the value 0, passing directly to the critical region. At this point, he can execute the code he wants without the other processor getting in the way. In the meantime, the other processor will be executing the while (remember that the first reading of the peripheral will return 0 and change the value to 1). After the first processor closes the critical region, it can release the lock, allowing the second processor to enter the critical region. This is true for as many processors as needed. You can even define an AquireLock macro with the while and another ReleaseLock with the last line of the example.

To include the peripheral, start from the memory code, implement the new peripheral, add a new master port on the router and change the router code to send readings to the Lock address for Lock and the rest to the memory. What is Lock's desired address? Looking at the source code of the platform and MIPS, you can see that they use 5Mb of memory, so you can use the next word as the base address.

Make a program that reads the peripheral several times and shows the values ​​read on the screen. In a future activity you will have a multicore platform and will be able to test the code like the one above. You must have noticed something strange happening, did your values ​​change as expected? The problem here is that the cache is holding the data from the last reading, faithfully fulfilling the functionality you expect from it.

Part 2 - Limiting the working cache address range

To solve the problem of the previous part, you must change the encoding of your cache, from the previous exercise, to only store the values ​​if they are in a certain address range. By the previous coding example, you cannot store values ​​that are outside the initial 5Mb limit of memory.

This addressing limit in the cache is common practice in processors when it comes to addressing peripherals.

Part 3 - Resetting the peripheral's base address

The simplest way to indicate the address of a peripheral is through the router, which must check the value of the address field and send it to the peripheral. The big problem here is that each peripheral can have multiple addresses allocated to it. See the example of the memory that has 5M addresses. Whenever a peripheral has more than one address, its internal address is always considered to start at zero. Thus, in the activity in the previous part, the address 5M should be seen as the base address and sent as 0 to Lock. Implement this on your platform.

Working with different base addresses and keeping internal addresses always starting at zero allows you to move a peripheral from one address to another without having to worry about anything else, as everything will continue to work.

Part 4 - Inclusion of a more elaborate peripheral

Now it's time to use a peripheral to accelerate the platform's performance. First, let's define a program as a metric. Implement a program that performs matrix multiplication. To make it easier, use square matrices, generated with the following property: a [i, j] = i + j.

The first version of your program should work with integers, define an array size so that the code runs around 5s. Now transform your matrix from integer to double. How long did your program run? This time difference is due to the fact that the simulator uses a floating point number emulation library, not having a native implementation. For this reason, you must now create 1 peripheral that performs the multiplication of two numbers of type double. Here's what you need:

  1. Define the input and output parameters. Precise multiplication of 2 inputs and 1 output
  2. Define an address for each of these parameters. What is the size of a double?
  3. Define a communication protocol with this peripheral. My suggestion is to use the simplest of all: send the first number, then the second, then read the answer. The execution of the peripheral functionality can be done at the time of requesting the response.

This peripheral will have to deal with the difference in endian between local architecture (x86) and MIPS. You haven't had to worry about this until now as the simulator itself solves 32-bit endian problems but a double is bigger than this. How to fix the problem?

Measure the time now again and see if your device is faster.

Create a new peripheral with the functionality of adding two double numbers and see how much its performance improves.

Delivery

The code for this activity will have to be delivered. Send the report through Susy and keep the code in your area until I request it. Submit a report only 1 page describing the activity performed and the results obtained.