Exercise 5 - Assessing the effect of a cache

General Information

Exercise Tips

Motivation

Objectives

This exercise will be divided into parts and, at the end of them, you should be able to:

Contextualization

Attention: All the files you need for this activity are available at / home / staff / rodolfo / mc723 / download.

In each part, read the entire statement once before carrying out your activities.

Part 1 - Preparing for the inclusion of caches

So far, you have not had to change the processor simulator code. This is one of the most important parts of this exercise. Although the change is simple, it will have to be synchronized with the rest of your code.

Start with the router platform from the previous exercise, it already has part of the infrastructure you will need . Separate this code in version control (git) and continue developing from it now.

Changing the processor code: At the beginning of the processor code, there is a declaration of the form ac_tlm_mem DM: 5M;. This line declares an external gateway with the name DM and sets the valid range of values ​​(for programs) to 5MB. You must have already listed the name DM ao DM_port that you used so much in the previous year. The suffix _port It is added to all ports declared in this way (whatever the name, outside the processor, it will have the suffix). ArchC considers the last declared port to be the place where the program read from the disk will be placed (don't worry about it) and, consequently, from where the instructions will be read (this part is important) . So if you declare two ac_tlm_port, the program will be read from the disk and placed on the last declared port (this is not the best way to define a language but this is how it is implemented today).

Your goal, in this part 1, is to include another port (leave the processor with two output ports) and keep the simulator running. The two distinct ports will be used to separate access to instruction and data caches. You should be concerned with the following details:

Part 2 - Inclusion of caches

Now is the time to design a cache. Looking at the high level, the cache serves to store data that is guaranteed to be in memory but that has been used frequently.

Your cache must be configurable. It must be possible to define, at compile time:

The block size, as you may already know, has to be a multiple of the word size of the processor. Both the number of tracks and the number of lines must be greater than zero.

Your first cache version must have a policy write through because it is easier to implement. Also for ease of implementation, the replacement policy can be round robin with a single global counter per cache, as long as there are no invalid blocks in the cache (first look for an invalid block, if not found, replace one chosen by round robin).

A good starting point for your cache is the router of exercise 4, in a version with only one input and one output. Include a data cache and an instruction cache on your platform, test with the program hello world with several different parameters to make sure that the operation is ok. (your cache must be in the ARP IP directory).

Program your cache to collect statistics from hits e misses. Print out these statistics before closing the program. Use the cache class destructor to print.

How did the simulator perform? Makes sense?

Part 3 - Evaluating the caches

How to choose the best cache? For L1 caches, mission rates on the order of 2% to 5% are expected.

You must evaluate the best cache configuration for the program you chose in exercise 3. To do this, run the program several times, with different configurations of instruction and data caches ( they can be different from each other!) and choose the best setting. Put all the settings you seek in your report, along with the missions rate, indicating the direction of the search. Attention: At the beginning of the execution, the simulator reads the program from the disk and writes it to memory, you cannot count these first accesses.

Tip: Use the concept of the 3Cs of the caches to assist you in the search for the best configuration.

Part 4 - Evaluating an L2 cache (optional for bonuses)

With the best configuration you found for L1 caches, now include an L2 cache and find the best configuration for it. At this point, your miss should be between 1% and 2%. The L2 cache is unified, there is no difference between instructions and data in it. Use the same program as in Part 3.

Delivery

The code of this activity will have to be delivered. Send the report via Susy and keep the code in your area until I request it. Submit a report only 1 page describing the activity performed and the results obtained.