Exercise 4 - Using parallelism to increase performance

General Information

Motivation

Objectives

Contextualization

To parallelize this activity, the library will be used Openmp that uses #pragma of C compilers to generate parallel code using threads. 3 pragma options will be used. The first option is the #pragma omp parallel, which parallels a code block. See the example below:

#include

Main()
{
#pragma omp parallel
{
puts ("Hello World");
}
}

This code can be compiled correctly on any C compiler but, when compiled with a compiler with OpenMP support, the generated code will use threads to run in parallel. O gcc do not assume OpenMP, in this exercise the Intel C compiler (icc) will have to be used. To use icc, include in the PATH the directory /home/staff/rodolfo/intel/cc/9.0/bin and declare the environment variable LD_LIBRARY_PATH with value /home/staff/rodolfo/intel/cc/9.0/lib. Write the above code to the file hello.c and compile it twice, without and with OpenMP:

icc hello.c -o hello
icc hello.c -o hello_openmp -openmp

When running both programs, you will see that the OpenMP version prints a message for each processor (real or virtual) on the computer. The number of threads which will be generated depends on the number of processors available, but can also be specified using the environment variable OMP_NUM_THREADS. Do a test with some options for this variable and see the different answers. Although, in general, one does not want to replicate just the code, the ideal is to divide the workload between the various threads generated. The simple way to do this is to parallelize a for loop with #pragma omp for, placed just before the for (there is an example just below in the activity). The last pragma #pragma omp critical (variable), which defines all access to the specified variable as a critical region. There is also an example in this activity of this construction.

Activity

Activities must be carried out individually in its entirety. Using the program below, compile it and run in um computer in rooms 302 ou 303, using the Intel compiler.

#include

#define MAXIMUM 500
#define REPEATS 10000
float A [MAXIMO] [MAXIMO], B [MAXIMO] [MAXIMO], C [MAXIMO] [MAXIMO];

Main()
{
int i,j,k;

for (i = 0; i <MAXIMUM; i ++)
for (j = 0; j <MAXIMUM; j ++) {
A [i] [j] = i + j;
B [i] [j] = i - j;
C [i] [j] = 0;
}

#pragma omp parallel
{
#pragma omp for
for (k = 0; k <REPEAT; k ++)
for (i = 0; i <MAXIMUM; i ++)
for (j = 0; j <MAXIMUM; j ++) {
A [i] [j] + = 1;
C [i] [j] + = A [i] [j] + B [i] [j];
}
}
}

Compile once with each of the compilation option sets:

  1. -O0
  2. -O1
  3. -O2
  4. -O3
  5. -O3 -xN
  6. -O3 -openmp
  7. -O3 -parallel
  8. -O3 -xN -openmp
  9. -O3 -xN -parallel

The option -xN generates exclusive code for Pentium 4, the option -openmp uses OpenMP pragmas to generate parallel code and option -parallel tries to automatically parallelize the code (without using OpenMP pragmas). Run the generated programs, note the time and comment on the report, indicating what were the influences on the execution in each case.

#include
#include

#define LIMIT 5000000

int cousin (int number)
{
int root, factor;

root = (int) sqrt ((double) number);

for (factor = 2; factor <= root; factor ++)
if (number% factor == 0)
0 return;

1 return;
}


int main ()
{

int quantity = 0, number;

#pragma omp parallel for schedule (static, 8)
for (number = 2; number <LIMIT; number ++) {
int p = prime (number);
#pragma omp critical (quantity)
amount + = p;
}

printf ("Total prime numbers up to% d:% d \ n", LIMIT, quantity);
}

Measuring performance compiling with and without the option -openmp. Make a variation of the code by removing the part schedule (static, 8) and compare the performance of the 3 runs. What schedule (static, 8) does? Refer to the OpenMP manual.

General tip: If you think the run times are too close, vary the configuration constants to change the run times. Indicate this in your report and justify your choices.

Delivery

Submit a report of at most 2 pages, describing the activity performed and the results obtained. Analyze and comment on the result.