Discrepancy execution times in parallel programming

17

I made a parallel code in C to verify its execution time. It was treated:

  • Threads;
  • Mutex;
  • False Sharing;
  • Synchronization.

When running time of Linux with code execution, it was generally possible to compute at the following time:

Resultado da soma no modo concorrente = 36028797153181696  

real    0m0.340s  
user    0m1.300s  
sys     0m0.000s  

However, every + - 5 of these executions took a dramatically different time:

Resultado da soma no modo concorrente = 36028797153181696  

real    0m1.863s
user    0m5.584s
sys     0m0.000s  

What did you miss? What do the times real , user , sys mean?

The code that calculates the sum of elements from 1 ~ N

#include <stdio.h>
#include <pthread.h>
#define QTD 268435456 //1024//16384 //8192
#define QTD_N 4

unsigned long long int soma = 0;
unsigned long int porcao = QTD/QTD_N;

struct padding{
    unsigned long long int s;//soma parcial;
    unsigned int i,start,end; 
    unsigned int m; // identificador de parcela
    char p[40];
};

pthread_mutex_t mutex_lock;


void *paralelo(void *region_ptr){
    struct padding *soma_t;
    soma_t = region_ptr;
    soma_t->s = 0;    
    soma_t->start = soma_t->m * porcao + 1;
    soma_t->end = (soma_t->m + 1) * porcao; 
    for(soma_t->i = soma_t->start; soma_t->i <= soma_t->end ; soma_t->i++){
        soma_t->s += soma_t->i;
    }

    pthread_mutex_lock(&mutex_lock);
    soma += soma_t->s;
    pthread_mutex_unlock(&mutex_lock);
    pthread_exit(NULL);
}

int main(void){ 
    pthread_t thread[QTD_N];
    struct padding soma_t[QTD_N];
    int i;
    void *status;
    pthread_mutex_init(&mutex_lock,NULL);
    for(i = 0 ; i < QTD_N ; i++){
        soma_t[i].m = i;
        pthread_create(&thread[i], NULL, paralelo, &soma_t[i]);
    }

    for(i = 0 ; i < QTD_N ; i++){
        pthread_join(thread[i],&status);
    }
    pthread_mutex_destroy(&mutex_lock);

    printf("Resultado da soma no modo concorrente = %lli\n",soma);
    return 0;
}
    
asked by anonymous 22.02.2014 / 15:16

1 answer

14

About time , the result is expressed in three times:

  • real : This is simply the time you would count on a clock (often referred to as wall clock ). This is the difference between the start time and the end time.
  • user : Here is the total amount of time your process spent awake and running in some core. Note that if it is running in more than one kernel at a time, count n times. Then this value can be greater than real .
  • sys : Here is the time the system spent to do something that was requested by your process, such as printing things on the screen or reading files. It is an indirect processing.

About the difference measured during execution, one thread terminates its processing quickly while the other three spend more time processing for some reason whatsoever. It's clear here:

This phenomenon seems to disappear when the code is optimized by the compiler before execution by transforming soma_t->s and soma_t->i into registers. That way there is no more reading and writing memory in the loop (ie: the problem might have something to do with the processor cache).

Note that it is the system that has the job of deciding when and for how long each thread will run. The reason that in your specific case this happened has escaped me, but it is something common. The behavior of this type of code is not deterministic, do not expect it to be.

    
22.02.2014 / 16:48