Parallelizing n queens problem with OpenMP

1

I'm having trouble paralleling my C code with OpenMP and getting good performance on the n queens problem.

The results are always considerably slower with parallelization enabled and I do not know why.

The code is this below:

    #include <stdio.h>
#include <stdlib.h>
#include "omp.h"

int checkQueen(int **queens, int linha, int col, int n);
void printTabuleiro(int **queens, int n);
void play(int **queens, int col, int n, int *sol, int *maxQueens, int *count);


//MAIN
int main(){
    int n = 10, sol = 0, maxQueens = 0, count = 0;

    //alocando rainhas
    int **queens = (int**) malloc(n  * sizeof(int *));
    for(int i = 0; i < n; ++i)
        queens[i] = (int*) malloc(n  * sizeof(int));

    #pragma omp parallel for
    for(int i = 0; i < n; ++i){
        for(int j = 0; j < n; ++j){
            queens[i][j] = 0;
        }
    }

    play(queens, 0, n, &sol, &maxQueens, &count);

    //imprime resultados
        printf("Maximo de rainhas: %d \n", maxQueens);
        printf("Numero de solucoes: %d \n", sol);


    return 0;
}

//testando se é possivel colocar rainha em determinada posição

int checkQueen(int **queens, int linha, int col, int n){

    int a = 0;

    if(queens[linha][col] == 1)
        return 1;
    #pragma omp parallel num_threads(n)
    {
        for(int i = linha, j = col; j >= 0 && i >= 0; --i, --j){
            if(queens[i][j] == 1){
                #pragma atomic
                a = 1;
            }
        }


        if(!a){
            for(int i = linha, j = col; j >= 0 && i < n ; ++i, --j){
                if(queens[i][j] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha, j = col; j < n && i >= 0 ; --i, ++j){
                if(queens[i][j] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha, j = col; j < n && i < n ; ++i, ++j){
                if(queens[i][j] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha; i < n ; ++i){
                if(queens[linha][i] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha; i >= 0 ; --i){
                if(queens[linha][i] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = col; i < n ; ++i){
                if(queens[i][col] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = col; i >= 0 ; --i){
                if(queens[i][col] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }

    }

    return a;
}

//Imprime tabuleiro
void printTabuleiro(int **queens, int n){
    printf("//////////////////////////////////////////////");
    printf("\n//// 1 - rainhas / 0 - posicoes vazias ////\n");
    printf("//////////////////////////////////////////////");
    for(int i = 0; i < n; ++i){
        printf("\n");
        for(int j = 0; j < n; ++j){
            printf("%d - ", queens[i][j]);
        }
    }
    printf("\n");
}

//executa testes
void play(int **queens, int col, int n, int *sol, int *maxQueens, int *count){
    if (col == n){
        printf("\nSolucao #%d\n", ++*sol);
        printTabuleiro(queens, n);
        return;
    }

    for(int i = 0; i < n; ++i){
        if(!checkQueen(queens, i, col, n)){
            queens[i][col] = 1;
            ++*count;
            play(queens, col+1, n, sol, maxQueens, count);

            if(*count > *maxQueens){
                *maxQueens = *count;
            }

            --*count;
            queens[i][col] = 0;
        }
    }


}
    
asked by anonymous 02.03.2018 / 20:01

1 answer

4

Your parallelization strategy is wrong. The problem is the block delimited with this within the function checkQueen

    #pragma omp parallel num_threads(n)

The content code for this block will check the entire board. Since you are casting n threads there, each of these n threads will check the entire board, which means a lot of redundant work. This block will only end when the last of these threads ends. Add that with the complexity and performance impact to maintain the atomic assignments and with that the result will be much slower than a version with a single thread.

In other words, the problem is that instead of you dividing the work into n threads, you multiplied with%

Worse, the n function is called multiple times within the checkQueen recursive function. This means that this overhead of creating and synchronizing threads will be multiplied by the number of times play is called, making everything even slower.

I strongly suggest you to parallelize the checkQueen function, not the play function. To benefit from parallelism, it is important that you depend on as few sequential parts as possible < in> single-thread , and that's exactly how the checkQueen function is.

In addition, to have a good performance gain when paralleling, it's important to avoid threads having to interact with each other. It is perfectly possible to let them interact and is often unavoidable, but in doing so, you will probably have some performance cost, and so this is something to be avoided / minimized. The way you have implemented play has a lot of thread interactions through the checkQueen variable.

The a within for to initialize everything with zero also does not have an appreciative gain of performance when being parallelized. On the contrary, the overhead that thread management should impose will probably be greater than the gain. Incidentally, for performance to gain, the amount of workload that each thread has to earn has to be large enough for it to bypass that overhead, and none of the parts you have paralleled have that feature.

I also have two observations:

  • Make main return 1 when it is not possible to put a queen and 0 when it sounds a bit strange, since it implies that 1 = no and 0 = yes, the opposite of the convention. By moving this, you can remove the checkQueen operator that is in the ! of the if function. Better yet, you can rename this function to play to be clearer.

  • You can allocate a single memory area with queen_ok cells and use n * n to access the elements. That way, you only have to allocate memory once to mount the board and also ensures that it is contiguous, which offers better performance.

  • To prevent one thread from interfering with another while changing the board, create one board per thread and use a function to copy the boards.

  • To avoid having to work with queen[linha * n + coluna] and int *queen in all places, I suggest you create a int n for this.

  • To prevent threads from competing with struct when displaying solutions, use locking. The solution below uses the concept of SOen .

  • Perhaps when using printf , you will end up having a good performance gain considering that not all partially filled trays have the same weight to be analyzed.

  • These variables schedule(dynamic,1) , sol and count will get in the way of parallelization. But it is not very difficult to eliminate them. The maxQueens can be placed in the count I mentioned. struct is only used in a context where you have lock due to the need to use sol . printf can be updated lazily at this point only.

  • Do not forget to give maxQueens to everything you give free .

  • In C, most functions use the notation malloc , not the separada_por_underscores notation.

I suggest that your code looks like this, paralleling camelCase :

#include <stdio.h>
#include <stdlib.h>
#include "omp.h"

typedef struct Tabuleiro {
    int n;
    int *casas;
    int queens_count;
} Tabuleiro;

// Funções básicas sobre tabuleiro.
Tabuleiro *criar_tabuleiro(int n);
Tabuleiro *copiar_tabuleiro(Tabuleiro *original);
void destruir_tabuleiro(Tabuleiro *tabuleiro);
void print_tabuleiro(Tabuleiro *tabuleiro);

// Implementação do n queens.
int queen_ok(Tabuleiro *tabuleiro, int linha, int coluna);
void play(Tabuleiro *tabuleiro, int *solucoes, int *max_queens, omp_lock_t *lock);
void play_in(Tabuleiro *tabuleiro, int coluna, int *solucoes, int *max_queens, omp_lock_t *lock);

// Função main.
int main() {
    int n = 10, solucoes = 0, max_queens = 0;

    omp_lock_t lock;
    omp_init_lock(&lock);

    Tabuleiro *tabuleiro = criar_tabuleiro(n);

    play(tabuleiro, &solucoes, &max_queens, lock);

    printf("Maximo de rainhas: %d \n", max_queens);
    printf("Numero de solucoes: %d \n", solucoes);

    omp_destroy_lock(&lock);
    destruir_tabuleiro(tabuleiro);
    return 0;
}

Tabuleiro *criar_tabuleiro(int n) {
    Tabuleiro *tabuleiro = malloc(sizeof(Tabuleiro));
    tabuleiro->n = n;
    tabuleiro->queens_count = 0;
    tabuleiro->casas = malloc(n * n * sizeof(int));

    for (int i = 0; i < n * n; n++) {
        tabuleiro->casas[i] = 0;
    }
    return tabuleiro;
}

Tabuleiro *copiar_tabuleiro(Tabuleiro *original) {
    int n = original->n;
    Tabuleiro *tabuleiro = malloc(sizeof(Tabuleiro));
    tabuleiro->n = n;
    tabuleiro->queens_count = original->queens_count;
    tabuleiro->casas = malloc(n * n * sizeof(int));

    for (int i = 0; i < n * n; n++) {
        tabuleiro->casas[i] = original->casas[i];
    }
    return tabuleiro;
}

void destruir_tabuleiro(Tabuleiro *tabuleiro) {
    free(tabuleiro->casas);
    free(tabuleiro);
}

void print_tabuleiro(Tabuleiro *tabuleiro) {
    int n = tabuleiro->n;
    for (int i = 0; i < n; ++i) {
        printf("\n");
        for (int j = 0; j < n; ++j) {
            printf("%c", tabuleiro->casas[i * n + j] ? 'X' : '.');
        }
    }
    printf("\n\n");
}

int queen_ok(Tabuleiro *tabuleiro, int linha, int coluna) {
    int n = tabuleiro->n;
    int *queens = tabuleiro->casas;

    if (queens[linha * n + coluna] == 1) return 0;

    // Diagonal para cima e para a esquerda.
    for (int i = linha, j = coluna; j >= 0 && i >= 0; --i, --j) {
        if (queens[i * n + j]) return 0;
    }

    // Diagonal para baixo e para a esquerda.
    for (int i = linha, j = coluna; j >= 0 && i < n; ++i, --j) {
        if (queens[i * n + j]) return 0;
    }

    // Diagonal para cima e para a direita.
    for (int i = linha, j = coluna; j < n && i >= 0; --i, ++j) {
        if (queens[i * n + j]) return 0;
    }

    // Diagonal para baixo e para a direita.
    for (int i = linha, j = coluna; j < n && i < n; ++i, ++j) {
        if (queens[i * n + j]) return 0;
    }

    // Para baixo.
    for (int i = linha; i < n ; ++i) {
        if (queens[i * n + coluna]) return 0;
    }

    // Para cima.
    for (int i = linha; i >= 0; --i) {
        if (queens[i * n + coluna]) return 0;
    }

    // Para a direita.
    for (int j = coluna; j < n; ++j) {
        if (queens[linha * n + j]) return 0;
    }

    // Para a esquerda.
    for (int j = coluna; j >= 0; --j) {
        if (queens[linha * n + j]) return 0;
    }

    return 1;
}

void play(Tabuleiro *tabuleiro, int *solucoes, int *max_queens, omp_lock_t *lock) {
    int n = tabuleiro->n;

    #pragma omp parallel num_threads(n) schedule(dynamic,1)
    for (int i = 0; i < n; ++i) {
        Tabuleiro *copia = copiar_tabuleiro(tabuleiro);
        copia->casas[i * n] = 1;
        copia->queens_count++;

        play_in(copia, 0, solucoes, max_queens, lock);

        destruir_tabuleiro(copia);
    }
}

void play_in(Tabuleiro *tabuleiro, int coluna, int *solucoes, int *max_queens, omp_lock_t *lock) {
    int n = tabuleiro->n;

    if (coluna == n) {
        omp_set_lock(lock);
        printf("\nSolucao #%d\n", ++*solucoes);
        print_tabuleiro(tabuleiro);
        if (*max_queens < tabuleiro->queens_count) {
            *max_queens = tabuleiro->queens_count;
        }
        omp_unset_lock(lock);
        return;
    }

    for (int i = 0; i < n; ++i) {
        if (queen_ok(tabuleiro, i, coluna)) {
            Tabuleiro *copia = copiar_tabuleiro(tabuleiro);
            copia->casas[i * n + coluna] = 1;
            copia->queens_count++;

            play_in(copia, coluna + 1, solucoes, max_queens, lock);

            destruir_tabuleiro(copia);
        }
    }
}

Note: I have not tested this code, so I do not know if it will work, maybe there is some silly mistake somewhere. However, even if something goes wrong, the correct result is certainly something close to that.

You could still make an improvement by trying to avoid repeated allocations and deallocations by pre-allocating a pool of trays and getting them / returning them to that pool when needed. However, I'm not sure what the pool size is, but an estimate of the upper limit size of this pool would be something of the order of n² / 4 . Note also that this pool would be a stack.

    
02.03.2018 / 23:23