Loop to perform calculation in C Pagerank

4

Based on starting notes 0.15 for everyone and then add note as explained in link

If we consider pages 1, 2 and 3 to be:

1: link1
2: link2
3: link3

The paginas.txt file for this example would be:

1 link1 
2 link2
3 link3

While the links.txt file would be:

1 2 
2 1
2 3

Your program should then calculate the PageRank of each of the pages contained in the paginas.txt file (based on the links.txt file), and generate as output, a file called PageRanks.txt, where the pages are sorted from highest to lowest PageRank. If two pages have the same PageRank, the file with the lowest id must first come in the file.

Would anyone please let me know why this loop is giving me the wrong results? My link.txt looks like this:

1 3
2 1
3 1
3 2
3 4

My paginas.txt looks like this:

1 link1
2 link2
3 link3
4 link4

My Loop:

void calculoPageRank(struct page *pageRank, int numLinhas){
    int h=1,g=1;
    int j=1,i=1;
    float soma = 0.0, difer= 0.2;

    while (difer>=0.05) {
        while (h<=numLinhas) {
            soma= 0.0;
            i=pageRank[h].numeroDeLinksRecebe;
            while (i!=0) {
                g = pageRank[h].quaisLinks[i];
                soma += 0.85 * (pageRank[g].rank/pageRank[g].numeroDeLinksEnviados);
                i--;
                difer = (soma+0.15) - pageRank[h].rank;
            }

            pageRank[h].rank=soma+0.15;
            printf("%f ",difer);
            h++;
        }
        j++;
    }
}

And the complete algorithm:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct page{
    char pagina[20];
    int numeroDeLinksRecebe;
    int numeroDeLinksEnviados;
    int quaisLinks[30];
    float rank;
}page;

int numeroLinhas(){
    char caracter = '
1: link1
2: link2
3: link3
'; int numLinhas = 0; FILE *arq; arq = fopen("paginas.txt", "r"); while (!feof(arq)) { fread(&caracter, 1, 1, arq); if (caracter=='\n') { numLinhas++; } } fclose(arq); return numLinhas; } void calculoPageRank(struct page *pageRank, int numLinhas){ int h=1,g=1; int j=1,i=1; float soma = 0.0, difer= 0.2; while (difer>=0.05) { while (h<=numLinhas) { soma= 0.0; i=pageRank[h].numeroDeLinksRecebe; while (i!=0) { g = pageRank[h].quaisLinks[i]; soma += 0.85 * (pageRank[g].rank/pageRank[g].numeroDeLinksEnviados); i--; difer = (soma+0.15) - pageRank[h].rank; } pageRank[h].rank=soma+0.15; printf("%f ",difer); h++; } j++; } } void leitura(struct page *pageRank, int numLinhas){ int i=1,j=1; FILE *arq; arq = fopen("paginas.txt", "r"); while (fscanf(arq, "%d %s", &j, pageRank[i].pagina)!=EOF) { i++; } fclose(arq); int g = 1; for (i=1; i<=numLinhas; i++) { pageRank[i].numeroDeLinksRecebe = 0; pageRank[i].numeroDeLinksEnviados=0; pageRank[i].rank = 0.15; } int aux; FILE *arq1; arq1=fopen("links.txt", "r"); while (fscanf(arq1, "%d %d", &i, &aux)!=EOF) { if (i==j) { g++; pageRank[i].numeroDeLinksRecebe++; pageRank[aux].numeroDeLinksEnviados++; } else{ j=i; g=1; pageRank[i].numeroDeLinksRecebe++; pageRank[aux].numeroDeLinksEnviados++; } pageRank[i].quaisLinks[g] = aux; } fclose(arq1); } int main(){ int numLinhas = numeroLinhas()+1; page pageRank[80]; int i=1; leitura(pageRank, numLinhas); for (i=1; i<=numLinhas; i++) { printf("i=%d recebe=%d envia=%d rank=%f\n", i, pageRank[i].numeroDeLinksRecebe, pageRank[i].numeroDeLinksEnviados,pageRank[i].rank); } calculoPageRank(pageRank, numLinhas); for (i=1; i<=numLinhas; i++) { printf("\n %d rank: %f\n", i, pageRank[i].rank); } }

Within this context the loop is returning an output that I did not expect. The correct return would be: PR1 = 1.4901 PR2 = 0.7833 PR3 = 1.5766 PR4 = 0.1500 and is delivering R1 = 0.277500 PR2 = 0.267938 PR3 = 0.623184 PR4 = 0.150000.

Does anyone know why my loop is delivering inconsistent results?

    
asked by anonymous 30.01.2014 / 13:13

2 answers

2

I confess that I can not, at first reading, understand how PageRank works and express it through code. But looking at your attempt to solve, I have identified three potential problems that may be contributing to an incorrect result:

  • Potential infinite loop in% of external%

    When you exit% internal% - which indicates that your condition is no longer satisfied - and simply makes while and continues the outer loop, the condition of while internal has not changed >. This means that if after the first iteration j++ continues to be greater than while , it will enter an infinite loop. For your luck, this did not happen, otherwise instead of an incorrect output you would have a program crashed ...

    I suggest to solve this before moving on to item 2. If what you want is to go through all the lines again, what is missing is to return difer to zero. By the way ...

  • Why does 0.05 start with h , and h to zero?

    Is this intentional? Or you're not aware that in% with%, the indexes start at zero (ie the 1st element is 1 , 2nd i , 3rd C , and so on) ? Sorry if I said one basic thing, but I did not see reason looking at the code to assume that you actually wanted to "skip" the first element. (and if this is really wrong, I'm surprised you do not find a segfault ...)

  • External%% only takes into account the last visited link.

    Each time you parse a link, you assign array[0] several times (this is correct, but it's not really necessary - you could do it once, just after exiting the array[1] more internal; is detail). However, when you are going to parse the next link you are discarding the values of the previous links, so that in the end only the last link visited contributes to array[2] - and therefore defines if the outer loop goes continue or not.

    I suppose this is not what you intended, or am I mistaken? Maybe you just want to continue the loop if% max is is greater than while .

  • Putting it all together, I suggest changing your code to:

    int h=1,g=1; // (Estou mantendo essas linhas iguais,
    int j=1,i=1; //  pois elas não importam agora...)
    float soma = 0.0, difer= 0.2, max_difer = 0.2; // Variável extra para a diferença máxima
    
    while (max_difer>=0.05) { // Agora compara a diferença máxima
        max_difer = 0.2; // ... resetando ela no início de cada loop
        h = 0; // h também tem de ser resetado, para zero, não para 1
    
        while (h<numLinhas) { // h vai de 0 a numLinhas-1, então é < e não <=
            soma= 0.0;
            i=pageRank[h].numeroDeLinksRecebe-1; // idem para o i
            while (i>=0) { // i pode ir até zero, então é >= em vez de !=
                g = pageRank[h].quaisLinks[i];
                soma += 0.85 * (pageRank[g].rank/pageRank[g].numeroDeLinksEnviados);
                i--;
            }
            difer = (soma+0.15) - pageRank[h].rank; // Movi pra fora do loop interno
    
            if ( difer < max_difer ) // Atualiza a diferença máxima
                max_difer = difer;
    
            pageRank[h].rank=soma+0.15;
            printf("%f ",difer);
            h++;
        }
        j++;
    }
    

    I do not know if it will lead to the expected result, but I hope it will be of some help!

        
    30.01.2014 / 21:43
    2

    Some notes about style:

    • Use for when the number of iterations is known in advance, and leave while for when you must reach a condition in an unknown number of steps.

    • Declare the variables at the point closest to the point of use, and in the narrowest scope required. Some teachers believe that one must declare the variables in a block at the beginning of the function, a habit that must be abandoned. For example, i is not used outside the h loop, so it's worth declaring it just inside the loop.

    • If possible, use C99 or C11, which allows you to for example use the declaration of variables within for .

    • Constants in code lose meaning fast. What does 0.85 mean? Of course, you scribbled on a paper to get that value, but that information was not in the code. Consider declaring p = 0.15 within the function, and replacing the places where that value is used.

    • By saying a little more about this: there is a joke that is immoral to write a destruir_bagdah() function: the correct thing is to do a destruir(cidade) function and then pass Baghdad as a parameter. In your case, consider that PageRank might work with other values of p desired, so that it should come from outside.

    Quite possibly, the problem with your code is that you're changing the array of ranks as you go through it. You need to make a copy of the structure in order to change the new one.

    This is evident if you see the calculation of PageRank as an iteration of matrix multiplication. If A is the matrix that gives the links, where A[i][j] = 1 if the page i has a link to page j , and v is the ranks vector, then PageRank is obtained iterating

    v_novo = A * v_prev + v_aleatório
    

    until v_prev ~ v_novo . I have not tested the following program, but I hope it is correct. Next time, consider easing the work of those who respond, many times only in the effort to create a minimal example you solve the problem.

    // Nao precisa colocar o tipo de pageRank como struct page, vc ja
    // fez um typedef
    void calculoPageRank(page *pageRank, int numLinhas){    
        // Se nao tem motivo pra usar float, use double, que eh mais preciso
        double p = 0.15;
        double soma = 0.0, difer= 0.2; 
    
        int contador = 0; // j nao parecia um bom nome pra essa variavel
        while (difer >= 0.05) {
            page copia[numLinhas]; //C99 - se nao funcionar, use copia[80]
    
            // memcpy copia pedacos de memoria. Necessita do header <string.h>
            memcpy(copia, pageRank, numLinhas * sizeof(page));
    
            for (int h = 1; h <= numLinhas; h++) {
                soma= 0.0;
                int i = copia[h].numeroDeLinksRecebe;
                for (; i > 0; i--) {
                    int g = copia[h].quaisLinks[i];
                    soma += (1 - p) * (copia[g].rank / copia[g].numeroDeLinksEnviados);
                    difer = (soma + p) - copia[h].rank;
                }
    
                pageRank[h].rank = soma + p;
                printf("%f ",difer);
            }
            contador++; // Pra que voce esta usando isso?
        }
    }
    
        
    23.02.2014 / 19:32