Data Comparison in Files [C]

1

I want to make an algorithm that takes a name in an X file and see if this name exists in another file Y, if it does not exist it writes that name to a third Z file. (Basically an algorithm that points the missing ones in a attendance list). I tried to do this in the algorithm below but it did not work, could anyone point to errors in the logical structure that prevent it from working?

#include <stdio.h>
#include <string.h>

typedef struct tipo_nome{
    char nomePessoa[50];
} nome;

void main(){
    int i, k=0;
    nome nome1, nome2;
    FILE *file1, *file2, *file3;
    file1 = fopen("listaCompleta.txt", "r");
    file2 = fopen("listaPresenca.txt", "r");
    file3 = fopen("listaFaltantes.txt", "a+");

    do{
        fread(&nome1, sizeof(nome), 1, file1);
        do{
            fread(&nome2, sizeof(nome), 1, file2);
            if(strcmp(nome1.nomePessoa, nome2.nomePessoa)==0){
                k=1;
            }
        }while(!feof(file2) && k==0);

        if(k==0){
            fwrite(&nome1, sizeof(nome), 1, file3);
        }
        k=0;    
    }while(!feof(file1));

}
    
asked by anonymous 18.08.2017 / 05:33

2 answers

1

You want to create the following C set:

Todothis,youneedtoanswertwoquestions:

  • HowtoidentifyebelongingtoA?
  • HowtoidentifyenotbelongingtoB?
  • AsinourcaseAisafile,thenebelongstoAifitistheresultofreadingthefile.Sowewereabletoanswerthefirstquestion.

    Now,howdoyouknowifedoesnotbelongtoB?Theoperationofnon-pertinenceimpliesthefollowing:

    IfBisacommonset,itmeansthatIwillalwaysneedtocompareallofitselementstomakesurethatedoesnotbelongtoB.Butwedonothavetosticktocommonclusters,Icanhaveorderedclusters,hashmapsortreenames,allofthesealternativesallowoptimizationstobemadeinthenumberofcomparisonsmade.Anefficientsearchdatastructurewillreducethenumberofoperationsperformed.

    Forthescopeofthisanswer,I'mnotoptimizingtheamountofcomparisons.I'malsotakingintoaccountthatthefilecontainingtheBsetisrelativelysmall,withafewmegabytesmaximum.

    SincetheBsetisalwaysqueriedasawhole,andtheAsetonlyhasimportanceofgeneratingthenextelement,atthestartofmyalgorithmIwilltotallypreloadB.TheinitializationoftheAset,inthiscase,willsimplybetoopenthefile.

    Thegeneralideaismoreorlessasfollows:

    inicializaconjuntoAinicializaconjuntoBfaça:pegue_e_opróximoelementodeAse_e_nãopertenceaB:adiciona_e_emCenquantonãochegouaofimdoconjuntoA

    InitializingAwouldonlyopenthefile.IfweweretotrytooptimizethecomparisonsbetweenAandB,wecouldusesomedatastructureinAasanorderedvector.

    inicializaçãoconjuntoA:abrearquivo"listaCompleta"
    

    Initializing the B set here is your complete reading. Since we do not know its total size a priori, we can use a linked list, whose node contains a nome structure and a pointer to the next linked list element:

    inicialização conjunto B:
        nodo_lista *conjuntoB = NULL
        abre arquivo "listaPresenca"
        faça:
            nodo_lista *novoElemento = malloc(sizeof(nodo_lista))
            novoElemento->next = conjuntoB
            lê do arquivo "listaPresenca" no endereço &(novoElemento->valor)
            se a leitura deu certo:
                conjuntoB = novoElemento
        enquanto não der fim do arquivo "listaPresenca"
        fecha arquivo "listaPresenca"
    

    Getting the next element is just giving fread the way you did it:

    pegue _e_ o próximo elemento de A:
        nome _e_
        lê do arquivo "listaCompleta" no endereço &_e_
    

    The relevance of e to B is being treated by looking at the entire set B , so we need to do a complete iteration:

    _e_ pertence a B?
        nodo_lista *elementoB = conjuntoB
        enquanto elementoB != NULL:
            se elementoB->valor é igual a _e_:
                retorna "_e_ pertence a B"
            elementoB = elementoB->next
        retorna "_e_ não pertence a B"
    

    The comparison between two elements of type nome is comparing the nomePessoa field of the two objects using strcmp :

    _a_ é igual a _b_?
        retorna strcmp(_a_.nomePessoa, _b_.nomePessoa) == 0
    

    Add element e to set C can either put the new element in a list or write directly to the file with fwrite . If you use the alternative to be in a list, after completing the analysis of all items of A , it is necessary to write these items to the file.

    Differences between our approaches

    Basically, I am filling the B set in working memory while you try to keep using external memory (such as HD). The problem with your approach is that before you re-examine an item in the A set, you would have to reposition the file that represents the B set to start again. A simple fseek soon after reading the first file, forcing the second file back to the beginning, would sometimes return to the beginning of the iteration of the B set.

    As your alternative involves a quadratic amount of readings to external memory, I did not find it to be a practical solution. So, by using a linear amount of readings to the external memory, I fill in the whole B set.

        
    18.08.2017 / 06:44
    1

    Here's a possible solution to your problem:

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    
    
    #define LINHA_MAX_TAM  (100)
    #define NOME_MAX_TAM   (50)
    
    
    typedef struct pessoa_s
    {
        char nome[ NOME_MAX_TAM + 1 ];
    } pessoa_t;
    
    
    int carregar_pessoas( const char * arq, pessoa_t ** pp, int * qtd )
    {
        FILE * pf = NULL;
        char linha[ LINHA_MAX_TAM + 1 ] = {0};
        pessoa_t * p = NULL;
        int n = 0;
    
        pf = fopen( arq, "r" );
    
        if(!pf)
            return -1;
    
        while( fgets( linha, LINHA_MAX_TAM, pf ) )
        {
            linha[ strcspn(linha, "\n") ] = 0;
            n++;
            p = realloc( p, n * sizeof(pessoa_t) );
            strncpy( p[n-1].nome, linha, NOME_MAX_TAM );
        }
    
        fclose(pf);
    
        *qtd = n;
        *pp = p;
    
        return 0;
    }
    
    
    int gravar_pessoas( const char * arq, pessoa_t * p, int qtd )
    {
        FILE * pf = NULL;
        int i = 0;
    
        pf = fopen( arq, "w" );
    
        if(!pf)
            return -1;
    
        for( i = 0; i < qtd; i++ )
            fprintf( pf, "%s\n", p[i].nome );
    
        fclose( pf );
    
        return 0;
    }
    
    
    int processar_ausencias( pessoa_t * pc, int qtdc, pessoa_t * pp, int qtdp, pessoa_t ** pa, int * qtda )
    {
        int i = 0;
        int j = 0;
        int presente = 0;
        pessoa_t * p = NULL;
        int n = 0;
    
        for( i = 0; i < qtdc; i++ )
        {
            presente = 0;
    
            for( j = 0; j < qtdp; j++ )
            {
                if( !strcmp( pc[i].nome, pp[j].nome ) )
                {
                    presente = 1;
                    break;
                }
            }
    
            if(!presente)
            {
                n++;
                p = realloc( p, n * sizeof(pessoa_t) );
                strncpy( p[n-1].nome, pc[i].nome, NOME_MAX_TAM );
            }
        }
    
        *pa = p;
        *qtda = n;
    
        return 0;
    }
    
    
    int main( void )
    {
        pessoa_t * lst_completa = NULL;
        int qtd_completa = 0;
    
        pessoa_t * lst_presenca = NULL;
        int qtd_presenca = 0;
    
        pessoa_t * lst_ausencia = NULL;
        int qtd_ausencia = 0;
    
        carregar_pessoas( "listaCompleta.txt", &lst_completa, &qtd_completa );
        carregar_pessoas( "listaPresenca.txt", &lst_presenca, &qtd_presenca );
    
        processar_ausencias( lst_completa, qtd_completa, lst_presenca, qtd_presenca, &lst_ausencia, &qtd_ausencia );
    
        gravar_pessoas( "listaFaltantes.txt", lst_ausencia, qtd_ausencia );
    
        free(lst_ausencia);
        free(lst_completa);
        free(lst_presenca);
    
        return 0;
    }
    

    Fulllist.txt

    JOSE
    JOAO
    ANTONIO
    FRANCISCO
    LUIZ
    CARLOS
    PEDRO
    PAULO
    MANOEL
    LUCAS
    

    ListPresenca.txt

    JOSE
    JOAO
    LUIZ
    PEDRO
    PAULO
    MANOEL
    LUCAS
    

    Falselist.txt

    ANTONIO
    FRANCISCO
    CARLOS
    
        
    18.08.2017 / 21:53