Problems
There are some flaws in the logic that is implemented:
- You are not removing repeated multiple successively because you only use
a
at the end of the second for
- You can not
free(p)
and then use for
to advance to the next with p = p -> prox
-
agenda = p -> prox
is not necessary if you remove the duplicates from the first instead of the first one itself. This causes your statement to be agenda->prox = p->prox
or p->prox =q->prox
since to a
be NULL
p
must be exactly in the next element
Correcting problems
Using the logic and variables I have suggested I do this:
Elemento* remove_duplicados(Elemento* agenda){
Elemento* a;
Elemento* p;
Elemento* q;
for(p = agenda; p != NULL; p = p -> prox ){
a = NULL; //a cada varrimento a começa a NULL
for(q = p -> prox; q != NULL; ){ //sem incremento
if(0 == strcmp(p -> info.nome, q -> info.nome)){ //teste de igual
if(a == NULL){ //remove do inicio
p -> prox = q -> prox;
}
else{ //ou do meio/fim
a -> prox = q -> prox;
}
Elemento* remover = q;
q = q -> prox;
free(remover);
}
else { //so atualiza o anterior quando não é igual
a = q;
q = q->prox;
}
}
}
return agenda;
}
I removed the comments that I had to show the ones I put, and with the explanations associated with the changes I made.
See this code working on Ideone
Refactoring for better readability
The variable names you have are very uninspiring and difficult to read. You should always try to give names representative of the content that the variables have. In addition, the first element is never changed since duplicates are removed to the front, so the return type is also not required, and must be void
.
Taking this into account we can make the function much clearer:
void remove_duplicados(Elemento* agenda){
Elemento *anterior, *corrente, *seguinte;
for(corrente = agenda; corrente != NULL; corrente = corrente -> prox ){
anterior = NULL;
for(seguinte = corrente -> prox; seguinte != NULL; ){
if(0 == strcmp(corrente -> info.nome, seguinte -> info.nome)){
if(anterior == NULL)
corrente -> prox = seguinte -> prox;
else
anterior -> prox = seguinte -> prox;
Elemento* remover = seguinte;
seguinte = seguinte -> prox;
free(remover);
}
else {
anterior = seguinte;
seguinte = seguinte->prox;
}
}
}
}
Efficiency with Hash Table
If there is concern about efficiency, the solution you presented yourself, which I picked up on, can cause problems if the amount of data is large, since it is a quadratic solution, with time complexity in the order of O (n²).
There are several ways to improve the solution but with a hash table and ensuring few collisions we get a solution in the order of O (n).
We begin with the structure of each hash table entry and the corresponding array that represents it:
typedef struct entrada {
struct info *dados;
struct entrada *prox;
} entrada;
#define TAMANHO_HASH 100
entrada *hash_nomes[TAMANHO_HASH];
In order to find the entry for a given value, the hash function is required, which in this case we can use as strings:
unsigned long hash(char *str){
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c;
return hash;
}
Creating a good hash function is by itself a science, and will reflect on the performance of the algorithm because it will create many or few collisions depending on how it is built. For simplicity I used
an already known .
The function that remove_duplicados
is now simplified since it is based entirely on the hash table to know if the current value already exists:
void remove_duplicados(Elemento* agenda){
Elemento *corrente = agenda, *anterior = NULL;
while (corrente != NULL){
if (adiciona_se_inexistente(&corrente->info) == 0){ //0 não adicionou por já existir
anterior->prox = corrente->prox;
Elemento* remover = corrente;
corrente = corrente -> prox;
free(remover);
}
else {
anterior = corrente;
corrente = corrente -> prox;
}
}
}
Here, we used the adiciona_se_inexistente
function to add to the hash table if it does not exist:
int adiciona_se_inexistente(struct info* dados){
unsigned long key = hash(dados->nome) % TAMANHO_HASH; //achar a entrada
entrada *corrente = hash_nomes[key];
while (corrente != NULL){ //verificar se já existe
if (strcmp(corrente->dados->nome, dados->nome) == 0){
return 0;
}
corrente = corrente->prox;
}
//não existe adiciona a nova entrada
entrada *nova = malloc(sizeof(entrada));
nova->prox = hash_nomes[key];
nova->dados = dados;
hash_nomes[key] = nova;
return 1;
}
The hash table has to be initialized with all the
NULL
entries, which I did using the
memset
.
Comments:
- Implemented the hash table with list of internal collisions instead of open addressing, which ends up using more space, but is potentially simpler to implement.
- This solution is much more performance-efficient if we ensure that we have few collisions, depending on the hash function and the size of the table. However it will use more memory than its original solution. Note that the size of
100
I chose for the table is good for the number of names I have used ( 5
), but if you have more names you can already generate many collisions, so it is something that has to adapt to your needs .
- It is also worth remembering that if you only need this function at a point in the program and you are sure that later it will not be necessary, you must free up the space of all the filled entries in the hash table, with
free
.
Example of this Ideone implementation