Dealing with collisions in dictionary C #

3

I need to map an array of strings to a dictionary so that I can later test whether an informed string is valid by checking that it is part of dict , and would like to retrieve some other information stored in dict .

For this I am using the following structure:

struct reservedWord
    {
        public String command;
        public type token;      //token é um enumerador
    };

When trying to store the sequence of strings by means of the code (modifications trying to follow the recommendations of @mgibsonbr):

Dictionary<string, string> palavras = new Dictionary<string, string>();
for (int i = 0; i < reservedWords.Length; i++)
        (...)
        palavras.Add(reservedWords[i], word);

At run time an exception is generated by the duplicate key insertion. Even when I replace this insertion code with

palavras.Add(reservedWords[i].getHashCode(), word);    

I get the same error message.

    
asked by anonymous 23.03.2014 / 15:40

4 answers

2

Techniques to deal with collisions

The% class of .Net does not allow null or duplicate keys. You will have to deal with these occurrences in the most appropriate way for your use. There are several techniques that depend on the behavior you desire. In case of using duplicate keys, I highlight the following:

  • Replace: If you want to add an item to the dictionary, replacing the key if it already exists:

    dict[chave] = valor;
    
  • Add if it does not exist:

    if (!dict.ContainsKey(chave))
         dict.Add(chave, valor);
    
  • Recover if there is or add: If you want to add a key / value, only if the key is not already occupied, and if it is busy then retrieve the existing value:

    TValor original;
    if (!dict.TryGet(chave, out original))
        dict.Add(chave, valor);
    
  • Adding multiple items to the same key: If you want to add multiple elements to the same key, you would actually have to use a list dictionary, along with add /

    List lista;
    if (!dict.TryGet(chave, out lista))
        dict[chave] = lista = new List();
    lista.Add(valor);
    

Some potential problems that I noticed in your code

  • Do not use the Dictionary<TKey, TValue> method as the dictionary key. This method will already be used internally by the class GetHashCode() on all keys, and serves to speed up the process of locating elements within the dictionary.

  • Your Dictionary<TChave, TValor> does not implement struct , nor is a IEquatable<T> passed to the dictionary. Because of this, the hash will be extracted from the first field in your struct , which is the IEqualityComparer<T> field ... is that what you really want? That sounds pretty good to me, but it's still good to notice.

  • Your command has fields that can be changed after entering the dictionary. A type that is used as a key for a dictionary, can not have its hash changed, and it is recommended that struct be immutable.

24.03.2014 / 14:29
1

Well, one way to check for collisions before including them in the dictionary is to use the ContainsKey " of Dictionary. This method allows you to check if there is such a key for each dictionary item. Example with an extension method:

public static bool AddSafe<TKey, TValue>(this Dictionary<TKey, TValue> dictionary, TKey key, TValue value)
{
    if (!dictionary.ContainsKey(key))  
    {
        dictionary.Add(key, value);
        return true;
    }  
    return false;
}
    
23.03.2014 / 18:03
0

There are some misconceptions here, we need to solve them before thinking about the most appropriate solution:

  • Is there really a "collision", or is your problem "duplicate keys"?

    If your data mass has, for example, key "a" mapping both 1 and 2 , then the problem is not a hash collision. Collision would be if we had "a" mapping to 1 and "b" mapping to 2 , but hash(a) == hash(b) .

    3 collisions in 59 entries does not seem to me "very small", but an excessive number, especially if the hash function is well done (as I believe to be the case with getHashCode ). So I suspect your case is duplicate keys, but that's only you can confirm. If so, and a normal key in your application is pointing to the same value - then in fact a solution like your proposal (map key to a list) would be correct.

  • If there is a collision, who is responsible for handling it?

    Collisions can occur in any hash function, and the libraries that make use of them are (or should be) prepared to handle them. In the example above, if you insert "a" and "b" in a Dictionary and there is a collision between your hashes, it is the obligation of the Dictionary to do something about it, not yours. Unless this library is very broken I have no practical experience with C #, but I doubt that is the case. This will be handled transparently for the programmer-user [of the library, ie you].

    Your example above leads me to wonder if it was even a% w_that you wanted. If your keys are strings, would not it be the case to use Dictionary<int, int> ? Because if you call the method Dictionary<string, int> of the string manually, and use its result as the key (and not the original string) then you are transferring to you a library liability (handle collisions). As far as I know, there's no reason to do this ...

  • If after reading the above you still need to deal with collisions manually, please update your question in more detail and I'll try to give you a better direction on your options (chaining, rehashing ...).

        
    23.03.2014 / 18:14
    0

    Another suggestion was to construct the Dictionary around the reservedWord struct itself, after it is done with IEquatable<reservedWord>

    public struct reservedWord : IEquatable<reservedWord>
    {
      public string command;
      public MyType token;
    
      public override int GetHashCode(){ return command.GetHashCode() ^ (int)token; }
      public bool Equals(reservedWord other)
      {
        // seja qual for o teu criteria de egualdade...'case-sensitive'?
        return String.Equals(command, other.command) && token == other.token;
      }
    }
    

    and then

    var palavras = new Dictionary<reservedWord, string>();
    var reservedWords = new reservedWord[] { .... }; // lista de struct 'reservedWord'
    for (int i = 0; i < reservedWords.Length; i++)
    {
        (...)
        if(!palavras.ContainsKey(reservedWords[i])
          palavras.Add(reservedWords[i], word);
    }
    
        
    24.03.2014 / 05:43