Using hashcode as id is a good practice?

10

I have a short list of strings (where strings will never be repeated) and would like to use hashcode as id is it a good practice?

    
asked by anonymous 11.02.2016 / 22:45

2 answers

9

It depends.

Unique value

Not good practice if the idea is to create a unique ID .

Hashes are a simplification of content and any simplification loses information because there is no way you can represent the whole with the part.

This means that it is difficult, but not impossible, for two strings or different objects to have the same hash value. When this occurs, we say there has been a collision . A serious implementation should address this type of situation, otherwise it will compromise the integrity of the data or, as in the case of the use of secure hashes, the privacy of its users.

The most you could do to decrease the size of a String without losing information is to compress it, but that would not be very practical.

If you need to create keys from strings you basically have two options:

  • Use the Strings themselves. Example:
  • mapa.put("string1", valor1);
    mapa.put("string2", valor2);
    
  • Assign a number to the Strings as if it were an id in a table:
  • mapeamento.put("String1", 1);
    mapeamento.put("string2", 2);
    ...
    mapa.put(1, valor1);
    mapa.put(2, valor2);
    

    Classification

    On the other hand, hashes are good for sorting content, which means you can quickly find what you are looking for, but not in a unique way.

    I'll use the example of HashMap to illustrate this.

    Imagine that you have a bucket full of colored and numbered balls and someone asks you to pick up a blue number one ball. You can spend a lot of time looking for it and in the end you might not find it. This is the equivalent of fetching items from a list, for example.

    On the other hand, if you have several buckets, each with single-colored balls, just go to the bucket with the blue polka dots and look inside. You may still have to look a bit for the correct number, but it will be much more efficient.

    That's pretty much how it works with HashMap . If you look at the implementation of HashMap.put() will see that it uses the hash to access the vector index

    11.02.2016 / 23:40
    6

    Note that there is no guarantee of non-collision when using hash functions. This means that different strings can produce the same integer (this is a collision). An example is the strings: "[email protected]" and "[email protected]". Both produce the same integer in the hashCode implementation of the String class in Java.

    So, everything depends on the context for which it is being applied. If it is to solve a specific case, I would first try the approach described here because it is very simple to implement. Checking for collision is also trivial to do. If it does not answer, implementing others is simple and, even simpler, is to use ready-made hash algorithms.

    See the C ++ implementation of an algorithm that sounds decent in generating hashes for Strings: (easily adapted for Java): link

    Related answers in SOen:

    link

    link

        
    11.02.2016 / 23:14