What is the most efficient way to calculate the HashCode of an object in Javascript?

Question

What is the most efficient way to calculate the HashCode of an object in Javascript?

Navigation

#1 by (5 votes)
#2 by (0 votes)

4

I'm currently using

Object.prototype.GetHashCode = function () {
    var s = this instanceof Object ? JSON.stringify(this) : this.toString();

    var hash = 0;
    if (s.length === 0) return hash;
    for (var i = 0; i < s.length; ++i) {
        hash = ((hash << 5) - hash) + s.charCodeAt(i);
    }
    return hash;
};
Number.prototype.GetHashCode = function () { return this.valueOf(); };

For numbers it's pretty fast, but for complex objects it's very inefficient mainly because of the conversion to JSON.

Is there any way to calculate hashcode in a more efficient way?

javascript hashcode

asked by anonymous 17.12.2013 / 16:09

2 answers

0

If the purpose is to compare two objects, you can try an implementation similar to isEqual () of underscore.js

link

17.12.2013 / 16:53

How to select rows from table A that are referenced in a column of table B? How can I implement the GroupBy method in TypeScript?

score 5 · Accepted Answer

The main requirements of a hashCode are:

If A == B , then hashCode(A) == hashCode(B) ;

(The reciprocal is not true: two objects can have the same hashCode and meanwhile they are different. This is called a collision , and in practice it is usually unavoidable.)

The hashes must be distributed in a homogeneous way for all objects in the domain (to reduce the chance of collision); It must be fast to compute, otherwise the performance gain in using a hash table is negated by the cost of computing the hashes.

To ensure efficiency, it is usually necessary to create a specific%% for each situation (your sample code is a good hash for strings in general - as long as not too long). Also, when using a specific equality criterion, it is important that the hashCode used is consistent with this criterion (see requirement 1 above).

In the absence of more specificity, some methods for calculating the hashCode would be (in ascending order of homogeneity but decreasing in performance):

// constante (rápido, mas inútil)
function hashCode1(obj) { return 0; }

// conta o número de propriedades
function hashCode2(obj) {
    var contagem = 0;
    for ( var p in obj )
        contagem++;
    return contagem;
}

// leva somente as chaves das propriedades em consideração
function hashCode3(obj) {
    var hash = 0;
    for ( var p in obj )
        hash += hashString(p);
    return hash;
}

// leva as chaves e os valores em consideração
function hashCode4(obj, profundidade) {
    if ( !profundidade) profundidade= 0;
    var hash = 0;
    for ( var p in obj )
        hash += hashString(p) + 
                (profundidade> 0 ? hashCode4(obj[p], profundidade-1) : 0);
    return hash;
}

And so on ... The more "background" it is in the object, the less chance of collisions occurring (ie different objects with the same hash), but the longer the hash calculation time (ie the lower the performance) . It is up to you to determine for your particular data set the best balance between precision and calculation speed to ensure that the hashed user algorithm performs best. / p>

(Note: hashCode was simplified - in practice, the recursive call would also need to check the type of the object to call the most appropriate hash method for that type)

Update: in a comment in your answer to the related question I stated that making a hash from the JSON of an object was" terribly inefficient ". I would like to clarify this statement, based on the above analysis:

In practice, it is rarely possible to use a hash with 100% accuracy (ie taking into account the complete structure of the object in its composition) and achieving acceptable performance by placing this hash . If objects are too large - whether they have many properties, nested or not, or because they contain a large amount of data - the overall performance will be poor, regardless of the method you choose.

What makes "serializing an object in JSON and making the string hash" inefficient is not serialization, but the fact that you are aiming for 100% accuracy. And if that's really necessary, stick with your implementation, because it actually (as your tests showed) is more efficient than my methods 3 and 4 above.

I will leave just one more example, if it has not been clear. How would you do to put the following strings into a hash table:

"123blábláblábláblá...10Mb depois...bláblá"
"345blábláblábláblá...10Mb depois...bláblá"
"678blábláblábláblá...10Mb depois...bláblá"

Using the full string content in the hash?

Using only the first 256 bytes and ignoring the rest?

(and, if your options were to use full content, or do not use any hash and instead use a tree-based dictionary / mapping, which one would you choose?)