Why does String hashCode () in Java use 31 as a multiplier?

5

In Java, the hash code for an object String is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using integer arithmetic, where s[i] is the i -th character of the string, n is the length of the string, and ^ indicates exponentiation.

Why is 31 used as a multiplier?

I understand that the multiplier should be a relatively large prime. So why not 29, or 37, or even 97?

p>     
asked by anonymous 02.01.2019 / 17:54

1 answer

4

Generally, the hash code is used as a key for spreadsheets, so-called dictionaries. It is common for the maximum value of possible codes to be stored in 32 bits, so it makes sense to use the maximum multiple of 32 and the immediate lower prime is 31. Not that you need to use all the codes, but from that number you can derive the highest index appropriate according to the amount of possible buckets in that specific spreading thus giving a good distribution.

According to comments, we now consider there are better numbers (larger), but as far as I know the reason for choosing That was the beginning. A smaller number could generate code collisions much more easily. A bigger one really is better, but the gain difference is not so great, since a smaller one gets a lot worse.

On some platforms a shift operation of certain numbers is cheap and others are not, in some cases there is optimization for some numbers, as is the case of 31 that can be used (it is a < in> shift and a simple subtraction).

It is not a well-thought-out number, it has not been done a thorough evaluation, something that has a sensational justification:)

A comparison was made on SOen . It seems that certain numbers give the same, but note that other observations need to be made, analysis can not be taken in isolation. There it does not show other problems of each number.

    
02.01.2019 / 17:59