MessageDigest class and hash with MD5 in java

4

I'm testing the creation of Hash using this class, and I've seen it on many sites, including SOEn , the use of excerpts similar to the following to create hashs using md5, but without much explanation of the operation (in Portuguese, at least):

    String s = "teste1234";
    MessageDigest m = MessageDigest.getInstance("MD5");
    m.update(s.getBytes("UTF-8"), 0, s.length());
    System.out.println("MD5: " + new BigInteger(1, m.digest()).toString(16));

My question is, what is happening in this code until the hash is created?

    
asked by anonymous 03.03.2016 / 15:28

1 answer

5

The MessageDigest class provides hashing .

The digest term refers to a type of "summary" of the data, that is, nothing more than a hash , generating a relatively small byte sequence regardless of the original data size.

The line:

MessageDigest m = MessageDigest.getInstance("MD5");

Retrieves an instance that will use the MD5 algorithm with the factory method getInstance . It is analogous to using other APIs as Calendar.getInstance() , for example, where different types of calendars can be returned.

The algorithms that are supported by Java on all platforms are:

  • MD5
  • SHA-1
  • SHA-256

Now that we have the algorithm set, let's go to the next line:

m.update(s.getBytes("UTF-8"), 0, s.length());

In fact this is the same as:

m.update(s.getBytes("UTF-8"));

Here, the method update defines the message that will be summarized, ie the content where the hash will be applied.

You can call this method multiple times to compose a longer message, so you can process content larger than the available memory.

Now that we have the algorithm and the content to be processed, let's go to an excerpt from the last line:

m.digest()

The method digest finalizes the processing, and in this case returns the hash MD5 of the message, returning to the initial state, ready to receive new content and generate a new hash . >

The return of digest is a sequence of bytes representing a large number. So the above implementation uses a builder. BigInteger to convert the bytes to a number. This constructor receives two parameters:

  • signum : the sign of the number, that is, whether it is positive or negative. The value 1 treats the number as positive.
  • magnitude : the number itself. When we speak of bytes, it is easy to forget that everything in computation is represented numerically in binary. What this routine does is to extract exactly the value represented by the sequence of bytes, ignoring any numeric representation format, which is why the signal has to be passed in another parameter.
  • Next, the method toString(radix) is called with the value 16 and converts the number to a hexadecimal (base 16) text.

    This could be rewritten in a way that, in my opinion, becomes more evident:

    String message = "teste1234";
    byte[] hash = MessageDigest.getInstance("MD5").digest(message.getBytes("UTF-8"));
    System.out.println("MD5: " + new BigInteger(1, hash).toString(16));
    

    IdeOne functional code

        
    04.03.2016 / 01:05