What is the problem with these hash algorithms?

2

I have searched for several hash algorithms, and found some example in SOen, but they are returning different hashes to the same file:

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class ObterHash {

    private static String algoritmo = "SHA-256";

    public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
        File arq = new File("C:\img.jpg");
        System.out.println(toHex(gerarHash1(arq)));
        System.out.println(toHex(gerarHash2(arq)));
        System.out.println(toHex(gerarHash3(arq)));
    }

    // Adaptado de: https://stackoverflow.com/a/19304310/7711016
    public static byte[] gerarHash1(File arq) throws NoSuchAlgorithmException, IOException {
        DigestInputStream shaStream = new DigestInputStream(new FileInputStream(arq),
                MessageDigest.getInstance(algoritmo));
        // VERY IMPORTANT: read from final stream since it's FilterInputStream
        byte[] shaDigest = shaStream.getMessageDigest().digest();
        shaStream.close();
        return shaDigest;
    }

    // Adaptado de: https://stackoverflow.com/a/26231444/7711016
    public static byte[] gerarHash2(File arq) throws IOException, NoSuchAlgorithmException {
        byte[] b = Files.readAllBytes(arq.toPath());
        byte[] hash = MessageDigest.getInstance(algoritmo).digest(b);
        return hash;
    }

    // Adaptado de: https://stackoverflow.com/a/304275/7711016
    public static byte[] gerarHash3(File arq) throws NoSuchAlgorithmException, IOException {
        InputStream fis = new FileInputStream(arq);

        byte[] buffer = new byte[1024];
        MessageDigest complete = MessageDigest.getInstance(algoritmo);
        int numRead;

        do {
            numRead = fis.read(buffer);
            if (numRead > 0) {
                complete.update(buffer, 0, numRead);
            }
        } while (numRead != -1);

        fis.close();
        return complete.digest();
    }

    private static String toHex(byte[] bytes) {
        StringBuilder ret = new StringBuilder();
        for (int i = 0; i < bytes.length; ++i) {
            ret.append(String.format("%02X", (bytes[i] & 0xFF)));
        }
        return ret.toString();
    }
}

When run, I had this output (a hashcode on each line):

  

E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855   010F60D2927A35D0235490136EF9F4953B7EE453073794BCAF153D20A64544EA   010F60D2927A35D0235490136EF9F4953B7EE453073794BCAF153D20A64544EA

See that the hash generated by gerarHash2() and gerarHash3() are equal to each other but different from gerarHash1() . Because? Is the gerarHash1() algorithm wrong? If so, what is the error in it?

    
asked by anonymous 02.02.2018 / 21:56

1 answer

2

Let's look at the constructor code for DigestInputStream :

    /**
     * Creates a digest input stream, using the specified input stream
     * and message digest.
     *
     * @param stream the input stream.
     *
     * @param digest the message digest to associate with this stream.
     */
    public DigestInputStream(InputStream stream, MessageDigest digest) {
        super(stream);
        setMessageDigest(digest);
    }

It calls the superclass constructor:

    /**
     * Creates a <code>FilterInputStream</code>
     * by assigning the  argument <code>in</code>
     * to the field <code>this.in</code> so as
     * to remember it for later use.
     *
     * @param   in   the underlying input stream, or <code>null</code> if
     *          this instance is to be created without an underlying stream.
     */
    protected FilterInputStream(InputStream in) {
        this.in = in;
    }

And also calls the setter:

    /**
     * Associates the specified message digest with this stream.
     *
     * @param digest the message digest to be associated with this stream.
     * @see #getMessageDigest()
     */
    public void setMessageDigest(MessageDigest digest) {
        this.digest = digest;
    }

Then you call the getMessageDigest() method:

    /**
     * Returns the message digest associated with this stream.
     *
     * @return the message digest associated with this stream.
     * @see #setMessageDigest(java.security.MessageDigest)
     */
    public MessageDigest getMessageDigest() {
        return digest;
    }

Notice that nowhere, DigestInputStream is being read or is reading the bytes of the FileInputStream file passed to it. So when you call the digest() method, MessageDigest does not know the contents of the file and is empty. Therefore, the hash generated is the same hash:

System.out.println(toHex(MessageDigest.getInstance(algoritmo).digest()));

What went wrong? Note this comment:

        // VERY IMPORTANT: read from final stream since it's FilterInputStream

He's just saying that you have to read the content of the last stream produced, which in the SOen answer was done the whole sense since there he is enveloping the streams in each other consecutively, which is not your case. So, just replace this comment with this:

shaStream.readAllBytes();

By adding this, the generated hash is the same as the other two methods. This way, your gerarHash1 method can be rewritten like this:

    // Adaptado de: https://stackoverflow.com/a/19304310/7711016
    public static byte[] gerarHash1(File arq) throws NoSuchAlgorithmException, IOException {
        try (DigestInputStream shaStream = new DigestInputStream(new FileInputStream(arq),
                MessageDigest.getInstance(algoritmo))) {
            shaStream.readAllBytes();
            return shaStream.getMessageDigest().digest();
        }
    }

Please note that this is how I'm using try-with-resources . In the gerarHash3 method, I recommend that you also use try-with-resources . I also recommend putting the final modifier in the algoritmo field and renaming it to ALGORITMO to conform to the conventions of the language .

Note: The readAllBytes() method is only available from Java 9. In earlier versions, you will need to use something else in place (a% loop probably) to simulate your behavior.

    
02.02.2018 / 22:37