Compare Hash of two files in Python

1

I need to compare the hash of a.txt and b.txt files using a native python3 library.

I tried to do this:

import hashlib

file_0 = 'a.txt'
file_1 = 'b.txt'

hash_0 = hashlib.new('ripemd160')
hash_0.update(file_0)
hash_1 = hashlib.new('ripemd160')
hash_1.update(file_1)

assert hash_0 != hash_1, f'O arquivo: {file_0} é diferente do arquivo: {file_1} '

But the following error occurred:

TypeError: Unicode-objects must be encoded before hashing

Note: Before editing I made the error of two variables file_0. In the project was correct.

    
asked by anonymous 04.01.2019 / 14:45

2 answers

7

First - you're comparing file names - calling update in Hashlib classes does not open files alone - it expects bytes objects (and that's the reason for the error message, you're passing text) - but even if you put the b' prefix in these name strings, or use .encode() in them, you will continue to have the file name only hash. (Another error: you used twice the same variable name - if you were opening the files, would be comparing the "b.txt" file with itself)

To see the hash of the contents of the files do:

import hashlib

file_0 = 'a.txt'
file_1 = 'b.txt'

hash_0 = hashlib.new('ripemd160')
hash_0.update(open(file_0, 'wb').read())

hash_1 = hashlib.new('ripemd160')
hash_0.update(open(file_1, 'wb').read())

if hash_0.digest() != hash_1.digest():
    print(f'O arquivo: {file_0} é diferente do arquivo: {file_1} ')

(You were also using assert wrongly. Avoid using assert in same code - and reserve this command for testing. Although it looks like a shortening of if followed by raise in some places, it is a test that is disabled depending on the parameters with which the Python runtime runs - so there is a lot developer setting up assert in production code that can be more difficult, with a test that is not done, because of a seemingly innocuous configuration changed elsewhere)

    
04.01.2019 / 15:10
5

To compare using hashlib can be done like this:

import hashlib

def open_txt(file):
    with open(file) as f:
        return "".join(f.read())

file_1 = 'a.txt'
file_2 = 'b.txt'
text_1 = open_txt(file_1)
text_2 = open_txt(file_2)

def compare(text_1, text_2):
    hash_1 = hashlib.md5()
    hash_1.update(text_1.encode('utf-8'))
    hash_1 = hash_1.hexdigest()

    hash_2 = hashlib.md5()
    hash_2.update(text_2.encode('utf-8'))
    hash_2 = hash_2.hexdigest()

    assert hash_1 == hash_2, f'O arquivo: {file_1} é diferente do arquivo: {file_2} '


compare(text_1, text_2)

But I find it more practical and quick to use filecmp that compares byte to byte.

import filecmp
filecmp.cmp('a.txt', 'b.txt')
    
04.01.2019 / 15:45