Function encode () and creation of hashes

3

I'm using Python 3.6 to do a program in which the person types an MD5 hash, so the program saves the hash in a variable, reads a txt file and plays the contents inside a list, where each name separated by , is an item in that list.

After that, the program goes into a loop where it encrypts to list item (txt) and compares to the hash entered. If the comparison is True , then it finds the word that is there in the hash.

Follow the code:

passmd5 = input("Digite o hash MD5: ")  #dega a hash desejada

lista = open('worldlist.txt', "r") #abre o arquivo txt
worldlist = lista.read() #ler todo conteúdo como uma string
worldlist = worldlist.split(", ") #Quebra a string por palavras separadas por ', '
descripto = hashlib.md5() #Variável que será utilizada para criptografar cada item da lista

for item in worldlist: #loop que percorre cada item da lista
    descripto.update(item.encode('utf-8')) #caso eu nao use o encode, o python retorna o seguinte erro: Unicode-objects must be encoded before hashing
    if descripto.hexdigest() == passmd5: #Verifico se o item criptografado é igual ao hash passado, se sim, descubro a palavra
        print ("-----------------------------------")
        print ("Sua Hash MD5: ", passmd5)
        print ("Hash Descriptograda: ", item)
    print (descripto.hexdigest())
    print (item)

I use the two end prints to see how the output is, since the if comparison is not working.

I noticed that when I give a print(item) the output is the worldlist item correctly, but when I use print(item.encode("utf-8")) a b is added in front of the item, thus: b'fulano' . So, I guess that's why the comparison never works, it compares fulano to b'fulano' . (Encrypted, of course!)

I would like to know if anyone can help me make it work and also give some touches to the code, because I am learning.

    
asked by anonymous 29.12.2018 / 01:15

2 answers

3

For your code, I'm assuming you're using module hashlib .

The problem is that the update method, according to the documentation , is cumulative: calling update(a) and then update(b) is equivalent to calling update(a+b) . For example:

import hashlib

md5 = hashlib.md5()
md5.update(b'a')
print(md5.hexdigest()) # hash de 'a'
md5.update(b'b')
print(md5.hexdigest()) # hash de 'ab'

First I call update with a , and then with b . Since update calls are cumulative, the end result is the hash of ab . The output of this code is:

0cc175b9c0f1b6a831c399e269772661
187ef4436122d1cc2f40dc2b92f0eba0

The first is the hash of a , and the second of ab . Calling update with a and then with b is the same as making a single call with ab :

md5 = hashlib.md5()
md5.update(b'ab')
print(md5.hexdigest())

This code also prints 187ef4436122d1cc2f40dc2b92f0eba0 .

Just to compare, to know the hash of only b :

md5 = hashlib.md5()
md5.update(b'b')
print(md5.hexdigest())

The result is 92eb5ffee6ae2fec3ad71c777531578f .

Because you have created your descripto variable out of loop, the calls of update are being accumulated, so the hash being calculated is not of each word, but of all words being concatenated.

In the first iteration of loop , update is called with the first word. In the second iteration, update is called with the second word, but since this method is cumulative, the resulting hash will be from the first word concatenated with the second one. And so on ...

The solution is to rebuild the object at each iteration:

for item in worldlist:
    descripto = hashlib.md5() # criar um novo md5
    descripto.update(item.encode('utf-8'))
    ...

You can see the difference in this example:

words = ['teste', 'teste', 'teste']
# criar md5 fora do loop
md5 = hashlib.md5()
for item in words:
    md5.update(item.encode('utf-8'))
    print(md5.hexdigest())

I've created a list that contains 3 times the same word. So the result should be the same hash printed 3 times, right? Wrong:

698dc19d489c4e4db73e28a713eab07b
f6fd1939bdf31481d27ac4344a2aab58
1ceae7af21732ab80f454144a414f2fa

The first hash is teste . The second hash is testeteste , since update calls are cumulative. And the third hash is testetesteteste .

Creating a new md5 at each iteration of the loop yields the correct result:

for item in words:
    md5 = hashlib.md5() # criar md5 a cada iteração
    md5.update(item.encode('utf-8'))
    print(md5.hexdigest())

As I'm creating a new md5 at every iteration of for , update calls do not stack, and the result is the hash of teste printed 3 times:

698dc19d489c4e4db73e28a713eab07b
698dc19d489c4e4db73e28a713eab07b
698dc19d489c4e4db73e28a713eab07b

On the b'etc' syntax, the @Sidon answer already explains what it is.

Also note that hash is the same as encryption , and MD5 is already considered an "obsolete" algorithm.

    
29.12.2018 / 17:31
3

Give some "touches on the code" is vague, I'll try to elucidate specifically your problem:

The "b" in front of the object indicates that this object is of bytes type, see the example below:

my_obj = b"abc123"
print(my_obj)
b'abc123'

print(type(my_obj))
<class 'bytes'>

To "convert" to string, you need to decode it, like this:

my_str = my_obj.decode("utf-8")
print(my_str)
abc123
print(type(my_str))
<class 'str'>

As I'm not involved with all its context (you would have to indicate the imports, in your example) I believe you would have to decode after the decryption.

Perhaps the following examples will "clear up" even more:

item = "item1234"
print(item)
item1234

item = "item1234".encode("utf-8")
print(item)
b'item1234'

item = "item1234".encode("utf-8").decode("utf-8")
print(item)
item1234

Tip: Try to always ask the question with a code that can be copied and reproduced in the simplest possible way for those who will try to help, for example, try to copy the code that you have posted and reproduce in a python terminal.

    
29.12.2018 / 16:19