Answering the question of why the number of attempts to decrypt is half the number of possibilities, suppose you assemble a giant deck, where one decryption key is written on one side of each card, and the other is written "Yes" or "no" depending on whether the key is the right key or not.
If you have N cards and you know that the opponent has chosen the key fairly randomly, the winning card will have equal probability in each of the N positions (i.e. 1 / N). If you number cards from the top to bottom of the deck from 1 to N, the number of attempts you have to make to find the right key is the number of that card: if the key is the 17th card in the pack, you will make 17 attempts until you find the right key.
But we do not know what the right letter is; we want to know what the average number of attempts we have to make to find the right letter. With probability 1 / N this number is 1 (when the right key is the first card in the pack); with probability 1 / N this number is 2 (when the right key is the second card in the deck); ...; with probability 1 / N this number is N (when the right key is the last card in the pack).
So the average number of attempts you have to make is
1/N * 1 + 1/N * 2 + 1/N * 3 + … + 1/N * (N-1) + 1/N * N =
= 1/N (1 + 2 + 3 + … + (N-1) + N) =
= 1/N * N * (N+1) / 2 =
= (N+1) / 2
For these algorithms that we are talking about, N is huge; this +1 is one in billions. So it is a very good approximation to say that the average number of attempts is N / 2.
The whole discussion above assumes that the password is indeed random, but the fact is that in practice things do not work that way - people choose birthday dates, phone numbers, ... as passwords.
To help protect users in these cases, it's popular to use a key stretching function , such as bcrypt , scrypt or PBKDF2: these functions are complicated hash functions (like any hash function) and faces (unlike MD5 and SHA - *).
The idea is that you generate a random string s
and use f(s, x)
to encrypt
the file (where x
is the password that the user will decorate); you save% with% open, along with the encrypted file. The idea is that if you know eg that the user has chosen s
as an eight-digit number, if you encrypt the file naively, your opponent only needs to make, on average, 5 * 10 ^ 7 decrypts to open the file your file.
When you have a key stretching function , you place your opponent against the wall: calculating x
is very costly; he continues to open his file after 5 * 10 ^ 7 attempts, but each of these attempts is much more costly (usually this cost is adjustable, but for "civil" applications the crowd usually adjusts to f(s, x)
to take a second to be calculated on a modern computer).
The other option is to ignore the existence of f(s, x)
and try to find the encryption key directly, but this is impractical if you are using an algorithm with a 128-bit key, for example .