I would say the correct answer is Wrong .
Two points to consider:
- Probability is not certainty, and
Lossless compression is a process closely linked to two factors: Noise and ability of the compression algorithm to encode a pattern dictionary.
Let's assume that the 4-character alphabet is composed of the following symbols:
A (binary representation 00)
B (binary representation 01)
C (binary representation 10)
D (binary representation 11)
Simulation 1: A file containing 1024 characters [A]
The probability of occurrence of a file containing only one distinct character repeated N times is low (1/2 ^ 1024 in this simulation), but not impossible.
Its total size is 2048 bits (2Kb).
Assume a compression algorithm that uses the following rule as an escape code:
If the AD pair is found, check the next 2 characters.
- If the content is AD: The final content is only an AD pair.
- If the content is different from AD: Repeat the first character after the escape code by the decimal value expressed in the following 12 bits.
The compressive representation of the simulation 1 file would be as follows:
A D A B A A A A A
00 11 00 01 00 00 00 00 00
| | | |_______________|_ Valor hexa &H400, ou 1024 decimal
| | |____________________ Caracter a ser repetido (**A**)
|___|______________________ Código de escape
Result of parsing of this statement: The 1024-character A generated lossless representation of the original file.
That is, you can represent the file with only 18 bits, or 0.8% of the original size.
But let's assume that the statement, instead of:
[...] Consider, also, that the probability of occurrence of each
element is equal to 1/2, 1/4, 1/8 and 1/8 [...]
state this sentence as follows:
[...] Consider, also, that the occurrence of each
element is equal to 1/2, 1/4, 1/8 and 1/8 [...]
That is no longer a probability, it is a fact that all the characters of the alphabet appear in the file. Let's go to the next simulation:
Simulation 2: A content file, in order: 512 [A] 256 [B] 128 [C] 128 [D]
Let's use the same algorithm as in the previous example:
A D A A B A A A A
00 11 00 00 01 00 00 00 00 = 512 caracteres A
A D B A A C A A A
00 11 01 00 00 10 00 00 00 = 256 caracteres B
A D C A A B A A A
00 11 10 00 00 01 00 00 00 = 128 caracteres C
A D D A A B A A A
00 11 01 00 00 01 00 00 00 = 128 caracteres D
Final representation:
ADAABAAAAADBAACAAAADCAABAAAADDAABAAA
Total: 36 bits . Compression ratio: 1.75% .
Considerations
In both cases I used an algorithm that uses a standard dictionary of length N = 1 (repetition of one character), and files with the least possible noise (pattern variations). For example, a file containing the 256-fold ABCD sequence would generate output times larger than the original file.
It is a terrible algorithm for any real application, but it serves as proof of concept that, given the characteristics of the problem and in ideal situations it is possible to compress the file at rates below 80%.
-x -
Attention to the end of the statement:
[...] you will not be able to compress this file without loss with a
of 80% compression. [...]
I'm assuming the statement actually wants to say 'with a compression ratio of less than 80%'. If the rate needs to be precisely 80% the algorithm can be modified to generate "padding" at the end of the content, for example.