This has little to do with C or C ++ or another programming language.
Internally, the computer only knows numbers. To represent letters, use coding . Each person can make their own custom encoding.
There are several common encodings.
Many of these encodings use only the numbers of 0
to 255
(or -128
a 127
), meaning they are 8-bit encodings.
In ASCII encoding (of only 7 bits) there is no representation for eg ã
.
When the use of computers was extended it was necessary to extend the encodings used to represent more than 128 characters.
One of the new encodings created is called ISO-8859-1. In this encoding the ã
has the code 227. For example, in ISO-8859-8 encoding, the same code 227 represents the ד
character (Dalet).
So far so good. All coded numbers fit into 8 bits.
Obviously there is the problem of always having to know what coding was originally used to convert the numbers into characters. This problem often happened at the beginning of the internet when people from different countries exchanged emails, each using a different encoding.
In order to solve this problem of different encodings, a scheme of encoding more than 256 characters in a single encoding that is suitable for all countries, Unicode, was invented.
But the unicode codes are too large to fit into 8 bits. In addition, it is not possible to use the Unicode interface in the same way as the Unicode interface, but it can not be used in any way. ..., little-endian, big-endian, ..., ...)