What are the main differences between Unicode, UTF, ASCII, ANSI?

15

What are the main differences between "encodings" and "Unicode, UTF, ASCII, ANSI?"

Are all of them really encodings or are some just sub-categories of others?

I do not want to know all the details of each, just a brief of each and, if possible, how they differ.

    
asked by anonymous 05.10.2016 / 21:59

2 answers

14

ASCII

American Standard Code for Information Interchange . As the name already says is a standard that caters well to Americans. It ranges from 0 to 127, with the first 32 and the last being considered as control, the others representing "printable characters", that is, recognized by humans. It is very universal. It can be represented with only 7 bits, although usually a byte is used.

ANSI

Thisencodingdoesnotexist.ThetermisAmericanNationalStandardsInstitute,theequivalentofour ABNT .

Since it has established some standards for the use of characters to meet various demands, many encodings (actually code pages) end up being called generically from ANSI, even to make a counterpoint to Unicode which is another entity with another type of encoding . Usually these code pages are considered extensions to ASCII, but nothing prevents some specific encoding from being 100% compatible.

Again it was an American solution to deal with international characters since ASCII did not serve well.

Depending on the context, and even of the time, it means something different. Today the term is often used for Windows 1252 since much of the Microsoft documentation refers to your * encoding8 as ANSI. ISO 8859-1 , also known as Latin1, is also widely used.

All encodings called ANSI I know can be represented by 1 byte.

So it depends what you're talking about.

UTF

Alone does not mean much. It is Unicode Transformation Format. There are some encodings that use this acronym. UTF-8 , # and UTF-32 are the most well-known * encondings.

In Wikipedia articles there are several details. They are very complex and almost nobody knows how to use right in its fullness, including me. Most implementations are wrong and / or non-compliant, especially UTF-8.

UTF-8 is ASCII compliant (it accepts ASCII as valid characters). But not with any other character encoding system. It is the most complete and complex encoding that exists. Some are in love with it (and this is the best term I've found) and others hate it, even though they recognize its usefulness. It is complex for the human (programmer) to understand and for the computer to deal with.

The length of UTF-8 and UTF-16 is variable, the first from 1 to 4 bytes (depending on the version by up to 6 bytes) and the second is 2 or 4 bytes. UTF-32 has 4 bytes.

There is a comparison between them . I do not know how much I need. It is certainly not complete.

Unicode

It is a pattern for text representation established by a consortium . Among the norms established by him are some encodings . But in fact it refers to much more than this. It originated from Universal Coded Character Set or UCS that was much simpler and solved almost everything that needed.

A article that everyone should read , even if they do not agree with everything they have there.

Supported character sets are separated into planes. You can have a overview about them in the Wikipedia article . Being the plane 0 or BMP the most used, hastily.

All these standards are made official by ISO , which is the international body that regulates technical standards.

It is related to UTF.

    
05.10.2016 / 22:18
8

According to @randrade I linked, I did a quick translation, I removed some things about specific programming and about opinionated things, too I tried not to leave to the letter of the translation (my English is + or - I will revise).

  • " Unicode " is not a specific encoding, it refers to any encoding that uses Unicode codes to form a character.

  • UTF-16

  • UTF-16
  • UTF-8 : In this format each character ranges from 1 to 4 bytes. Since ASCII values use 1 byte each

  • UTF-32 : This format uses 4 bytes per "code point" (probably for form a character).

  • ASCII : Uses a single byte for each character, it uses only 7 bits to represent all characters (O Unicode uses 0-127), it does not include accents and several special characters.

  • ANSI: There is no fixed pattern for this encoding, there are actually several types. A very common example would be Windows-1252 .

Other type you can find information on unicode.org and possibly this link may be useful to you code charts .

  

Detail:

     

1 byte equals 8 bits

    
05.10.2016 / 22:18