What encoding to use when it comes to accent?

0

There is so much encoding ... I've looked a lot in Google and can not tell when to use each of these here

ISO-8859-1
UTF-8
LATIN1
cp...
I've tried UTF-8 because I've always thought that if it was an accent it would be UTF-8, but it did not work ... After a lot of searching, I tried ISO -8859-1 and guess what worked ... Why?

    
asked by anonymous 12.12.2017 / 00:12

2 answers

1

In general the most used would be UTF-8 , since it supports all the alphabet and accents besides a gigantic range of special characters, as for its problem with accentuation there are several causes for the problem, it may range from missing configuration in some part of the code or wrong configuration in the text editor / IDE (most common cause).

By what I know about the subject UTF-8 is generally used for internationalization of a text (although commonly used nationally as well) while ISO-8859-1 is generally used for encoding of national texts.

The reason for the difference is in how text is encoded to ISO-8859-1 encodes text that is different from UTF-8, while UTF-8 supports all unicode characters while ISO-8859-1 only supports the first 256 (if I'm not mistaken).

This answer in Stack can explain better than I: What are the main differences between Unicode, UTF , ASCII, ANSI?

    
12.12.2017 / 01:54
0
  

ISO-8859-1 is a Latin alphabet coding, but to what I know ISO has discontinued it, but if it works for you to continue using it. it may be difficult to know with little information

Before we start, we just explain one thing about Rodrigo's comment, a lot of people think that too, it's not that ISO-8859-1 has been "discontinued", simply it does not have much more to evolve, the UTF -8 already supri and tried to focus on UTF-8 the necessary evolutions, ISO-8859-1 will not only evolve, does not mean that it is not usable, say that is discontinued sounds like something being "not recommended" or going stop "working", which can lead to false interpretations.

ISO-8859-1 can be used quietly if your App will be in Portuguese and if you do not need emojis or multi-languages.

ISO-8859-1 vs. UTF-8

ISO-8859-1 is a single-byte (8-bit) encoded graphic character sets and consists of 191 characters

UTF-8 is a coding between 1 and 4 bytes for each character, and currently supports many characters and is one of the most used in websites

In other words, they are two different encodings, for example the letter à (the with crase) has different codes in both encodings:

  • UTF-8 The à has the following code: c3 a0
  • iso-8859-1 The à has the following code: E0
  

(if the tables I searched for are correct)

Then note that in utf-8 it was c3 a0 and in the iso: E0 , that is to say that for us humans it looks the same, it is not, in UTF-8 2 different bytes were used, or "uni-code"

In short, both support accents, the point is that depending on the application, framework or the like, the project might be UTF-8 standard or use windows-1252 ("compatible" with iso-8859-1) if you are reading a text file and trying to view and in your application is because the contents of the text file is in a format compatible with ISO-8859-1 and so set in the UTF-8 project will not work.

The same goes for a database, if the charset defined in the database table is latin1 (compliant with ISO-8859-1) then you can not use utf-8 in the project (although depending on the database it is possible to adjust and until some result is obtained), then answer the title of the question:

  

What encoding to use when it comes to accent?

ISO-8859-1 as much as UTF-8 supports accents, will depend on where it is taking the data, such as a database or a text file, if all files or database tables use UTF-8 then the project will have to be in UTF-8, if they are in ISO-8859-1 (bank or text files) you will have to use ISO-8859-1 in your project

Saving a file

Assuming you are using a text editor such as notepad ++ (which is better than the native windows for this) before saving you can convert your text document (if it is your case) to ANSI (which will be compatible with iso-8859-1) or for UTF-8:

If you save the text document as UTF-8 and in your Java project try to use iso-8859-1 they will conflict and characters will get lost at the time of displaying, the same goes in reverse , if you save the document as ANSI and set the project as UTF-8 will fail, ie always use the same encoding if possible, otherwise you will have to stay using encoders and decoders within the application.

    
12.12.2017 / 01:58