Which collate UTF-8 is most appropriate for Web (multi-language)

16

I usually use utf8_general_ci by default in my projects, however recently I came across that other developers usually use utf8_unicode_ci

  • utf8_general_ci : Unicode (multi-language), case insensitive
  • utf8_unicode_ci : Unicode (multi-language), case insensitive

Which of these would be the most appropriate for Web or is there any other utf-8 more appropriate for web?

    
asked by anonymous 08.12.2014 / 19:22

1 answer

13

The main difference is how utf8_general_ci and utf8_unicode_ci make comparisons similar to some phonemes.

For example, in the German language the character "ß" would be equivalent to "ss". As utf8_unicode_ci has to do this type of comparison by combining more than one character, it is slow than utf8_general_ci .

That is, if your application does not need cross-language comparisons, go from utf8_general_ci .

But considering systems that work globally and should work with multiple languages, such as a Wordpress or Wikimedia for example, using utf8_unicode_ci is a good way out.

Another interesting chartset to mention is utf8_bin . It is based on the bitwise comparison of the characters, resulting in a case-sensitive comparison, unlike the other collations.

Conclusion

The choice of collation depends a lot on the nature of our application. In addition to uft8 , there are other charsets to meet the needs of a specific region ( latin1 for example) and as each scope varies a lot, I do not think it is possible to point the most appropriate for all cases.

In most cases, the utf8_general_ci will match, as, as the name suggests, it is for general and more common use to be found. However, it is interesting to know that there are other collations that can meet a more specific need, such as utf8_unicode_ci and utf8_bin .

Source: MySQL Documentation in

    
08.12.2014 / 19:34