The main difference is how utf8_general_ci
and utf8_unicode_ci
make comparisons similar to some phonemes.
For example, in the German language the character "ß" would be equivalent to "ss". As utf8_unicode_ci
has to do this type of comparison by combining more than one character, it is slow than utf8_general_ci
.
That is, if your application does not need cross-language comparisons, go from utf8_general_ci
.
But considering systems that work globally and should work with multiple languages, such as a Wordpress or Wikimedia for example, using utf8_unicode_ci
is a good way out.
Another interesting chartset to mention is utf8_bin
. It is based on the bitwise comparison of the characters, resulting in a case-sensitive comparison, unlike the other collations.
Conclusion
The choice of collation depends a lot on the nature of our application. In addition to uft8
, there are other charsets to meet the needs of a specific region ( latin1
for example) and as each scope varies a lot, I do not think it is possible to point the most appropriate for all cases.
In most cases, the utf8_general_ci
will match, as, as the name suggests, it is for general and more common use to be found. However, it is interesting to know that there are other collations that can meet a more specific need, such as utf8_unicode_ci
and utf8_bin
.
Source: MySQL Documentation in