How to determine the ordering of accented characters in MySQL?

7

My table uses "utf8-general-ci" so all accents are OK. But when I have for example Aa, Ac and Ab and I want to filter "alphabetically", the result is:

  • Aa
  • Ac
  • Ab

On Excel, the result is:

  • Aa
  • Ab
  • Ac

Which is more logical.

How do I get the same result using MySQL? I tried to change the collation to latin_general_ci or others, but nothing changes ...

    
asked by anonymous 19.02.2015 / 19:45

3 answers

4

Well, I'm going to ask a question because from Guilherme Nascimento's answer I found the answer. The problem is not a COLLATE problem but a problem between Mysql and PHP. We know that PHP does not use utf8. It was intended for PHP6 but will 'theoretically' happen only in PHP7. Let's then see how to make a site in utf8 to understand the difficulty. At first, I'm going to create my BDD and then the tables. All in utf8. When I'm going to test, using Guilherme Nascimento's test, it's all right. Cool!

Then I'll create my PHP code and create the HTML page where I'll put it:

   <meta content="text/html; charset=UTF-8" http-equiv="content-type" />

Then I will create a php document, where I will put for example:

   define ("TITULO","Direção");

I will save in UTF8 and send using FTP software, which will not change the code.

I will create a page with a form and an INPUT field using the title defined in my define. Nice! I will see "Direction" as title, I will type for example "national direction" in the field. In the submit, the content will be sent in the second PHP page and I'll save it using a SELECT. In reading, I'll simply do a SELECT * FROM TAB. I'll do an echo by putting title and then the containing of my table and I will have:

    Direção: direção nacional

Right now, I'm going to make sure everything is right. But it is not. Really, inside the table, I do not have "national direction". I have "national direction". But as in reading, it has automatic data conversion, which gives the illusion that the data is correct. The problem is that when I'm going to need to do an ORDER BY, MySQL will do it using "national" and the result will be wrong.

In his example Guilherme Nascimento uses a fiddle then, a closed system. Which explains why it works perfectly.

Solution

The solution is simple. Immediately after mysqli_connect, you need to put mysqli_set_charset.

   $handle = mysqli_connect($sql_host,$sql_user,$sql_password,$sql_database);
   mysqli_set_charset($handle,'utf8');

From this, when you type "direction", the table will have "direction" and ORDER BY will be a bandstand.

But for old data ???

Unfortunately, exporting to re-importing will change anything. Because we are going to export "national direction" and reimport "national direction". Actually, you need to read the data without doing the mysqli_set_charset, then do the mysqli_set_charset and do an INSERT.

Then:

    1 - Conectar usando mysqli_connect (sem fazer o mysqli_set_charset)
    2 - Ler os dados da tabela e salvar para preparar o query para INSERT elas
    3 - Fazer um TRUNCATE na tabela
    4 - Fazer o mysqli_set_charset($handle,'utf8');
    5 - Fazer o INSERT dos dados

So, I'm going to read it in the old way and then I'll insert it in the new way.

Now, everything is right !!

    
20.02.2015 / 13:39
3

Apparently the problem occurs specifically with latin1_general_ci (as per this link ):

  • latin1_general_ci : There is no distinction between uppercase and lowercase letters. Searching for teste , records such as Teste or TESTE will be returned.

  • latin1_swedish_ci : Does not distinguish upper and lower case letters or accented characters with cedilla, that is, the record containing the word Intuição will be returned when there is a search for the word intúicao .

And it is because of the distinction that this different ordering occurs.

An example:

CREATE TABLE IF NOT EXISTS 'products' (
  'id' int(11) NOT NULL AUTO_INCREMENT,
  'name' varchar(45) DEFAULT NULL,
  PRIMARY KEY ('id')
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 COLLATE latin1_general_ci;

INSERT INTO 'products' ('name') VALUES ('Aa'), ('Ac'), ('Áb');

If I use the following query:

SELECT * FROM products ORDER by name ASC;

Will return this:

+----+------+
| ID | NAME |
+----+------+
| 1  | Aa   |
| 2  | Ac   |
| 3  | Áb   |
+----+------+

Online: link

If I use with utf8_general_ci (or utf8_unicode_ci ):

CREATE TABLE IF NOT EXISTS 'products' (
  'id' int(11) NOT NULL AUTO_INCREMENT,
  'name' varchar(45) DEFAULT NULL,
  PRIMARY KEY ('id')
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 COLLATE latin1_swedish_ci;

Will return this:

+----+------+
| ID | NAME |
+----+------+
| 1  | Aa   |
| 3  | Áb   |
| 2  | Ac   |
+----+------+
  

Note: With utf8_unicode_ci , utf8_general_ci and latin1_swedish_ci I had the expected results, only latin1_general_ci

Another problem that can occur is with changes of COLLATE at the time of export and import (in a possible backup restore) the data was converted to another format (this happens a lot because of software like PhpMyAdmin ).

How to make backups without problems with coding

There are several methods to back up, but in my view the most practical (if you are an online server mainly) is to use SSH with mysqldump

At the terminal (on the server via SSH) you should use the following command to export a table:

mysqldump banco_de_dados tabela > <caminho completo>/table_name.sql

Note: If you want to download via FTP instead of table_name.sql enter a path accessible by FTP to be able to download later (it is very useful for a backup routine)

Import a table:

mysql -u <usuario> -p banco_de_dados
mysql> tabela <caminho completo>/table_name.sql

Possible Solutions

No I recommend trying to set COLLATE in the query, as this would be more of a "gambiarra" and probably if you forget to add to some query there may be differences in results.

I recommend that I re-create the tables using a% sup_port to support characters with accents equal to those without accents, you can use COLLATE or latin1_swedish_ci

Read this, maybe use utf8:

  

Note: As the answer to the question [ Which collate "UTF-8 is the most appropriate for web (multi-language) ] utf8* considers "equal" accents the letters without accents, but not all characters are considered equal, for example utf8_general_ci will only have the same result as ß if you use ss (read in the same answer about advantages and disadvantages), what is a "problem" similar to utf8_unicode_ci

    
19.02.2015 / 22:17
0

You can override the default collation by another through the COLLATE clause. For example:

SELECT * FROM tabela ORDER BY campo COLLATE utf8_general_ci;

Also, note that the field is also the same as% of the table.

    
19.02.2015 / 20:41