=, BINARY, LIKE, LIKE BINARY, REGEXP, SQL binary collation

14

Reformulation of the question from @GabrielHenrique's answer and search:

What is binary collation in a string column in the table?

BINARY serves to make a case-sensitive search and LIKE case-insensitive, then LIKE BINARY would be a case-sensitive comparison using % and _ to complete the text with any value ?

Is there a difference and what is more performative, BINARY or just = ? Is this significant difference in large tables? And in small tables?

Which is more performative, REGEXP or LIKE ? Is this significant difference in large tables? And in small tables?

Which is more performative, REGEXP or LIKE BINARY ? Is this significant difference in large tables? And in small tables?

    
asked by anonymous 03.03.2018 / 14:48

2 answers

5

According to the MySQL documentation ( string comparation ), the comparisons are not case-sensitive, unless one of the be case-sensitive , or the collation field is.

So, using = or LIKE can also be a case-sensitive search depending on collation , for example:

 LIKE _latin1 'ABC%' COLLATE latin1_bin
Using LIKE BINARY forces a binary collation independent of collation , ensuring case-sensitive comparison, although it may degrade performance depending on the size of the table.

>

On the use of REGEX vs LIKE , some considerations should be made:
REGEX does not use an index, even if present, which will impact performance; an index, if any, as long as you do not start the comparison by LIKE or % ;

Here's an excellent answer in the English SO about it: link

EDIT : By suggestion I'm copying an interesting part of the above link response:

SELECT * FROM t WHERE a = 'abc'      <<-- (case insensitive ) pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'abc'   <<-- (case insensitive como "=") pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'abc%'  <<-- pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'a%'    <<-- pode usar o índice se existir, dependendo da cardinalidade (1)
SELECT * FROM t WHERE a LIKE '%a%'   <<-- não vai usar um índice
SELECT * FROM t WHERE a LIKE '_agf'  <<-- não vai usar um índice

(1) According to the author of the response, depending on the cardinality MySQL may or may not use an index: If more than +/- 20% of the rows match the criteria, MySQL will not use an index, because, in that case, doing a full search of the table is faster

>

Therefore, from the performance point of view, it is interesting to consider the presence of an index in the column, collation and the search that will be done (started or not by '_' or % ) to choose the best way to compare.

Depending on the frequent comparison in the field, starting with _ and data volume, another option to consider is the use of a fulltext index, which is suitable for optimizing this type of research. More information here: MySQL Fulltext Index

EDIT : Clearing a question about % : In the example above, from the MySQL documentation, it is specifying the collation you want to use in the comparison, regardless of the original COLLATION . As an example, a table with a field that was created so collation , that is the column name is with the German collation, but you want to compare using Spanish, you could do this: texto VARCHAR(50) CHARACTER SET latin1 COLLATE latin1_german1_ci

I've put an example here: sqlfiddle

    
03.04.2018 / 14:42
9

In MySQL, BINARY has the function of forcing an exact comparison, that is, a case-sensitive comparison, byte to byte.

I think the use of this grouping is for search optimization when searching for a specific value.

    
03.03.2018 / 19:42