Index limit with LIMIT

2

I've never been an expert in sql but I will always come, nothing that a googlate has not solved. But I have a question that I did not find information about.

Recently I did some tests to find out why an sql was slow and I came across the following situation

select campo from tabela limit 0,150

The table had more than 10mil regitros. I used explain to find out because sometimes it is slow and sometimes not, and I found that depending on the value of LIMIT the select used the index and after a value that I can not remember now what it was, the index was not used. >

Can anyone explain me why and if there is any limit to using LIMIT

explain SELECT id, nome, email FROM tabela ORDER BY id LIMIT 0,20

explain's output:

id select_type table type possible_keys key key_len ref rows Extra
1      SIMPLE tabela index NULL PRIMARY 3 NULL 20

explain SELECT id, nome, email FROM tabela ORDER BY id LIMIT 0,820

explain's output:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tabela ALL NULL NULL NULL NULL 14132 Using filesort
    
asked by anonymous 11.12.2015 / 17:43

1 answer

4

Many times the query planner realizes that the performance of a sequential disk read out is more "cheap" for the system than looking through the index and doing random reads.

Without index, the DB can be sweeping the records (even out of order), and taking only what matters (and eventually already sorting in the output):

SELECT * FROM x BETWEEN( 3, 10 )

0 > 1 > 8 > 5 > 128 > 2 > 882 > 9 > 7 > 11 > 92 > 6 > 4 > 10 > 3 ...
        |   |                   |   |             |   |   |    |
3 4 5 6 7 8 9 10 ...

In the above case, the query "looks" one by one, compares, and accumulates what matters.

When you use a low limit, for example 3, it pays to use the index, despite query overhead and random reading

SELECT * FROM x BETWEEN( 3, 10 ) LIMIT 4

INDICE: 0, 1, 2, 3 (start), 4, 5, 6 (stop), 7, 8, 9, 11, 92 ...
BUSCA:           ->         -> <-  ->
DB:     0 > 1 > 8 > 5 > 128 > 2 > 882 > 9 > 7 > 11 > 92 > 6 > 4 > 10 > 3 ...

In the above case, as there are few, it is worthwhile to start with the index, to pick up the first one, to find where it is on the disk, to read that part of the data (remembering that if it was only the value of the indexed column and would not need the DB, ), go to the next index, find the data in the DB, read the data, go back to the index novalmente to see the next, read the DB data ... etc.

Note that in the first case, it does not have this "come and go" in the index, because the amount does not compensate to be reading a thousand separate pieces of the file that contains the data, since the sequential reading of them is usually faster. p>

In the second case, there is little information, so doing this "back and forth" compensates, because even with random reads, much less data will be read than the DB.

I'm assuming the most common databases, working on normal HDs. Has DBs that specialize in memory access or SSDs. In fact, the query planner takes these decisions based on a series of statistics, which may not give the best result in all situations, but usually in most cases the gain justifies. What's important to understand is that someone programmed it in this way, and in all probability they have done an absurd amount of testing to improve their operation.


From the MySQL manual

See, for example, these links:
link < sup> (en)
link (en)

Highlight for this line:

  

You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large to part of the table and that the table scan would be faster.

     

You are comparing indexed columns with constant values, and MySQL has calculated, based on the index tree, that they cover such a large part of the table, that scanning it directly will be faster.

And this:

  

You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it will probably do many key lookups and that the table scan would be faster.

     

You are using a low cardinality key (many lines hit the key value) through another column. In this case, MySQL assumes that by using the key, it will do many searches for it, and it is faster to scan the table.

Although they are not the exact scenario of LIMIT, the same logic applies.

    
11.12.2015 / 19:16