Query indexes in MySQL

18

Using indexes in mysql queries actually makes the result come quickly for some large queries or tables.

For example in the query below:

SELECT * FROM table WHERE status = 1;

For a table with 1 million records if I give the command below:

DELETE FROM table WHERE status = 1;

This query takes a long time. Then create the index for the status column and I was able to remove 700,000 lines quickly (0.8s thing)

In the sample select I did not put use index xyz but the database was smart in using the index I created for the column.

But what about for multiple WHERE queries? Something like

SELECT * FROM table WHERE status = 1 AND account = 12 AND data > '2014-09-09 00:00:00';

If I create an index with the status, account, and date columns will MYSQL use it in the above query automatically or will I have to explicitly put use index xpto ?

UPDATE

After implementing only one index on a table that had a complicated query. Look what resulted in CPU usage in my RDS on Amazon.

I'll be able to narrow down the instance type: D

Example index creation for the query example above:

ALTER TABLE 'table' ADD INDEX 'status_idx' ('status');
    
asked by anonymous 10.09.2014 / 15:33

1 answer

17

For this particular query you will use the index if there is a suitable index for it. This index should include the columns status , account and data . Preferably in this order. There is no reason why MySQL does not use the index automatically. Anyway to make sure create and see what happens. Not always what we deduce is correct.

So the index will sort status incrementally, so it can find fast rows with 1 . Then it can quickly find the rows that have the value 12 to account , among the rows already selected that satisfy the previous condition . And finally, it will select all dates above the value indicated in query , again between the lines already selected previously in the previous conditions. Obviously this selection will not consider other data ranges that do not contain the previous values.

Note that there must be a single selection range to fully index the query . Therefore the order can be important to determine whether or not to use the index. So I said that in this specific query the index with the indicated composition would be used. Another query may not be able to use the index or may partially use it to select a range through it and the rest of the selection has to be sweeping the database line by line between the lines already selected in the index.

Normally the key is to have indexes that meet the main queries and avoid queries that can not benefit from indexes.

The indexes work well when you can mount sub-tracks with them. In a very complex query , which makes it difficult to get these sub-strips, there is no index that solves it. Then the ideal is to try to at least greatly reduce the lines that were obtained through the index and leave little to the final selection.

To try to ease understanding, think about how this search works. When you have something sorted according to what you want to search it is easy to find through a binary search that has a logarithmic complexity O (logN), that is, it divide and conquer . Even with billions of lines, in just over 30 steps he finds what you want. When there is no order, the way is to look one by one, linear complexity O (N). If the query looks at several columns it will try to use the first method and make a very quick selection. When it is no longer possible, just between the lines you have already managed to select fast it will apply the individually determined condition line by line. To understand a little more about Big Read this answer .

When the optimizer does not use an existing index, it is unable to figure out a way to "understand" its query properly for the index. This may be the fault of the way the index is mounted or it may be because the optimizer is not so good. Almost always the optimizer does its job well.

It may actually be that the index is not used even though it has an appropriate key. There are small data volumes that direct fetch in the data has a lower cost. The optimizer can identify that even though it can select sub-stripes, subsequent analysis can be so costly that dropping the query to the index can yield a better result. But in these cases the optimizer has relinquished its use in a "conscious" way for its benefit. You can not think bad. Although there are cases he misses.

But be careful not to create too many indexes . Updates to them can slow the software down.

What the AP did in editing the question is exactly what needs to be done. Try and see what actually happens. With a simple key using only the most relevant column it achieved a huge gain. It used the status column. I do not know if it tried, but I could experiment only with the column account , only with the column data , and make the possible combinations between the three columns and see how each result is. As the gain seems to be fundamental, it's worth the little effort.

    
10.09.2014 / 16:45