When and in which columns should you use indexes?

14

Reading an article that has nothing to do with database I came across the information that using indexes can bring great improvements to the performance of the database .

I have two doubts about this:

  • When should indexes be created and in what columns should they be created?
  • Does creating indexes have any cost to the bank?
asked by anonymous 22.07.2015 / 19:04

2 answers

9

Importance of indexes

Imagine searching for a word in a dictionary that is not ordered. Worse than that, imagine researching in a dictionary where words are randomly arranged.

In the worst case of this scenario, it would be necessary to go through every dictionary to find a particular word. The worst case is one in which the searched word is the last of the last page of the dictionary. Obviously, nothing that a brute-force algorithm does not solve. But now imagine millions / billions of records in a table or file. It would be extremely costly to find, do you agree?

What are indexes

Indexes are mechanisms that allow data queries to happen faster compared to the same non-indexed queries. They can be used in database, isolated files, RAM.

This happens because the index is stored in a data structure that has as premise to keep the fields that make up the index ordered, thus allowing the application of search algorithms built to operate when the data are properly ordered. An example of such algorithms is binary search. Binary search is a search algorithm that operates in RAM and which requires that the search key be ordered for its operation. It is much more efficient than a brute force search algorithm, because it reduces the search space, consequently reducing search time. While a raw-force algorithm has complexity O (n), the binary search algorithm has complexity O (log (n)), the log in this case is in base 2.

Example:

If there are 10000 keys to query, a worst-case algorithm, in the worst case, would make 10000 comparisons to find what you want. But the binary search algorithm would, in the worst case, make 14 comparisons (actually 13.29, however, rounded to 14).

See how binary search is much more efficient than the brute-force search algorithm.

Note, however, that this is only possible because the search key is sorted. If it were not, it would not be possible to use the binary search algorithm.

When to use

Indices are very useful and you can really reduce the time of many queries by using this functionality in virtually all database engines. However, there is no free lunch in computing and you need to be aware of two issues:

  • There is a computational cost to maintain the ordered index. Returning to the dictionary example, imagine that a new word is incorporated into Portuguese and added to the dictionary. There is a cost to know where to enter this word and another cost to move (reorganize) the other words for the insertion of the new one. So if a given table suffers many insertions every second, an index could hurt rather than help.

  • As the goal is to increase performance, that is, to reduce the amount of time spent consulting, you will need to spend more space. In other words, in the case of the database, disk space is used to store an index. This is the traditional compromise of computing: if you want faster execution of an algorithm, you should pay with memory. If you want to use less memory then you should accept a not-so-efficient algorithm.

  • See the image below, showing the space spent for the data, the number of lines and the space spent by the index (real example):

    In the case of databases, indexes are created in one or more fields of a given table. If the index consists of more than one field, then it is called a compound index. Generally, you include indexes for those fields that are always present in common system where clauses or that should be executed quickly.

    For the primary keys, indexes are automatically defined, since they must always be present in an INSERT besides being present in JOINS.

    The above image is from an audit table with more than 1 billion records. The index is composed using two fields of type varchar. It would be impractical to query this table if there was no such index.

    Someone will be able to tell you if using the help index or harms without analyzing in detail the context of the situation. This is part of more detailed analysis of database tuning.

        
    22.07.2015 / 20:19
    3

    Creating indexes has cost, yes. Each time you enter or change a column, the index needs to be rebuilt.

    The index serves to record the elements of the indexed column in another location. When you search for, say, Robert, instead of looking at the USERS table, row by row and see where the searched term appears, Oracle searches the index, which is sorted, and find the line or the lines where the term appears.

    Think of it as an index in a book. The performance impact exists: When you change a page, you need to update the index. But this impact is distributed among the changes, which are usually small. Without indexes, the impact of the search would be greater, and would occur in each search (think of having to search the book every time you want to find where a word appears).

    Some details:

    • Some scenarios are dramatically improved with indexing. Some not so much.
    • Sometimes the impact of creating indexes can have measurable consequences. It is recommended to avoid unnecessary indexes.
    • In addition to the impact on time, indexes take up space.
    • There are other optimizations in a database that are important (normalization), and deserve even more attention than the index issue.

    At the end of the day, the subject is complex. But the basic tip is to create indexes in fields that will be searched frequently.

        
    22.07.2015 / 19:30