Importance of indexes
Imagine searching for a word in a dictionary that is not ordered. Worse than that, imagine researching in a dictionary where words are randomly arranged.
In the worst case of this scenario, it would be necessary to go through every dictionary to find a particular word. The worst case is one in which the searched word is the last of the last page of the dictionary. Obviously, nothing that a brute-force algorithm does not solve. But now imagine millions / billions of records in a table or file. It would be extremely costly to find, do you agree?
What are indexes
Indexes are mechanisms that allow data queries to happen faster compared to the same non-indexed queries. They can be used in database, isolated files, RAM.
This happens because the index is stored in a data structure that has as premise to keep the fields that make up the index ordered, thus allowing the application of search algorithms built to operate when the data are properly ordered. An example of such algorithms is binary search. Binary search is a search algorithm that operates in RAM and which requires that the search key be ordered for its operation. It is much more efficient than a brute force search algorithm, because it reduces the search space, consequently reducing search time. While a raw-force algorithm has complexity O (n), the binary search algorithm has complexity O (log (n)), the log in this case is in base 2.
Example:
If there are 10000 keys to query, a worst-case algorithm, in the worst case, would make 10000 comparisons to find what you want. But the binary search algorithm would, in the worst case, make 14 comparisons (actually 13.29, however, rounded to 14).
See how binary search is much more efficient than the brute-force search algorithm.
Note, however, that this is only possible because the search key is sorted. If it were not, it would not be possible to use the binary search algorithm.
When to use
Indices are very useful and you can really reduce the time of many queries by using this functionality in virtually all database engines. However, there is no free lunch in computing and you need to be aware of two issues:
There is a computational cost to maintain the ordered index. Returning to the dictionary example, imagine that a new word is incorporated into Portuguese and added to the dictionary. There is a cost to know where to enter this word and another cost to move (reorganize) the other words for the insertion of the new one. So if a given table suffers many insertions every second, an index could hurt rather than help.
As the goal is to increase performance, that is, to reduce the amount of time spent consulting, you will need to spend more space. In other words, in the case of the database, disk space is used to store an index. This is the traditional compromise of computing: if you want faster execution of an algorithm, you should pay with memory. If you want to use less memory then you should accept a not-so-efficient algorithm.
See the image below, showing the space spent for the data, the number of lines and the space spent by the index (real example):
In the case of databases, indexes are created in one or more fields of a given table. If the index consists of more than one field, then it is called a compound index. Generally, you include indexes for those fields that are always present in common system where clauses or that should be executed quickly.
For the primary keys, indexes are automatically defined, since they must always be present in an INSERT besides being present in JOINS.
The above image is from an audit table with more than 1 billion records. The index is composed using two fields of type varchar. It would be impractical to query this table if there was no such index.
Someone will be able to tell you if using the help index or harms without analyzing in detail the context of the situation. This is part of more detailed analysis of database tuning.