Why does bitmap indexing work well for low cardinality domains?

3

I have just read Chapter 29 of the Data Warehousing and OLAP overview

On page 725, the author talks about indexing techniques to support high-performance access, this technique is called bitmap .

That is how the question arose and consequently the question.

Source: Database Systems 6th edition, authors: Elmasri, Ramez Navathe, Shamkant B. Year: 2011 Chapter 29, page 725.

    
asked by anonymous 09.11.2017 / 15:40

1 answer

4

This really depends on the implementation and purpose of the index.

Generally, the goal is to reduce the space occupied in the index and thus give better performance, as well as maintaining a similar structure to an array for key access, which also increases performance. > Ideally such an index should be just two possible values, so a single bit is sufficient for each key entered in the index. This is very efficient.

If you have 4 possible values you already need 2 bits. For 256 possible values you need 8 bits.

As the B-tree index increases and approaches the key size, the efficiency gain becomes smaller. At some point the key will be equal to or greater than if you used a B-Tree.

But it has implementations that need 1 bit for every possible value. This is because each key has a complete 1-bit structure for each table entry, so if you have 256 possible values there will be 256 keys, each with a size equivalent to the number of rows in the table divided by 8, that is 32 bytes for each entry is clearly worse since an entry in the B-tree index will hardly be more than 25 bytes long even in a complex implementation and serving an absurd amount of lines (usually much smaller than this).

There are ways to compress this, but the gain will depend on the distribution of the values. But you lose positional readability as if it were an array .

Depending on the implementation and the distribution it can still compensate even at very high cardinality, for example having all the unique values.

Another gain is when you need to confront two or more indexes mapped per bit. Simply applying simple Boolean algebra across the index and getting the expected result.

As the query is always done according to the line position in the table it is difficult to use for other classifications and may not have direct gain. There are optimizations for this by combining with B-Tree that minimize this problem.

The greatest gain of this type of index is when it saves the result of a Boolean expression of a condition based on the columns. This is rarely needed for application use. But JOIN and other database operations can benefit from an index created internally to control what it should take in that query based on results from two or more selections made earlier. Having such an index can optimize the query because it does not have to generate the index in time. On the other hand, writing is worsening to having to update another index.

A good database will know when to use a bitmap index even though it already exists and when you prefer another path.

    
09.11.2017 / 16:51