Why do DBMSs use their own paging if the operating system already has one?

8

I'm studying database and I ended up getting to the subject of replacing pages like LRU and MRU. The operating system already does this normally, why does the DB need to do its own paging?

    
asked by anonymous 15.11.2015 / 06:55

1 answer

10

Because the goals are different. The operating system creates byte pages. The database creates purpose-specific data pages. He knows best what's inside, his pages have a specific format. So much so that it is common that not all pages are the same. Each page type has a specific data structure. He needs to have more control over how they are manipulated. Usually optimized for performance.

The OS usually uses a not-very-efficient linked list for the database access pattern.

  • Of course, unless the DB passes over the filesystem (it was common, today it is not anymore), it also uses OS paging at a lower level. A DB page can occupy several OS pages or an OS page can contain several DB pages. This depends on what is best for each situation.

    There are banks that use OS paging primarily. Those who chose to have their own paging needed an extra level of control of what memory is. It has a more appropriate replacement algorithm ( MRU, LRU, LRU-k, etc. ). The database algorithm knows what must be active and can prioritize what is most important for that case (indexes, especially primary, need to be available with priority). Remembering that OS paging continues to be used to compose DB pages.

    In general, the pages on the disk hit the memory pages in the DB case. Because they can be of different sizes than is used in the OS (to fit the access pattern), it makes sense to have a control of your own.

    There are banks that prefer to leave the page cache entirely under the OS. Others use both levels. This may even make it difficult to optimize memory consumption, since the data is in two different locations, but it can give you more security and flexibility.

  • Depending on the implementation of the DB it is not possible to get manipulated directly in the OS pages. It is not always necessary to manipulate the data that the OS can play on disk. Each implementation has its own specificity and will manipulate it to the best of its ability. The DB needs to decide on its own what should go to the disk or not.

  • With a more specialized algorithm you can get more performance and scalability, more flexibility and reliability.

  • With its own system it is possible to abstract the conditions of the various operating systems. And it can adapt when a condition no longer meets expectations and needs.

15.11.2015 / 08:46