External and internal memory ordering algorithms

10

I was researching the difference between external and internal memory sorting algorithms and found the following answer in

  

"In cases where we have to sort more data than can fit into the   main memory, we need an external classification algorithm.   Instead of doing all sorting in memory, we sort   pieces of the data in memory each time, dump the results   to a file, and so on until we have a file   fully ordered. "

Until then I understood the concept, I'm having difficulty understanding which sort algorithm is used in each type, there are several types but how do I know if this sorting algorithm is about primary or secondary memory?

    
asked by anonymous 18.11.2016 / 12:34

2 answers

9

In fact it can be said that they are the same both for external and internal use. It is likely that some algorithm is somewhat better in one case than other algorithms. I will not analyze it one by one to come to that conclusion.

Choosing the most appropriate algorithm depends on a number of factors. It is often said that Quicksort is best suited for the general case. Which is not to say that he is the best ever. It was created for use in RAM. But nothing prevents it from being used in secondary storage.

When going to secondary storage it is extremely important to access the disk as little as possible. So in theory, you would need to find an algorithm that minimizes this, even if it has other higher costs.

But this is naive. In practice any ordering algorithm that needs a lot of performance will combine different techniques to achieve the goal. It's obvious that one of these techniques is to put as much data as possible into memory and sort out what's going on there and organize all the portions. If you can put everything in the secondary memory into the primary, the best algorithm for that case will work well in memory. If you can not put everything, you need to manage the load as smartly as possible. This has little to do with the sort algorithm itself.

Note that the amount of memory that computers have today and the advent of SSD has greatly improved the condition of a not-so-optimized algorithm to achieve good results. If you have multiple processors the choice may tend more towards a specific algorithm. In fact the right hardware will make a lot of difference, probably more than the most suitable algorithm, as long as you do not use a very bad one anyway.

You can do this transparently. You can use a memory mapped file and make in the memory everything that is expected do on file. The operating system manages what must be in memory and what it needs is on disk. In this way you can focus on the algorithm and the memory management is by the operating system. Reading and recording is something that you have to do even. Of course, it will be very efficient if everything is in memory or the data allows for a few page exchanges. Of course, this ease of having a cost, with the right algorithm can better control how you want to do the paging, or can do much worse than the operating system would . MMF is more used in computing than you think. All your executables are loaded into memory as well. All virtual memory works like this. In fact any memory allocation in one way or another ends up calling an MMF.

If the hardware is really restricted I remember that MergeSort is usually good in cases like this. I've been browsing and it looks like BucketSort is even better, and there are some variants that can help each case better. You have a study that seems to confirm this .

    
18.11.2016 / 13:02
8

The points exposed by @bigown are true. An image might help you visualize how such a process would work.

Any sort algorithm can be used in both primary and secondary memory if you segment your data. In the following example we have a computer that can load and manipulate 6 main memory addresses, and a secondary memory structure that supports 12 addresses.

  • MP - Main memory
  • MS - Secondary memory
  • RO - Sort Result

Inthefirstcycle,6addresses(1-6)areloadedintomainmemory,sortedandreturnedtosecondarymemory.(Intheaboveimagetheaddressesthathavebeenchangedaremarkedwithalightbluebackground.)

Inthesecondcycle,the3finaladdressesofthepreviousoperation(%with_%)plus3(%with_%)undergothesameload,sortandstoreoperation.

Thishappenssuccessivelyuntilallsecondarymemoryaddressesareloaded,sortedandwrittenback.Afterallsecondarymemoryissorted,theprocessevaluateswhetherthecycleshouldberunagain;iftherewasatleastonechange,thentheprocesswillberepeated.

Secondarymemorycanbeconsideredorderedifnosortoperationinprimarymemoryresultsinaddresschangesduringacompletecycle:

    
18.11.2016 / 15:12