What is unrolling?

8

In this question I asked about the optimization and performance that the compiler performs.

Among the highlighted items, users commented that the compiler does an unrolling loop optimization or unrolling .

  • What is this unrolling ?
  • How does it work in practice?
  • I need to do something specific in my code so that the compiler can you use this optimization?
asked by anonymous 23.02.2017 / 14:33

1 answer

8

This is an optimization that attempts to speed up the code by eliminating or reducing loop repetitions.

It is common for the optimizer to try to keep the generated final code about the same size, but this is not always possible, so in many cases the code gets a little bigger.

The ideal would be to completely eliminate the loop, do

for (int i = 0; i < 4; i++) {
    soma += dados[i];
}

in

soma = dados[0] + dados[1] + dados[2] + dados[3];

But when you do not know the size the best you can do is:

for (int i = 0; i < n; i++) {
    soma += dados[i];
}

Be transformed into:

for (int i = 0; i < n; i += 4) {
    soma0 += dados[i + 0];
    soma1 += dados[i + 1];
    soma2 += dados[i + 2];
    soma3 += dados[i + 3];
}
soma = soma0 + soma1 + soma2 + soma3;

I placed it on GitHub for future reference .

So it is possible to have some gain, not only because it reduces some loop control operations, but it can lessen the processor's cache miss call, and decrease the amount of branches (conditional variances) that are expensive.

But note that in more modern processors there are so many optimizations of their own that the gain may not occur, in fact there are cases that can get worse, because at the same time that there is a reduction of some instructions, you need others, at least in the second example.

In the first case there may still be gain because it eliminates the loop altogether. What's more, this can allow other optimizations to be done, such as linearize a function , although some compilers can even linearize without unwinding. But linearizing a function may make unrolling impractical since the code may get too large to repeat it "manually." The compiler will have to analyze which one is most interesting there.

If the size increases too much, the cache of the code may occur, so the compiler has to be very smart about the platform that is generating code. It may also end up using more registers forcing some maneuvers that would have been unnecessary before.

In some rare cases doing this can make it easier to parallelize operations, since you do not have only one, you have four.

A JITter can take advantage here because it has information that the normal compiler does not have, it knows the value of n . This can help decide whether or not it will roll out.

Four or five are actually the values most adopted as advantageous unrolling, but that's implementation detail.

Do not try to do the optimization manually, it will lose readability and may end up with worse performance.

At Wikipedia has more complete information . Of course, more specific questions can be asked if it is beyond basic curiosity.

    
23.02.2017 / 14:57