This statement is old and exists precisely because compilers did not know how to optimize code well in the past, resulting in an executable with questionable performance (the variable is copied, and copies can be costly). Nowadays, returning something local may be even better than using output parameters. Of course, never rely on popular optimization phrases, always do the calculations and performance measures of your program for any conclusion.
We have two names for the possible optimization types in this case:
-
RVO: Return Value Optimization , and
-
NRVO: Named Return Value Optimization , which is basically a variation of RVO for cases when the value has a name (ie is a variable). >
These two optimization techniques are part of Copy Elision ( elision / omission of copy in Portuguese). In c ++ 17 , elision is part of standardization. Previously, this technique was mentioned as permissible, but did not go into many details about which cases were allowed to omit copies.
With all that said, we can now see the effects of RVO and NRVO:
When the RVO optimization technique is successfully applied, the copy (which would previously be done) of an object, which has just been created and returned by the function, is omitted, making the storage area of this object the even from the object that is receiving that return value. To get clearer, the following code:
#include <string>
std::string foo() { return "teste"; }
auto s = foo();
It is transformed into the following:
#include <string>
std::string s;
void foo() { s = "teste"; }
foo();
Notice how the optimization used to store the variable from outside s
to assign the literal string "teste"
instead of creating a new std::string
and copying that object to s
. Compiling with GCC 7.3 and with level 3 optimization, we have the following body for the std::string foo()
function:
foo[abi:cxx11]():
lea rdx, [rdi+16] # Calcula o local onde 's' está
mov rax, rdi
mov DWORD PTR [rdi+16], 1953719668 # Escreve "teste" no buffer de 's'.
mov BYTE PTR [rdi+20], 101
mov QWORD PTR [rdi+8], 5 # Escreve o tamanho da string.
mov QWORD PTR [rdi], rdx
mov BYTE PTR [rdi+21], 0 # Escreve o caractere nulo da string.
ret
Instead of creating a new object of std::string
, the function foo
only assumes that the storage location of the object already exists (that is, who called the function has already allocated space to the object) and makes use of it.
The NRVO variation does exactly the same thing, except that it is extended to variables. If we had the following code:
#include <string>
std::string foo()
{
std::string s_local = "teste";
s_local[0] = 'T';
return s_local;
}
auto s = foo();
We would have exactly the same optimized output, with the only addition of a mov BYTE PTR [rdi+16], 84
at the end, which changes the first character of the string to a capital T. That is, s_local
and s
will have the same storage location after optimization.
There are some cases where NRVO optimization can not be applied easily. If we just return the same local variable, then the NRVO application is trivial. Otherwise, if we have multiple-value returns, then we are in a difficult case for the NRVO, and optimization will probably not take place. For example:
std::string foo(bool b)
{
std::string s1 = "abc";
std::string s2 = "def";
return b ? s1 : s2;
}
Here, the compiler may even be able to apply NRVO (writing "abc"
or "def"
in the string, depending on the value of b
), but as soon as the code becomes more complex, chances of NRVO being applied with success decreases. In contrast, if we only have constant returns to the same variable, the function can be as complex as you want, that the NRVO application will be trivial regardless.
Finally, here is the output of your function (briefly changed) from some compilers (compiling with c ++ 17 in all).
#include <string>
#include <algorithm>
std::string foo()
{
std::string s = "teste";
std::transform(begin(s), end(s), begin(s),
[](char c) { return c - 32; });
return s;
}
With GCC 7.3 and optimization level 3:
foo[abi:cxx11]():
lea rdx, [rdi+16] # Calcula o começo da string que já existe fora da função
mov DWORD PTR [rdi+16], 1953719668 # Escreve "teste"
mov BYTE PTR [rdi+20], 101
mov rax, rdi
mov QWORD PTR [rdi+8], 5
mov BYTE PTR [rdi+21], 0
mov QWORD PTR [rdi], rdx
sub BYTE PTR [rdi+16], 32 # Sequência de subtração (pra passar pra maiúsculo)
sub BYTE PTR [rdi+17], 32 # que foi desenrolado de 'std::transform'
sub BYTE PTR [rdi+18], 32
sub BYTE PTR [rdi+19], 32
sub BYTE PTR [rdi+20], 32
ret
With Clang 6.0.0, optimization level 3 and also compiling with libstdc ++:
foo[abi:cxx11](): # @foo[abi:cxx11]()
lea rax, [rdi + 16]
mov qword ptr [rdi], rax
mov qword ptr [rdi + 8], 5
mov dword ptr [rdi + 16], 1414743380 # Clang conseguiu remover o 'std::transform'
mov word ptr [rdi + 20], 69 # e já passou a string na versão maiúscula
mov rax, rdi
ret
You can play and test with compiler outputs at Compiler Explorer Godbolt .