In functional programming languages, it is said that it is possible to avoid stack overflows using "tail recursion". But what is the difference between normal recursion and tail recursion?
In functional programming languages, it is said that it is possible to avoid stack overflows using "tail recursion". But what is the difference between normal recursion and tail recursion?
The difference is just where the recursive call is called: If it is called in the "tail" of the function, it is a recursive tail call.
The "tail" function is your last call. It is the last computation / calculation made by the function, and soon after it, no processing is done before returning its value.
For example, considering this function to calculate the factorial of a number, in F #:
let rec fatorial n : int64 = if n <= 1L then 1L else n * fatorial (n - 1L)
This function calculates the factorial of any number you place. Unless this number is too large. Because in this function, with each recursive call, the stack increases, and a very large number can cause a stack overflow. Imagine its execution:
fatorial(5) -> 5 * fatorial(5 - 1) ->
5 * fatorial(4) -> 5 * 4 * fatorial(4 - 1) ->
5 * 4 * fatorial(3) -> 5 * 4 * 3 * fatorial(3 - 1) ->
5 * 4 * 3 * fatorial(2) -> 5 * 4 * 3 * 2 * fatorial(2 - 1) ->
5 * 4 * 3 * 2 * fatorial(1) -> 5 * 4 * 3 * 2 * 1 ->
120
If you observe, with each recursive call, the number of functions being called increases, because the program can only calculate the result of the last called function, to then calculate the result of those that called it. So the stack bursts.
How to avoid this? Using recursive tail calls. Functional language compilers often transform tail recursive calls into loops, because this is perfectly possible. Why not make ties directly? Because this would lose the qualities and advantages of functional programming.
I will illustrate a factorial function in F # using recursive tail calls:
let fatorial n =
let rec _fatorial n acc : int64 =
if n <= 1L then acc else _fatorial (n - 1L) (acc * r)
_fatorial n 1L
Note that in this case, the recursive function is NOT fatorial
, but _ fatorial
. I declare _ fatorial
within fatorial
so we can call it with just one argument, and the recursive function uses an accumulator.
The main difference is that in the recursive tail function, the tail call is the recursive call, not *
as in the first case. If you look at the flow of the call, it runs like this:
fatorial(5) ->
_fatorial(5, 1) -> _fatorial(5 - 1, 1 * 5) ->
_fatorial(4, 5) -> _fatorial(4 - 1, 5 * 4) ->
_fatorial(3, 20) -> _fatorial(3 - 1, 20 * 3) ->
_fatorial(2, 60) -> _fatorial(2 - 1, 60 * 2) ->
_fatorial(1, 120) -> 120
As you can see, with each step, the amount of calls neither increases nor decreases. From the moment the recursive function is called, only it is called at the end, without needing further calculations.
When a compiler ready for this sees a recursive call on the tail, it automatically turns it into a loop during optimizations. With this, you do not miss the advantages or the elegance of functional programming, but you also do not risk going through a stack overflow.
Using a reflector, I can see that the recursive function's code would look like this, imperative (in C #):
internal static long _fatorial@8(long n, long acc)
{
while (n > 1L)
{
long arg_1F_0 = n - 1L;
acc *= n;
n = arg_1F_0;
}
return acc;
}
public static long fatorial(long n)
{
return Fatorial._fatorial(n, 1L);
}
The compiler actually transforms its recursive function into a loop. On the other hand, the function that does not use recursion of cause remains intact.
A good way to tell if your function uses recursion in the tail or is not to try to simulate it in Clojure. Because Clojure does not have tail recursion natively, you should use the recur
function, which will throw an exception if it is not used in the tail.
; Causa uma exceção pois a chamada de cauda é *
(defn fatorial [n]
(if (<= n 1)
1
(* n (recur (dec n)))))
; Funciona pois a chamada de cauda é recur
(defn fatorial
([n] (fatorial n 1))
([n acc] (if (<= n 1)
acc
(recur (dec n) (* acc n)))))
Conceptually the execution of a program (process) occurs as follows: The program is loaded into memory. Each program instruction corresponds to an (memory) address. During the function call, the following procedure is performed:
This means that with each function call, the execution stack receives all the information that is in the local scope, and a number of commands to stack and unhide occur. If many calls occur, chances are a stack will pop up. By the way, the name given to these operations that allow the exchange of local variables between functions is context switch.
Whenever you opt for recursion all problems related to context switching can occur (therefore: more operations to manipulate the execution stack and eventually stack overflow). But, noting that recursive algorithms could be transformed into iterative algorithms, compiler designers have created optimizations for some function calls. One such optimization is Tail Call Optimization (TCO).
It works exactly as explained by @ andre-leria with the following comments:
The sequence of calls from A -> B - > C - > A is also recursive, and would also be subject to tail call optimization. Some compilers / interpreters can perform optimization on recursions involving more than one function: Ex: Erlang. Scala, for example, can only guarantee tail optimizations for the same function ( link ):
In principle, tail calls can always re-use the stack of the calling function. However, some run-time environments (such as the Java VM) lack the primitives to make stack frame re-use for tail calls ef fi cient. A production quality Scala implementation is therefore only required to re-use the stack of a directly tail-recursive function whose last action is a call to itself. Other tail calls could be optimized also, but one should not rely on this across implementations.
Some compilers use a structure called a trampoline that transforms a call into a tail, in an iteration with accumulators. This happens for example in Scheme that converts the source code to C that does not have support for TCO.