Excellent questions. Below I'll put my two cents about them.
1. Why are loops slow in R?
Loops are slow in R because this is an intrinsic feature of interpreted languages. All code written in the R language (which is a language interpreted as python or ruby) is read and interpreted for machine language, to be executed there.
C, on the other hand, is a compiled language. All code written in the C language is compiled, made into an executable in the native language of the machine's operating system and processor, and only then will it run.
If we loop a language interpreted as R, the step of translating the code written in R into the machine language will occur for each step of the loop. Thus, several extra steps are added in the execution of the program, these steps do not exist in the compiled language. And each intermediate step of these is added to the total execution time of the program.
I understand that this answer may not answer your question directly. Let me redo it as follows:
Why do loops in R are slower than vectorized code?
Although it does not look like it, the answer to this question is in the description I made above. Many of the native R codes, such as% as% of your example, were written in C, C ++, or FORTRAN. Note the output that appears at the prompt when typing sum
:
sum
function (..., na.rm = FALSE) .Primitive("sum")
This function was not written in R. It was certainly written in C, C ++ or FORTRAN, which makes it much more optimized. After all, these are compiled languages, much more optimized to perform any operations. So the run-time difference in the sum
and com_loop
codes of your question example.
2. What alternatives are there? (packages, strategies, etc.)
Basically, there are three strategies to try to optimize code in R. However, they will not always work, as each case is a case.
Use vectorized code
For example, vetorizado
family functions have an advantage over loops. Often (though not always), using functions of this family will leave your code faster. After all, R is a language that works best with vectors. The functions of the apply
family use this feature of R optimally, and therefore end up being many times faster than apply
(or for
etc).
In addition, in my opinion , leave the code cleaner and easier to audite later.
Parallelize the code
Use the power of parallel processing of your computer. Instead of using a core to do the job, distribute it in more colors. The most famous packages for this are while
, parallel
and doMC
.
Unfortunately, I've tried it in the past and have never been able to make it work in Windows. I even suspect that it is impossible. However, they are easy to use on macOS and Linux.
Read the book R Inferno . It brings many strategies beyond these two that I quoted above. The book opened my eyes in the past, showing what I did wrong at the time of writing my codes. There are 9 more detailed strategies than these that I put here in this summary and I'm sure many of your questions will be clarified by him.