Why is multiplication faster than division?

31

Bit brushing question, but I was reading an article on javascript that says split is slower than doing multiplication.

And for example, I would recommend changing the code below:

var resultado = 4/2;

by that would be faster.

var resultado = 4 * .5;

Does this happen in any language? And why multiplying is more performative than dividing?

Here's the link to the article I mentioned in the Math section: link

    
asked by anonymous 28.03.2014 / 13:38

5 answers

12

As far as I know, this applies to all languages, after all the operation happens in the processor. And if the processor does not have the splitting function, it needs to be emulated using other operations. Here's an explanation:

At multiplication, for each position, you are multiplying only by 0 or 1, which is really easy to do: you either get all the zeros, or you get the original input back. As a result, each partial bit of the product can be calculated with a AND port. Once you have all the partial products, just add them together. There are some optimizations you can do to speed up logic, but either way you can see that this process can be done very quickly.

The division, on the other hand, is as ugly in binary as it is in decimal. Simpler algorithms are done the same way they were in school: The old algorithm to "throb and correct, if necessary" , meanwhile, faster algorithms tend to use occupy chip area, so spend more) .

Wikipedia about division algorithms: Note that even fast algorithms are still interactive - while multiplication can be done all at once, division requires repeated interactions.

Font

    
28.03.2014 / 13:44
13

Tests performed

According to the results of the jsperf submitted by Renan, other tests that I created differentiating integer and floating point and tests recently created by Marcelo the claim "multiplication is faster than division in Javascript" is not always true.

Results obtained

For example, I just ran the new Marcelo test in IE and Chrome. See the resulting chart as I write:

TheitemreferringtoIE10.0wastheoneIjustran.Thebrowserseemstoabstractthecalculationssothattheresultisuniformacrossalltests.Itbecomesstrange.

Chromeresultshavealreadyshownthatintegermultiplicationhasbeenexceptionallyfaster.However,floatingpointmultiplicationwasworsethansimpledivision.

Theotherbrowsersdidnotshowsignificantdifferences.

WiththeexceptionofSafariversion4,inmosttests,thepremiseofthearticlequotedinthequestionwasproventobefalse,thatis,the4*.5operationdidnotrunfasterthan4/2.

Analysis

Asforthedifferencesbetweenbrowsers,thiscertainlydependsonhowthedivisionisimplementedineachengine.Wewouldhavetoanalyzethetypeusedintheoriginallanguageandtheconversionsperformedinternallyinthatlanguage,tothenanalyzetheCPUconsumption.

Inlower-levellanguages,divisioncanbeamoreexpensiveoperation,butinalinearexecutionwewouldprobablyseenodifference,sincemathematicalprocessorscanterminatetheoperationinjustonecycle.TheremaybeadifferenceinHyper-threadingprocessors,whichaccumulatemorethanoneoperationpercyclewhenthereare"lighter" operations such as integer multiplication, but could not do so with more "complex" . Note I do not have deep knowledge about processors to assert with absolute certainty , moreover this depends on the CPU structure itself and may be true only in specific cases.

Conclusion

The statement that "multiplication is faster than division" proved to be false in many cases. It was true for integer multiplication in some browsers, but not for floating point as a way of replace the division.

Unless you're working on an embedded device with extremely limited memory and processing, do not waste time with this kind of micro-optimization, especially in a language like Javascript that runs on different engines , operating systems and processors.

    
02.04.2014 / 17:07
7

There are several efficient algorithms to perform - in binary - multiplication (eg Dadda multiplier , Wallace tree ) and the division (eg: Newton-Raphson , Goldschmidt ). Often they [multiplication at least] are done directly in the CPU , which results pretty fast.

  

Source: this question on Yahoo! Answers (from 4 years ago)

As for the division, I find it hard to find concrete and up-to-date data, but I can safely say that:

  • It is possible to implement the division in hardware - and this has been done for a long time;
  • However, it is more expensive (i.e. requiring more complex logic).
  • I say this based on the Bug from Pentium , where an optimization attempt at floating point division (FDIV) caused a error in several operations that - although unusual for day-to-day usage - have significantly affected more specialized uses (such as Astronomy, which routinely deals with very large numbers). The root cause of the error was the attempt to simplify a lookup table intended to speed up the splitting process - which otherwise would actually be slower.

    Note: Unlike the Wikipedia article in Portuguese , the error was not in multiplication, but in division. a href="http://en.wikipedia.org/wiki/Pentium_FDIV_bug"> the article in English )

    A lot of time has passed since the Pentium here, and the Moore's Law has made mathematical operations on hardware faster and cheap. That is, I can not say whether one is faster than the other today without doing benchmarks - in different languages and circumstances. As for JavaScript, this test in jsPerf (adapted from Luiz Ricardo's answer ) showed inconclusive results - ranging from browser to browser. However, the multiplication between integers appears to be a bit faster than other operations (or much faster in Chrome): integer division, floating-point multiplication, and floating-point division. >

    P.S. This test refers to division by arbitrary numbers. When trying to divide by a constant, it is common for the compiler (or interpreter, or JIT compiler) to replace the division by a simpler operation (shift, inverse multiplication, or something more sophisticated - see here comment ). Thus, I do not believe that the two example codes presented have a significant difference in performance ... but to say with certainty only by checking via experimentation.

        
    24.04.2014 / 06:10
    5

    The following is a table with the clock cycle latency of the multiplication and division operations for comparison.

    Note that if your program does actually perform such operations, this is going to happen at the hardware level and there is no escaping it.

    | Intel Core i7                     |
    |-----------------------------------|
    | Instruction | Operand   | Latency |
    |-------------|-----------|---------|
    | MUL/IMUL    | r8        | 3       |
    | MUL/IMUL    | r16       | 5       |
    | MUL/IMUL    | r32       | 5       |
    | MUL/IMUL    | r64       | 3       |
    | IMUL        | r16,r16   | 3       |
    | IMUL        | r32,r32   | 3       |
    | IMUL        | r64,r64   | 3       |
    | IMUL        | r16,r16,i | 3       |
    | IMUL        | r32,r32,i | 3       |
    | IMUL        | r64,r64,i | 3       |
    | MUL/IMUL    | m8        | 3       |
    | MUL/IMUL    | m16       | 5       |
    | MUL/IMUL    | m32       | 5       |
    | MUL/IMUL    | m64       | 3       |
    | IMUL        | r16,m16   | 3       |
    | IMUL        | r32,m32   | 3       |
    | IMUL        | r64,m64   | 3       |
    | IMUL        | r16,m16,i |         |
    | IMUL        | r32,m32,i |         |
    | IMUL        | r64,m64,i |         |
    | DIV         | r8        | 11-21   |
    | DIV         | r16       | 17-22   |
    | DIV         | r32       | 17-28   |
    | DIV         | r64       | 28-90   |
    | IDIV        | r8        | 10-22   |
    | IDIV        | r16       | 18-23   |
    | IDIV        | r32       | 17-28   |
    | IDIV        | r64       | 37-100  |
    | FMUL        | r         | 5       |
    | FMUL        | m         |         |
    | FDIV        | r         | 7-27    |
    | FDIV        | m         | 7-27    |
    | FIMUL       | m         | 5       |
    | FIDIV       | m         | 7-27    |
    
    | AMD Steamroller                     |
    |-------------------------------------|
    | Instruction | Operand     | Latency |
    |-------------|-------------|---------|
    | MUL/IMUL    | r8/m8       | 4       |
    | MUL/IMUL    | r16/m16     | 4       |
    | MUL/IMUL    | r32/m32     | 4       |
    | MUL/IMUL    | r64/m64     | 6       |
    | IMUL        | r16,r16/m16 | 4       |
    | IMUL        | r32,r32/m32 | 4       |
    | IMUL        | r64,r64/m64 | 6       |
    | IMUL        | r16,(r16),i | 5       |
    | IMUL        | r32,(r32),i | 4       |
    | IMUL        | r64,(r64),i | 6       |
    | IMUL        | r16,m16,i   |         |
    | IMUL        | r32,m32,i   |         |
    | IMUL        | r64,m64,i   |         |
    | DIV         | r8/m8       | 17-22   |
    | DIV         | r16/m16     | 15-25   |
    | DIV         | r32/m32     | 13-39   |
    | DIV         | r64/m64     | 13-70   |
    | IDIV        | r8/m8       | 17-22   |
    | IDIV        | r16/m16     | 14-25   |
    | IDIV        | r32/m32     | 13-39   |
    | IDIV        | r64/m64     | 13-70   |
    | FMUL        | r/m         | 5       |
    | FIMUL       | m           |         |
    | FDIV        | r           | 9-37    |
    | FDIV        | m           |         |
    | FIDIV       | m           |         |
    
    | VIA Nano L3050                    |
    |-----------------------------------|
    | Instruction | Operand   | Latency |
    |-------------|-----------|---------|
    | MUL/IMUL    | r8        | 2       |
    | MUL/IMUL    | r16       | 3       |
    | MUL/IMUL    | r32       | 3       |
    | MUL/IMUL    | r64       | 8       |
    | IMUL        | r16,r16   | 2       |
    | IMUL        | r32,r32   | 2       |
    | IMUL        | r64,r64   | 5       |
    | IMUL        | r16,r16,i | 2       |
    | IMUL        | r32,r32,i | 2       |
    | IMUL        | r64,r64,i | 5       |
    | DIV         | r8        | 22-24   |
    | DIV         | r16       | 24-28   |
    | DIV         | r32       | 22-30   |
    | DIV         | r64       | 145-162 |
    | IDIV        | r8        | 21-24   |
    | IDIV        | r16       | 24-28   |
    | IDIV        | r32       | 18-26   |
    | IDIV        | r64       | 182-200 |
    | FMUL        | r/m       | 4       |
    | FDIV        | r/m       | 14-23   |
    

    Source: Instruction tables: Lists of instruction latencies, breakthroughs and micro-operation breakdowns for Intel, AMD and VIA CPUs

    This shows that in fact your processor will perform divisions slower than multiplications. This is due to the hardware implementation of these operations suffer from the same algorithmic complexity issues already discussed in other answers. Regardless, your program in one form or another will make use of these instructions to perform such operations efficiently, and will be limited to the performance they provide.

    As for the benchmarks and dubious tips that are supposed to be based on this, beware. Each language will treat the expressions as it defines, the expression 1/2 in the source in C and C ++ will end up becoming a constant 0 in the executable, and no division will be performed during the execution of the program, in a language dynamics such as javascript for example, the cost of interpreting the two operations can make the cost of the arithmetic instructions discussed irrelevant to the analysis. This is just an example, and the wise programmer will know when multiplication vs division makes sense or not, there are cases that actually does, algorithms (come to mind Bresenham now ...) and languages where this makes the difference. When the programmer has no idea of this, and practices religiously, it falls in the case of micro optimization that is the root of all evil .

        
    24.04.2014 / 07:39
    3

    Good luck, the reason the division is slower, is not in the language but in the processor. Since a multiplication operation the processor will perform a binary sum, as an example in the decimal system where 5 x 5 = 25 being 5 + 5 + 5 + 5 + 5 = 25. In the division we will have 2 different operations, which is the multiplication and subtraction In Example 25: 5, you will first perform a multiplication, then perform the subtraction. In short the processor will have to do more processes involving binary arithmetic.

        
    02.04.2014 / 03:12