The following is a table with the clock cycle latency of the multiplication and division operations for comparison.
Note that if your program does actually perform such operations, this is going to happen at the hardware level and there is no escaping it.
| Intel Core i7 |
|-----------------------------------|
| Instruction | Operand | Latency |
|-------------|-----------|---------|
| MUL/IMUL | r8 | 3 |
| MUL/IMUL | r16 | 5 |
| MUL/IMUL | r32 | 5 |
| MUL/IMUL | r64 | 3 |
| IMUL | r16,r16 | 3 |
| IMUL | r32,r32 | 3 |
| IMUL | r64,r64 | 3 |
| IMUL | r16,r16,i | 3 |
| IMUL | r32,r32,i | 3 |
| IMUL | r64,r64,i | 3 |
| MUL/IMUL | m8 | 3 |
| MUL/IMUL | m16 | 5 |
| MUL/IMUL | m32 | 5 |
| MUL/IMUL | m64 | 3 |
| IMUL | r16,m16 | 3 |
| IMUL | r32,m32 | 3 |
| IMUL | r64,m64 | 3 |
| IMUL | r16,m16,i | |
| IMUL | r32,m32,i | |
| IMUL | r64,m64,i | |
| DIV | r8 | 11-21 |
| DIV | r16 | 17-22 |
| DIV | r32 | 17-28 |
| DIV | r64 | 28-90 |
| IDIV | r8 | 10-22 |
| IDIV | r16 | 18-23 |
| IDIV | r32 | 17-28 |
| IDIV | r64 | 37-100 |
| FMUL | r | 5 |
| FMUL | m | |
| FDIV | r | 7-27 |
| FDIV | m | 7-27 |
| FIMUL | m | 5 |
| FIDIV | m | 7-27 |
| AMD Steamroller |
|-------------------------------------|
| Instruction | Operand | Latency |
|-------------|-------------|---------|
| MUL/IMUL | r8/m8 | 4 |
| MUL/IMUL | r16/m16 | 4 |
| MUL/IMUL | r32/m32 | 4 |
| MUL/IMUL | r64/m64 | 6 |
| IMUL | r16,r16/m16 | 4 |
| IMUL | r32,r32/m32 | 4 |
| IMUL | r64,r64/m64 | 6 |
| IMUL | r16,(r16),i | 5 |
| IMUL | r32,(r32),i | 4 |
| IMUL | r64,(r64),i | 6 |
| IMUL | r16,m16,i | |
| IMUL | r32,m32,i | |
| IMUL | r64,m64,i | |
| DIV | r8/m8 | 17-22 |
| DIV | r16/m16 | 15-25 |
| DIV | r32/m32 | 13-39 |
| DIV | r64/m64 | 13-70 |
| IDIV | r8/m8 | 17-22 |
| IDIV | r16/m16 | 14-25 |
| IDIV | r32/m32 | 13-39 |
| IDIV | r64/m64 | 13-70 |
| FMUL | r/m | 5 |
| FIMUL | m | |
| FDIV | r | 9-37 |
| FDIV | m | |
| FIDIV | m | |
| VIA Nano L3050 |
|-----------------------------------|
| Instruction | Operand | Latency |
|-------------|-----------|---------|
| MUL/IMUL | r8 | 2 |
| MUL/IMUL | r16 | 3 |
| MUL/IMUL | r32 | 3 |
| MUL/IMUL | r64 | 8 |
| IMUL | r16,r16 | 2 |
| IMUL | r32,r32 | 2 |
| IMUL | r64,r64 | 5 |
| IMUL | r16,r16,i | 2 |
| IMUL | r32,r32,i | 2 |
| IMUL | r64,r64,i | 5 |
| DIV | r8 | 22-24 |
| DIV | r16 | 24-28 |
| DIV | r32 | 22-30 |
| DIV | r64 | 145-162 |
| IDIV | r8 | 21-24 |
| IDIV | r16 | 24-28 |
| IDIV | r32 | 18-26 |
| IDIV | r64 | 182-200 |
| FMUL | r/m | 4 |
| FDIV | r/m | 14-23 |
Source: Instruction tables: Lists of instruction latencies, breakthroughs and micro-operation breakdowns for Intel, AMD and VIA CPUs
This shows that in fact your processor will perform divisions slower than multiplications. This is due to the hardware implementation of these operations suffer from the same algorithmic complexity issues already discussed in other answers. Regardless, your program in one form or another will make use of these instructions to perform such operations efficiently, and will be limited to the performance they provide.
As for the benchmarks and dubious tips that are supposed to be based on this, beware. Each language will treat the expressions as it defines, the expression 1/2
in the source in C and C ++ will end up becoming a constant 0
in the executable, and no division will be performed during the execution of the program, in a language dynamics such as javascript for example, the cost of interpreting the two operations can make the cost of the arithmetic instructions discussed irrelevant to the analysis. This is just an example, and the wise programmer will know when multiplication vs division makes sense or not, there are cases that actually does, algorithms (come to mind Bresenham now ...) and languages where this makes the difference. When the programmer has no idea of this, and practices religiously, it falls in the case of micro optimization that is the root of all evil .