How to test the runtime of a code in Visual Studio 2017?

0

I'm trying to test the runtime of a code, however I'm always getting incorrect values, the first test will always be the one that has the worst time. And most of the time the second test is always 0.

#include <iostream>
#include <math.h>
#include <intrin.h>
#include <chrono>

using namespace std;

#define MAX_LOOP                100000
#define NUM                     10000.f

auto sse_sqrt( float n )
{
    __m128 reg = _mm_load_ss( &n );
    return _mm_mul_ss( reg, _mm_rsqrt_ss( reg ) ).m128_f32[ 0 ];
}

auto stl_sqrt_timer()
{
    auto start = std::chrono::high_resolution_clock::now();

    for ( auto i = 0; i < MAX_LOOP; i++ )
    {
        auto v = std::sqrt( NUM );
    }

    auto end = std::chrono::high_resolution_clock::now();

    return ( end - start ).count();
}

auto sse_sqrt_timer()
{
    auto start = std::chrono::high_resolution_clock::now();

    for ( auto i = 0; i < MAX_LOOP; i++ )
    {
        auto v = sse_sqrt( NUM );
    }

    auto end = std::chrono::high_resolution_clock::now();

    return ( end - start ).count();
}

int main()
{
    cout << "sse_sqrt: " << sse_sqrt_timer() << "\n";
    cout << "stl_sqrt: " << stl_sqrt_timer() << "\n";

    cin.ignore();

    return 0;
}

First Run: sse_sqrt: 12461 stl_sqrt: 0

Second run: sse_sqrt: 2643 stl_sqrt: 378

Reversing the order of tests:

stl_sqrt: 23032 sse_sqrt: 378

stl_sqrt: 2265 sse_sqrt: 0

I'm compiling in Release x86, with optimization / Ox

    
asked by anonymous 06.08.2018 / 20:01

1 answer

1

Keep in mind that certain optimizations can affect performance more than you realize. For example, if a variable is defined, initialized, modified, and everything else but all in a useless way (because its value is not actually used), its existence can be omitted along with the instructions computing its new value. This is one of the most violent causes of bugs in performance measurements.

So, calculating something uselessly can cause the executable to not have the calculation, and even a loop can be simplified to the point where it does not occur in execution. So, to measure correctly with optimizations you should make a code that takes that into account. For example, accumulate the result of the calculation so that all operations have relevance in defining the value of a variable and then use that variable in some way that guarantees its usefulness, such as printing its value (or pretending to print, passing as an argument in printf but not including it in the formatting of what will be printed). As well? See the example below.

  int index , sum , chrono ;
  chrono = time(0) ;                                 // Mede ponto de partida.
  for( index=0 ; index<999999 ; index++ ){
      sum += index ;                                 // Executa a instrução que quer medir.
  }
  chrono = time(0)-chrono ;                          // Mede intervalo.
//printf( "Terminou em %d segundos.\n" , chrono ) ;
  printf( "Terminou em %d segundos.\n" , chrono , sum ) ;

Another thing, keep in mind that programs may not be 100% of the time running on the CPU, thus dilating the time in some sections. There are several ways around this but none are perfect. Another thing that increases the measured time is the realization of instructions that hold the loop (condition, increment) in addition to the calculation itself that you want to measure, which is solved by making a particular measurement of these excessive instructions to know what that additional time is. One more thing, if the first measurement still has strange results, then make a first dummy measurement and discard it.

Any questions?

    
07.08.2018 / 02:11