In what situations should I allocate a vector dynamically in C ++?

19

I'm working on a framework code for my work. In one of the functions, it dynamically allocates a std::vector , makes a copy of each node that the object has and returns it to the user:

std::vector<Node> *Tree::getAllNodes() {
    std::vector<Node> *n, *aux;
    n = new vector<Node>();
    mut.lock();
    for (int i = 0; i < subnodes.size(); ++i) {
        aux = subnodes[i]->GetNodes();
        n->insert(n->end(), aux->begin(), aux->end());
    }
    mut.unlock();
    return n;
}

That is, it is up to the user to settle that memory later.

But, I do not know if it is really necessary to allocate this memory dynamically, since vector takes care of it for people, underneath the rags, correct?

One of the reasons I find it is that it is cheaper to return only the pointer than the vector copy when we have a lot of data. If we did not allocate it dynamically, we would have to return a copy, and because it was too much data, it would have cost more.

Questions:

  • Is this really a case that we should allocate vector dynamically?
  • In other cases, when we have few data and / or few calls to this function, is it unnecessary to make such a dynamic allocation? After all, memory management gets simpler.
  • asked by anonymous 14.12.2013 / 21:31

    2 answers

    16

    I can not see any real advantage in allocating a std::vector dynamically. But care must be taken when returning a vector as a result of a function. Even though it is small (typically 12 bytes on 32-bit systems) your copy constructor is slow.

    • If possible, allow the compiler to apply the return optimizations. Here the object to be returned is built directly into the target variable after the function call. There are two possible modalities (12.8 / 31):

      • NRVO : When you create a variable within the function whose type is the return type of the function and at all return points, you return that variable. Example:

        // Otimização de valor nomeado:
        //  Todos os returns devem retornar a mesma variável local.
        std::vector<int> func_nrvo() {
            std::vector<int> result;
            // execute algo aqui e adicione elementos ao vetor
            return result;
        }
        
        std::vector<int> result = func_nrvo();
        
      • RVO (Return Value Optimization) : When you return an object built on its own return point, that is: a temporary (which is exactly the type of the function). Example:

        // Otimização de valor não nomeado:
        //  Todos os returns devem ser um objeto temporário do tipo de retorno.
        std::vector<int> func_rvo() {
            return std::vector<int>(5, 0);
        }
        
        std::vector<int> result = func_rvo();
        
    • If you can not apply these optimizations (I suggest rewriting so that the function looks like one of these examples), then you have two options: move or copy , the first being quite light and the second very costly. Unfortunately there is no move concept in C ++ 03 and if you can not use C ++ 11 you will have to use other means to avoid copying, like using a reference argument to return:

      void func_ref(std::vector<int>& vec) {
          vec.clear();
          vec.push_back(1);
      }
      
      std::vector<int> vec;
      func_ref(vec);
      
    • If you use a compiler that supports C ++ 11:

      The vector in this case has a constructor that can move the object. So returning the vector by a function is quite light and you do not have to worry. In cases where value return optimization does not apply, but if you return a local variable, the result will be moved automatically. But you can force the action to move using std::move if the situation is different.

      std::vector<int> func1() {
          return std::vector<int>({1, 2, 3, 4, 5});
      }
      
      std::vector<int> func2() {
          std::vector<int> a, b;
          if (rand() % 2)
              return a;
          else
              return b;
      
          // Otimização não se aplica. Estamos retornando variáveis diferentes.
          // Mas ainda estamos retornando variáveis, mover é implícito.
      }
      
      std::vector<int> func3() {
          std::vector<int> a, b;
          return (rand() % 2) ? std::move(a) ? std::move(b);
          // Otimização não se aplica. Estamos retornando variáveis diferentes.
          // Mas note que não estamos retornando variáveis, e sim uma estrutura
          // mais complexa (a condicional ternária). Nesse caso precisa forçar o move.
      }
      
      std::vector<int> vec1 = func1(); // Resultado construído diretamente em vec1
      std::vector<int> vec2 = func2(); // Resultado movido para vec2
      std::vector<int> vec3 = func3(); // Resultado movido para vec3
      

    Note:

    In the code of your function I see that you use a mutex. Note that calling the insert function in the vector may fail and throw the std::bad_alloc exception in the event of a memory shortage. If this happens your function will be terminated without releasing the mutex. A deadlock waiting to emerge!

    Ideally, you should use a class whose constructor blocks the mutex and the destructor unlocks it, such as std::lock_guard . Also in the case of an exception the mutex will be freed because the destructor of local variables is always called.

    When in doubt ...

    The rules governing exactly which type of constructor to call, when and what optimizations can be made in each case are quite complex, and analyzing a code based solely on your "instincts" can be risky. When confronted with situations like that one attitude is to trust your compiler to tell you what is happening. Instead of a vector, use a "guinea pig" class to see through the code. Example:

    struct Cobaia {
        Cobaia() {cout << "  Cobaia()" << endl;}
        ~Cobaia() {cout << "  ~Cobaia()" << endl;}
        Cobaia(const Cobaia&) {cout << "  Cobaia(const Cobaia&)" << endl;}
        Cobaia(Cobaia&&) {cout << "  Cobaia(Cobaia&&)" << endl;} // apenas C++11
    };
    
    volatile bool cond = true; // volatile para não otimizar
    
    Cobaia func1() { Cobaia r; return r; }
    Cobaia func2() { return Cobaia(); }
    Cobaia func3() { Cobaia a, b; if (cond) return a; else return b; }
    Cobaia func4() { Cobaia a, b; return cond ? a : b; }
    Cobaia func5() { Cobaia a, b; return std::move(cond ? a : b); } // apenas C++11
    
    int main() {
        cout << "func1:" << endl; Cobaia c1 = func1();
        cout << "func2:" << endl; Cobaia c2 = func2();
        cout << "func3:" << endl; Cobaia c3 = func3();
        cout << "func4:" << endl; Cobaia c4 = func4();
        cout << "func5:" << endl; Cobaia c5 = func5(); // apenas C++11
        cout << "fim:" << endl;
    }
    

    Here the result of this program (compiled with GCC 4.8.1 in C ++ mode 11, my comments):

    func1:
      Cobaia() // Otimização acontecendo. Tudo acontece como se 'c1' estivesse dentro de func1
    func2:
      Cobaia() // Otimização acontecendo. Tudo acontece como se 'c2' estivesse dentro de func2
    func3:
      Cobaia() // Construção do local a
      Cobaia() // Construção do local b
      Cobaia(Cobaia&&) // Mover a ou b para c3
      ~Cobaia() // Construção do local b
      ~Cobaia() // Construção do local a
    func4:
      Cobaia() // Construção do local a
      Cobaia() // Construção do local b
      Cobaia(const Cobaia&) // Copiar a ou b para c4
      ~Cobaia() // Construção do local b
      ~Cobaia() // Construção do local a
    func5:
      Cobaia() // Construção do local a
      Cobaia() // Construção do local b
      Cobaia(Cobaia&&) // Mover a ou b para c5
      ~Cobaia() // Construção do local b
      ~Cobaia() // Construção do local a
    fim:
      ~Cobaia() // Destruir c5
      ~Cobaia() // Destruir c4
      ~Cobaia() // Destruir c3
      ~Cobaia() // Destruir c2
      ~Cobaia() // Destruir c1
    

    Notice the difference between func3 , func4 and func5 . It is a clear example of how obscure these rules can be. In func3 return is a local variable, so it has a fixed address where data can be moved . In the case of func4 the return expression is a temporary expression, so its value is expected to be destroyed just before it actually returns from the function. That way the compiler needs to first copy the result to the return address before continuing. At func5 I use std::move to convert the expression to a Cobaia&& that can be < strong> moved .

    If you run the same code in C ++ 03 you will see that the latter make copies since there is no concept of moving an object.

    For extra doses of fun, add flag -fno-elide-constructors to GCC. It shuts off all these optimizations and you can see each and every copy happening. An exercise: determine the reason for each.

        
    14.12.2013 / 21:44
    2

    In some cases it may be advantageous / necessary to create the vector dynamically:

    Situation 1: You have a multithreaded process and you have a proper heap for each thread (for performance reasons), in which case you create the vector in the respective thread heap.

    Situation 2: You use a custom allocator, if it is not static you are not able to declare something like:

    vector<unsigned int, my_allocator<unsigned int>> vVector;
    

    Because your my_allocator does not yet exist and when you declare the vector it automatically needs an allocator to initialize.

        
    29.01.2014 / 17:02