Doubts binary search

3

Based on the principle that binary search only works with array of ordered integers , if I have to fetch an integer from an ordered vector the search speed would be much more faster than a sequential search, but my question is, if my vector is not ordered how much would it influence the speed in relation to the sequential search if I have to sort it? and if we can go deeper think about a binary search in a String vector the process would be:

  • Convert each vector item to a decimal
  • Order the vector
  • Perform binary search
  • In these two examples how much would it influence the speed in relation to the sequential search processes before the binary search?

        
    asked by anonymous 26.03.2014 / 18:11

    3 answers

    4
      

    If my vector is not ordered how much would it influence the speed in relation to the sequential search if I have to sort it?

    If you have to sort a vector and then search for sure the process will be slower, it will only make sense to sort if you have to do multiple searches.

      

    a binary search in a String vector

    The String (in Java) class implements the Comparable interface which allows you to use the binarySearch() method directly on your Strings vector, without the need to transform into integers (whatever you plan to do this).

    That is, you can sort the vector simply by doing so:

    String[] sa = {"um", "dois", "tres", "quatro"};
    Arrays.sort(sa);
    System.out.println(Arrays.binarySearch(sa, "um")); //imprime "3" que é a posição do
                                                       //elemento "um" no vetor ordenado
    
      

    How much would it influence the speed in relation to the sequential search processes before the binary search?

    It depends on the size of your vector and how many times you will search it after it has been sorted.

    Just know that the sort() method has O(n log(n)) complexity.

    References:

    Arrays.sort method ( ) - Java SE7

    String Class - Java SE7

    Comparable Interface - Java SE7

        
    26.03.2014 / 18:37
    3

    The efficiencies would be:

    • n.log (n) to sort
    • log (n) to fetch when ordered binarysearch
    • n to search cluttered

    Let's say we have 10K elements

    To order and search :

    10K.log(10K)+log(10K) = 44K de eficiência
    

    To look for messy would just be:

    10K = 10k eficiência
    

    Unordered search is faster than ordering and fetching. The problem with sorting the list is that its cost is very high. So it would never be more efficient to sort before searching for only 1 search .

    If you had to search more then the cost of this initial ordering would disappear.

    For example for this search, if done 10 times we would have:

    Ordenado: 44+40 = 88
    Desordenado: 10*10 = 100
    

    Remembering that the efficiency (1) is the most efficient, then 88 is better than 100.

        
    26.03.2014 / 19:48
    3

    Well, the speed at which the processes required to make binary search viable are performed is directly associated with the number of times they run.

    In case, if the vector is not ordered and you do not know a "clue" about a way to sort it in a faster and more effective way (based on the vector values themselves), a sequential search will always be the best alternative.

    In short, it is only interesting to do a binary search instead of the sequential search when:

  • The vector in question is sorted (as already pointed out in your question);
  • The number of tasks performed to sort a vector from an already known algorithm is less than the number of value comparisons per value.
  • Think of the following situation (I'll use C # to demonstrate):

    • You want to search by a number (38 for example) within a vector with 40 positions that is not ordered:

      int valorDesejado = 38;
      int[] vetor = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
                        22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 39};
      

      You have a clue or, in this case, you know explicitly that your vector has the last two inverted values.

    • By comparing the two types of searches (considering processing cost in addition, division, disjunction, and comparison operations in an equivalent manner), in sequential you would perform 38 comparison tests loop) and increment its counter variable i the same number of times:

      for (int i = 0; i < vetor.Length; i++)
          if (valorDesejado == vetor[i])
              Console.WriteLine("Valor encontrado na posicao " + i);
      
      /* Total:
       * 38 Operações
       * 38 Comparações
       */
      

      On the other hand, before performing the binary search , you only need three operations to change the position values (for this I will use XOR algorithm of exchange ), adding to the total two operations and a comparison per cycle:

      // Usando XOR Swap: 3 operações
      vetor[38] ^= vetor[39];
      vetor[39] ^= vetor[38];
      vetor[38] ^= vetor[39];
      
      // Busca binária
      int esq, meio, dir;
      esq = -1;
      dir = vetor.Length;
      
      while (esq < dir - 1) // Para buscar pelo 38, o ciclo só se repetirá 6 vezes
      {
          meio = (esq + dir) / 2; // Duas operações (soma e divisão)
          if (vetor[meio] < valorDesejado) // Uma comparação
              esq = meio;
          else
              dir = meio;
      }
      Console.WriteLine("Valor encontrado" + dir);
      
      /* Total:
       * 6 * 2 + 3 = 15 Operações
       * 6 * 1 = 6 Comparações
       */
      
    26.03.2014 / 18:33