Efficient method to compare lists

0

I have a problem where I need to compare strings and define their proximity.

The problem is that I have a list with 21,000 records and I need to compare all of them with each other, which would give a quadratic complexity and in total would be 21,000 * 21,000 comparisons, which does not take very little time. I need to do this faster and more easily.

I'm inserting these records into a database for ease of manipulation but I make some changes and filters it through programming in Java before inserting. Here is an excerpt from the code:

for (int i = 0; i < limite; i++) {
            for (int j = 0; j < limite; j++) {

               if (searchForItem(listaoComp, listao.get(j).id, listao.get(i).id)) {

                    naoInsere++;
                } else {

                    insere++;

                   listaoComp.add(new Comparacao(listao.get(i).id, listao.get(j).id));
               }
                total++;
            }

        }

Basically, I'm doing the quadratic check. As this happens, for example, I'm comparing the records (0,1) and (1,0) and inserting them in the list. In my case, records thus mean the same thing and do not need to be both in the list then searchForItem checks before inserting (1,0) if it already exists (0,1) in the list. The searchForItem does this. This method is getting worse throughout code execution because the list goes up considerably.

To contextualize them, I have a list of active drug principles (id and name) that make up the Principle class, which the list is a list of this object.   List<Principio> listao = new ArrayList<Principio>();

Already theComp list is a Comparison list.

In the Comparison class, I keep the ids of the active principles compared and their proximity, for example

new Comparacao(1, 2, 98.5);

Ah, and the border variable within For's indicates the size of the list.

int limite = listao.size()

As I said earlier, I'm comparing all the ids twice, for example 0 - 0 / 0 - 1 / 0 - 2 / 1 - 0 / 1 - 1 / 1 - 2 / 2 - 0 / 2 - 1 / 2 - 2

The searchForItem looks up the listComp if the 0 - 1 record already exists before entering the 1 - 0, understand?

I was trying to use Stream 8's Java Parallels but I do not know how to use the tool right and I could not mount this logic in the filters. (I do not even know if I should mount the conditions.)

My question is if someone has already gone through something similar and has some algorithm that is not quadratic to do this and if they could point me a direction.

    
asked by anonymous 20.03.2018 / 15:18

1 answer

0

Without the pretense of being the most performative solution for your case. But if you do not want to compare the equal items (1: 1) or their inverse (1: 0 and 0: 1) you could simply change your inner loop to start at position i + 1

for(int i = 0; i < lista.size(); i++) {
    for(int j = (i + 1) ; j < lista.size(); j++) {
      // Como aqui você já ignora os indices iguais ou inversos..
      // seria simplesmente adicionar as comparações.
      listaComp.add(new Comparacao(...));
    }
}

In this way you will compare the indices .. [0,1; 0.2; ... 1,2; 1.3 ... 2.3 ... n, n + 1 ...]

By analyzing an example of 10 items, a quadratic (n²) runtime algorithm like yours would have 100 iterations. Already, with this simple change, this would fall to 45 iterations. So, straining for your case of 21,000 records, from 441,000,000, goes to 198,450,000 220,489,500 iterations.

    
20.03.2018 / 18:16