I have a problem where I need to compare strings and define their proximity.
The problem is that I have a list with 21,000 records and I need to compare all of them with each other, which would give a quadratic complexity and in total would be 21,000 * 21,000 comparisons, which does not take very little time. I need to do this faster and more easily.
I'm inserting these records into a database for ease of manipulation but I make some changes and filters it through programming in Java before inserting. Here is an excerpt from the code:
for (int i = 0; i < limite; i++) {
for (int j = 0; j < limite; j++) {
if (searchForItem(listaoComp, listao.get(j).id, listao.get(i).id)) {
naoInsere++;
} else {
insere++;
listaoComp.add(new Comparacao(listao.get(i).id, listao.get(j).id));
}
total++;
}
}
Basically, I'm doing the quadratic check. As this happens, for example, I'm comparing the records (0,1) and (1,0) and inserting them in the list. In my case, records thus mean the same thing and do not need to be both in the list then searchForItem checks before inserting (1,0) if it already exists (0,1) in the list. The searchForItem does this. This method is getting worse throughout code execution because the list goes up considerably.
To contextualize them, I have a list of active drug principles (id and name) that make up the Principle class, which the list is a list of this object.
List<Principio> listao = new ArrayList<Principio>();
Already theComp list is a Comparison list.
In the Comparison class, I keep the ids of the active principles compared and their proximity, for example
new Comparacao(1, 2, 98.5);
Ah, and the border variable within For's indicates the size of the list.
int limite = listao.size()
As I said earlier, I'm comparing all the ids twice, for example 0 - 0 / 0 - 1 / 0 - 2 / 1 - 0 / 1 - 1 / 1 - 2 / 2 - 0 / 2 - 1 / 2 - 2
The searchForItem looks up the listComp if the 0 - 1 record already exists before entering the 1 - 0, understand?
I was trying to use Stream 8's Java Parallels but I do not know how to use the tool right and I could not mount this logic in the filters. (I do not even know if I should mount the conditions.)
My question is if someone has already gone through something similar and has some algorithm that is not quadratic to do this and if they could point me a direction.