What is the problem of queries N + 1?

13

Whenever we work with some ORM, it is common to fall into the queries N + 1 problem. It's something about performance, called up to antipattern .

But what is really this problem, why it happens, what are its main causes and how, in theory, solve them?

I've also heard that to solve, just practice the eager loading . But to what extent is it beneficial and able to solve this problem?

    
asked by anonymous 15.06.2018 / 05:00

2 answers

12

The problem is not unique to ORMs, although many find it because it is a common problem, but not inherent in them. And do not believe in people's ability to do it wrong manually:).

The ORM appears more because a naive implementation will force the problem to always occur.

Alias is not only a problem of the ORM itself, but of modeling objects with related data. Either use a database with a non-relational model that has its problems there, or adopt the relational model in the application.

Still, you can do something when you mix both models.

The problem is common when there is an object and other N related, hence the name N + 1 which is the "father" of these N. The problem becomes clear when the query takes the main data, let's say it is a fiscal note , and then will pick up the lines of items that make up the note. Getting to fetch data individually in the database can become very costly, especially in the poorly designed architecture that many people do (some out of necessity).

Generally at least 1 + 1 is required, which is a failure of relational model communication (not the model, but rather how they communicate in the current implementations, which I consider to be an error, and instead of fixing this, they created another model worse, it is the history of our area, they solve a problem with another problem, but nothing that another problem can not solve this too).

Alias, this is why some people like to use a NoSQL DB as the proxy of the relational. Again, the complexity of the solution increases because the tool has problems that are easy to solve, but nobody does.

But in this case if the size is large will not be a big problem.

The problem with eager loading is that it can bring information you will not even use. But it depends a lot on the problem, there are cases that this is rare, there are others that even if it happens does not even tickle, and in many cases the fact that it comes more than accurate generates an overhead so small that a simple extra query will already be worse, ie a 1 + 2 may already be worse. Imagine reading a single invoice and it will bring all the lines of all invoices to avoid the N + 1, total waste.

This is the problem of automated solutions or programmers who do not understand what they are doing and adopt solutions automatically. The real solution is to understand what will happen in that case and decide what is most interesting. Even manually it is complicated to meet all cases, it depends on the query. The ORM may have a mechanism that attempts to "guess" what the best strategy is.

In many cases there is a lot of repetition of information by the way in which it is agreed to work with tabular data, generally using JOIN .

Most of the time bringing everything at once is often more interesting than bringing one by one.

For lack of a better solution would be something like:

SELECT * FROM Nf
SELECT * FROM NfItem

If you have 1000 notes and an average of exactly 10 items per note, there will be 11,000 lines in all, with 2 queries, one large and one huge.

In opposition to the form N + 1:

SELECT * FROM Nf
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
.
.
.
Tantos quantos forem a quantidade de notas fiscais existentes.

Here you will also have 11 thousand lines, but with 1000 small queries and 1 large.

The code is very abstract, just to illustrate.

Try frying one potato fillet at a time and a bunch of fillet at a time. The first one ends quickly individually, but the whole is tragic, the second takes longer, but when it's finished, everything is ready. It's only a problem if you find out you sold only 3 fillets, and fried the package.

    
15.06.2018 / 17:50
9
  

But what this problem really is

The best way to explain this problem is with an example.

Imagine that you have a Pessoa table and a Endereco table. Each person has multiple addresses, consolidating a one to many ( 1-N ) relationship.

And now you want to get the addresses of several people. Normally, we see the following query using the ORM of your preference (I will use the JPA JPQL notation):

public List<Pessoa> consultarPessoas() {
    String jpql = "select * from Pessoa";
    return em.createQuery(jpql).getResultList();
}

And then you iterate through each Person to get their addresses:

List<Pessoa> pessoas = consultarPessoas():
for (Pessoa pessoa : pessoas) {
    List<Endereco> enderecos = pessoa.getEnderecos();
}

By imagining a LAZY between person and addresses, we will have the following SQL for each person when calling the pessoa.getEnderecos() method:

SELECT * from Endereco where pessoa_id = :id;
  

, why it happens,

The problem is that to get people's addresses you first get the person and then search each person's addresses. Imagine that the previous query returned us 5 people, the amount of SQLs generated will look something like this:

 SELECT * from pessoa
 SELECT * from endereco where pessoa_id = 1;
 SELECT * from endereco where pessoa_id = 2;
 SELECT * from endereco where pessoa_id = 3;
 SELECT * from endereco where pessoa_id = 4;
 SELECT * from endereco where pessoa_id = 5;

That is, 1 select from person with N select for addresses, the famous N + 1 .

  

What are your main causes?

Normally it is caused by improper use of ORMs. You need to understand what the ORM does behind the scenes. Although they are there to make our lives easier, they need to be used wisely. Because they are very forgiving in general, unexpected results can be caused by misuse of the tool.

  

And how, in theory, do you solve them?

In the example I gave earlier, your goal was to get the addresses of several people. If it's the addresses of all people in the database, you just need to do:

 SELECT * from Endereco

But if you want to apply a filter to bring those 5 people, this can be done by avoiding those various queries with a different JPQL, starting from the Address table too:

 SELECT * from Endereco where pessoa_id IN (1,2,3,4,5);
  

I've also heard that to solve, just practice eager loading. But to what extent is it beneficial and able to solve this problem?

EAGER loading is an alternative, as the generated SQL would look something like this:

select p.id, p.nome, end.id, end.rua, end.pessoa_id from pessoa p JOIN endereco end ON end.pessoa_id = p.id

However, EAGER is a problem if added between the Person and Address relationship in your ORM because every time you search for a person, the addresses will also come together. Believe, you do not want this as the default behavior of your system. The main performance issues I saw in applications involving the use of some ORMs were caused by this.

Some ORMs have the option of optionally using FETCH optionally in a query, so you can "turn on" EAGER when you want. In JPQL, it would look like this:

SELECT * from pessoa JOIN FETCH pessoa.enderecos

Resulting in the same SQL I mentioned earlier.

However, there is a serious limitation with FETCH and EAGER if you try to apply some kind of pagination in the query. Using EAGER or FETCH this can not be applied to the generated SQL itself, and to have a query with the same effect (bring people and addresses in the same query) you need to call a native query and / or use functions (such as DENSE_RANK) of the database.

    
15.06.2018 / 18:33