Best approach is to filter data in the database or application?

1

Suppose we have a 'Vendas' table with the following 'estado', 'cidade', 'vendedor', 'valor' columns.

'estado' and 'cidade' would be where the sale was made and 'vendedor' would be what made that sale. 'valor' would be the value of the sale.

What is the best, cheapest, smarter approach to accumulating these values by 'estado' or 'cidade' or 'vendedor' ?

Make 3 selects:

sql = "SELECT estado, sum(valor) 
       FROM vendas
       GROUP BY estado";

sql = "SELECT cidade, sum(valor) 
       FROM vendas
       GROUP BY cidade";

sql = "SELECT vendedor, sum(valor) 
       FROM vendas
       GROUP BY vendedor";

or

Make 1 select only and do the calculations with the programming language

sql = "SELECT * FROM vendas";

Leave it in the programming language (in this case I work with Java) to do these calculations, and the programmer's manual work would be heavier?

    
asked by anonymous 07.06.2017 / 19:45

1 answer

8

The question could be considered unclear because it would need more information. But the answer will almost always be:

Let the database work for you.

Not that this is an absolute truth. But bringing a lot of data to the application filtering almost always is a mistake. I say almost because I can always have a scenario that I do not know that can be useful or the only viable way. But I do not know which one I could remember like this.

Bringing too much data has a cost for the database, for infrastructure and for pli- cation, not to mention you'll have an extra code to deal with.

This case is clearly better to use full SQL and bring only what you need. See if you can do all three queries in a query only. Probably can not, because the most used technologies make this difficult, but it depends a bit on how everything is organized.

There may be some very specific and weird case that would be better to bring everything, but it would still be a little gain and I find something so off the curve that I doubt it would really happen.

Of course you need to architect the application to work well, you need to see if the database is properly configured to support this query .

There is controversy over what the threshold should pass to the database. Many scalability issues occur because the developer overloads the database with things that would look better in the application. There will always be the discussion about using Stored Procedures .

I say and repeat, many of the techniques and tools that exist today were created to solve a problem created by another technique that was unnecessary or worse than simple. Often people ignore the Occam Razor . Of course there will always be the argument that the more complex is doing something more. There are cases that are true, but there are cases that it is only complication for not evaluating the whole, or for liking it, or just know how to do it in a way.

The "good" part is that you created a huge industry because of so much new problem. We look hamsters on the wheel.

Related: Why is "SELECT * FROM table" bad? .

    
07.06.2017 / 20:11