Is a subquery in the SELECT calculated for each of the results or only once?

17

Based on this example, where the result will be used to calculate the percentage of occurrence of each 'type' , which approach is more efficient / faster?

Use a subquery in SELECT to calculate total records:

SELECT tipo, COUNT(*) AS Parcial, (SELECT COUNT(*) FROM tabela) AS Total FROM tabela
GROUP BY tipo;  

Or use two querys , one to calculate the total number of records:

SELECT COUNT(*) AS Total FROM tabela;  

and another to calculate the total for 'type'

SELECT tipo, COUNT(*) AS Parcial FROM tabela
GROUP BY tipo;

Are there other more efficient ways to do this?

    
asked by anonymous 08.05.2015 / 17:30

3 answers

9

I think this question needs to be better explored. There is no general answer, as this depends on the implementation of each database engine. The SQL language is declarative. You say what wants and not how to get. The how is left over by the engine. In some cases it is possible to give a hint to the engine, but not to radically change the way it works.

So, what I'm going to show here are tests I did in SQL Server 2005.

My tests were based on two queries. The first one is in the question. The second (cross-join) is in this response. See below:

Query 1

SELECT
    NUMBER,
    (SELECT COUNT(*) FROM NUMBERS)
FROM
    NUMBERS

Query 2

SELECT
    NUMBER,
    TOTAL.T
FROM
    (SELECT COUNT(*) T FROM NUMBERS) TOTAL,
    NUMBERS

Numbers Table

Creating and populating the Numbers table (999999 records) can be seen below.

CREATE TABLE [dbo].[Numbers](
    [Number] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
    [Number] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

--1 milhão de registros são adicionados
insert into Numbers(Number)
select top 1000000 row_number() over(order by t1.number) as N
from   master..spt_values t1
       cross join master..spt_values t2

Hypothesis

The hypothesis is that a SELECT COUNT (*) is not made for each record in the case of Query 1 .

It's a simple optimization that SQL Server programmers would not let go of. Note that (SELECT COUNT (*) FROM NUMBERS) is completely independent of the query. The value of it can be calculated once, stored and just put in the SQL return (as if it were a constant).

Analysis

The image below shows the Query 1 execution plan:

ThisimageshowstheQuery2executionplan:

TheonlydifferenceisanoperatorcalledComputerScalar.Therestoftheoperatorsisexactlythesameinthepositionwithinthetreeandinthevaluescomputed/estimatedbytheSQLServerplanner.ComputerScalarcostsanestimated2%forthiscase.

IwentalittlefurtheranddidananalysisusingtheSQLServerProfile.Seethereturn:

ThemostimportantthingtonotehereisthatthenumberofReads(readings)ofthetwoqueriesisthesame.TheQuery1timeisabitlongerduetotheCPUconsumption(alsoslightlylarger).CertainlyitisComputerScalaroperatorshownabove.

Stillintheprofile,Icheckedthecosttoexecuteonlythestatementbelow:

SELECTCOUNT(*)FROMNUMBERS

Theresultcanbeseenbelow:

Conclusion

In the foregoing, you can see that the assumption raised is correct and that SQL Server 2005 Planner in the context , did not perform a count operation to each line returned. Quite possibly the other bank engines also optimize queries like this one, in order to avoid unnecessary processing.

Note that Query 2 still performs slightly better and may be indicated by this. However, it should be made clear that the purpose here was not to compare the performance of the two queries, but to show that a query count (*) is not made in SQL Query 1 .

    
13.08.2015 / 04:36
8

I found Cantoni's considerations important, and I believe that the performance issue goes beyond the structure of how to set up the query, of course it influences a lot, but it reaches a point where the DBMS itself resolves, so I put an answer based on mysql.

I tried with the sql of Ramal and Rafael Guerreiro (adjusted because of the way the error is posted) explain below

Below query based on the Ramaral technique.

    mysql> explain  SELECT comp_pago, COUNT(*) AS Parcial, (SELECT COUNT(*) FROM tab_controle_compras_item) AS Total FROM tab_controle_compras_item GROUP
 BY comp_pago;
+----+-------------+---------------------------+-------+---------------+----------------+---------+------+------+---------------------------------+
| id | select_type | table                     | type  | possible_keys | key            | key_len | ref  | rows | Extra                           |
+----+-------------+---------------------------+-------+---------------+----------------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | tab_controle_compras_item | ALL   | NULL          | NULL           | NULL    | NULL | 8780 | Using temporary; Using filesort |
|  2 | SUBQUERY    | tab_controle_compras_item | index | NULL          | fk_comp_id_idx | 4       | NULL | 8780 | Using index                     |
+----+-------------+---------------------------+-------+---------------+----------------+---------+------+------+---------------------------------+

Below consultation based on the technique of Rafael Guerreiro

mysql> explain  SELECT tab.comp_pago, COUNT(1) AS Parcial,tot.total     FROM tab_controle_compras_item tab,             (SELECT COUNT(*) as total FRO
M tab_controle_compras_item) as tot     GROUP BY tab.comp_pago;
+----+-------------+---------------------------+--------+---------------+----------------+---------+------+------+---------------------------------+
| id | select_type | table                     | type   | possible_keys | key            | key_len | ref  | rows | Extra                           |
+----+-------------+---------------------------+--------+---------------+----------------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>                | system | NULL          | NULL           | NULL    | NULL |    1 | Using temporary; Using filesort |
|  1 | PRIMARY     | tab                       | ALL    | NULL          | NULL           | NULL    | NULL | 8780 | NULL                            |
|  2 | DERIVED     | tab_controle_compras_item | index  | NULL          | fk_comp_id_idx | 4       | NULL | 8780 | Using index                     |
+----+-------------+---------------------------+--------+---------------+----------------+---------+------+------+---------------------------------+

In my view the queries so far presented are just other ways to achieve the same performance, I will follow because I am also interested in this subject, I hope someone can demonstrate something more efficient if it is possible.

I have reached a satisfactory result because only one full scan has been done below:

    mysql> explain SELECT COUNT(IF(comp_pago=1,1, NULL)) 'pagas', COUNT(IF(comp_pago=0,1, NULL)) 'nao pagas' FROM tab_controle_compras_item;
+----+-------------+---------------------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table                     | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+---------------------------+------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | tab_controle_compras_item | ALL  | NULL          | NULL | NULL    | NULL | 8780 | NULL  |
+----+-------------+---------------------------+------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)

Below is the reference source:

  

link

    
17.11.2015 / 12:17
7

As it was done, the count will be executed for each row of the select. The same would be the case if you used the subquery in the where clause.

To execute this second count only once, just make a Cartesian plan:

SELECT tab.tipo, COUNT(1) AS Parcial, tot.total
FROM tabela tab,
    (SELECT COUNT(*) total FROM tabela) tot
GROUP BY tab.tipo;
    
08.05.2015 / 17:54