Someone could help me understand why Postgresql missed several cost estimates in an experiment I did.
I'm doing an experiment with 22 queries from the TPCH Benchmark [1], to check the performance of indexes in queries.
Of 22 queries, only 5 queries were verified that the optimizer used secondary indexes. In another experiment, the 5 queries were executed in a database without the presence of indexes, in order to identify if the execution time would increase due to the absence of indices.
But to my surprise, the experiment without the presence of indexes in the database was faster than using the indexes (for the 22 queries).
I would like to understand, because the total cost parameter was wrong in all cases, ie all the queries that spent the most time, indicated a lower cost - in all 5 cases, which is incorrect in my opinion.
See below, that the first line refers to query 6 that you used index, the cost was 3335809, lower than cost 5255959, the same query 6 that did not use index.
Also see the time spent. The query that used index spent 7 minutes while without using indices took 55 seconds. This behavior has spread to other cases.
My question is, Why does the total cost (execution plan) incorrectly calculate the cost for all cases where you have indexes?
Índices |Query |Tempo_Gasto |Custo_Total
===============================================
Sim 6 00:07:56 3335809.61
Nao 6 00:00:55 5255959.00
Sim 7 00:09:16 5847359.97
Nao 7 00:02:08 6793148.45
Sim 10 00:07:04 40743017.17
Nao 10 00:02:14 41341406.62
Sim 15 00:10:03 6431359.90
Nao 15 00:01:56 9608659.87
Sim 20 00:12:48 8412159.69
Nao 20 00:05:49 9537835.93
=============================================
For issues of using a specific Patch from my research project, I'm using an old version 9.0.1 (from year 2012).
I have not modified the default parameters, only the random_page_cost to 1, since I am using an SSD disk, where the random access cost is less than in an HDD. Here's what's in my postgresql.conf
max_connections = 100
effective_io_concurrency = 5
#seq_page_cost = 1.0
random_page_cost = 1.0
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.005
#cpu_operator_cost = 0.0025
#effective_cache_size = 128MB
Did any of these parameters interfere with this error in cost statistics?
Follow query 6 and its Explain Analyze with and without indexes.
------------- QUERY 6 ---------------------------------------------
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1995-01-01'
and l_shipdate < date '1995-01-01' + interval '1' year
and l_discount between 0.09 - 0.01 and 0.09 + 0.01
and l_quantity < 24;
---- COM ÍNDICE (idx_l_shipdatelineitem000)-------------------
Plano Execucao:Aggregate (cost=3335809.59..3335809.61 rows=1
width=16) (actual time=476033.847..476033.847 rows=1 loops=1)
-> Bitmap Heap Scan on lineitem (cost=376416.20..3330284.29
rows=2210122 width=16) (actual time=375293.183..471695.391
rows=2282333 loops=1)
Recheck Cond: ((l_shipdate >= _1995-01-01_::date) AND
(l_shipdate < _1996-01-01 00:00:00_::timestamp without time zone))
Filter: ((l_discount >= 0.08) AND (l_discount <= 0.10) AND
(l_quantity < 24::numeric))
-> Bitmap Index Scan on idx_l_shipdatelineitem000
(cost=0.00..375863.67 rows=17925026 width=0) (actual
time=375289.456..375289.456 rows=18211743 loops=1)
Index Cond: ((l_shipdate >= _1995-01-01_::date) AND
(l_shipdate < _1996-01-01 00:00:00_::timestamp without time
zone))Total runtime: 476034.306 ms
------------------ SEM USO DE ÍNDICE ----------------------------
Plano Execucao:Aggregate (cost=5255958.99..5255959.00 rows=1
width=16) (actual time=55051.051..55051.051 rows=1 loops=1)
-> Seq Scan on lineitem (cost=0.00..5250433.68 rows=2210122
width=16) (actual time=0.394..52236.276 rows=2282333 loops=1)
Filter: ((l_shipdate >= _1995-01-01_::date) AND (l_shipdate <
_1996-01-01 00:00:00_::timestamp without time zone)
AND (l_discount >= 0.08) AND (l_discount <= 0.10) AND
(l_quantity < 24::numeric))Total runtime: 55051.380 ms
Any tip is welcome !!