Query using order by with slowness

1

I have a query like this:

select ae_id from app_event
join trip_item on ti_id = ae_trip_item_id
where ae_driver_id = 63 order by ae_date desc limit 10;

The app_event table has about 3miles of data, and the trip_item table has about 1mile.

The return is about 40,000 data, and I believe that the problem is in sorting all of this data, in addition to join between two large tables. I used this to put an index in ae_date (column of order by ) , but it did not help much. How can I make a faster query?

    
asked by anonymous 28.06.2017 / 14:42

1 answer

0

First you created a scenario similar to the one you described: identical structure and a random data mass:

--
-- MODELO DA TABELA app_event
--
create table app_event
(
  ae_id serial primary key,
  ae_driver_id integer,
  ae_trip_item_id integer,
  ae_date timestamp not null
);

--
-- MODELO DA TABELA trip_item
--
create table trip_item
(
  ti_id serial primary key
);

--
-- INSERE 3 MILHOES DE REGISTROS NA TABELA app_event
--
INSERT INTO app_event( ae_id, ae_date, ae_driver_id, ae_trip_item_id )
SELECT
    generate_series( 1, 3000000 ),
    now() - ((random() * 3650) * '1 day'::interval),
    (random() * 100)::int,
    (random() * 100)::int;
--
-- INSERE 1 MILHAO DE REGISTROS NA TABELA trip_item
--
INSERT INTO trip_item( ti_id )
SELECT
    generate_series( 1, 1000000 );

Query under review:

SELECT
    ae_id
FROM
    app_event
JOIN
    trip_item ON (ti_id = ae_trip_item_id)
WHERE
    ae_driver_id = 63
ORDER BY
    ae_date DESC
LIMIT
    10;

The plan of your query looks something like this:

Limit  (cost=87050.55..87050.58 rows=10 width=12)
  ->  Sort  (cost=87050.55..87089.49 rows=15574 width=12)
        Sort Key: app_event.ae_date
        ->  Hash Join  (cost=28320.00..86714.00 rows=15574 width=12)
              Hash Cond: (app_event.ae_trip_item_id = trip_item.ti_id)
              ->  Seq Scan on app_event  (cost=0.00..58043.59 rows=15574 width=16)
                    Filter: (ae_driver_id = 63)
              ->  Hash  (cost=15045.00..15045.00 rows=1062000 width=4)
                    ->  Seq Scan on trip_item  (cost=0.00..15045.00 rows=1062000 width=4)

Notice that the plan analyzer points out two points of attention:

1) A Sequential Scan with a cost quite high in the app_event table because of a filtering by the ae_driver_id field

2) Other Sequential Scan with a cost quite high in the trip_item table because of the join condition using the ae_trip_item_id field.

A sequential search on a table with a large volume of data tends to be slow during read operations. That's why indexing is crucial to good performance.

Creating Indexes:

CREATE INDEX idx_ae_trip_item_id ON app_event (ae_trip_item_id);
CREATE INDEX idx_ae_driver_id ON app_event (ae_driver_id);

New query plan:

Limit  (cost=24050.07..24050.09 rows=10 width=12)
  ->  Sort  (cost=24050.07..24120.51 rows=28178 width=12)
        Sort Key: app_event.ae_date
        ->  Merge Join  (cost=23021.55..23441.15 rows=28178 width=12)
              Merge Cond: (trip_item.ti_id = app_event.ae_trip_item_id)
              ->  Index Only Scan using trip_item_pkey on trip_item  (cost=0.42..30408.42 rows=1000000 width=4)
              ->  Sort  (cost=23014.70..23085.14 rows=28178 width=16)
                    Sort Key: app_event.ae_trip_item_id
                    ->  Bitmap Heap Scan on app_event  (cost=530.81..20932.02 rows=28178 width=16)
                          Recheck Cond: (ae_driver_id = 63)
                          ->  Bitmap Index Scan on idx_ae_driver_id  (cost=0.00..523.76 rows=28178 width=0)
                                Index Cond: (ae_driver_id = 63)

The effect of index creation is the total absence of Sequential Scans in the query plane and a drastic reduction in cost of operations.

In practice, in my tests, before the creation of the indexes, the query took 450ms to execute, after creation of indexes the execution time is around 12ms .

I hope I have helped!

    
28.06.2017 / 16:39