Improve search performance when there is no record

1

I have a database with 9551011 rows. It contains addresses from all over the country.

When I make a query for a certain place, and that address exists in the database, I get a return in a maximum of 2 seconds. The problem is that this bank is used in an application where the person has to upload a file with multiple addresses, and many of these addresses do not exist in the bank. When an address does not exist, it takes a LOT in that search ... Is there any way to improve this? I have tried with some classic performance forms in the Postgresql world like the Vacuum, for example. But I have not been successful so far.

Columns in the table: id , gid , nome , num_inicio_esquerda , num_fim_esquerda , num_inicio_direita , num_fim_direita , cep_esquerda , cep_direita , bairro_esquerda , bairro_direita , nivel_detalhamento , estado_nome , estado_sigla , cidade , latitude_inicio , latitude_fim , longitude_inicio .

  • Index in column name.

Is it possible to make the bank not delay to give the answer when a record is not found?

Example of a query in the address table:

select
      gid,
      bairro_esquerda,
      cep_direita,
      cep_esquerda,
      cidade,
      estado_nome,
      estado_sigla,
      latitude_fim,
      latitude_inicio,
      longitude_fim,
      longitude_inicio,
      nivel_detalhamento,
      nome,
      num_fim_direita,
      num_fim_esquerda,
      num_inicio_direita,
      num_inicio_esquerda 
  from
      enderecos
  where
      nome like 'avenida treze de maio'
      and (
          estado_sigla like 'CE'
      ) 
      and (
          lower(unaccent(cidade))='fortaleza'
      ) 
      and (
          cast(num_inicio_esquerda as integer)<=1116 
          and cast(num_fim_esquerda as integer)>=1116 
          or cast(num_fim_esquerda as integer)<=1116 
          and cast(num_inicio_esquerda as integer)>=1116 
          or cast(num_inicio_direita as integer)<=1116 
          and cast(num_fim_direita as integer)>=1116 
          or cast(num_fim_direita as integer)<=1116 
          and cast(num_inicio_direita as integer)>=1116
      ) limit 1
    
asked by anonymous 01.09.2016 / 18:38

1 answer

2

It is not recommended to use the LIKE clause for textual queries on tables with a large volume of data.

It is recommended to use a Postgres feature called FTS (Full Text Search) .

Here's a step-by-step guide on how you can use it to improve your query performance:

1 - Create an auxiliary column sv_nome of type tsvector in table enderecos :

ALTER TABLE enderecos ADD COLUMN sv_nome tsvector;

2 - Create a TRIGGER that is fired every INSERT and UPDATE in the enderecos table, which will be able to keep the auxiliary column always cohesive:

CREATE FUNCTION fc_enderecos() RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        new.sv_nome = to_tsvector('portuguese', COALESCE(NEW.nome, ''));
    END IF;

    IF TG_OP = 'UPDATE' THEN
        IF NEW.nome <> OLD.nome THEN
            new.sv_nome = to_tsvector('portuguese', COALESCE(NEW.nome, ''));
        END IF;
    END IF;

    RETURN NEW;
END
$$
LANGUAGE 'plpgsql';

CREATE TRIGGER trg_enderecos BEFORE INSERT OR UPDATE ON enderecos FOR EACH ROW EXECUTE PROCEDURE fc_enderecos();

3 - Creation of INDEX in the helper column created in step 1:

CREATE INDEX idx_enderecos_nome ON enderecos USING gin( to_tsvector('portuguese'::regconfig, COALESCE((sv_nome)::text, ''::text)));

4 - Update the data already in the table:

UPDATE enderecos SET sv_nome = to_tsvector('portuguese', COALESCE(nome, ''));

5 - And finally, your query can be done as follows:

SELECT
    gid,
    bairro_esquerda,
    cep_direita,
    cep_esquerda,
    cidade,
    estado_nome,
    estado_sigla,
    latitude_fim,
    latitude_inicio,
    longitude_fim,
    longitude_inicio,
    nivel_detalhamento,
    nome,
    num_fim_direita,
    num_fim_esquerda,
    num_inicio_direita,
    num_inicio_esquerda 
FROM
    enderecos
WHERE
    sv_nome @@ public.to_tsquery('portuguese', 'avenida&treze&de&maio' ) AND
    and (
          estado_sigla = 'CE'
      ) 
      and (
          lower(unaccent(cidade))='fortaleza'
      ) 
      and (
          cast(num_inicio_esquerda as integer)<=1116 
          and cast(num_fim_esquerda as integer)>=1116 
          or cast(num_fim_esquerda as integer)<=1116 
          and cast(num_inicio_esquerda as integer)>=1116 
          or cast(num_inicio_direita as integer)<=1116 
          and cast(num_fim_direita as integer)>=1116 
          or cast(num_fim_direita as integer)<=1116 
          and cast(num_inicio_direita as integer)>=1116
      ) limit 1

I hope I have helped!

    
06.09.2016 / 20:37