How to sort by results more similar to the last parameter?

1

I have a table in my system that saves all airports and in a select autocomplete I have this list where I can search.

The problem is that I can search for " nome ", " cidade ", " país " and " IATA " and require retornar em ordem de mais similar found airports.

For example:

My parameter is " GRU " and the function returns this:

{
    "airports": [
        {
            "id": 1059,
            "iata": "BPA",
            "full_name": "Grumman, Bethpage, Bethpage, (BPA)"
        },
        {
            "id": 3410,
            "iata": "GUU",
            "full_name": "Bakki, Grundarfjordur, Grundarfjordur, (GUU)"
        },
        {
            "id": 8083,
            "iata": "GRU",
            "full_name": "Guarulhos, São Paulo, São Paulo, (GRU)"
        }
    ],
}

As you can see, the return is right, but I, as a user, researched IATA and the function returned everything that looked like what I researched.

The code is as follows:

$str_where = "CITEXT(
                            TRANSLATE(
                                TRIM(
                                    name || ', ' || city || ', ' || country || ', ' || '(' || iata || ')' 
                                ) , 
                                'áàâãäåaaaÁÂÃÄÅAAAÀéèêëeeeeeEEEÉEEÈìíîïìiiiÌÍÎÏÌIIIóôõöoooòÒÓÔÕÖOOOùúûüuuuuÙÚÛÜUUUUçÇñÑýÝ',
                                'aaaaaaaaaAAAAAAAAAeeeeeeeeeEEEEEEEiiiiiiiiIIIIIIIIooooooooOOOOOOOOuuuuuuuuUUUUUUUUcCnNyY'
                            )
                        )";

            $airports = $connection
                    ->newQuery()
                    ->select(['id', 'iata'])
                    ->select(["CITEXT(name || ', ' || city || ', ' || country || ', ' || '(' || iata || ')' ) AS full_name"])
                    ->from('white_label_airports')
                    ->where([$str_where . ' LIKE' => '%' . \Cake\Utility\Text::transliterate($this->request->data('term')) . '%'])
                    ->limit(20)
                    ->execute()
                    ->fetchAll('assoc');

I would like to know if I can sort by the most similar of the answer of the server and if somebody has any better idea.

Thank you in advance!

    
asked by anonymous 03.10.2017 / 20:19

1 answer

0

To sort this array you need a similarity index that tells you that Guarulhos is the "best answer" of the three, something that has to be returned by the server who told you that these are the most similar results ...

Using pg_trgm

With PostgreSQL I would use the pg_trgm module, which returns you such an index and allows you to sort with ORDER BY direct in the query, saving any extra work in PHP. For example, I created an "airports" table with the three records you showed:

create table aeroportos (id int not null, iata text not null, full_name text not null);
insert into aeroportos values 
(1059, 'BPA', 'Grumman, Bethpager, Bethpage, (BPA)'),
(3419, 'GUU', 'Bakki, Grundarfjorfur, Grundarfjordur, (GUU)'),
(8083, 'GRU', 'Guarulhos, São Paulo, (GRU)');

Here we can use the word_similarity() function, which returns the degree of similarity between a string and the word most like it in another string. The greater the value in the column grau (from 0 to 1), the greater the similarity between the strings. Note that I concatenated the columns iata and full_name to compare to "GRU" as one thing.

select *, word_similarity(iata||full_name, 'GRU') as grau 
from aeroportos 
order by grau desc;

  id  | iata |                  full_name                   |   grau   
------+------+----------------------------------------------+----------
 8083 | GRU  | Guarulhos, São Paulo, (GRU)                  | 0.166667
 3419 | GUU  | Bakki, Grundarfjordur, Grundarfjordur, (GUU) |    0.125
 1059 | BPA  | Grumman, Bethpager, Bethpage, (BPA)          | 0.047619
(3 rows)

Detailing a little more ...

As your table certainly has more than three records, it is also interesting to use similarity operators to filter only those that look good enough with your search, otherwise every query will return the entire list of airports, most with similarity '0.0'. You can simply use LIMIT and cut in the amount of records that are most convenient to you, but you can also use other PostgreSQL features to filter by similarity, which will give you much more performance for large volumes of data in addition to using GIST indexes.

For this we can use the operator (which is also part of the package pg_trgm) <% in the WHERE clause. This operator returns true if the first parameter is a string containing word similar to the second parameter, as long as the degree of word similarity is greater than the minimum limit specified by the pg_trgm.word_similarity_threshold option. As we have already noted in the previous query, the general degree of similarity of our comparisons is generally low, since the "right answer" to GRU has a score of only 0.16. So we must lower the minimum limiter so our WHERE does not return zero records. The smaller the pg_trgm.word_similarity_threshold , the more records will be returned:

set pg_trgm.word_similarity_threshold to 0.1;

select *, word_similarity(iata||full_name, 'GRU') as grau 
from aeroportos 
where (iata||full_name) <% 'GRU' 
order by grau desc;

  id  | iata |                  full_name                   |   grau   
------+------+----------------------------------------------+----------
 8083 | GRU  | Guarulhos, São Paulo, (GRU)                  | 0.166667
 3419 | GUU  | Bakki, Grundarfjordur, Grundarfjordur, (GUU) |    0.125
(2 rows)

Defined all this course of action, in PHP the thing is simple. Just send SET to the minimum similarity limit along with the query:

$query = "set pg_trgm.word_similarity_threshold to 0.1; select *, word_similarity(iata||full_name, 'GRU') as grau from aeroportos where (iata||full_name) <% 'GRU' order by grau desc;";
$res = pg_query($conn, $query);

Or, if you do not want to spend the limiter every time, you can even fix it as the property of the user of your application:

alter user nunks set pg_trgm.word_similarity_threshold = 0.1;
select usename, useconfig from pg_user where usename = 'nunks';
 usename |                useconfig                
---------+-----------------------------------------
 nunks   | {pg_trgm.word_similarity_threshold=0.1}
(1 row)

Installing the pg_trgm extension

If you do not have pg_trgm installed, use the command CREATE EXTENSION :

> create extension pg_trgm;
CREATE EXTENSION

If the command fails you probably need to install the postgresql-contrib extension package, which can be either compiled or installed from the package manager of your choice. In Windows, the graphical installer gives you option of which modules to contrib install.

In addition to the word_similarity() function we use here, the pg_trgm extension offers other associated features, such as GIN indexes and GIST capable of greatly streamlining the search response by similarity in large volumes of data, special operators and methods for fine-tuning similarity limit values.

    
05.10.2017 / 17:24