To sort this array you need a similarity index that tells you that Guarulhos is the "best answer" of the three, something that has to be returned by the server who told you that these are the most similar results ...
Using pg_trgm
With PostgreSQL I would use the pg_trgm module, which returns you such an index and allows you to sort with ORDER BY
direct in the query, saving any extra work in PHP. For example, I created an "airports" table with the three records you showed:
create table aeroportos (id int not null, iata text not null, full_name text not null);
insert into aeroportos values
(1059, 'BPA', 'Grumman, Bethpager, Bethpage, (BPA)'),
(3419, 'GUU', 'Bakki, Grundarfjorfur, Grundarfjordur, (GUU)'),
(8083, 'GRU', 'Guarulhos, São Paulo, (GRU)');
Here we can use the word_similarity()
function, which returns the degree of similarity between a string and the word most like it in another string. The greater the value in the column grau
(from 0 to 1), the greater the similarity between the strings.
Note that I concatenated the columns iata
and full_name
to compare to "GRU" as one thing.
select *, word_similarity(iata||full_name, 'GRU') as grau
from aeroportos
order by grau desc;
id | iata | full_name | grau
------+------+----------------------------------------------+----------
8083 | GRU | Guarulhos, São Paulo, (GRU) | 0.166667
3419 | GUU | Bakki, Grundarfjordur, Grundarfjordur, (GUU) | 0.125
1059 | BPA | Grumman, Bethpager, Bethpage, (BPA) | 0.047619
(3 rows)
Detailing a little more ...
As your table certainly has more than three records, it is also interesting to use similarity operators to filter only those that look good enough with your search, otherwise every query will return the entire list of airports, most with similarity '0.0'. You can simply use LIMIT and cut in the amount of records that are most convenient to you, but you can also use other PostgreSQL features to filter by similarity, which will give you much more performance for large volumes of data in addition to using GIST indexes.
For this we can use the operator (which is also part of the package pg_trgm) <%
in the WHERE
clause. This operator returns true
if the first parameter is a string containing word similar to the second parameter, as long as the degree of word similarity is greater than the minimum limit specified by the pg_trgm.word_similarity_threshold
option. As we have already noted in the previous query, the general degree of similarity of our comparisons is generally low, since the "right answer" to GRU has a score of only 0.16. So we must lower the minimum limiter so our WHERE
does not return zero records. The smaller the pg_trgm.word_similarity_threshold
, the more records will be returned:
set pg_trgm.word_similarity_threshold to 0.1;
select *, word_similarity(iata||full_name, 'GRU') as grau
from aeroportos
where (iata||full_name) <% 'GRU'
order by grau desc;
id | iata | full_name | grau
------+------+----------------------------------------------+----------
8083 | GRU | Guarulhos, São Paulo, (GRU) | 0.166667
3419 | GUU | Bakki, Grundarfjordur, Grundarfjordur, (GUU) | 0.125
(2 rows)
Defined all this course of action, in PHP the thing is simple. Just send SET
to the minimum similarity limit along with the query:
$query = "set pg_trgm.word_similarity_threshold to 0.1; select *, word_similarity(iata||full_name, 'GRU') as grau from aeroportos where (iata||full_name) <% 'GRU' order by grau desc;";
$res = pg_query($conn, $query);
Or, if you do not want to spend the limiter every time, you can even fix it as the property of the user of your application:
alter user nunks set pg_trgm.word_similarity_threshold = 0.1;
select usename, useconfig from pg_user where usename = 'nunks';
usename | useconfig
---------+-----------------------------------------
nunks | {pg_trgm.word_similarity_threshold=0.1}
(1 row)
Installing the pg_trgm extension
If you do not have pg_trgm installed, use the command CREATE EXTENSION
:
> create extension pg_trgm;
CREATE EXTENSION
If the command fails you probably need to install the postgresql-contrib extension package, which can be either compiled or installed from the package manager of your choice. In Windows, the graphical installer gives you option of which modules to contrib install.
In addition to the word_similarity()
function we use here, the pg_trgm
extension offers other associated features, such as GIN indexes and GIST capable of greatly streamlining the search response by similarity in large volumes of data, special operators and methods for fine-tuning similarity limit values.