Adding some nuances to the good response from @gmsantos ...
Metaphone for Portuguese names
In this question has been widely discussed the phonetic algorithm for Portuguese, which is more efficient than mathematical similarity of % quoted, or distances such as Hamming , # and others, which measure the similarity between any strings (even in genetics they use).
The question is in the direction of a more practical and already classic problem: grouping or equating proper names (street names, names of people, etc.). For example, "Joao"="joao", "Sylvia"="silvia", "Luíz"="luis", etc.
The experience of anyone who has ever worked ( documented in this article ) shows that the errors more frequent, spelling of names, has its origin in the spelling mistakes we make when we try to transcribe only what we hear. So focus on phonetics.
And the phonetics of Portuguese speakers are not phonetics of English speakers ... So the best solution is the best phonetic algorithm adapted to Portuguese ... And that exists!
This is the MetaphonePtBr .
(If you do not have access to install external functions on your server, generic difference
fault is also still higher than Metaphone
).
In PostgreSQL (8.X or 9.X), once installed just do
SELECT metaphone_ptbr('Sylvia')=metaphone_ptbr('sillvya');
-- retorna TRUE ('SV'=='SV')
SELECT metaphone_ptbr('Sylveira')=metaphone_ptbr('sillvya');
-- retona FALSE ('SVR'!='SV')
The great advantage of this method is that the comparison can be "cached", that is, part of the process can be stored before in the database (the metaphone of all names), so that the search for a given name , or grouping similar ones, is much faster than peer-to-peer evaluation by string similarity functions.
Since it allows the grouping, noma database with 1000 names for example, one can reduce the analysis to a group of 10 or 20 names, and on them to apply the more sophisticated functions (cost more CPU) of similarity of string .