I had a similar problem here in my company. Basically, here we use two ways to identify names with typos.
The first is the levenshtein distance and the second is the soundex () function.
LEVENSHTEIN
Paraphrasing Wikipedia "Levenshtein distance or edit distance between two strings is given by the minimum number of operations required to transform a string into the other."
For example, the distance between "Guilherme Silva" and "Guilherme da Silva" is 3. The distance between "Maria Dores" and "Maria das Dores" is 4.
Here is the code to create the levenshtein () function in Mysql.
DELIMITER $$
CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END$$
DELIMITER ;
To use, in your case, you can make the following query:
SELECT * FROM usuario as us WHERE levenshtein(us.nome, 'Wallace Silva') < 5 //Ou outro indice. Coloquei 5 para que ele pegue somente nomes que tenham no maximo indice 4 na distancia levenshtein.
If you have the following records in the database
--------------------
1 Wallace Silva
2 Wallace João da Silva
3 Wallace das Silva
4 Guilherme da Silva
--------------------
The query will return, in this case, only records 1 and 2.
SOUNDEX
The soundex () function is a function that can be used in database searches where you know the pronunciation but not exactly how it is written.
Soundex () is already native to Mysql and we do not need to create it manually.
In your case, the query looks something like this:
SELECT * FROM pessoa WHERE soundex(nome) = soundex('Wallace Silva')
The problem with using soundex () is that it works best with English words and does not work 100% in English.
COMBINED MODE
We can also create a query in which we combine the two forms of search so that we can have a greater probability of success.
We can use it like this:
SELECT * FROM pessoa WHERE levenshtein(soundex(nome), soundex('Wallace Silva')) < 3
Or even so:
SELECT * FROM pessoa WHERE (levenshtein(nome, 'Wallace Silva') < 5) OR soundex(nome) = soundex('Wallace Silva')
SOME CONSIDERATIONS
- When we use more functions, we need more processing on the machine, that is, if we create a query 'little monster', it may take a while to run
- The two functions have some bad points, I recommend you study them well and test VERY much before implementing some code
- So far, these were the most effective ways to find similar names directly in the SQL query that I found and that served me. It does not mean that there are no better ones, it costs nothing to give a Googleada;)
At last, I hope I have helped! Good profit!