A slightly different approach, which does not allow depend of similar_text () , enabling its use, by removing the dots and spaces regularly and conditionally.
For this approach, the ideal would be to use preg_replace_callback () but with two preg_replace () consecutive to ER gets cleaner:
$var1 = "M. D. AQUI";
$var2 = "MD AQUI";
$var1 = preg_replace( '/(\w)\.\s+(?!\w{2,})/', '$1', $var1 ); // MD. AQUI
$var1 = preg_replace( '/(\w)\.\s+(?=\w{2,})/', '$1 ', $var1 ); // MD AQUI
if( $var1 != $var2 ) {
similar_text( $var1, $var2, $percentual );
if( $percentual > 70 ) {
// Strings similares, faz alguma coisa
}
} else {
// Strings iguais
}
The first substitution removes the dots and spaces of the individual letters if they are not followed by a word with more than one letter.
The second one does the same thing, but on the contrary. If the letter and dot are followed by a larger word, remove the dot, but add an extra space.
So it does not get "all stuck together."
This approach has the following advantages:
- Manipulate only one of the strings, which is useful if the second comes from a fixed source that you can not or should not change
- Does not require the use of similar text () because, at least in the exposed scenario, the strings become equal. If you are not and you want to rely on similar_text () as fallback , it decreases the probability that percentage returns a false-positive with a very low score .