Text Comparison

2

I'm having a question while comparing variables.

I get a variable value in a string and need to compare it with another string .

For example:

$var1 = "M. D. AQUI";
$var2 = "MD AQUI"; // COM PONTUAÇÃO OU SEM PONTUAÇÃO. COM ESPAÇOS OU SEM ESPAÇOS.

Well, I tried to make a replace in the variable, changing the dots for nothing else, the space continues. I can not take the space because the text will be all together.

$result = str_replace(". ", "", $var1); // resultado: MDAQUI  / Com isso não consigo fazer a comparação de semelhanças.

Could anyone help with the code or indicate a study tool?

    
asked by anonymous 21.11.2014 / 12:45

2 answers

1

What you have to do is remove all the spaces or points from TWO strings:

$var1 = str_replace(".", "", $var1);
$var1 = str_replace(" ", "", $var1);
$var2 = str_replace(".", "", $var2);
$var2 = str_replace(" ", "", $var2);

$var1==$var2  (true)

If you want to make a comparison of similarities as you said in the code you can use the similar_text :

$var1 = strtoupper("M. D. AQUI");
$var2 = strtoupper("MD AQUI");

similar_text($var1, $var2, $percentagemDeSemelhanca);
echo $percentagemDeSemelhanca;

//resultado => 82.3529411765

So you will know the percentage of similarity of the two strings. I used strtoupper to increase the likelihood of similarity between strings in case they are not in uppercase.

PHPFiddle Example

    
21.11.2014 / 12:51
0

A slightly different approach, which does not allow depend of similar_text () , enabling its use, by removing the dots and spaces regularly and conditionally.

For this approach, the ideal would be to use preg_replace_callback () but with two preg_replace () consecutive to ER gets cleaner:

$var1 = "M. D. AQUI";
$var2 = "MD AQUI";

$var1 = preg_replace( '/(\w)\.\s+(?!\w{2,})/', '$1', $var1 ); // MD. AQUI

$var1 = preg_replace( '/(\w)\.\s+(?=\w{2,})/', '$1 ', $var1 ); // MD AQUI

if( $var1 != $var2 ) {

    similar_text( $var1, $var2, $percentual );

    if( $percentual > 70 ) {

        // Strings similares, faz alguma coisa
    }

} else {

    // Strings iguais
}

The first substitution removes the dots and spaces of the individual letters if they are not followed by a word with more than one letter.

The second one does the same thing, but on the contrary. If the letter and dot are followed by a larger word, remove the dot, but add an extra space.

So it does not get "all stuck together."

This approach has the following advantages:

  • Manipulate only one of the strings, which is useful if the second comes from a fixed source that you can not or should not change
  • Does not require the use of similar text () because, at least in the exposed scenario, the strings become equal. If you are not and you want to rely on similar_text () as fallback , it decreases the probability that percentage returns a false-positive with a very low score .
21.11.2014 / 14:14