Identifying snippets in two PHP strings

7

I need to make a non-standard string comparison in PHP. I have 2 strings as below:

$primeira = 'asdasdasdTESTEasdasdasdasd';

$segunda = 'lkijlikjTESTEilkjik';

How do I dynamically know if the first and second variables contain the same sequence of characters? In this case exemplified by the sequence of "TEST" characters.

    
asked by anonymous 25.09.2014 / 20:39

4 answers

7

I created a function that compares string segments and returns the same words in an array:

function palavras_iguais($string1, $string2, $minlen = 5) {
    $strlen1 = strlen($string1);
    $strlen2 = strlen($string2);
    $palavras = array();
    for($i=0; $i < $strlen1; $i++) {
        $palavra = substr($string1, $i, $minlen);
        if (strpos($string2, $palavra) !== false) {
            $j = $minlen;
            do {
                $j++;
            } while (strpos($string2, substr($string1, $i, $j)) !== false && $j < $strlen2);
            $palavra = substr($string1, $i, $j-1);
            $i += strlen($palavra)-1;
            $palavras[] = $palavra;
        }
    }
    return $palavras;   
}

Test 1:

$primeira = 'asdasdasdTESTEasdasdasdasd';
$segunda = 'lkijlikjTESTEilkjik';

print_r( palavras_iguais($primeira, $segunda) );

// Retorno:

Array
(
    [0] => TESTE
)

Test 2:

$primeira = 'asdFINALasdasdTESTEaTESTE2sdasdasdasdTESTENOFINAL';
$segunda = 'lkiTESTE2jlikjTESTEilkjTESikTESTENOFINALjhfdgkFINAL';

print_r( palavras_iguais($primeira, $segunda) );

// Retorno:

Array
(
    [0] => FINAL
    [1] => TESTE
    [2] => TESTE2
    [3] => TESTENOFINAL
)

Test 3:

$primeira = 'asdaTSCsdasdTESTEasdasdasdasd';
$segunda = 'lkijlikjTESTEilkjTSCik';

print_r( palavras_iguais($primeira, $segunda, 3) );

// Retorno:

Array
(
    [0] => TSC
    [1] => TESTE
)
    
27.09.2014 / 21:02
4

I thought of an approach a bit different from the others. I wanted to avoid nested loops, but I have not tested whether this has a positive impact on performance. It works like this:

  • Creates an array of character sets from the first string. For example, with $minlen=2 , the "abcde" string is divided into ["ab", "bc", "cd", "de"] .
  • Checks whether each pair occurs in the second string. If they occur next, consider a single word (for example, if the second string contains "abc" , the first two pairs are found in sequence).

I find it easier to understand in code form:

function matchingSubstrings($str1, $str2, $minlen=2) {
    $grupos = [];
    for($i=1; $i<strlen($str1); $i++) {
        array_push($grupos, substr($str1, $i-1, $minlen));
    }

    $palavras = [];
    $temp = '';
    $i = 0;
    $j = 0;

    do {
        if($k = strpos($str2, $grupos[$i], $j) !== false) {
            $j += $k;
            $temp .= $temp === '' ? $grupos[$i] : substr($grupos[$i], -1);
        } else {
            if($temp !== '') array_push($palavras, $temp); 
            $temp = '';
            $j = 0;
        }
        $i++;
    } while($i<count($grupos));

    return $palavras;
}

A test with repetitions:

matchingSubstrings('nnnabcnnnabcnnn', 'kkkabckkkabc');

Return:

Array
(
    [0] => abc
    [1] => abc
)

If the repetition is not desired on return, just change the last line of the function to return array_unique($palavras); .

This function also worked with the tests of the Jader answer (the output was identical).

Demo no ideone

    
28.09.2014 / 01:32
3

I do not think there's a native php function that does this.

I found a solution on google that Solve what you need.

function longest_common_substring($words)
{
    $words = array_map('strtolower', array_map('trim', $words));
    $sort_by_strlen = create_function('$a, $b', 'if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;');
    usort($words, $sort_by_strlen);

    // We have to assume that each string has something in common with the first
    // string (post sort), we just need to figure out what the longest common
    // string is. If any string DOES NOT have something in common with the first
    // string, return false.
    $longest_common_substring = array();
    $shortest_string = str_split(array_shift($words));
    while (sizeof($shortest_string)) {
        array_unshift($longest_common_substring, '');
        foreach ($shortest_string as $ci => $char) {
            foreach ($words as $wi => $word) {
                if (!strstr($word, $longest_common_substring[0] . $char)) {
                    // No match
                    break 2;
                }
            }

            // we found the current char in each word, so add it to the first longest_common_substring element,
            // then start checking again using the next char as well
            $longest_common_substring[0].= $char;
        }

        // We've finished looping through the entire shortest_string.
        // Remove the first char and start all over. Do this until there are no more
        // chars to search on.
        array_shift($shortest_string);
    }

    // If we made it here then we've run through everything
    usort($longest_common_substring, $sort_by_strlen);
    return array_pop($longest_common_substring);
}

This solution returns the largest set of similar characters from within an array of strings.

The implementation is very simple:

$primeira = 'asdasdasdTEStEasdasdasdasd';
$segunda = 'lkijlikjTESTEilkjik';

echo longest_common_substring([$primeira, $segunda]);
    
25.09.2014 / 20:43
2

You can use the strpos function. For more details click here

if (strpos($primeira, $segunda) !== false)
    echo 'true';
    
25.09.2014 / 20:45