Check words that appear in foreach

1

I have the following code:

    <?PHP
$texto1 = file_get_contents('cot.txt');
//adiciona o texto em posições do array
preg_match_all('|texto.\d+(.+?)<\/body>|is', $texto1, $resultado);
$textos = $resultado[1];
    $arrayCot =  explode(" ", file_get_contents('PalavrasCot.txt'));
    $arra = $textos[$n];
            foreach($arrayCot as $valor){
                if (strpos($arra, $valor) !== false) {
                    $contCot++;
                    $ArzCot[$n] = $contCot;
                }
            }
    ?>

In this code, it reads a text file, separates the content, and checks if the words exist in a separate file (in the code $arrayCot ). My doubt is: How to show, how many times each word in $arrayCot has appeared. Text in PalavrasCot.txt :

parque. parque, parque brincadeiras. brincadeiras, brincadeiras mães mães, mães. filho, filho. filho acidente. acidente, acidente venda venda, venda. família família natureza, natureza. natureza carro. carro, carro crianças, crianças. crianças escola, escola. escola
    
asked by anonymous 08.11.2016 / 22:43

1 answer

2

Let's start with the fact that the original text is in $texto and the word list in $palavras , just to simplify reading.

A relatively simple algorithm is this:

$aTexto = explode( ' ', $texto );
$aPalavras = explode( ' ', $palavras );
$contagem = array();

foreach( $aTexto as $pTexto ) {
    if( in_array( $pTexto, $aPalavras ) ) {
        $contagem[$pTexto] = isset( $contagem[$pTexto] ) ? $contagem[$pTexto] + 1 : 1;
    }
}

See working at IDEONE .

To find the words in the array $aPalavras , we use in_array :

  

link


Considerations

Get away from the question a little, but it's important to look at a few things. The code would need some improvements for use in real-world situations.

  • There are no space and line breaks. Probably before explode would help to normalize double spaces, tabs, and line breaks to simple spaces.

  • Your list of words depends on repetition with commas and periods, which causes two problems: one of them is that the count separates into each group. Probably in a real situation, the list would count only words, and the algorithm would take the semicolon (and any more characters it needs to remove) before searching. It would suffice for this change:

    if( in_array( rtrim( $pTexto, '.,;!?' ), $aPalavras ) ) {
    

    So you eliminate the need to have the words repeated in the listing.

  • The upper and lower case letters do not work in your original proposal. The solution would be, for example, to register all lowercase words in the search dictionary, and use this function to normalize in the text:

    if( in_array( mb_strtolower( $pTexto ) ), $aPalavras ) ) {
    

    Note that in this case, charset of PHP needs to be correctly configured for the format of the files, otherwise you will have problems with accentuation.

  • Finally, in an actual application you would not normally load the whole text into memory as it does today. You could just read the text in blocks, and as you find spaces, already do the counting. In this way, you do not duplicate data in memory (keeping the array and the original text unnecessarily until you get a result).

08.11.2016 / 23:07