How to count the number of characters of the word that came from the first line of a text file?

8

Below is an example of how to count the number of characters in a string:

$palavra ="coisa";
echo strlen($palavra); //retorna o número 5

However, I'm getting this word from a text file and strlen is not working, see:

$f = fopen("palavras.txt", "r");
echo fgets($f); // Até aqui funciona: ecoa "casa".
echo strlen($f); //A primeira palavra do arquivo de texto é "casa", 
                //  mas o echo não ecoa 4.

I tried to do it this way too and it did not work:

$f = fopen("palavras.txt", "r");
$palavra = fgets($f); 
echo strlen($palavra); // Está ecoando 6 que não corresponde aos 4 caracteres da       
                      //palavra "casa".

NOTE: The file currently contains 3 words , each in a line. But I intend to put more words.

I started to do it in the form below, but it is still not returning 4 characters for the house, it is always returning 3 characters more than the word I put in the first line in the file:

 $f = fopen("palavras.txt", "r");
 $palavra = fgets($f);
 echo strlen(trim($palavra));

ADDED ON 08/25/2014

As each word is an array of characters I've been trying to print on the screen to check if it printed something more than the four letters of the word "house", I discovered that the word house is in:

echo $palavra[3];
echo $palavra[4];
echo $palavra[5];
echo $palavra[6];

What's in the 0, 1, and 2? I made a for print all and the top three positions appear on the screen as diamonds with a question mark

I have set both the html goal and the save time file to for utf-8.

I've tried utf8_decode and nothing.

I figured if I always took 3 characters out of the result it would solve my problem I went searching and found this satckoverflow question in English: link

One face does just that, but another also warns that discarding the BOM is not a good idea, because even if one hour the BOM is not set I would not be counting 3 characters of my word. I do not want to play tricks. I want to understand.

Look at my final code working:

//Nesse arquivo na primeira linha tenho somente a palavra "casa"
$f = fopen("palavras.txt","r");
$palavra = fgets($f);
$car= strlen(trim($palavra)) - 3;
echo $car;
//Com o código acima retorno o valor 4, sem o (-3) retorna 7.

Will Ma be suitable?

** RESOLVED! SAVING WITHOUT THE BLESSED "GOOD"! NO NOTEPAD ++ HAS TO GO IN

  

ENCODING

BECAUSE NO WINDOWS NOTEPAD DOES NOT HAVE THAT OPTION. **

Thank you all! The answer from @Jader is very useful and I'm sure I'll use it, but according to the question if someone else in the forum needs this information @bfavaretto put everything.

    
asked by anonymous 24.08.2014 / 16:56

2 answers

8

The variable $f represents the file. The fgets($f) command reads the next line of the file (in its example, the first). So it does not make sense to try to measure $f , you need to measure fgets($f) :

$f = fopen("palavras.txt", "r");
$linha = fgets($f);
echo strlen($linha);

As pointed out by @mgibsonbr, the return of fgets includes the line break. In a file containing only the word "home" and formatted in Windows, this means casa\r\n , that is, the carriage return and the line break count and the length gives 6 . You can use trim to remove these characters (it removes whitespace at the beginning and end of the string, including line breaks and tabs):

echo strlen(trim($linha));

Another important detail: In UTF-8 encoding, certain characters, such as accented ones, will occupy more than one byte, and PHP will incorrectly measure length in these cases. To resolve, you'll need to use the mb_strlen function:

echo mb_strlen(trim($linha));

To read all the lines, just use a loop. Putting it all together, it looks like this:

$f = fopen("palavras.txt", "r");
while($linha = fgets($f) !== false) {
    echo $linha . ' - ' . mb_strlen(trim($linha)) . '<br>';
}

When PHP is returning 3 more characters in the count of the first line, everything indicates that your TXT file is UTF-8 encoded with BOM (byte order mask). You need to change the encoding to UTF-8 without BOM. The way to do this depends on the editor, usually in the save dialog itself, or a separate encoding option.

    
24.08.2014 / 17:00
5

To get through all the words you have to do something like this:

$texto = file_get_contents('teste.txt');

$palavras = preg_split('/[\s\r\n\t[:punct:]]+/', $texto, -1, PREG_SPLIT_NO_EMPTY);

$tamanhos = array();
foreach($palavras as $palavra) $tamanhos[] = strlen($palavra);

for ($i = 0; $i < sizeof($palavras); $i++) {
    echo  $i . '.) "' . $palavras[$i] . '"  - ' . $tamanhos[$i] . '<br>';
}

test.txt

Casa grande é outra coisa!
Mas, custa caro para manter...

Result:

0.) "Casa" - 4
1.) "grande" - 6
2.) "é" - 1
3.) "outra" - 5
4.) "coisa" - 5
5.) "Mas" - 3
6.) "custa" - 5
7.) "caro" - 4
8.) "para" - 4
9.) "manter" - 6

To get a word randomly use rand() like this:

$r = rand(0,sizeof($palavras)-1);

echo 'Palavra aleatoria: ' . $palavras[$r] . ' - ' . $tamanhos[$r];

// Resultado:
// Palavra aleatoria: coisa - 5 
    
24.08.2014 / 18:27