I'm doing a program that needs to get the words into a file that has a certain formatting. Everything is working correctly except that in the output file the accented characters are wrong. I even inserted two lines of debug to start the word found in the console, in which case the accented characters appear correctly. I believe there is some coding problem but I do not know what it is.
The output in the file looks like this: (weird things are the accented letters)
abdala
abdel-leader
abdelcdir
Findings
Adidas
Code:
while (<$in>){
if ($_ =~ /& .*/){ # Testa se a linha tem o formato de interesse.
my @linha = split (//, $_);
my $count = 0;
# Obtém o início da palavra.
while ($linha[$count] !~ /[a-zA-Záéíóúãẽĩõũâêîôûàèìòùäëïöü]/) { $count++; }
my $inicio = $count;
# Obtém o fim da palavra e calcula o tamanho.
while ($linha[$count] =~ /[a-zA-ZáéíóúãẽĩõũâêîôûàèìòùäëïöüÁÉÍÓÚÃẼĨÕŨÂÊÎÔÛÀÈÌÒÙÄËÏÖÜ]/) { $count++; }
my $tamanho = $count - $inicio;
# Obtém a palavra em caixa baixa e grava no arquivo.
my $palavra = lc (substr ($_, $inicio, $tamanho));
print $out "$palavra\n";
print $palavra; #DEBUG
print "\n"; #DEBUG
}
}