I have a series of files that appear to have been generated on different operating systems, since the character encoding of their names seems to vary between them.
There are names whose accents usually appear to me, both on OSX and Linux (with terminal configured in UTF-8 in both cases), while others have become strange. For example, there is a section where I see APRESENTAÇO_MAR_2015
where clearly the word should be APRESENTAÇÃO
.
Looking more carefully at the problematic section ÇO
, I found 5 values (in hexadecimal):
0xC2
0x80
0x43
0x327
0x4F
I tried to convert the string with iconv
, varying the input and output encodings, but I did not get the desired result ( ÇÃO
). How to find out the original coding of these names and correct them? I have many files with problem, and I would like to solve this programmatically.