Save words with python accents

0

I have this json file:

{"certa": 1, "vez": 7, "quando": 13, "tinha": 6, "seis": 7, "anos": 6, "vi": 4, "num": 4, "livro": 3, "sobre": 6, "a": 47, "floresta": 1, "virgem": 1, "hist\u00e3\u00b3rias": 1, "vividas": 1, "uma": 31, "imponente": 1, "gravura": 1, ... }

The above file data is saved as follows:

    with open(nameFileJson + '.json', 'w') as arq:
        json.dump(data, arq)

Where the file name is given for the variable nameFileJson and data is a string with the text that will be processed to count the number of words to be added to the json file. That is, we will have a dictionary of words and frequencies. This part does it right.

I read the json file like this:

with open(nomeFile + '.json') as json_data:
    dicContadores = json.load(json_data)
    json_data.close()

return dicContadores

I need words to continue to be accentuated. How do I resolve this?

    
asked by anonymous 18.12.2018 / 23:04

1 answer

2

The Python JSON module encodes text using "ensure_ascii" by default - this causes all accented characters to be encoded in "\uXXXX" form.

In order for the functions of the json module in Python to write their own letters instead of using this escape sequence, simply pass the ensure_ascii=False parameter to them.

That is, in your code, switch

json.dump(data, arq)

by:

json.dump(data, arq, ensure_ascii=False)

The text will be saved in the utf-8 encoding (remember that by default, programs in the windows environment may try to open the text as if it were latin1 - if the accents appear incorrectly, the best thing to do is change the configuring these programs to interpret text as utf-8, and do not mess with the utf-8 encoding of JSON, which is the default for this type of file)

    
20.12.2018 / 13:09