Problems with accentuation - Python

Question

Problems with accentuation - Python

Navigation

#1 by (1 votes)

0

Hello, I have problems with accentuation in Python.

In the code I put this: # - - coding: UTF-8 - - But the accents are not recognized in cmd.

Follow the print for better understanding.

Thanks in advance!

python character-encoding utf-8 cmd console

asked by anonymous 20.11.2016 / 01:09

1 answer

Segmentation Fault (Core Dumped) Print Numbers Change src from an action-based image on the previous page [closed]

score 1 · Answer 1

When you have a little time, read this here . The title may scare a little - but it's the best introduction to accent and special characters I've seen.

That said, what happens is that: until about 30 years ago, computers were limited to displaying a maximum of 256 characters at a time. It is easy to realize that with so many languages and characters in the world, this does not even begin to take account of the communication needs we have.

Well, as a palliative, each country adopted a different table of 256 characters - preserving a common core of codes between 0 and 127 (this is called "ASCII"), and creating new maps for codes from 128 to 255.

Incidentally, the difernetes tables were not just for "country", but several tables appeared at different times in history in several countries. The Unicode consortium has eventually been set up - it standardizes all of these different tables, giving each a name - as well as putting encoding patterns that support more than 256 simultaneous characters - for example "utf-8".

In the case of Windows you have a bigger problem because programs in the normal Windows environment use a coding ( latin1 for windows in Portuguese), and programs running in CMD use other different encoding - (cp852). Therefore, a character that appears as 'È' in a programming editor may appear as '╚' when it is printed in the CMD.

The Python language from version 3 greatly improves the approach and simplifies the correct programming - in particular, it automatically treats all text in the code as "unicode text", which is independent of encoding (but still you you must leave the coding of your programming editor equal to the encoding that is marked in the first line of the Python code) - and automatically checks for the coding of the terminal when it finds a print or other output. So, its È character will appear right in cmd. I strongly recommend that you use Python3 if you are appending or starting a new project - this issue of text coding is the most important of the version change. (From your print, I assume you are using Python 2 - just because of the characters that appear.)

For Python 2, do so:

configure your editor to actually use UTF-8 in menus, in addition to coding declaration in the first line of your program.

Prepend all your strings with the letter u , with in: a = u"maçã" - this will make them unicode objects and not just a sequence of bytes. (this is the default behavior of Python3)

On each print, encode your text to the terminal's default encoding, calling the .encode(sys.stdout.encoding) method in your text. (import the sys module into your program). This behavior is also standard in Python3.

Example in Python2:

# coding: utf-8
import sys

coding = sys.stdout.encoding
a = "eu tenho uma maçã"
print a.encode(coding)

Example in Python3 (as long as your editor is set to utf-8):

print("Eu tenho uma maçã")

(not even the coding declaration is required in the .py file itself, when it is utf-8)