What is "\ x" in strings in Python?

Question

What is "\ x" in strings in Python?

Navigation

#1 by (5 votes)
#2 by (2 votes)

3

I replied a question here on the site where the following string existed in the Python language:

"\xf7\x1a\xa6\xde\x8f\x17v\xa8\x03\x9d2\xb8\xa1V\xb2\xa9>\xddC\x9d\xc5\xdd\xceV\xd3\xb7\xa4\x05J\r\x08\xb0"

I imagine that this \x has some relation with some type of codification, but I'm not sure what it is.

What would this \x be in the string? Does it have to do with hex?

string python

asked by anonymous 10.08.2017 / 15:38

2 answers

Windows Forms Application - Visual Studio 2017 Only stylize if the element belongs to a class

score 5 · Answer 1

It is an escape sequence, it means that the following two characters are to be interpreted as hex-decimal digits for the interpretation of code of characters:

Try the terminal:

>>> 0x65
101

>>> "\x65"
'e'

0xHH It is used literally, that is, the hexa literal number, if used as a string ( "\xHH" ) is used for character representation.

score 2 · Answer 2

The "\ xHH" within a string indicates that the next two characters ("H") will be interpreted as hexadecimal digits, and thus is a way to represent any arbitrary byte within a string in Python .

Thus, b"\xff" will correspond to a string byte with a single byte of 255 value (ff in hexadecimal).

It is important to keep in mind that in Python 3, just like in unicode strings in Python 2, a byte of these would not necessarily correspond to a character. Because of the specific encoding used for text in Python 3, all bytes from 0 to 255 correspond to character encoding known as "latin1" - the same used in many versions of Windows for Brazilian Portuguese. This means that any arbitrary byte specified with the prefix "\xHH" will match a printable character of text in Python 3.

An interesting experiment can be to write numeric data in a binary file, read it as text and see how the representation appears:

In [23]: f = open("teste.bin", "wb")

In [24]: f.write(bytearray((0, 0, 255, 255, 128, 128)))
Out[24]: 6

In [25]: f.close()

In [26]: open("teste.bin", encoding="latin1").read()
Out[26]: '\x00\x00ÿÿ\x80\x80'

(In this case, the character Ÿ has the code 255 (0xff):) In [30]: print ("\ xff") ÿ

Similarly, in Python 3 (and Unicode strings of Python2), the prefix \u allows you to designate a unicode character direct by its codepoint value - to codepoints of up to 16 bits (four hexadecimal digits)

So, for example, the codepoint character 0x263A, which is the smiley face emoji, can be placed directly in Python source code:

In [42]: a = "\u263a"

In [43]: print(a)
☺

And for more "distant" characters, the prefix \U (uppercase "U") allows 8 hexadecimal digits - to express characters with a codepoint greater than 65535 (0xffff). The semantics of "\ xHH", "\ uHHHH" and "\ UHHHHHHHH" are the same.

Now, what may be interesting is that sometimes we get a "double-encoded" string - that is, the sequence of \xHH actually has four characters (for example, if we save a .txt file with the string \x41 - so that it is a 4-byte file). If we want to read the single character represented by the 0x41 (uppercase "A"), we have to do some maneuvering. For simplicity we can simply escape "\" itself by typing "\" in a Python string (always Python 3):

In [37]: a
Out[37]: 'A'

In [38]: a = "\x41"

In [39]: len(a)
Out[39]: 4

In [40]: a
Out[40]: '\x41'

That is - in this case we have the "\" as a separate character - not as a character that is combined with the "x" and the next two digits at compile time by Python. In order to "compile" this into a single byte we have to "decode" this text using the special "unicode_escape" codec. But, it's not that simple - you can not decode a text in Python 3, because it's already considered "decoded" - you need to have a byte-string to be able to call the decode method. Since our variable "a" is a string, the solution is to convert it first to bytes using the "encode" method - we use the "latin1" encoding that does not change any content value, as long as it is a lower-coded character that 255:

In [41]: a.encode("latin1").decode("unicode_escape")
Out[41]: 'A'