What are the statements placed before strings?

0

In python, I noticed that these "statements" were placed in two cases, first, in strings before passing through a hash algorithm, eg:

  

import hashlib

     

m = hashlib.sha256 ()

     

m.update (b "Nobody inspects")

Note: The reference I am referring to is 'b' before 'Nobody inspects'

And second before the strings pass through the 'encode':

  

plaintext = 'last'

     

encodedtext = plaintext.encode ('utf-8')

Note: This time indication is 'u' before 'algorithm'

I would like to know what they mean, because in python to use a print or something similar it is not necessary (at least as far as I know) to use these "indications", except in cases where the hashlib library is used, for example .

    
asked by anonymous 03.09.2017 / 19:32

2 answers

1

Enter a python 2.7 terminal and do:

str1 = 'teste'
str2 = 'teste'

Now look at the two types:

type(str1)
str

type(str2)
unicode

Python 2 needs u' to indicate that the string is unicode (to work with our accent, for example), if you do the same test in python 3, you will realize that this u' has become unnecessary.

Now go back to the terminal (can be python 2 or 3) do:

str1 = 'string1'
str2 = b'string1'

Now let's look at the types of these strings:

print (type(str1))
<class 'str'>

print (type(str2))
<class 'bytes'> 

See this note for the paper:

  

Attention: While strings are strings (represented by strings of size 1), bytes and bytearray objects are sequences of integers (0 to 255), representing the ASCII values of each byte. This means that for a byte or bytearray object, b [0] will return an integer. View the full text here.

The b is to indicate that the object is of type bytes.

Now return to the terminal and do:

str1 = 'Linha 1\nLinha 2'
str2 = r'Linha 1\nLinha 2'

Now look at the difference of the two when running print on them:

print (str1)
Linha 1
Linha 2

print (str2)
Linha 1\nLinha 2

The r (of raw) indicates that it is a "raw" string, any escape character will be deprecated.

    
03.09.2017 / 20:10
2

They are indicative of what string string will be placed there, each has a different characteristic in which the interpreter / interpreter understands differently and decides what to do. You have lexical analysis documentation that is done in the code.

In the case of u indicates that the string is encoded as UTF-8 a> and b as bytes without specific encoding (uses ASCII), in which case it will not be treated as a str type. You can still use r together to indicate that it is a raw text and special characters are not treated in a special way.

    
03.09.2017 / 20:00