How does the function int deal with the character \ n?

8

I've created a list:

usuarios = ['123\n','123\n4']

I tried to convert index 0 to integer using int()

int(usuarios[0])

Result:

123

But when I tried to do the same with index 1:

int(usuarios[1])

result:

  

ValueError: invalid literal for int () with base 10: '123 \ n4'

I would like to know all the int() rules, since I can not find them at least in Portuguese.

    
asked by anonymous 02.12.2018 / 20:43

2 answers

11

The rule is simple, it must be a string with a valid integer numeric value, that is, it does not have characters that hinder the correct understanding of its value, not even decimal point. A few characters are accepted after the numbers because they are considered neutral (usually white space, tabbing, line breaks, etc.).

When it is identified that the number can be interpreted in some other way, a ValueError exception will occur.

These examples work:

print(int('12\n'))
print(int('\n123'))
print(int('1234 '))
print(int(' 1235'))

These are not:

print(int('1236c'))
print(int('a1237'))
print(int('123 8'))

See running on ideone . And no Coding Ground . Also I put GitHub for future reference .

    
02.12.2018 / 21:42
7

According to official documentation :

  

class int ([x])

     

[...] If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in radix base. Optionally, the literal can be preceded by + or - (with no space in between) and surrounded by whitespace .

As the snippet highlights, the value of the parameter can be surrounded by whitespace. For practical purposes, a trim in the string will be performed before converting it to integer, thereby ignoring start and end blanks.

TL; DR

The information below is based on the official implementation of Python, known as CPython.

To confirm this information, you can review the Python implementation in C :

/* Parses an int from a bytestring. Leading and trailing whitespace will be
 * ignored.
 *
 * If successful, a PyLong object will be returned and 'pend' will be pointing
 * to the first unused byte unless it's NULL.
 *
 * If unsuccessful, NULL will be returned.
 */
PyObject *
PyLong_FromString(const char *str, char **pend, int base);

The value that you pass as a parameter in int will be the *s pointer. When analyzing the body of the function, you'll see right at the beginning (line 2226) there are:

while (*str != '
#define Py_ISSPACE(c)  (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_SPACE)

// pyctype.c

PY_CTF_SPACE, /* 0x9 '\t' */
PY_CTF_SPACE, /* 0xa '\n' */
PY_CTF_SPACE, /* 0xb '\v' */
PY_CTF_SPACE, /* 0xc '\f' */
PY_CTF_SPACE, /* 0xd '\r' */
' && Py_ISSPACE(Py_CHARMASK(*str))) { str++; }

That is, it traverses the string and if it is a white space it increments the pointer, causing the character to be ignored in later steps. It will be considered whitespace any character that Py_ISSPACE returns true.

>>> int('\t1')
1
>>> int('\n2')
2
>>> int('\v3')
3
>>> int('\f4')
4
>>> int('\r5')
5

That is, the characters \t , \n , \v , \f and \r will be disregarded in string .

scan = str;

# ...

while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base || *scan == '_') {
    # ...
}

Continuing to review the body of the function, we see the excerpt (line 2399):

>>> int('1_000')
1000
>>> int('1\n000')
...
ValueError: invalid literal for int() with base 10: '1\n000'

It assigns the input pointer str to scan and traverses it as long as the character is a valid digit, that is, less than the given base, or the _ character. Any character that does not satisfy these conditions will cause goto onError to be executed, ending the function with error. Therefore, within the number the character will be allowed _ only, but any other character, including the whitespace, will result in an error.

while (*str && Py_ISSPACE(Py_CHARMASK(*str))) {
    str++;
}

if (*str != '
/* Parses an int from a bytestring. Leading and trailing whitespace will be
 * ignored.
 *
 * If successful, a PyLong object will be returned and 'pend' will be pointing
 * to the first unused byte unless it's NULL.
 *
 * If unsuccessful, NULL will be returned.
 */
PyObject *
PyLong_FromString(const char *str, char **pend, int base);
') { goto onError; }

Finally, continuing the analysis of the function cup, we see again ( line 2535):

while (*str != '
#define Py_ISSPACE(c)  (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_SPACE)

// pyctype.c

PY_CTF_SPACE, /* 0x9 '\t' */
PY_CTF_SPACE, /* 0xa '\n' */
PY_CTF_SPACE, /* 0xb '\v' */
PY_CTF_SPACE, /* 0xc '\f' */
PY_CTF_SPACE, /* 0xd '\r' */
' && Py_ISSPACE(Py_CHARMASK(*str))) { str++; }

In a similar fashion as above, to ignore the white space at the beginning of string , the pointer is traversed by ignoring the trailing blanks. The condition of ending with ' ' ensures that the string ends with whitespace and not with other characters.

In short,

  • Any leading blank will be ignored ( '\t' , '\n' , '\v' , '\f' , '\r' , _ );
  • During string , any character other than a digit or _ will give error;
  • Any whitespace from the end will be ignored;
  • Any character that is not a digit or %code% will give error, except cases above (start and end spaces);
03.12.2018 / 12:50