According to official documentation :
class int ([x])
[...] If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in radix base. Optionally, the literal can be preceded by + or - (with no space in between) and surrounded by whitespace .
As the snippet highlights, the value of the parameter can be surrounded by whitespace. For practical purposes, a trim in the string will be performed before converting it to integer, thereby ignoring start and end blanks.
TL; DR
The information below is based on the official implementation of Python, known as CPython.
To confirm this information, you can review the Python implementation in C :
/* Parses an int from a bytestring. Leading and trailing whitespace will be
* ignored.
*
* If successful, a PyLong object will be returned and 'pend' will be pointing
* to the first unused byte unless it's NULL.
*
* If unsuccessful, NULL will be returned.
*/
PyObject *
PyLong_FromString(const char *str, char **pend, int base);
The value that you pass as a parameter in int
will be the *s
pointer. When analyzing the body of the function, you'll see right at the beginning (line 2226) there are:
while (*str != '#define Py_ISSPACE(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_SPACE)
// pyctype.c
PY_CTF_SPACE, /* 0x9 '\t' */
PY_CTF_SPACE, /* 0xa '\n' */
PY_CTF_SPACE, /* 0xb '\v' */
PY_CTF_SPACE, /* 0xc '\f' */
PY_CTF_SPACE, /* 0xd '\r' */
' && Py_ISSPACE(Py_CHARMASK(*str))) {
str++;
}
That is, it traverses the string and if it is a white space it increments the pointer, causing the character to be ignored in later steps. It will be considered whitespace any character that Py_ISSPACE
returns true.
>>> int('\t1')
1
>>> int('\n2')
2
>>> int('\v3')
3
>>> int('\f4')
4
>>> int('\r5')
5
That is, the characters \t
, \n
, \v
, \f
and \r
will be disregarded in string .
scan = str;
# ...
while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base || *scan == '_') {
# ...
}
Continuing to review the body of the function, we see the excerpt (line 2399):
>>> int('1_000')
1000
>>> int('1\n000')
...
ValueError: invalid literal for int() with base 10: '1\n000'
It assigns the input pointer str
to scan
and traverses it as long as the character is a valid digit, that is, less than the given base, or the _
character. Any character that does not satisfy these conditions will cause goto onError
to be executed, ending the function with error. Therefore, within the number the character will be allowed _
only, but any other character, including the whitespace, will result in an error.
while (*str && Py_ISSPACE(Py_CHARMASK(*str))) {
str++;
}
if (*str != '/* Parses an int from a bytestring. Leading and trailing whitespace will be
* ignored.
*
* If successful, a PyLong object will be returned and 'pend' will be pointing
* to the first unused byte unless it's NULL.
*
* If unsuccessful, NULL will be returned.
*/
PyObject *
PyLong_FromString(const char *str, char **pend, int base);
') {
goto onError;
}
Finally, continuing the analysis of the function cup, we see again ( line 2535):
while (*str != '#define Py_ISSPACE(c) (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_SPACE)
// pyctype.c
PY_CTF_SPACE, /* 0x9 '\t' */
PY_CTF_SPACE, /* 0xa '\n' */
PY_CTF_SPACE, /* 0xb '\v' */
PY_CTF_SPACE, /* 0xc '\f' */
PY_CTF_SPACE, /* 0xd '\r' */
' && Py_ISSPACE(Py_CHARMASK(*str))) {
str++;
}
In a similar fashion as above, to ignore the white space at the beginning of string , the pointer is traversed by ignoring the trailing blanks. The condition of ending with ' '
ensures that the string ends with whitespace and not with other characters.
In short,
- Any leading blank will be ignored (
'\t'
, '\n'
, '\v'
, '\f'
, '\r'
, _
);
- During string , any character other than a digit or
_
will give error;
- Any whitespace from the end will be ignored;
- Any character that is not a digit or %code% will give error, except cases above (start and end spaces);