The "b prefix" indicates that the object you have at hand is not a text string - but a set of bytes -
In Python 3 the two things are fundamentally different, why you always need to know how the text is encoded in the bytes to be able to transpose them into characters. Nowadays it is increasingly common for the text to be in the "utf-8" encoding, but some legacy systems and Windows use "latin-1" encoding - which allows all characters of the Portuguese language to be in a single byte .
Python's "byte" objects have a "decode" method - just call it and the result will be the text string (which is indicated in Python without the prefix 'b'). but in addition to the "decode" method, the str(xml, 'utf-8')
call would also do this transformation - the error message changes. Since it is not the Python error saying that there is an invalid utf-8 string, the odds are your XML will be in utf-8 - only ODBC complains about an invalid character: utf-8 supports universal characters - other encodings such as latin1, no - if there are characters in languages with Greek, Russian, Hebrew, or even punctuation marks that are not defined in latin-1, an error will occur, which may well be this one. p>
The remedy would be to force a coding with escaping to pass the data to the driver - but, there is another problem: the function does not accept bytes (the already encoded text). Result: You will have to mutilate the text in Python, replacing all the characters outside of "latin1" with "?", Turning it back into text and then making your call. Then, if there is no other error in the XML should work.
I would recommend contacting anyone who has designed the bank you are feeding to accept universal encoding.
To understand more about these processes, stop now what you are doing and read link
To fix your problem and remove the problematic characters from the text:
An error equivalent to this is what is happening now inside the ODBC code - if you send a text with Cyrillic characters, for example:
In [119]: a = "texto inválido: Ут пауло интерессет темпорибус пер"
In [120]: a.encode("latin-1")
UnicodeEncodeError Traceback (most recent call last)
Then - you should: decode your data using utf-8, encode back to latin-1, changing the unknown characters to "?", and decode back to text - there you will have data that can be sent to your bank:
In [122]: dados
Out[122]: b'texto inv\xc3\xa1lido: \xd0\xa3\xd1\x82'
In [123]: dados_str = dados.decode("utf-8").encode("latin1", errors="replace").decode("latin1")
In [124]: dados_str
Out[124]: 'texto inválido: ??'
(The "data" variable in this example is equivalent to what you have there at the beginning: a bytes object representing text encoded in utf-8, with invalid characters in latin-1). If you continue to have the same error não é possível alternar a codificação
, expriemn filter out all non-ASCII characters - use "ASCII" instead of "latin-1" in the above code.