Problems with encoding in Database Query

1

When trying to execute a SELECT , using fields that save data in the JSON structure, I see a problem with the data that may come:

SELECT tbl_pf.nome,
       tbl_pf.adicionais::JSONB->'recebeTaxa' AS localTaxa
FROM   tbl_pf;

  

ERROR: unsupported Unicode escape sequence
    DETAIL: Unicode escape values can not be used for code values above 007F when the encoding server is not UTF8.

When fetching the entire JSON field, there is no error.

This search is done in the terminal, where there is already the configuration to handle UTF8 data coming from the database. (This type of query I can use in other tables with data in JSON, with already treated content, including numerical data.)

I wanted to know if this problem is only in the terminal, that can not identify the encode of the bank, or the BD also has "fault in the registry", since using a routine in PHP for tests, the error was the same? / p>     

asked by anonymous 26.04.2018 / 19:55

1 answer

2

The error message pasted in the question comes from PostgreSQL, and indicates that the database in question is not in encoding UTF-8.

documentation indicates that the JSON specification can not be fully supported without the database is encoded in UTF-8, since RFC 7159 determines that JSON values use this character set (in practice it also supports UTF-16 and UTF-32, but emphasizes that UTF-8 provides better interoperability between systems).

More specifically for your case, the same documentation page clarifies:

  

However, the input function for jsonb is stricter: it disallows Unicode escapes for non-ASCII characters (those above U + 007F) unless the encoding is UTF8.

That is, if your database does not have encoding UTF8, you will get these conversion errors for "extended" characters every time you extract them from a jsonb field , since it can not be converted to the charset of the database in question for manipulation of the data. I believe that the absence of errors when querying the field as a whole is that it is treated simply as a text, without having to parse to search for certain properties.

So, whether on the command line or any other client used, Postgres should return this same error for the query shown. I suggest you convert your base to UTF8 in order to avoid any coding problems with the json and jsonb data types.

If you need your client to "talk" to the bank in another encoding , you can set the client_encoding parameter at the time of connection or user-specific settings, such as another answer here in SOP.

    
28.04.2018 / 00:50