Hello,
When creating a string in a Java class (for example: String t = "Ola Java!"
), it seems the compiler is choosing the 'wrong' encode to interpret the bytes that are in the source and generate the String (the 'right' encode should be UTF-8
, which is the encode I'm using in the sources).
To exemplify the error, I did the following test:
String t = "ã";
log.debug("t: " + t);
log.debug("t.length(): " + t.length());
log.debug("t.getBytes().length: " + t.getBytes().length);
log.debug("t.getBytes(utf-8).length: " + t.getBytes("utf-8").length);
log.debug("t.getBytes(UTF-8).length: " + t.getBytes("UTF-8").length);
log.debug("t.getBytes(ISO-8859-1).length: " + t.getBytes("ISO-8859-1").length);
(The logging engine I use is commons-logging
with log4j
support, but to do the same using System.out
)
The result was as follows:
t: ã
t.length(): 2
t.getBytes().length: 4
t.getBytes(utf-8).length: 4
t.getBytes(UTF-8).length: 4
t.getBytes(ISO-8859-1).length: 2
The first line could be explained by some conversion problem when converting the string at the time of writing the log file.
But the other lines make the problem clear.
Now in the second line ( t.length()
) to see that the String was created with two characters, not one, already showing that in the creation of the string the two bytes that represent the character ã in utf-8
were treated as two characters ( in some other format type ISO-8859-1).
I'm looking for some way to force the encode in the interpretation of a static string by the compiler, but I do not think it's a good way ... is there any way to do this? Or to tell the compiler which encode should be used when interpreting the static strings in the sources ?