What calculation to convert a 'CodePoint' to UTF-16?

Question

What calculation to convert a 'CodePoint' to UTF-16?

Navigation

#1 by (3 votes)

1

I have a 32-bit integer representing a unicode character and would like to convert this single character to its utf-16 representation, that is, one or more 16-bit integers.

unicode

asked by anonymous 27.06.2017 / 14:52

1 answer



                    
        

         
                            Format string or float for currency
                                        Change the color line according to the contents of the cell in the datagridview

score 3 · Accepted Answer

The unicode transformation format, 16 bits (UTF-16) is defined in the section 2.5 of the Unicode standard , as well as the RFC 2781 . It works as follows:

Be the codepoint U of the value you want to encode. If U is less than 65,536, issue it normally.

If U is greater than or equal to 65,536, get U' = U - 65536 . This U' , by the Unicode rules, will have the 12 most significant bits equal to zero (since the last valid

 Issue two bytes, in order:

 The first has the six most significant bits    0x10FFFF     and the least significant ten equal to the ten most significant bits of    1101 10    .


 The second has the six most significant bits    U'     and the least significant ten equal to the ten least significant bits of    1101 11    . 



 In C: 

void
utf_16(uint32_t codepoint, FILE * out) {
    uint32_t U;
    uint16_t W;

    assert(codepoint <= 0x10FFFF);
    if (codepoint < 0x10000) {
        W = (uint16_t) codepoint;
        fwrite(W, sizeof(W), 1, out);
    } else {
        U = codepoint - 0x10000;
        W = 0xD800 | (U >> 10);
        fwrite(W, sizeof(W), 1, out);
        W = 0xDC00 | (U & 0x3FF);
        fwrite(W, sizeof(W), 1, out);
    }
}