# What calculation to convert a 'CodePoint' to UTF-16?

1

I have a 32-bit integer representing a unicode character and would like to convert this single character to its utf-16 representation, that is, one or more 16-bit integers.

asked by anonymous 27.06.2017 / 14:52

3

The unicode transformation format, 16 bits (UTF-16) is defined in the section 2.5 of the Unicode standard , as well as the RFC 2781 . It works as follows:

• Be the codepoint `U` of the value you want to encode. If `U` is less than 65,536, issue it normally.
• If `U` is greater than or equal to 65,536, get `U' = U - 65536` . This `U'` , by the Unicode rules, will have the 12 most significant bits equal to zero (since the last valid ``` ```
• ``` Issue two bytes, in order: ```
• ` The first has the six most significant bits 0x10FFFF and the least significant ten equal to the ten most significant bits of 1101 10 . `
• ``` The second has the six most significant bits U' and the least significant ten equal to the ten least significant bits of 1101 11 . In C: void utf_16(uint32_t codepoint, FILE * out) { uint32_t U; uint16_t W; assert(codepoint <= 0x10FFFF); if (codepoint < 0x10000) { W = (uint16_t) codepoint; fwrite(W, sizeof(W), 1, out); } else { U = codepoint - 0x10000; W = 0xD800 | (U >> 10); fwrite(W, sizeof(W), 1, out); W = 0xDC00 | (U & 0x3FF); fwrite(W, sizeof(W), 1, out); } }     ```
` `
``` 27.06.2017 / 15:38 ```
``` ```
``` ```
``` ```
``` Format string or float for currency Change the color line according to the contents of the cell in the datagridview ```
``` ```
``` ```
``` ```
``` user contributions licensed under cc by-sa 3.0 with attribution required. it_qna ```