I have a 32-bit integer representing a unicode character and would like to convert this single character to its utf-16 representation, that is, one or more 16-bit integers.

1

asked by anonymous 27.06.2017 / 14:52

3

The unicode transformation format, 16 bits (UTF-16) is defined in the section 2.5 of the Unicode standard , as well as the RFC 2781 . It works as follows:

`U`

of the value you want to encode. If `U`

is less than 65,536, issue it normally. `U`

is greater than or equal to 65,536, get `U' = U - 65536`

. This `U'`

, by the Unicode rules, will have the 12 most significant bits equal to zero (since the last valid ` `` `` ````
```

```
Issue two bytes, in order:
```

` The first has the six most significant bits ``0x10FFFF`

and the least significant ten equal to the ten most significant bits of `1101 10`

.

```
``` The second has the six most significant bits `U'`

and the least significant ten equal to the ten least significant bits of `1101 11`

.
In C:

```
void
utf_16(uint32_t codepoint, FILE * out) {
uint32_t U;
uint16_t W;
assert(codepoint <= 0x10FFFF);
if (codepoint < 0x10000) {
W = (uint16_t) codepoint;
fwrite(W, sizeof(W), 1, out);
} else {
U = codepoint - 0x10000;
W = 0xD800 | (U >> 10);
fwrite(W, sizeof(W), 1, out);
W = 0xDC00 | (U & 0x3FF);
fwrite(W, sizeof(W), 1, out);
}
}
```

` `

```
27.06.2017 / 15:38
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```

```
```