首页使用C语言代码生成一个UTF-8编码函数

使用C语言代码生成一个UTF-8编码函数

时间: 2023-11-28 08:50:06 浏览: 74

以下是一个简单的 C 语言函数，用于将一个给定的 Unicode 码点编码为 UTF-8 字节序列： ```c #include <stdio.h> void encode_utf8(unsigned int codepoint) { if (codepoint <= 0x7F) { // 1字节 putchar(codepoint); } else if (codepoint <= 0x7FF) { // 2字节 putchar((codepoint >> 6) | 0xC0); putchar((codepoint & 0x3F) | 0x80); } else if (codepoint <= 0xFFFF) { // 3字节 putchar((codepoint >> 12) | 0xE0); putchar(((codepoint >> 6) & 0x3F) | 0x80); putchar((codepoint & 0x3F) | 0x80); } else if (codepoint <= 0x10FFFF) { // 4字节 putchar((codepoint >> 18) | 0xF0); putchar(((codepoint >> 12) & 0x3F) | 0x80); putchar(((codepoint >> 6) & 0x3F) | 0x80); putchar((codepoint & 0x3F) | 0x80); } else { printf("Error: Code point out of range: %u\n", codepoint); } } int main() { encode_utf8(0x24); // $ encode_utf8(0xA2); // ¢ encode_utf8(0x20AC); // € encode_utf8(0x1F49C); // 💜 return 0; } ``` 该函数使用了 UTF-8 的编码规则，将 Unicode 码点编码为 1 到 4 个字节的字节序列。使用时，只需将所需的 Unicode 码点作为参数传递给 `encode_utf8` 函数即可。在上面的示例中，我们将 `$`、`¢`、`€` 和 `💜` 四个字符的 Unicode 码点分别传递给了该函数，并输出了它们的 UTF-8 字节序列。

阅读全文