看到有前輩寫了一個UTF-8與UNICODE相互轉換的代碼,順便提一下,希望可以給大家提供一點幫助.
下面是一些編碼格式的bit長
Examples of fixed-width encoding forms:
Type | Each character encoded as |
Notes |
---|---|---|
7-bit | a single 7-bit quantity | example: ISO 646 |
8-bit G0/G1 | a single 8-bit quantity | with constraints on use of C0 and C1 spaces |
8-bit | a single 8-bit quantity | with no constraints on use of C1 space |
8-bit EBCDIC | a single 8-bit quantity | with the EBCDIC conventions rather than ASCII conventions |
16-bit (UCS-2) | a single 16-bit quantity | within a code space of 0..FFFF |
32-bit (UCS-4) | a single 32-bit quantity | within a code space 0..7FFFFFFF |
32-bit (UTF-32) | a single 32-bit quantity | within a code space of 0..10FFFF |
16-bit DBCS process code | a single 16-bit quantity | example: UNIX widechar implementations of Asian CCS's |
32-bit DBCS process code | a single 32-bit quantity | example: UNIX widechar implementations of Asian CCS's |
DBCS Host | two 8-bit quantities | following IBM host conventions |
Examples of variable-width encoding forms:
Name | Characters are encoded as | Notes |
---|---|---|
UTF-8 | a mix of one to four 8-bit code units in Unicode and one to six code units in 10646 |
used only with Unicode/10646 |
UTF-16 | a mix of one to two 16 bit code units | used only with Unicode/10646 |