[Coco] Updated accurate memory map [forward]
Nathan Woods
npwoods at cybercom.net
Sat Jul 10 22:45:09 EDT 2004
KnudsenMJ at aol.com wrote:
>Does UTF-8 use 16 bits for extended characters, or is it just the unused
>upper 128 codes of ASCII (which you can get by holding ALT while typing 3 or 4
>digits)? I guess the "8" answers that :-)
>Thanks, Mike K.
>
UTF-8 encodes like this:
Unicode chars $00000000 - $0000007F: 0xxxxxxx
Unicode chars $00000080 - $000007FF: 110xxxxx 10xxxxxx
Unicode chars $00000800 - $0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
Unicode chars $00010000 - $001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
...and so on
Since all characters in all living languages have a char value of less
than $FFFF, in practice extended characters will get either two or three
bytes. For a programmer, the thing that is nice about UTF-8 is that
apps can process it in the same way that one processes other 8-bit
encodings. In UTF-8, $21 will always be an exclaimation point, whereas
in other encodings such as Shift-JIS, $21 could be the second byte in a
multibyte character.
More information about the Coco
mailing list