[Coco] Updated accurate memory map [forward]

Nathan Woods npwoods at cybercom.net
Sat Jul 10 22:45:09 EDT 2004


KnudsenMJ at aol.com wrote:

>Does UTF-8 use 16 bits for extended characters, or is it just the unused 
>upper 128 codes of ASCII (which you can get by holding ALT while typing 3 or 4 
>digits)?  I guess the "8" answers that :-)
>Thanks, Mike K.
>
UTF-8 encodes like this:

Unicode chars $00000000 - $0000007F: 0xxxxxxx
Unicode chars $00000080 - $000007FF: 110xxxxx 10xxxxxx
Unicode chars $00000800 - $0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
Unicode chars $00010000 - $001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

...and so on

Since all characters in all living languages have a char value of less 
than $FFFF, in practice extended characters will get either two or three 
bytes.  For a programmer, the thing that is nice about UTF-8 is that 
apps can process it in the same way that one processes other 8-bit 
encodings.  In UTF-8, $21 will always be an exclaimation point, whereas 
in other encodings such as Shift-JIS, $21 could be the second byte in a 
multibyte character.





More information about the Coco mailing list