Re: Writing to file as UTF8 with BOM ?
Re: Writing to file as UTF8 with BOM ?
- Subject: Re: Writing to file as UTF8 with BOM ?
- From: "Mark J. Reed" <email@hidden>
- Date: Fri, 27 Oct 2006 14:39:07 -0400
On 10/27/06, Yvon Thoraval <email@hidden> wrote:
i'de like to go further working with UTF-16 to enable working with asian
"character" set (I'm an anciant chinese reader).
I think you misunderstand the way UTFs work. Unicode is Unicode is
Unicode. It has exactly the same character repertoire - containing
the nicely palindromic number of 1,114,111 code points, though most of
those are not yet assigned - no matter what transformation format you
use. UTF-32, UTF-16, UTF-8, even the deprecated UTF-7 for ancient
email gateways; all support the full set of characters. They differ
in ease of processing, compatibility with other character sets, and of
course the amount of space taken up by each character. So which you
should use depends on how you rank those characteristics in
importance.
In UTF-32, all characters take up four bytes (32 bits) each. In
UTF-16, most of the assigned characters, which have code points in the
Basic Multilingual Plane range of 0 - 65,535, take up only two bytes
(16 bits), while characters in the mostly-empty-so-far higher planes
with code points greater than 65,535 have to be represented with
surrogate pairs, taking up four bytes. In UTF-8, characters take up
either one, two, three, or four bytes.
If you care about size, and if your doc is mostly text in the Roman
alphabet, UTF-8 will be most compact. If your doc is mostly ancient
Han ideograms, then you might be better off with UTF-16. If you're
using a lot of characters from the non-BMP supplemental Han
characters, which require four bytes in UTF-16 anyway, switching to
UTF-32 might not be much of a space increase...
--
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users
This email sent to email@hidden