• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Writing to file as UTF8 with BOM ?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Writing to file as UTF8 with BOM ?


  • Subject: Re: Writing to file as UTF8 with BOM ?
  • From: "Mark J. Reed" <email@hidden>
  • Date: Fri, 27 Oct 2006 14:39:07 -0400

On 10/27/06, Yvon Thoraval <email@hidden> wrote:
i'de like to go further working with UTF-16 to enable working with asian
"character" set (I'm an anciant chinese reader).

I think you misunderstand the way UTFs work. Unicode is Unicode is Unicode. It has exactly the same character repertoire - containing the nicely palindromic number of 1,114,111 code points, though most of those are not yet assigned - no matter what transformation format you use. UTF-32, UTF-16, UTF-8, even the deprecated UTF-7 for ancient email gateways; all support the full set of characters. They differ in ease of processing, compatibility with other character sets, and of course the amount of space taken up by each character. So which you should use depends on how you rank those characteristics in importance.

In UTF-32, all characters take up four bytes (32 bits) each.  In
UTF-16, most of the assigned characters, which have code points in the
Basic Multilingual Plane range of 0 - 65,535, take up only two bytes
(16 bits), while characters in the mostly-empty-so-far higher planes
with code points greater than 65,535 have to be represented with
surrogate pairs, taking up four bytes. In UTF-8, characters take up
either one, two, three, or four bytes.

If you care about size, and if your doc is mostly text in the Roman
alphabet, UTF-8 will be most compact.  If your doc is mostly ancient
Han ideograms, then you might be better off with UTF-16.  If you're
using a lot of characters from the non-BMP supplemental Han
characters, which require four bytes in UTF-16 anyway, switching to
UTF-32 might not be much of a space increase...

--
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users

This email sent to email@hidden
References: 
 >Re: Writing to file as UTF8 with BOM ? (From: Yvon Thoraval <email@hidden>)
 >Re: Writing to file as UTF8 with BOM ? (From: "Mark J. Reed" <email@hidden>)
 >Re: Writing to file as UTF8 with BOM ? (From: Yvon Thoraval <email@hidden>)
 >Re: Writing to file as UTF8 with BOM ? (From: "Mark J. Reed" <email@hidden>)
 >Re: Writing to file as UTF8 with BOM ? (From: Yvon Thoraval <email@hidden>)
 >Re: Writing to file as UTF8 with BOM ? (From: Yvon Thoraval <email@hidden>)
 >Re: Writing to file as UTF8 with BOM ? (From: Yvon Thoraval <email@hidden>)

  • Prev by Date: Re: Writing to file as UTF8 with BOM ?
  • Next by Date: Re: Photoshop CS2 Save
  • Previous by thread: Re: Writing to file as UTF8 with BOM ?
  • Next by thread: Re: Writing to file as UTF8 with BOM ?
  • Index(es):
    • Date
    • Thread