• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Weird error
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Weird error


  • Subject: Re: Weird error
  • From: Christopher Nebel <email@hidden>
  • Date: Mon, 29 Sep 2003 11:59:46 -0700

On Sep 24, 2003, at 9:02 AM, Walter Ian Kaye wrote:

>> ... the code that creates the data in the first place? Is it producing
>> old fashioned ASCII text or has it started using UTF-8 or UTF16?

That's an interesting point. I've been out of the loop for a while,
how do the text item delimiters handle UTF-8 and UTF-16?

Apparently they assume a BOM. When you coerce "BLOW, JOE" to Unicode text, it allocates space for a nonexistent BOM. Since Unicode text is UTF-8, there are two separate 8-bit characters for the double-byte BOM... which is nonexistent and so the two characters allocated are represented by empty strings.

I assume this is done to prevent the loss of the BOM when converting back and forth when a BOM actually does exist, so this is apparently something that you "just have to know" when using Unicode text.

It might have been smarter of AS to default to a non-empty BOM (FFFE?) if only to have allowed us to figure this out sooner. ;)

Ah, no. First off, how AppleScript stores Unicode internally should be of no consequence to you, since a code point is a code point, no matter which encoding you're using, and AppleScript doesn't allow (direct) access to the bytes of a Unicode string. (However, if you really want to know, it uses UTF-16.)

Second, AppleScript does not store a BOM internally. When receiving data from the outside world, it looks for a BOM and removes it (swapping the data as necessary, of course.) Incidentally, a BOM in UTF-8 is three bytes, not two. (0xefbbbf, to be precise.) It's obviously pointless for the purposes of byte-ordering, since UTF-8 is immune to that sort of thing, but it's sometimes used as a tag ("magic") to indicate that the data is UTF-8.

The business with the misbehaving text item delimiters is simply a bug that had to do with some bad math -- it assumed that a delimiter string would always have at least one character. Nothing to do with BOMs. This will be fixed in a future release.


--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Weird error
      • From: Jean-Baptiste <email@hidden>
References: 
 >Re: Weird error (From: John Fowler <email@hidden>)
 >Re: Weird error (From: Walter Ian Kaye <email@hidden>)

  • Prev by Date: Re: Tex-Edit add line endings appending cr
  • Next by Date: Mac OS X and getting preview of image
  • Previous by thread: Re: Weird error
  • Next by thread: Re: Weird error
  • Index(es):
    • Date
    • Thread