Re: Weird error
Re: Weird error
- Subject: Re: Weird error
- From: Jean-Baptiste <email@hidden>
- Date: Mon, 29 Sep 2003 21:57:12 +0200
Could someone explain to me what a 'BOM' is ?
JB
Le lundi, 29 sep 2003, ` 20:59 Europe/Paris, Christopher Nebel a icrit :
On Sep 24, 2003, at 9:02 AM, Walter Ian Kaye wrote:
>> ... the code that creates the data in the first place? Is it
producing
>> old fashioned ASCII text or has it started using UTF-8 or UTF16?
That's an interesting point. I've been out of the loop for a while,
how do the text item delimiters handle UTF-8 and UTF-16?
Apparently they assume a BOM. When you coerce "BLOW, JOE" to Unicode
text, it allocates space for a nonexistent BOM. Since Unicode text is
UTF-8, there are two separate 8-bit characters for the double-byte
BOM... which is nonexistent and so the two characters allocated are
represented by empty strings.
I assume this is done to prevent the loss of the BOM when converting
back and forth when a BOM actually does exist, so this is apparently
something that you "just have to know" when using Unicode text.
It might have been smarter of AS to default to a non-empty BOM
(FFFE?) if only to have allowed us to figure this out sooner. ;)
Ah, no. First off, how AppleScript stores Unicode internally should
be of no consequence to you, since a code point is a code point, no
matter which encoding you're using, and AppleScript doesn't allow
(direct) access to the bytes of a Unicode string. (However, if you
really want to know, it uses UTF-16.)
Second, AppleScript does not store a BOM internally. When receiving
data from the outside world, it looks for a BOM and removes it
(swapping the data as necessary, of course.) Incidentally, a BOM in
UTF-8 is three bytes, not two. (0xefbbbf, to be precise.) It's
obviously pointless for the purposes of byte-ordering, since UTF-8 is
immune to that sort of thing, but it's sometimes used as a tag
("magic") to indicate that the data is UTF-8.
The business with the misbehaving text item delimiters is simply a bug
that had to do with some bad math -- it assumed that a delimiter
string would always have at least one character. Nothing to do with
BOMs. This will be fixed in a future release.
--Chris Nebel
AppleScript Engineering
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.