• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Weird error
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Weird error


  • Subject: Re: Weird error
  • From: Jean-Baptiste <email@hidden>
  • Date: Tue, 30 Sep 2003 23:09:12 +0200

I was aware of big-endian vs little-endian but not about the 'BOM' (but now I am). thanks for the explanation.

@+ JB

Le mardi, 30 sep 2003, ` 20:03 Europe/Paris, Christopher Nebel a icrit :

On Sep 29, 2003, at 1:57 PM, Steve Mills wrote:

On Monday, Sep 29, 2003, at 14:57 US/Central, Jean-Baptiste wrote:

Could someone explain to me what a 'BOM' is ?

Byte Order Mark. The order of the bytes in each Unicode character. It's at the front of most Unicode files. It's either 0xfffe or 0xfeff, which mean the bytes are in DOS order or Mac order respectively. As Unicode, the ASCII character 'A' would look like 0x0065 on a Mac and 0x6500 on DOS.

Steve's explanation obscures a few details, but is essentially correct. For the terminally fussy:

"BOM" does indeed stand for "Byte Order Mark". The problem is that (a) all Unicode encodings other than UTF-8 use multi-byte integers for their code units -- for instance, UTF-16 uses 16-bit integers, and (b) different computers use different byte orders for multi-byte integers. [1] Therefore, if you try to send data from one part of the "world" to another, the receiver may interpret the bytes the other way around and get gibberish.

The solution is the BOM: it's a special "character" at the front of the text. Its code point is 0xfeff, so if you see 0xfffe (which is defined to be illegal, so it'll never show up under any other circumstances), then you know you've got things the wrong way around.


--Chris Nebel
AppleScript Engineering

[1] There are only two orders anyone uses: big-endian and little-endian. (See <http://info.astrian.net/jargon/terms/b/big-endian.html>.) Motorola and PowerPC use the former, Intel uses the latter. Say you've got a 2-byte integer, 0x1234. That's two bytes, 0x12 and 0x34. On a big-endian system, the bytes show up in memory most-significant (i.e., the big end) first, so you get 0x1234. On a little-endian system, it's the other way around: 0x3412.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: Weird error (From: Christopher Nebel <email@hidden>)

  • Prev by Date: Re: chevrons
  • Next by Date: Re: Writing To A File
  • Previous by thread: Re: Weird error
  • Next by thread: Re: Weird error
  • Index(es):
    • Date
    • Thread