Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: String Encoding Detection (Revisited)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: String Encoding Detection (Revisited)

Subject: Re: String Encoding Detection (Revisited)
From: David Elliott <email@hidden>
Date: Fri, 8 Aug 2003 15:17:08 -0400

On Friday, August 8, 2003, at 08:27 AM, Clark S. Cox III wrote:

On Thursday, August 07, 2003, at 23:39, Francisco Tolmasky wrote:

How do I determine if the data is in beg endian or little endian? Or is just check for both FEFF and FFFE enough? Also, is there no b/l e difference in the utf-8 mode? (Do I check for any of "EF BB BF, or for all of those one after the other?)

Yes, checking the BOM (if it exists) will tell you which endian the data is, that's all you need; and yes, there is no endian difference in UTF-8 (as it's an 8-bit encoding)

Furthermore, you should NOT assume that a file beginning with EF BB BF is necessarily UTF-8. That is a valid 3 character string in any normal 8-bit encoding. If you do determine that the file is UTF-8, then you can go ahead and remove the BOM if you wish. And because certain badly behaved editors add a BOM to UTF-8, you must. As someone else mentioned, the Unicode people consider it a really bad idea to use a BOM in UTF-8, for this and other reasons.

-Dave
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: String Encoding Detection (Revisited)
  - From: Francisco Tolmasky <email@hidden>

References:
	>Re: String Encoding Detection (Revisited) (From: "Clark S. Cox III" <email@hidden>)

Prev by Date: NSShortTimeDateFormatString still supported?
Next by Date: Re: String Encoding Detection (Revisited)
Previous by thread: Re: String Encoding Detection (Revisited)
Next by thread: Re: String Encoding Detection (Revisited)
Index(es):
- Date
- Thread