Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)

Subject: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
From: Chris Hanson <email@hidden>
Date: Wed, 4 Sep 2002 02:44:05 -0500

At 8:45 AM +0200 9/4/02, Allan Odgaard wrote:

As I said then I (only) use it when I need a "char *", e.g. for sscanf(), regex-functions and similar.

I think you'd be pretty screwed here with an UTF8-string consisting of Japanese characters, cause the functions look for control codes and similar and are not multi-byte aware, thus it might easily mistake a multi-byte sequence for one or more control sequences, or part of a multi-byte sequence as the "argument" for a control code etc.

No, it won't. Control codes are low ASCII values. Every byte in a multi-byte UTF-8 sequence has its high bit set and is thus over 128, so no software should mistake parts of a multi-byte sequence for control codes. (This is part of why UTF-8 can take 3-4 bytes to represent a single 2-byte Unicode character.) And UTF-8 strings are safe to use with a NUL (ASCII 0) terminator, so they're safe to use with all Standard C string functions.

The people that designed UTF-8 put quite a bit of thought into it. Unless you have a *very* good reason *not* to use it in a *specific* case, you should use it.

However, my statement was not meant to criticize the Mac, but merely stating that for stuff that may cross platforms (e.g. some network protocols doesn't allow you to specify an encoding scheme) then iso-8859-1 is a rather safe bet.

As time goes on, UTF-8 is becoming the encoding of choice. I believe all new Internet protocols, for instance, are not only required to be 8-bit clean but also to use UTF-8 as their encoding of choice. And I believe all modern platforms either use Unicode strings natively (and thus support UTF-8 encoding) or have the ability to translate UTF-8 strings into their native encoding.

Sorry, I really meant many of the tools accompanying the OS -- not the kernel itself.

The tools tend to not care about encoding. However, Terminal.app defaults to using UTF-8 encoding as of Jaguar. So your best bet is to use UTF-8 encoding everywhere but specific cases where you know you *must* use another encoding.

-- Chris

--
Chris Hanson | Email: email@hidden
bDistributed.com, Inc. | Phone: +1-847-372-3955
Making Business Distributed | Fax: +1-847-589-3738
http://bdistributed.com/ | Personal Email: email@hidden
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
  - From: Allan Odgaard <email@hidden>

References:
	>Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!) (From: Allan Odgaard <email@hidden>)

Prev by Date: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Next by Date: PostgresQL [was : NSTableView : what about "footer" ?]
Previous by thread: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Next by thread: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Index(es):
- Date
- Thread