Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)

Subject: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
From: "Clark S. Cox III" <email@hidden>
Date: Wed, 04 Sep 2002 09:09:10 -0400

On 09/04/2002 02:45, "Allan Odgaard" <email@hidden> wrote:

> On tirsdag, sep 3, 2002, at 06:17 Europe/Copenhagen, Andrew Pinski
> wrote:
>
>>> However most of Europe make heavy use of accented letters and other
>>> stuff which in iso-8859-1 is placed in the range of 160-256.
>> So what if I knew Japanese what would happen then when you try to
>> convert the string to iso-8859-1 aka Latin 1.
>
> As I said then I (only) use it when I need a "char *", e.g. for
> sscanf(), regex-functions and similar.
>
> I think you'd be pretty screwed here with an UTF8-string consisting of
> Japanese characters, cause the functions look for control codes and
> similar and are not multi-byte aware, thus it might easily mistake a
> multi-byte sequence for one or more control sequences, or part of a
> multi-byte sequence as the "argument" for a control code etc.

No you wouldn't. There is no way that any byte in a multi-byte UTF-8
character could be confused for an ASCII character, because they always have
the high bit set. For instance, there is no way that you can make a
multi-byte UTF-8 character that looks like "%d".

>>> [...] this has been the de-facto standard on all other platforms than
>>> Mac for the last 10-20 years, and I also believe it to be promoted
>>> somewhere as the internet standard (whatever that means).
>> The Mac used their own encoding because if I remember correctly there
>> was no standard when Apple made accents available in 1984.
>
> I don't know the exact date of iso-8859-1. Though the first Amiga
> shipped in 1985, and I believe it used the encoding.
>
> However, my statement was not meant to criticize the Mac, but merely
> stating that for stuff that may cross platforms (e.g. some network
> protocols doesn't allow you to specify an encoding scheme) then
> iso-8859-1 is a rather safe bet.
>
>> Also the entire BSD layer in OS X is not geared towards iso-8859-1, it
>> is geared towards ASCII which is 7bits also the BSD layer does not
>> care about what encoding you
>
> Sorry, I really meant many of the tools accompanying the OS -- not the
> kernel itself.

Most of the tools are encoding agnostic. In fact, Terminal.app defaults
to using UTF-8.

--
http://homepage.mac.com/clarkcox3/
Clark S. Cox III
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Final round with NSTIFFCompressionCCITTFAX3
  - From: Ben Mackin <email@hidden>
- Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
  - From: Malte Tancred <email@hidden>

References:
	>Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!) (From: Allan Odgaard <email@hidden>)

Prev by Date: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Next by Date: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Previous by thread: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Next by thread: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Index(es):
- Date
- Thread