Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
- Subject: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
- From: "Clark S. Cox III" <email@hidden>
- Date: Wed, 04 Sep 2002 09:09:10 -0400
On 09/04/2002 02:45, "Allan Odgaard" <email@hidden> wrote:
>
On tirsdag, sep 3, 2002, at 06:17 Europe/Copenhagen, Andrew Pinski
>
wrote:
>
>
>> However most of Europe make heavy use of accented letters and other
>
>> stuff which in iso-8859-1 is placed in the range of 160-256.
>
> So what if I knew Japanese what would happen then when you try to
>
> convert the string to iso-8859-1 aka Latin 1.
>
>
As I said then I (only) use it when I need a "char *", e.g. for
>
sscanf(), regex-functions and similar.
>
>
I think you'd be pretty screwed here with an UTF8-string consisting of
>
Japanese characters, cause the functions look for control codes and
>
similar and are not multi-byte aware, thus it might easily mistake a
>
multi-byte sequence for one or more control sequences, or part of a
>
multi-byte sequence as the "argument" for a control code etc.
No you wouldn't. There is no way that any byte in a multi-byte UTF-8
character could be confused for an ASCII character, because they always have
the high bit set. For instance, there is no way that you can make a
multi-byte UTF-8 character that looks like "%d".
>
>> [...] this has been the de-facto standard on all other platforms than
>
>> Mac for the last 10-20 years, and I also believe it to be promoted
>
>> somewhere as the internet standard (whatever that means).
>
> The Mac used their own encoding because if I remember correctly there
>
> was no standard when Apple made accents available in 1984.
>
>
I don't know the exact date of iso-8859-1. Though the first Amiga
>
shipped in 1985, and I believe it used the encoding.
>
>
However, my statement was not meant to criticize the Mac, but merely
>
stating that for stuff that may cross platforms (e.g. some network
>
protocols doesn't allow you to specify an encoding scheme) then
>
iso-8859-1 is a rather safe bet.
>
>
> Also the entire BSD layer in OS X is not geared towards iso-8859-1, it
>
> is geared towards ASCII which is 7bits also the BSD layer does not
>
> care about what encoding you
>
>
Sorry, I really meant many of the tools accompanying the OS -- not the
>
kernel itself.
Most of the tools are encoding agnostic. In fact, Terminal.app defaults
to using UTF-8.
--
http://homepage.mac.com/clarkcox3/
Clark S. Cox III
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.