Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)

Subject: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
From: Malte Tancred <email@hidden>
Date: Thu, 5 Sep 2002 09:59:12 +0200

On wednesday, sep 4, 2002, at 15:09 Europe/Stockholm, Clark S. Cox III wrote:

No you wouldn't. There is no way that any byte in a multi-byte UTF-8
character could be confused for an ASCII character, because they always have
the high bit set. For instance, there is no way that you can make a
multi-byte UTF-8 character that looks like "%d".

I believe there is something called "overlong representation". For example, a slash (/) can be represented by for example a 3 byte UTF-8 sequence. The encoding/algorithm per se allows this.

This behavior is forbidden though, published in an extension to the original spec I think.

From what I've read, that's why it's important that any UTF-8 decoder disallows overlong sequences.

A document I've found interesting is http://www.cl.cam.ac.uk/~mgk25/unicode.html .

Anyway, I don't know if this applies to the discussion, but to me it seems that a tool that doesn't know anything about UTF-8 and there is a probability that this tool will handle data encoded into UTF-8 by someone else, some precautions should be taken by the caller.

Not in all cases perhaps, but if you talk internet server programs that may, if taken over by a malicious user, compromise the security of the server it might be a good idea?

Cheerio,
Malte
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
  - From: Chris Ridd <email@hidden>

References:
	>Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!) (From: "Clark S. Cox III" <email@hidden>)

Prev by Date: NSGraphicsContext help...
Next by Date: Searching AddressBook
Previous by thread: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Next by thread: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Index(es):
- Date
- Thread