Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
- Subject: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
- From: Malte Tancred <email@hidden>
- Date: Thu, 5 Sep 2002 09:59:12 +0200
On wednesday, sep 4, 2002, at 15:09 Europe/Stockholm, Clark S. Cox III
wrote:
No you wouldn't. There is no way that any byte in a multi-byte
UTF-8
character could be confused for an ASCII character, because they
always have
the high bit set. For instance, there is no way that you can make a
multi-byte UTF-8 character that looks like "%d".
I believe there is something called "overlong representation". For
example, a slash (/) can be represented by for example a 3 byte UTF-8
sequence. The encoding/algorithm per se allows this.
This behavior is forbidden though, published in an extension to the
original spec I think.
From what I've read, that's why it's important that any UTF-8 decoder
disallows overlong sequences.
A document I've found interesting is
http://www.cl.cam.ac.uk/~mgk25/unicode.html .
Anyway, I don't know if this applies to the discussion, but to me it
seems that a tool that doesn't know anything about UTF-8 and there is a
probability that this tool will handle data encoded into UTF-8 by
someone else, some precautions should be taken by the caller.
Not in all cases perhaps, but if you talk internet server programs that
may, if taken over by a malicious user, compromise the security of the
server it might be a good idea?
Cheerio,
Malte
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.