• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)


  • Subject: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
  • From: Chris Ridd <email@hidden>
  • Date: Thu, 05 Sep 2002 11:23:19 +0100

On 5/9/02 8:59 am, Malte Tancred <email@hidden> wrote:
> On wednesday, sep 4, 2002, at 15:09 Europe/Stockholm, Clark S. Cox III
> wrote:
>> No you wouldn't. There is no way that any byte in a multi-byte
>> UTF-8
>> character could be confused for an ASCII character, because they
>> always have
>> the high bit set. For instance, there is no way that you can make a
>> multi-byte UTF-8 character that looks like "%d".
>
> I believe there is something called "overlong representation". For
> example, a slash (/) can be represented by for example a 3 byte UTF-8
> sequence. The encoding/algorithm per se allows this.

Sort of correct - you aren't permitted to encode this, but it was slightly
vague about having to decode it.

> This behavior is forbidden though, published in an extension to the
> original spec I think.

Correct! The Unicode consortium's technical report 27 includes a "UTF-8
Corrigendum" that prohibits the interpretation of non-shortest forms of BMP
characters.

If you're desperately interested in this stuff and/or have copious spare
time ;-) the report is at:

<http://www.unicode.org/unicode/reports/tr27/>

I recall Microsoft falling foul of this problem in IIS.

Cheers,

Chris
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!) (From: Malte Tancred <email@hidden>)

  • Prev by Date: Re: Issue with NSPrintInfo values being weird...
  • Next by Date: Re: ftp mounting
  • Previous by thread: Re: iso-8859-1 over UTF8 (was: Re: cString deprecated!)
  • Next by thread: Final round with NSTIFFCompressionCCITTFAX3
  • Index(es):
    • Date
    • Thread