• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: POSIX paths and UTF-8 on Mac OS X...
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: POSIX paths and UTF-8 on Mac OS X...


  • Subject: Re: POSIX paths and UTF-8 on Mac OS X...
  • From: Adam Nohejl <email@hidden>
  • Date: Wed, 23 Nov 2005 18:08:37 +0100


2005/11/23 v 17:37, Shawn Erickson:
I am in the process of porting some software the runs on Windows to
Mac OS X and I want to validate what I believe to be true on Mac OS X
and also understand any special requirements that exist.

I am fairly sure (and API docs support) that things like open and
fopen accept UTF-8 (ASCII is a subset of UTF-8, so of course it accept
traditional ASCII as well). Is this correct?

I do see in Core Foundation code that in the new 10.4 method that
returns the file system representation of of a CFString that UTF-8 is
used but that it also attempts to do some specific type of string
decomposition[1] (assume dealing with combining accents and the
likes). So my question what exactly is expected and/or required in a
UTF-8 string handed to something like fopen.

Also can a BOM exist at the head of the UTF-8 string or must I assure
that it doesn't exist.

I tried to find a good document that calls things out fully but the
best I could find was the UTF-8 is used.

http://developer.apple.com/documentation/MacOSX/Conceptual/ BPInternational/Articles/FileEncodings.html


"All BSD system functions expect their string parameters to be in UTF-8 encoding and nothing else. Code that calls BSD system routines should ensure that the contents of all const *char parameters are in canonical UTF-8 encoding. In a canonical UTF-8 string, all decomposable characters are decomposed; for example, é (0x00E9) is represented as e (0x0065) + ´ (0x0301). To put things into a canonical UTF-8 encoding, use the “file-system representation” interfaces defined in Cocoa and Carbon (including Core Foundation)."

I think that you should use the mentioned APIs in the first place, but otherwise getting rid of BOMs seems reasonable as the interfaces never return them and it doesn't make sense to use them if you work only with UTF-8 (UTF-8 has a defined byte order independent of endiannes), UTF-8 BOMs are actually quite rare.

--
Adam Nohejl
Loki Software
mailto:email@hidden
http://lokisw.com

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: POSIX paths and UTF-8 on Mac OS X...
      • From: Shawn Erickson <email@hidden>
References: 
 >POSIX paths and UTF-8 on Mac OS X... (From: Shawn Erickson <email@hidden>)

  • Prev by Date: Re: POSIX paths and UTF-8 on Mac OS X...
  • Next by Date: Re: POSIX paths and UTF-8 on Mac OS X...
  • Previous by thread: Re: POSIX paths and UTF-8 on Mac OS X...
  • Next by thread: Re: POSIX paths and UTF-8 on Mac OS X...
  • Index(es):
    • Date
    • Thread