site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com I am in the process of porting some software the runs on Windows to Mac OS X and I want to validate what I believe to be true on Mac OS X and also understand any special requirements that exist. I am fairly sure (and API docs support) that things like open and fopen accept UTF-8 (ASCII is a subset of UTF-8, so of course it accept traditional ASCII as well). Is this correct? I do see in Core Foundation code that in the new 10.4 method that returns the file system representation of of a CFString that UTF-8 is used but that it also attempts to do some specific type of string decomposition[1] (assume dealing with combining accents and the likes). So my question what exactly is expected and/or required in a UTF-8 string handed to something like fopen. Also can a BOM exist at the head of the UTF-8 string or must I assure that it doesn't exist. I tried to find a good document that calls things out fully but the best I could find was the UTF-8 is used. -- Adam Nohejl Loki Software mailto:adam@lokisw.com http://lokisw.com _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... 2005/11/23 v 17:37, Shawn Erickson: http://developer.apple.com/documentation/MacOSX/Conceptual/ BPInternational/Articles/FileEncodings.html "All BSD system functions expect their string parameters to be in UTF-8 encoding and nothing else. Code that calls BSD system routines should ensure that the contents of all const *char parameters are in canonical UTF-8 encoding. In a canonical UTF-8 string, all decomposable characters are decomposed; for example, é (0x00E9) is represented as e (0x0065) + ´ (0x0301). To put things into a canonical UTF-8 encoding, use the “file-system representation” interfaces defined in Cocoa and Carbon (including Core Foundation)." I think that you should use the mentioned APIs in the first place, but otherwise getting rid of BOMs seems reasonable as the interfaces never return them and it doesn't make sense to use them if you work only with UTF-8 (UTF-8 has a defined byte order independent of endiannes), UTF-8 BOMs are actually quite rare. This email sent to site_archiver@lists.apple.com