Re: [Q] What encoding method can be automatically detected?
Re: [Q] What encoding method can be automatically detected?
- Subject: Re: [Q] What encoding method can be automatically detected?
- From: Ricky Sharp <email@hidden>
- Date: Thu, 01 Feb 2007 08:39:15 -0800
On Thursday, February 01, 2007, at 08:29AM, "JongAm Park" <email@hidden> wrote:
>According to Apple's document, it says that this method can determine
>encoding method used for the file whose file path is given.
>
> stringWithContentsOfFile:usedEncoding:error:
>
>However, it can't determine if the content of a file is whether in
>Mac Roman, ISO Latin1, and so on.
>However, if stringWithContentsOfFile:encoding:error is used with the
>Mac Roman encoding, it can open a file in ISO Latin1, etc.
>
>So, it seems to me that the stringWithContentsOfFile:encoding:error
>can read from a file with the given encoding successfully, even
>though the real encoding method compatible to the given "encoding".
>However for the stringWithContentsOfFile:usedEncoding:error it doesn't.
>
>So.. can anyone tell me what encoding method the
>stringWithContentsOfFile:usedEncoding:error can detect?
I believe the 'used encoding' can only detect UTFxx encoding. stringWithContentsOfFile:usedEncoding: is basically a replacement to the now-deprecated (as of 10.4) stringWithContentsOfFile:. The docs on that API mention that encodings are only returned when the file contains BOMs. Now, then UTF-8 BOMS are optional, so that older method would probably not be able to detect UTF-8 files lacking BOMs.
In terms of MacRoman vs. ISO-Latin-1, etc., they are effectively just a set of 256 characters. Although, their are subtle differences between CP-1252 (default Windows charset) and ISO-8859-1 (aka Latin-1).
So if a file is MacRoman, its just a stream of 8-bit chars. That file can be opened up on any system. And, when doing so, only then as a human reading the document could you see if things make sense. For example, opening that doc using Notepad would potentially map high-ascii Mac chars to different high-ASCII win chars.
In short, if your app needs to work with generic text documents coming from multiple OS machines such that the native encoding must be used, you'll need to add in a feature so that your users can specify what platform created that file.
Definitely try to migrate your users to using UTF-8 for their documents; makes life so much easier.
--
Rick Sharp
Instant Interactive(tm)
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden