• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Determine encoding of file
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determine encoding of file


  • Subject: Re: Determine encoding of file
  • From: Michael Ash <email@hidden>
  • Date: Fri, 30 Jul 2010 23:50:46 -0400

On Fri, Jul 30, 2010 at 6:24 PM, Nick Zitzmann <email@hidden> wrote:
>
> On Jul 30, 2010, at 4:09 PM, Dave DeLong wrote:
>
>> Hi everyone,
>>
>> I have a seemingly simple question, but I haven't been able to figure it out.
>>
>> Given a file, how can I determine the NSStringEncoding of the file, without reading the entire file into memory?  (If the file isn't a text file, then defaulting to NSUTF8StringEncoding is just fine, since my code will only work properly if I'm working with text files anyway)
>>
>> I've found this: http://www.macosxguru.net/article.php?story=20030808081801868 but it seems ridiculously complex...
>
> Check the first two bytes. If they are 0xFEFF or 0xFFFE, then it is guaranteed to be in Unicode (UTF-16) format. Otherwise, it can be in pretty much any format, since pretty much every format that is not Unicode doesn't use identifiers of any sort.

A nitpick: starting with those two bytes is a *strong suggestion* that
it's UTF-16, but it could just be, say, a Latin-1 file that starts
with "þÿ", or a random binary file that happens to start with that
byte sequence.

One fact that's can be extremely useful for this sort of thing but
which seems to be little-known: due to the structure of UTF-8 it's
rare for a file to be valid UTF-8 by accident. Random data, or data
that isn't intended to be structured like UTF-8, is extremely unlikely
to happen to match the structure required by UTF-8 by coincidence.
Thus, if a file parses as UTF-8, you can be pretty confident that it
was supposed to be interpreted in that encoding.

The same is, alas, not true of UTF-16.

Mike
_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Determine encoding of file
      • From: Dave DeLong <email@hidden>
References: 
 >Determine encoding of file (From: Dave DeLong <email@hidden>)
 >Re: Determine encoding of file (From: Nick Zitzmann <email@hidden>)

  • Prev by Date: Re: Intercepting deletion of NSManagedObjects or how to handle deletion in non-explicit relationships?
  • Next by Date: Re: Intercepting deletion of NSManagedObjects or how to handle deletion in non-explicit relationships?
  • Previous by thread: Re: Determine encoding of file
  • Next by thread: Re: Determine encoding of file
  • Index(es):
    • Date
    • Thread