• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Determine encoding of file
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determine encoding of file


  • Subject: Re: Determine encoding of file
  • From: Dave DeLong <email@hidden>
  • Date: Sat, 31 Jul 2010 22:18:11 -0600

Thanks to everyone who responded with ideas on this.

John's suggestion of the TEC* functions was promising, but I ended up not using them when I discovered that they're not available on the iPhone.  Ditto for Martin's suggestion of using getxattr.

I eventually ended up using Rainer's method (which was also suggested by Nick Z).  I understand that it's not perfect, but it was good enough for what I was doing.

And what was I doing....

I've just posted an open source CSV parser to github.  I've had occasion in the past for a parser that's more intelligent than -[NSString componentsSeparatedByString:@","], and have developed this one.  However, I've also had need to use this parser on UTF16 files, hence my desire to be able to auto-discover file encodings.  (There is, of course, an option to provide a known encoding when initializing the parser)

Please check it out, fork it, and help me make improvements to it:    http://github.com/davedelong/CHCSVParser

Thanks for your help, everyone!

Dave DeLong

On Jul 30, 2010, at 9:50 PM, Michael Ash wrote:
>
> A nitpick: starting with those two bytes is a *strong suggestion* that
> it's UTF-16, but it could just be, say, a Latin-1 file that starts
> with "þÿ", or a random binary file that happens to start with that
> byte sequence.
>
> One fact that's can be extremely useful for this sort of thing but
> which seems to be little-known: due to the structure of UTF-8 it's
> rare for a file to be valid UTF-8 by accident. Random data, or data
> that isn't intended to be structured like UTF-8, is extremely unlikely
> to happen to match the structure required by UTF-8 by coincidence.
> Thus, if a file parses as UTF-8, you can be pretty confident that it
> was supposed to be interpreted in that encoding.
>
> The same is, alas, not true of UTF-16.
>
> Mike

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References: 
 >Determine encoding of file (From: Dave DeLong <email@hidden>)
 >Re: Determine encoding of file (From: Nick Zitzmann <email@hidden>)
 >Re: Determine encoding of file (From: Michael Ash <email@hidden>)

  • Prev by Date: Problem with UITableView
  • Next by Date: table header cells not on even lines
  • Previous by thread: Re: Determine encoding of file
  • Next by thread: Re: Determine encoding of file
  • Index(es):
    • Date
    • Thread