• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Really big files and encodings
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Really big files and encodings


  • Subject: Re: Really big files and encodings
  • From: Alastair Houghton <email@hidden>
  • Date: Wed, 22 Apr 2009 10:30:25 +0100

On 22 Apr 2009, at 06:57, Seth Willits wrote:

In my app, I import data from potentially very large files. In the first pass, I simply mmap'd the entire file, created a string using CFStringCreateWithBytesNoCopy, and go about my business. This works great until it hits the address limit when it's running as a 32-bit process, so now in the second pass I want to rework it a bit to only mmap a chunk (128 MB) at a time.

Now, if it were simply binary data, I could chop up the file however I wanted, but since the file I'm processing is actually a huge *text* file, I need to mmap an appropriate range so creating the string doesn't fail because a multi-byte character was split down the middle.

Hi Seth,

I think this highlights a significant deficiency in the CFString/ NSString API, which is that it's impossible to get any kind of streaming encoder/decoder (which is really what you want for this kind of task).

Have you considered using libiconv instead to convert to UTF-16, then creating your strings from that? That would give you more control and would mean that you didn't have to guess where the encoder would want to start/finish working on your data (since it will tell you).

I guess ICU might also be a way around this, though iconv() et al. have the significant benefit of being documented and supported API.

Kind regards,

Alastair.

--
http://alastairs-place.net



_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Really big files and encodings (From: Seth Willits <email@hidden>)

  • Prev by Date: Re: Core Data Fetches + Transient Properties + NSPredicateEditor = Sadness
  • Next by Date: file extensions and Mime Type
  • Previous by thread: Really big files and encodings
  • Next by thread: Re: Really big files and encodings
  • Index(es):
    • Date
    • Thread