• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Really big files and encodings
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Really big files and encodings


  • Subject: Re: Really big files and encodings
  • From: Greg Guerin <email@hidden>
  • Date: Wed, 22 Apr 2009 10:34:49 -0700

Seth Willits wrote:

In my app, I import data from potentially very large files. In the first pass, I simply mmap'd the entire file, created a string using CFStringCreateWithBytesNoCopy, and go about my business. This works great until it hits the address limit when it's running as a 32-bit process, so now in the second pass I want to rework it a bit to only mmap a chunk (128 MB) at a time.

Now, if it were simply binary data, I could chop up the file however I wanted, but since the file I'm processing is actually a huge *text* file, I need to mmap an appropriate range so creating the string doesn't fail because a multi-byte character was split down the middle.

Change the buffer management.

Add a cushion to your mmap'ed chunk, say 1 MB, so you mmap in 129 MB at a time. When parsing the first 128 MB, everything proceeds normally, and there are no worries about splitting a multi-byte character. You can parse bytes after 128 MB because they're safely represented in the cushion area.

When the get-next-string starting position moves into the cushion area, then you re-mmap the next chunk (advance by 128 MB, i.e. buffer minus cushion) and reposition your pointers in the buffer. Then you have about 128 MB of no worries again.

Choose a cushion size suitable for the maximum length of multi-byte sequence. There's no magic to 1 MB, if something smaller suffices. And don't forget the combining character forms where multiple multi- byte "characters" should remain together.

  -- GG

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Prev by Date: Re: Time since Login?
  • Next by Date: Re: How to make app login window to look like OS X user login window ?
  • Previous by thread: Re: Really big files and encodings
  • Next by thread: Long Time Dealy When getting Vended Object
  • Index(es):
    • Date
    • Thread