Re: Really big files and encodings
Re: Really big files and encodings
- Subject: Re: Really big files and encodings
- From: Alastair Houghton <email@hidden>
- Date: Wed, 22 Apr 2009 10:30:25 +0100
On 22 Apr 2009, at 06:57, Seth Willits wrote:
In my app, I import data from potentially very large files. In the
first pass, I simply mmap'd the entire file, created a string using
CFStringCreateWithBytesNoCopy, and go about my business. This works
great until it hits the address limit when it's running as a 32-bit
process, so now in the second pass I want to rework it a bit to only
mmap a chunk (128 MB) at a time.
Now, if it were simply binary data, I could chop up the file however
I wanted, but since the file I'm processing is actually a huge
*text* file, I need to mmap an appropriate range so creating the
string doesn't fail because a multi-byte character was split down
the middle.
Hi Seth,
I think this highlights a significant deficiency in the CFString/
NSString API, which is that it's impossible to get any kind of
streaming encoder/decoder (which is really what you want for this kind
of task).
Have you considered using libiconv instead to convert to UTF-16, then
creating your strings from that? That would give you more control and
would mean that you didn't have to guess where the encoder would want
to start/finish working on your data (since it will tell you).
I guess ICU might also be a way around this, though iconv() et al.
have the significant benefit of being documented and supported API.
Kind regards,
Alastair.
--
http://alastairs-place.net
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden