Re: Read lines from very large text file
Re: Read lines from very large text file
- Subject: Re: Read lines from very large text file
- From: Seth Willits <email@hidden>
- Date: Mon, 2 Feb 2009 21:25:51 -0800
On Feb 2, 2009, at 7:50 PM, Joar Wingfors wrote:
Before opening the file, either determine, guess, or be told what
the encoding is. With that encoding, convert your delimiter string
into raw bytes, then do byte-for-byte comparison on the file to
find occurrences of that delimiter.
How do you know what delimiter string to use?
Well the original poster said he wants to read lines. So \r, \n, or \r
\n is your delimiter. It depends on your usage. If you're reading a
binary file, then the combination of encoding/delimiter isn't an issue
since you're going to have fixed data chunk sizes or a length value
stored in the file itself telling how how big a chunk is. But if
you're reading a text file, line endings are more or less the only
logical delimiter.
If you have an encoding where characters are not of fixed width, is
it generally safe to assume that the byte signature of the valid
delimiter strings for that encoding cannot also be found as a sub
pattern of some combination of other characters? Perhaps that would
always be a safe assumption, I'm no expert on string encodings and
line delimiters.
I actually thought about that as well, and honestly I'm not 100% sure.
I can see it being plausible or very likely to be problem and I'm not
sure which is correct. Oops. But it should be universally easy to deal
with by checking the MSB on the preceding byte if it's not fixed-
width? (I think)
--
Seth Willits
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden