Re: Read lines from very large text file
Re: Read lines from very large text file
- Subject: Re: Read lines from very large text file
- From: Greg Parker <email@hidden>
- Date: Mon, 2 Feb 2009 20:31:44 -0800
On Feb 2, 2009, at 7:50 PM, Joar Wingfors wrote:
On Feb 2, 2009, at 6:02 PM, Seth Willits wrote:
Before opening the file, either determine, guess, or be told what
the encoding is. With that encoding, convert your delimiter string
into raw bytes, then do byte-for-byte comparison on the file to
find occurrences of that delimiter.
How do you know what delimiter string to use? Another thing that
you'd have to determine, guess or be told, right? In general I would
guess that it in this case almost always would be impossible and /
or inappropriate to attempt to determine either of these two, and
that you would have to simply default to something reasonable.
That's right, though heuristics work better for the guess-line-ending
problem than they do for the guess-encoding problem. If you scan the
first few KB of a sufficiently-long file and see exactly one kind of
line ending, it's a good bet that you're right.
If you have an encoding where characters are not of fixed width, is
it generally safe to assume that the byte signature of the valid
delimiter strings for that encoding cannot also be found as a sub
pattern of some combination of other characters? Perhaps that would
always be a safe assumption, I'm no expert on string encodings and
line delimiters.
Safe in some encodings, unsafe in others. I'm pretty sure that UTF-8
is safe - that no valid UTF-8 character is a subsequence of any other
valid UTF-8 character.
--
Greg Parker email@hidden Runtime Wrangler
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden