Re: reading in text files
Re: reading in text files
- Subject: Re: reading in text files
- From: Bob Savage <email@hidden>
- Date: Sun, 03 Feb 2002 03:58:58 -0600
on 2/2/02 6:15 PM, Ondra Cada wrote:
>
BS> Okay, Ondra, hesitations aside, *if* someone wanted to implement
>
BS> something like this, as an addition to what Apple provides (so the
>
BS> purists would be unsullied ;), would this be a reasonable approach:
>
>
Well, it might be, perhaps. I would argue some points, though...
:) No problem with that. My hope is that we can figure out the "right way"
to d it, and the next time someone asks how to read a file line by line in
cocoa, people can say "search the archives for the hideously tedious thread
called `reading in text files`", instead of "just whip something up, it
should be easy."
>
Sorry, but I have not time enough to proofread the code, far less to test
>
it.
Again, not a problem. I'll test it out now that I have your initial
approval.
>
BS> delimiterFound:(int*)indexInDelimsOrNegative // why not a BOOL?
>
>
It was meant to be the index of delimiter which was found, presuming
>
delimiters can be just one-byte long, and that the NSData contains de facto
>
an array of them. Therefore, 0 would mean the first byte of delims was found,
>
1 the second one, etc.
Okay, I'll fix that.
>
if (indexInDelimsOrNegative) *indexInDelimsOrNegative=-1;
makes sense.
>
I presumed a series of byte-long delimiters. You are right, though, that
>
more complicated delimiters might be handy.
>
>
As for the API, it is quite simple -- either
>
>
... delimiter:(NSArray*)delims ...
>
>
with array of NSData objects, each of which would contain one delimiter of
>
arbitrary length, or even (as I would like it more)
>
>
... delimiter:delims ...
>
>
which would spare you of doing explicitly [NSArray arrayWithObject:foo] in
>
the (presumably) most common case. It might even be handy to allow for
>
strings (to be converted to data delimiters via default string encoding) in
>
place of NSData objects, perhaps even NSNumbers representing directly byte
>
values, or whatever ;)))
You lost me somewhere in there, but given the subsequent statement regarding
delimiters crossing chunk borders, I'ld tend to say, "here is how to do the
supposedly easy thing. If you want to do the hard thing, you'll have to come
up with a new method."
>
Nevertheless, as soon as you allow more-byte delimiters, you are in for a
>
number of nasty surprises. What if the delimiter happens to be just
>
_partially_ in the first chunk of len bytes? What if some delimiter is longer
>
than the maximum length? What if a delimiter is a "substring" (well,
>
subdata) of another? (That's not as idiotic as it might seem at the first
>
look: common line delimiters are "\n", "\r", and "\r\n", and one generally
>
wants to distinguish them all!) Etc...
This last bit bothers me because it goes right to the heart of the problem
this code is trying to solve. Someone should be able to read a file line by
line. For the moment I am going to ignore the DOS line ending pattern so I
can get a version running. Suggestions here are appreciated.
Worst case scenario: DOS formatted files end up with an extra blank line
every other line read (because a line read using "\r\n" as the delimiters,
would be interpreted as hit on either "\r" or "\n", and therefore return
first "some text\r" then "\n"). (See `one problem` below).
>
BS> // seek to location after there
>
>
I don't like this too much, unless I knew exactly how NSFileHandle gets
>
buffered (IIRC, there is no API to control buffering at the NSFileHandle
>
level). Without ability to control buffering, this might mean *quite*
>
compromised effectivity of reading some parts of the file many times, with
>
oooooh so many disk accesses...
I don't agree with you here Ondra. If someone sends a message called
readDataOfLength:orUpToDelimiterFrom:delimiterFound:, and the delimiter
*was* found, I think it would be reasonable to expect that the file was not
"read" past the delimiter -- meaning that the file cursor is ready to read
the next sequential chunk. Again, this whole discussion comes from wanting
to read a file line by line, right? You don't want to read a line, and then
find out that when you call the method a second time the implementation of
`readline` actually tossed away some amount of data (perhaps several whole
lines!)
The method has to set the cursor so that it is ready for the next sequential
read.
One problem with all of this is that the method is named "orUpToDelimiter"
so I would expect that the result strips the delimiter (and this is how I
coded it in the previous email), but [A] this might not be desirable
behavior in all cases, and [b] in the discussion of DOS-fomatted files,
above, we would end up returning an empty data object in between valid
lines. This seems very bad, because the user would not be able to tell if
there was additional valid data past that point (which would destroy the
ability to put the call into a while loop, for example).
For these two reasons, I propose that the behavior be that the delimiter is
not stripped, and since it would not work to leave it for the next line
(which would block all reading!) I think the method should be renamed
"orUpThroughDelimiter".
For the curious I'll post the revised copy of the code here:
<
http://homepage.mac.com/bobsavage/Programming/delimiting/index.html>, but I
won't be able to do any testing tonight. Hey, that's why we invented
Sundays!
Bob