Re: reading in text files
Re: reading in text files
- Subject: Re: reading in text files
- From: Bob Savage <email@hidden>
- Date: Sun, 03 Feb 2002 23:11:18 -0600
on 2/3/02 7:57 AM, Ondra Cada wrote:
>
If we came with something really useable (which I, to be honest,
>
somewhat doubt
Hey! What kind of attitude is that? =)
>
-- at the very least, I've never needed such a NSFileHandle method yet
>
myself)
Nor have I, loading an entire file into an NSString works fine for me, but I
never work with large files, which is what the question presumed. At least
after this there will be a straightforward answer in case someone wants one.
So far my experience working with this problem is that it is not quite as
simple as some people would have me believe. :\ Consider the answers to
these questions:
Q: How do I programmatically add an item to a popup menu?
Q: How do I split up a string of comma-delimited values into an array of
strings?
Q: How do I create a string with the date 1 week from today?
These (and oh, so many more) questions are answered using one line of Cocoa
code. *THAT* to me is "easy". Apple provides a "construction kit" which can
be used to assemble more complex projects quickly and easily. I do not think
that reading a file line by line is an unreasonable task, but the answer for
how to do that in Cocoa seems to raise above the threshold of "easy" as
evinced by the above questions. 3 lines of code, okay fine ... 5 lines of
code? That seems ridiculous when Python can do: x = f.readline() -- and yet
we're up to 30 lines for the method, and the method call itself returns
NSData, which needs to be converted to an NSString by the user. I'm not
expecting ObjC to rise to the level of Python in this regard, but 30:1??
>
it would be probably better to put it into MiscKit (if it is still
>
maintained) or so...
Even if it just ends up archived in the list archive, it will at least be
available to people.
>
NSData dd=...,ee=...;
>
...delimiters:[NSArray arrayWithObjects:dd,ee,@"\r\n",nil]...
>
>
would search for three delimiters, the first one contained in the dd data,
>
the second in the ee data, and third being directly "\r\n". Also,
>
>
...delimiters:dd...
>
>
would search for just one delimiter contained in the dd data, and
>
>
...delimiters:@"\r"...
>
>
would search just for one delimiter, the CR byte.
>
Okay, if someone wants to implement that and post it, they are more than
welcome to do so. :^)
>
The simplest heuristic is to ignore all CRs which are in one stream with at
>
least one LF. CRs which don't have a LF neighbour are to be interpreted as
>
EOL so as to support Classic.
>
Well, that certainly doesn't fit with the model you provided earlier (where
the data was to be searched for one of an array of arbitrary delimiter
bytes. At that point logic about CRs and LFs can't be built in. The
appropriate thing would be to assume we were working with strings to begin
with, and not ask for delimiters and handle all known cases of line ending
characters (and combinations).
>
I don't like the idea of trashing the already read
>
buffer though. That's (one of reasons) why I don't think NSFileHandle is the
>
best level for such API.
Or why it should be implemented by the good folks at Apple who have access
to the internals of NSFileHandle and can manipulate the buffer more
elegantly :>
>
The problem is that the "unused" buffer should be retained in-memory, and
>
when the next read comes, it should be used first, before the method tries to
>
read more bytes from the file, very generally and imprecisely of pattern
But if we are managing an additional buffer we can't be providing this
method as a category on NSFileHandle. Again, if this were to be desired, it
should be implemented properly by the folks at Apple.
>
BS> "orUpToDelimiter" so I would expect that the result strips the delimiter
>
BS> (and this is how I coded it in the previous email), but [A] this might
>
BS> not be desirable behavior in all cases,
>
>
We can add flag for that, if you feel like. I just presumed that it can be
>
added to the data in those IMHO *VERY* rare cases you do want to have it
>
there.
I agree that users are most likely to want the delimiter stripped, but I
think naming the method "orUpThroughDelimiter" is a clearer description of
what condition the file cursor is left in, so I am keeping that name, but
adding a flag, "stripping:(BOOL)strip" so people can get the desired effect.
>
BS> [b] in the discussion of DOS-fomatted files, above, we would end up
>
returning an empty data object in between valid lines.
>
>
- non-empty data: bytes read in;
>
- empty data (non-nil): empty chunk read in, ie., there were two delims in
>
succession;
>
- no data (nil): EOF;
>
- no return at all (exception raised): error.
>
This looks good. I'll note this in the documentation.
Later that same day:
I made the changes and did some tests. Everything looks good except The
end of the file is adding an extra return that does not get trapped as a
delimiter (notice the hanging apostrophe in the "Second line (by reading one
line)" sample output below.
-- SAMPLE OUTPUT --
* First line (the easy way):
'This is a simple test/example of the NSFileHandle(delimiting) category,
which is enclosed here as two files (NSFileHandle(delimiting).h, and
NSFileHandle(delimiting).m).'
* First line (by reading one line):
'This is a simple test/example of the NSFileHandle(delimiting) category,
which is enclosed here as two files (NSFileHandle(delimiting).h, and
NSFileHandle(delimiting).m).'
* Second line (the easy way):
'This distribution also includes some very minimal documentation file in
HTML format, and this README.'
* Second line (by reading one line):
'This distribution also includes some very minimal documentation file in
HTML format, and this README.
'
delimit-test has exited with status 0.
In case anyone is interested, the `final` form of NSFileHandle(delimiting)
can be found here:
<
http://homepage.mac.com/bobsavage/Programming/delimiting/index.html>
Bob