Re: reading in text files
Re: reading in text files
- Subject: Re: reading in text files
- From: Bob Savage <email@hidden>
- Date: Mon, 04 Feb 2002 10:37:18 -0600
on 2/4/02 9:11 AM, Ondra Cada wrote:
>
BS> Nor have I, loading an entire file into an NSString works fine for me,
>
BS> but I never work with large files, which is what the question presumed.
>
>
That's it. Since I did neither, I have no feel for potential problems and
>
gotchas, and thus am somewhat reluctant to make code for it, and if we (well,
>
*you* -- I'm just talking, you're programming ;) make some, I fear it would
>
prove problematic in real-life situation.
Well, people can take what they get. Initial tests seem to indicate that it
works properly. It has a Cocoa feel to it instead of mucking about in C
(category on an existing class, instance method instead of a series of
function calls), and the amount of code people will need to write has been
reduced by about 15:1. I clearly state in the header that they should be
careful, and it only makes sense that if someone wants to use it in
production they do additional testing, and alter the code as they see fit.
Which is all to say that I'm satisfied that there is a Cocoa way of reading
a file line by line, available to people searching the archives:
<
http://homepage.mac.com/bobsavage/Programming/delimiting/index.html>
>
BS> That seems ridiculous when Python can do:
>
BS> x = f.readline()
>
>
What would happen if the file is 16GB long, and the first EOL character is
>
three bytes before end?
>
>
How would it cope with different line delimiters?
>
It handles the 3 standard line endings with no problems, but I don't know
about a 16GB paragraph. My guess is that you should not try to read a file
like that line by line. :) In the case of NSFileHandle(delimiting), it would
be read in smaller chunks because of the size limit built in to every call.
>
Are we *that sure* the user wants NSStrings? If so, it would be *MUCH*
>
easier to do that on some upper level, eg. preparing something like
>
I am sure that one very good reason to try to read a file up through a
delimiting byte is to read it line by line. This is very commonly done for
many reasons (I still use awk because it makes it so dang easy). On the
other hand, the way we did it has some added flexibility.
>
BS> I'm not expecting ObjC to rise to the level of Python in this regard,
>
>
Actually, this is not ObjC thing, but a Foundation one: libraries, not
>
language.
You are right, of course. In fact it isn't even a matter of the Python
language, because, IIRC, the `file` class is (or is in) a module that is
statically linked to the interpreter.
>
Right! OTOH, it is probably the most often needed variant. So what? Should
>
we clutter NSFileHandle by heuristics which is not entitled to be there
>
(since it copes with EOLs, and on the NSFileHandle level there is no EOL
>
defined)? Or should we go the way of enumerator designed above? Or should we
>
trash the most useable behaviour?
Okay, lets make a deal. If someone decides that they don't like the example
of reading a file up to a delimiter using our little category, they are free
to code their own :)
>
BS> >The problem is that the "unused" buffer should be retained in-memory,
>
BS> >and when the next read comes, it should be used first, before the method
>
BS> >tries to read more bytes from the file, very generally and imprecisely
>
BS> >of pattern
>
BS>
>
BS> But if we are managing an additional buffer we can't be providing this
>
BS> method as a category on NSFileHandle.
>
>
Actually we can, but it would be impractical (a category _CAN_ have de-facto
>
properties, accessed by a somewhat clumsy way of a static dictionary
>
containing properties keyed by the object id).
I still have misgivings about keeping a second buffer that is not truly
related to the first. Without adding this second buffer a person can use an
NSFileHandle (with our category) in the following way:
(1) create filehandle
(2) read in a header knowing that it could be multiple lines long, but will
end at the first newline character that is not immediately preceded by a "/"
(3) read in a series of values separated by some delimiting character
(4) read in (or skip over) a chunk of data that is n bytes long.
(5) start reading line-by-line (or colon by colon, etc.)
If we use a second buffer just for the delimiting reads, the file cursor
(provided by NSFleHandle, not our category) won't be in the same place as
the buffer position (provided by our category) when moving from step 3 to 4
above.
Again, if people don't like the performance, due to unnecessary manipulation
of the file position, I promise to not get upset if they write their own
code that is optimized for their circumstances.
Bob
/*" Hey, it's the Internet, no one said it had to be useful! "*/