Re: Reading .doc files
Re: Reading .doc files
- Subject: Re: Reading .doc files
- From: "Alastair J.Houghton" <email@hidden>
- Date: Mon, 29 Sep 2003 11:19:23 +0100
On Monday, September 29, 2003, at 10:58 am, Alastair J.Houghton wrote:
On Monday, September 29, 2003, at 10:40 am, Lorenzo wrote:
Hi list,
I am trying to read Microsoft Word .doc files as pure text.
I already know there are utilities that do that, but I want to do
that by
myself, programmatically.
When the header is fixed sized I can easily find the start of the
pure text.
But when the offset of the first char of the pure text is variable, I
cannot
locate that point even using the "File Information Block" (FIB).
I tried to read the "fib.fcMin" from the FIB block (standing by the
documentation it should return the offset value of the firts char in
the
file), but this UInt32 variable returns a value that is not the
offset of
the pure text. Sometimes it's even greater than the whole file size.
Does anyone know where I am doing wrong?
Just to add another point that I thought of (and sent in a somewhat
longer e-mail direct to Lorenzo), the structure packing is probably
wrong as well; when using structures to represent binary data stored in
a file, you need to make sure that the offsets of the fields match the
positions in the file to which they are supposed to correspond. I
suspect you also need to use
#pragma pack(push, <some value>)
before your structure and
#pragma pack(pop)
after it. You should be able to work-out what <some value> should be
from the MS docs.
Kind regards,
Alastair.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.