Re: Reading .doc files
Re: Reading .doc files
- Subject: Re: Reading .doc files
- From: "Alastair J.Houghton" <email@hidden>
- Date: Mon, 29 Sep 2003 10:58:50 +0100
On Monday, September 29, 2003, at 10:40 am, Lorenzo wrote:
Hi list,
I am trying to read Microsoft Word .doc files as pure text.
I already know there are utilities that do that, but I want to do that
by
myself, programmatically.
When the header is fixed sized I can easily find the start of the pure
text.
But when the offset of the first char of the pure text is variable, I
cannot
locate that point even using the "File Information Block" (FIB).
I tried to read the "fib.fcMin" from the FIB block (standing by the
documentation it should return the offset value of the firts char in
the
file), but this UInt32 variable returns a value that is not the offset
of
the pure text. Sometimes it's even greater than the whole file size.
Does anyone know where I am doing wrong?
Yep. Microsoft Word was developed on PCs, which use x86
microprocessors. The x86 is little-endian, so Microsoft Word's file
format uses little-endian numbers. You need to byte swap the number
you get back; there are a set of functions in the Foundation kit that
will do this for you, e.g. NSSwapLittleIntToHost().
By the way, the Word binary file format is quite complicated,
especially if you plan on supporting all of the different (and
incompatible) versions.
Kind regards,
Alastair.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.