Re: am i loading this pdf data correctly or not?
Re: am i loading this pdf data correctly or not?
- Subject: Re: am i loading this pdf data correctly or not?
- From: "Alastair J.Houghton" <email@hidden>
- Date: Thu, 7 Aug 2003 13:23:55 +0100
On Thursday, August 7, 2003, at 12:42 pm, Marcel Weiher wrote:
According to the PDF Reference Manual (which you can get for free
from Adobe's web site as a PDF), PDF is indeed 7-bit ASCII, apart
from:
o Inside strings (things delimited by open and close brackets).
o Inside streams (delimited by "stream" and "endstream").
So if you read it in token by token, you could indeed treat it as
ASCII apart from those two special cases. (Be careful though, there
is a bit of trickiness with strings and nested brackets.)
The only problem is that those two special cases are always present,
and prevent you from actually parsing the rest, because you have no
safe way of delimiting them.
I don't think that's true... as far as I can see (looking at sections
3.2.3 and 3.2.7 of the PDF 1.4 spec), the binary sections are suitably
delimited; I haven't tried writing code to parse them, so maybe I'm
missing something, but I can't see any obvious problem.
Strings aren't allowed to contain non-escaped unbalanced sets of
brackets (so you can write "(\()", "(\))" or "(())", but not "(()" or
"())"), which makes it pretty easy to find the end using a simple FSM.
Streams have a (mandatory) Length key in their stream dictionary that
tells you how many bytes of *encoded* data are in the PDF, so you just
read that many, then check that you are pointing at something that
looks like "endstream", possibly preceded by an end-of-line marker, and
carry on parsing.
It also occurs to me that Ben said he only wanted an overview of the
file, in which case it might be best to just read the very end of the
file, which will tell you where the cross-reference table is, then you
can use that to skip about easily without having to parse the rest.
Kind regards,
Alastair.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.