Re: am i loading this pdf data correctly or not?
Re: am i loading this pdf data correctly or not?
- Subject: Re: am i loading this pdf data correctly or not?
- From: Marcel Weiher <email@hidden>
- Date: Thu, 7 Aug 2003 14:20:52 +0100
[parsing pdf]
The only problem is that those two special cases are always present,
and prevent you from actually parsing the rest, because you have no
safe way of delimiting them.
I don't think that's true... as far as I can see (looking at sections
3.2.3 and 3.2.7 of the PDF 1.4 spec), the binary sections are suitably
delimited; I haven't tried writing code to parse them, so maybe I'm
missing something, but I can't see any obvious problem.
The problem with that is that you cannot in any way guarantee that the
delimiters will not appear in the binary section.
Strings aren't allowed to contain non-escaped unbalanced sets of
brackets (so you can write "(\()", "(\))" or "(())", but not "(()" or
"())"), which makes it pretty easy to find the end using a simple FSM.
PDF streams can contain (for example) arbitrary files that are embedded
in the PDF.
Streams have a (mandatory) Length key in their stream dictionary
Yes, this seems like a way out. Alas, it doesn't actually help you
when you're scanning the file sequentially, because the stream
dictionary can have an indirect reference to its length, with the
length following after the actual stream data. This is actually quite
common, because it allows the stream to be written on-the-fly.
It also occurs to me that Ben said he only wanted an overview of the
file, in which case it might be best to just read the very end of the
file, which will tell you where the cross-reference table is, then you
can use that to skip about easily without having to parse the rest.
Yes, you essentially have to go through the cross-reference table,
which then delivers the rest of the file structure.
Marcel
--
Marcel Weiher Metaobject Software Technologies
email@hidden www.metaobject.com
Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.
1d480c25f397c4786386135f8e8938e4
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.