Re: am i loading this pdf data correctly or not?
Re: am i loading this pdf data correctly or not?
- Subject: Re: am i loading this pdf data correctly or not?
- From: Marcel Weiher <email@hidden>
- Date: Thu, 7 Aug 2003 12:42:34 +0100
i realised that the streams in their raw form were not useable as
they were, but i didn't realise they would cause outright problems.
other than the streams, pdfs are ascii i think,
According to the PDF Reference Manual (which you can get for free from
Adobe's web site as a PDF), PDF is indeed 7-bit ASCII, apart from:
o Inside strings (things delimited by open and close brackets).
o Inside streams (delimited by "stream" and "endstream").
So if you read it in token by token, you could indeed treat it as
ASCII apart from those two special cases. (Be careful though, there
is a bit of trickiness with strings and nested brackets.)
The only problem is that those two special cases are always present,
and prevent you from actually parsing the rest, because you have no
safe way of delimiting them.
I have also noticed that some files have 8-bit characters (and even
binary data) in comments, so although my quick skim of the reference
manual just now didn't reveal any obvious statement that that was
permissible, I'd take it as read that you need to support them in
comments as well.
Those characters get put in so file transfer programs won't treat the
file as ASCII and thus munge it.
Why use regular expressions? That seems a bit strange given that PDF
is based on Postscript, which is based on Forth, and Forth/Postscript
is pretty easy to parse properly. When you encounter a token that
introduces a string or a stream, keep reading until you find the end,
then continue reading tokens again.
The problem is that you can't reliably detect the end.
Marcel
--
Marcel Weiher Metaobject Software Technologies
email@hidden www.metaobject.com
Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.
1d480c25f397c4786386135f8e8938e4
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.