• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: am i loading this pdf data correctly or not?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: am i loading this pdf data correctly or not?


  • Subject: Re: am i loading this pdf data correctly or not?
  • From: Marcel Weiher <email@hidden>
  • Date: Thu, 7 Aug 2003 12:42:34 +0100

i realised that the streams in their raw form were not useable as they were, but i didn't realise they would cause outright problems. other than the streams, pdfs are ascii i think,

According to the PDF Reference Manual (which you can get for free from Adobe's web site as a PDF), PDF is indeed 7-bit ASCII, apart from:

o Inside strings (things delimited by open and close brackets).

o Inside streams (delimited by "stream" and "endstream").

So if you read it in token by token, you could indeed treat it as ASCII apart from those two special cases. (Be careful though, there is a bit of trickiness with strings and nested brackets.)

The only problem is that those two special cases are always present, and prevent you from actually parsing the rest, because you have no safe way of delimiting them.


I have also noticed that some files have 8-bit characters (and even binary data) in comments, so although my quick skim of the reference manual just now didn't reveal any obvious statement that that was permissible, I'd take it as read that you need to support them in comments as well.

Those characters get put in so file transfer programs won't treat the file as ASCII and thus munge it.

Why use regular expressions? That seems a bit strange given that PDF is based on Postscript, which is based on Forth, and Forth/Postscript is pretty easy to parse properly. When you encounter a token that introduces a string or a stream, keep reading until you find the end, then continue reading tokens again.

The problem is that you can't reliably detect the end.

Marcel


--
Marcel Weiher Metaobject Software Technologies
email@hidden www.metaobject.com
Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.
1d480c25f397c4786386135f8e8938e4
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: am i loading this pdf data correctly or not?
      • From: "Alastair J.Houghton" <email@hidden>
References: 
 >Re: am i loading this pdf data correctly or not? (From: "Alastair J.Houghton" <email@hidden>)

  • Prev by Date: Re: beginning Obj-C [LONG]
  • Next by Date: Re: am i loading this pdf data correctly or not?
  • Previous by thread: Re: am i loading this pdf data correctly or not?
  • Next by thread: Re: am i loading this pdf data correctly or not?
  • Index(es):
    • Date
    • Thread