Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: am i loading this pdf data correctly or not?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: am i loading this pdf data correctly or not?

Subject: Re: am i loading this pdf data correctly or not?
From: "Alastair J.Houghton" <email@hidden>
Date: Thu, 7 Aug 2003 13:23:55 +0100

On Thursday, August 7, 2003, at 12:42 pm, Marcel Weiher wrote:

According to the PDF Reference Manual (which you can get for free from Adobe's web site as a PDF), PDF is indeed 7-bit ASCII, apart from:

o Inside strings (things delimited by open and close brackets).

o Inside streams (delimited by "stream" and "endstream").

So if you read it in token by token, you could indeed treat it as ASCII apart from those two special cases. (Be careful though, there is a bit of trickiness with strings and nested brackets.)

The only problem is that those two special cases are always present, and prevent you from actually parsing the rest, because you have no safe way of delimiting them.

I don't think that's true... as far as I can see (looking at sections 3.2.3 and 3.2.7 of the PDF 1.4 spec), the binary sections are suitably delimited; I haven't tried writing code to parse them, so maybe I'm missing something, but I can't see any obvious problem.

Strings aren't allowed to contain non-escaped unbalanced sets of brackets (so you can write "(\()", "(\))" or "(())", but not "(()" or "())"), which makes it pretty easy to find the end using a simple FSM. Streams have a (mandatory) Length key in their stream dictionary that tells you how many bytes of *encoded* data are in the PDF, so you just read that many, then check that you are pointing at something that looks like "endstream", possibly preceded by an end-of-line marker, and carry on parsing.

It also occurs to me that Ben said he only wanted an overview of the file, in which case it might be best to just read the very end of the file, which will tell you where the cross-reference table is, then you can use that to skip about easily without having to parse the rest.

Kind regards,

Alastair.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: am i loading this pdf data correctly or not?
  - From: Marcel Weiher <email@hidden>

References:
	>Re: am i loading this pdf data correctly or not? (From: Marcel Weiher <email@hidden>)

Prev by Date: Re: am i loading this pdf data correctly or not?
Next by Date: Re: am i loading this pdf data correctly or not?
Previous by thread: Re: am i loading this pdf data correctly or not?
Next by thread: Re: am i loading this pdf data correctly or not?
Index(es):
- Date
- Thread