Re: am i loading this pdf data correctly or not?
Re: am i loading this pdf data correctly or not?
- Subject: Re: am i loading this pdf data correctly or not?
- From: Marcel Weiher <email@hidden>
- Date: Wed, 6 Aug 2003 15:46:58 +0100
On Wednesday, Aug 6, 2003, at 15:26 Europe/London, Ben Dougall wrote:
so pdfs are made up of both text data and non-text data? and non-text
data should not be put in an NSString (that makes sense i suppose :) > ).
No. The entire PDF file is a sequence of bytes, data. None of those
byte-sequences can be regarded as text. There may (or may not) be text
that is encoded in the PDF, but not in any way that you can segment it
on a purely syntactic level. Instead, you have to parse/interpret the
PDF (as data/bytes),
so parsing pdf data straight from the file with regular expressions is
not on, full stop.
Yes. You need to at least take into account the binary streams that
are embedded in the PDF document structure, in order to ignore those.
For that, you have to parse the PDF document structure (via the xref
table and the objects). Once you have that, you have PDF objects and
binary streams. The PDF objects then tell you how you can parse the
binary data streams to get at the actual contents of the PDF file.
(Virtually all relevant data is in those streams).
in order to do that i'd have to first extract or block out or
something the data (non-text data that is) so as to make sure that i
do not give data (non-text) to the regular expression methods? i need
to somehow parse the NSData first before regexing.
You need code that fully understands PDF, unless you only need some
very specialized data that may be residing in the objects.
Marcel
--
Marcel Weiher Metaobject Software Technologies
email@hidden www.metaobject.com
Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.
1d480c25f397c4786386135f8e8938e4
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.