• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: am i loading this pdf data correctly or not?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: am i loading this pdf data correctly or not?


  • Subject: Re: am i loading this pdf data correctly or not?
  • From: Marcel Weiher <email@hidden>
  • Date: Wed, 6 Aug 2003 15:46:58 +0100

On Wednesday, Aug 6, 2003, at 15:26 Europe/London, Ben Dougall wrote:

so pdfs are made up of both text data and non-text data? and non-text data should not be put in an NSString (that makes sense i suppose :) > ).

No. The entire PDF file is a sequence of bytes, data. None of those byte-sequences can be regarded as text. There may (or may not) be text that is encoded in the PDF, but not in any way that you can segment it on a purely syntactic level. Instead, you have to parse/interpret the PDF (as data/bytes),

so parsing pdf data straight from the file with regular expressions is not on, full stop.

Yes. You need to at least take into account the binary streams that are embedded in the PDF document structure, in order to ignore those. For that, you have to parse the PDF document structure (via the xref table and the objects). Once you have that, you have PDF objects and binary streams. The PDF objects then tell you how you can parse the binary data streams to get at the actual contents of the PDF file. (Virtually all relevant data is in those streams).

in order to do that i'd have to first extract or block out or something the data (non-text data that is) so as to make sure that i do not give data (non-text) to the regular expression methods? i need to somehow parse the NSData first before regexing.

You need code that fully understands PDF, unless you only need some very specialized data that may be residing in the objects.

Marcel


--
Marcel Weiher Metaobject Software Technologies
email@hidden www.metaobject.com
Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.
1d480c25f397c4786386135f8e8938e4
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: am i loading this pdf data correctly or not?
      • From: Ben Dougall <email@hidden>
References: 
 >Re: am i loading this pdf data correctly or not? (From: Ben Dougall <email@hidden>)

  • Prev by Date: Re: am i loading this pdf data correctly or not?
  • Next by Date: A solution for -drawPageBorderWithSize:
  • Previous by thread: Re: am i loading this pdf data correctly or not?
  • Next by thread: Re: am i loading this pdf data correctly or not?
  • Index(es):
    • Date
    • Thread