Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: am i loading this pdf data correctly or not?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: am i loading this pdf data correctly or not?

Subject: Re: am i loading this pdf data correctly or not?
From: Marcel Weiher <email@hidden>
Date: Thu, 7 Aug 2003 14:23:34 +0100

Since the streams are random binary junk, you can't ignore them by parsing through them. After all, it is perfectly permissible for them to contain the character sequence "endstream".

at the moment i'm not in anyway attempting to parse it with any pdf semantics in mind

Yes. That is the problem. PDF is very difficult to parse without keeping the semantics in mind. Really.

(the data could have a 100 'endstream's in - wouldn't make any difference because i'm not looking out for endstream yet at all - i'm just starting doing this so these are initial steps). the data that's between stream and endstream contains something that messes the regular expression's operation up (not messes up my matching pattern but the whole operation - it stops) - maybe there's a bug in the regex i'm using? obviously regex doesn't care about pdf semantics. something in the particular stream of data is causing regex to break / stop. i think there may well be a bug in the regex i'm using. i've described this to the person who wrote the regex cocoa wrapper that i'm using and they were perplexed by the regex being stopped in the data part and asked me to send the code and file i'm parsing which i did yesterday so i'm waiting for the outcome of that.

seeing as my code did get all the pdf data into an NSString (maybe incorrectly as the data between stream and endstream looked like ... \\001\\u03a98Vv\\u25ca^{\\371\\u220f\\2... after import which is very different to how it looks in the original pdf data) the regex shouldn't be stopped by some data like that i don't think? it maybe incorrect data but that shouldn't make a jot of difference to the regex operation / implementation itself - it should carry on through / past that.

The problem is that NSString (and any Unicode ompatible regex based on NSString) will attach semantics to character sequences.

Marcel

--
Marcel Weiher Metaobject Software Technologies
email@hidden www.metaobject.com
Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.
1d480c25f397c4786386135f8e8938e4
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: am i loading this pdf data correctly or not?
  - From: Ben Dougall <email@hidden>

References:
	>Re: am i loading this pdf data correctly or not? (From: Ben Dougall <email@hidden>)

Prev by Date: Re: am i loading this pdf data correctly or not?
Next by Date: What about using xpdf? (was: am i loading this pdf data correctly or not?)
Previous by thread: Re: am i loading this pdf data correctly or not?
Next by thread: Re: am i loading this pdf data correctly or not?
Index(es):
- Date
- Thread