• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: am i loading this pdf data correctly or not?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: am i loading this pdf data correctly or not?


  • Subject: Re: am i loading this pdf data correctly or not?
  • From: Tom Sutcliffe <email@hidden>
  • Date: Wed, 6 Aug 2003 01:00:33 +0100

Sorry if I'm being thick but you shouldn't be dumping arbitrary binary data into a string anyway should you?

I can't help with the regex framework but at a guess I'd say the problem is one of encoding - you don't specify an encoding so the init method is guessing unicode of some sort. Choose a single byte encoding like latin1 or whatever will probably solve the problem, as unicode contains some control-codes which could well be confusing the framework - not to mention the fact that you don't want it to try and run bytes together. Say the last byte of the binary section (when interpreted as unicode) says "I'm the first byte of a 3 byte character" but it's followed by a byte "b" (when interpreted in ASCII). The regex won't match against the "b" because it thinks it's part of the 3 byte unicode character.

Regards,

Tom

On Wednesday, August 6, 2003, at 12:04 am, Ben Dougall wrote:

i'm trying to parse the contents of a pdf file using a regex framework called AGRegex. it works fine until unicode type characters appear, then from then on it fails to get any of the expected matches. so the regex stops dead as soon as some unicode characters appear in the pdf data (which in actual fact was binary data rather than the unicode representations shown below).

the regex framework is supposed to work fine with unicode (based on pcre 4.0 - unicode compliant (if built correctly)). i think i've either incorrectly built the regex framework, without unicode support, or i'm incorrectly creating or setting up the string that gets passed to the regex methods. here's the code that sets up the string from the pdf:


NSArray *fileTypes = [NSArray arrayWithObject:@"pdf"];
NSOpenPanel *openPanel = [NSOpenPanel openPanel];
if ([openPanel runModalForDirectory:NSHomeDirectory() file:nil types:fileTypes] == NSOKButton) {
// load pdf file
pdfData = [[NSString alloc] initWithContentsOfFile:[[openPanel filenames] objectAtIndex:0]];


i print out the pdf data output using NSLog(@"%@", pdfData); and while it's like...:

<<
/Type /Font
/Subtype /Type1
/Name /F0
/BaseFont /Times-Roman
/Encoding /MacRomanEncoding
>>
endobj
15 0 obj

...everything's fine (the regex gets the expected matches). as soon as binary data occcurs, which looks like this after it's been through my above code...:

x\\u2044\\u2260W\\u20acr\\u2030\\u2202\\021\\u02dd\\307\\u02d8\\007<%\\ 372\\u2018\\016\\345;\\001\\u03a98Vv\\u25ca^{\\371\\u220f\\250\\2518UR\ \036\\256!\\247\\260\\u2248!\\253C\\351\\02

...it goes wrong (fails to get matches thereafter, even once the binary/unicode data stops and returns to resembling the first > snippet).

just to make clear: the above second snippet of data is (was) binary data and has been converted to that unicode style by my code that sets the string up.

so have i done the pdfData NSString incorrectly?
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: am i loading this pdf data correctly or not?
      • From: Ben Dougall <email@hidden>
References: 
 >am i loading this pdf data correctly or not? (From: Ben Dougall <email@hidden>)

  • Prev by Date: Re: Cross platform development
  • Next by Date: Re: fast user switching (was <no subject>)
  • Previous by thread: am i loading this pdf data correctly or not?
  • Next by thread: Re: am i loading this pdf data correctly or not?
  • Index(es):
    • Date
    • Thread