Re: am i loading this pdf data correctly or not?
Re: am i loading this pdf data correctly or not?
- Subject: Re: am i loading this pdf data correctly or not?
- From: Ben Dougall <email@hidden>
- Date: Wed, 6 Aug 2003 02:35:04 +0100
Tom, thanks for the reply.
On Wednesday, August 6, 2003, at 01:00 am, Tom Sutcliffe wrote:
Sorry if I'm being thick but you shouldn't be dumping arbitrary binary
data into a string anyway should you?
no not at all, i'm sure you're right. i'm very inexperienced at this
sort of thing. how should i specify which encoding? i can't see how to
do that. this is the line that needs changing or adding to obviously:
pdfData = [[NSString alloc] initWithContentsOfFile:[[openPanel
filenames] objectAtIndex:0]];
should i somehow do it in two steps maybe? any pointers would be much
appreciated.
I can't help with the regex framework but at a guess I'd say the
problem is one of encoding - you don't specify an encoding so the init
method is guessing unicode of some sort. Choose a single byte encoding
like latin1 or whatever will probably solve the problem, as unicode
contains some control-codes which could well be confusing the
framework - not to mention the fact that you don't want it to try and
run bytes together. Say the last byte of the binary section (when
interpreted as unicode) says "I'm the first byte of a 3 byte
character" but it's followed by a byte "b" (when interpreted in
ASCII). The regex won't match against the "b" because it thinks it's
part of the 3 byte unicode character.
what i find odd though is that it stops entirely - doesn't just miss
stuff, once it comes across this binary / unicode data it stops -
that's it, it doesn't go any further - that's what makes me think it
may be something to do with unicode and the regex's build concerning
unicode. - i'm guessing without knowing at all though. i hope that's
not the case.
thanks, ben.
Regards,
Tom
On Wednesday, August 6, 2003, at 12:04 am, Ben Dougall wrote:
i'm trying to parse the contents of a pdf file using a regex
framework called AGRegex. it works fine until unicode type characters
appear, then from then on it fails to get any of the expected
matches. so the regex stops dead as soon as some unicode characters
appear in the pdf data (which in actual fact was binary data rather
than the unicode representations shown below).
the regex framework is supposed to work fine with unicode (based on
pcre 4.0 - unicode compliant (if built correctly)). i think i've
either incorrectly built the regex framework, without unicode
support, or i'm incorrectly creating or setting up the string that
gets passed to the regex methods. here's the code that sets up the
string from the pdf:
NSArray *fileTypes = [NSArray arrayWithObject:@"pdf"];
NSOpenPanel *openPanel = [NSOpenPanel openPanel];
if ([openPanel runModalForDirectory:NSHomeDirectory() file:nil
types:fileTypes] == NSOKButton) {
// load pdf file
pdfData = [[NSString alloc] initWithContentsOfFile:[[openPanel
filenames] objectAtIndex:0]];
i print out the pdf data output using NSLog(@"%@", pdfData); and
while it's like...:
<<
/Type /Font
/Subtype /Type1
/Name /F0
/BaseFont /Times-Roman
/Encoding /MacRomanEncoding
>>
endobj
15 0 obj
...everything's fine (the regex gets the expected matches). as soon
as binary data occcurs, which looks like this after it's been through
my above code...:
x\\u2044\\u2260W\\u20acr\\u2030\\u2202\\021\\u02dd\\307\\u02d8\\007<%\
\372\\u2018\\016\\345;\\001\\u03a98Vv\\u25ca^{\\371\\u220f\\250\\2518U
R\\036\\256!\\247\\260\\u2248!\\253C\\351\\02
...it goes wrong (fails to get matches thereafter, even once the
binary/unicode data stops and returns to resembling the first >
snippet).
just to make clear: the above second snippet of data is (was) binary
data and has been converted to that unicode style by my code that
sets the string up.
so have i done the pdfData NSString incorrectly?
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.