Re: Help with find text command
Re: Help with find text command
- Subject: Re: Help with find text command
- From: has <email@hidden>
- Date: Wed, 1 Aug 2007 20:44:16 +0100
William Wallace wrote:
I'm using the find text command from satimage.osax to search a
block of text
to find a string that fits a pattern defined as a regular expression.
Seems to work fine up to a point. However, it occurred to me that
the regexp
could match this string: "0-0-0-0". Which is not at all what I
want. I'm
looking for 10 digit ISBNs in the block of text (which should
always be 13
characters--10 digits divided into 4 substrings by 3 hyphens). Is
there a
way that I can maintain the flexibility in the number of digits
within each
substring, but insist that the total number of characters in the
matched
string remain constant at 13?
I suppose I could just check the length of each match and ignore those
matches that don't fit the bill, but we're talking about hundreds
of ISBNs
in dozens of InDesign layouts and I'd prefer, for the sake of
speed, to
filter out the red herrings to begin with.
I don't think doing as you describe would be all that slow in
practice. Regexps are a great tool, but they're not intended for or
suited to every kind of parsing task. There are times when it's more
appropriate to use a combination of regexps and regular code or even
go for a full-blown parser than try to do everything with regexps
alone. You can have too much of a good thing, you know. ;)
However, if you want to do it all with regexps, I'd suggest a two-
pass solution: first match all possible candidates of the desired
length, then extract those that are of the correct format:
-- match all 13-character substrings that begin with a digit, end
with a digit
-- or "X", and have exactly 11 digits and/or hyphens inbetween
set text item delimiters to return
set theText to (find text "\\<[[:digit:]][[:digit:]-]{11}[[:digit:]X]\
\>" ¬
in theText with regexp, all occurrences and string result) as
string
-- extract those containing valid proportions of hyphens to digits
find text "^[[:digit:]]{1,5}-[[:digit:]]{1,7}-[[:digit:]]{1,7}-
[[:digit:]X]$" ¬
in theText with regexp, all occurrences and string result
This assumes, of course, that you only want the matching strings and
aren't needing the original matchPos indexes as well.
HTH
has
--
http://appscript.sourceforge.net
http://rb-appscript.rubyforge.org
http://appscript.sourceforge.net/objc-appscript.html
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden