• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Help with find text command
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with find text command


  • Subject: Re: Help with find text command
  • From: has <email@hidden>
  • Date: Wed, 1 Aug 2007 20:44:16 +0100

William Wallace wrote:

I'm using the find text command from satimage.osax to search a block of text
to find a string that fits a pattern defined as a regular expression.

Seems to work fine up to a point. However, it occurred to me that the regexp
could match this string: "0-0-0-0". Which is not at all what I want. I'm
looking for 10 digit ISBNs in the block of text (which should always be 13
characters--10 digits divided into 4 substrings by 3 hyphens). Is there a
way that I can maintain the flexibility in the number of digits within each
substring, but insist that the total number of characters in the matched
string remain constant at 13?


I suppose I could just check the length of each match and ignore those
matches that don't fit the bill, but we're talking about hundreds of ISBNs
in dozens of InDesign layouts and I'd prefer, for the sake of speed, to
filter out the red herrings to begin with.

I don't think doing as you describe would be all that slow in practice. Regexps are a great tool, but they're not intended for or suited to every kind of parsing task. There are times when it's more appropriate to use a combination of regexps and regular code or even go for a full-blown parser than try to do everything with regexps alone. You can have too much of a good thing, you know. ;)


However, if you want to do it all with regexps, I'd suggest a two- pass solution: first match all possible candidates of the desired length, then extract those that are of the correct format:

-- match all 13-character substrings that begin with a digit, end with a digit
-- or "X", and have exactly 11 digits and/or hyphens inbetween
set text item delimiters to return
set theText to (find text "\\<[[:digit:]][[:digit:]-]{11}[[:digit:]X]\ \>" ¬
in theText with regexp, all occurrences and string result) as string


-- extract those containing valid proportions of hyphens to digits
find text "^[[:digit:]]{1,5}-[[:digit:]]{1,7}-[[:digit:]]{1,7}- [[:digit:]X]$" ¬
in theText with regexp, all occurrences and string result



This assumes, of course, that you only want the matching strings and aren't needing the original matchPos indexes as well.


HTH

has
--
http://appscript.sourceforge.net
http://rb-appscript.rubyforge.org
http://appscript.sourceforge.net/objc-appscript.html

_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden
  • Prev by Date: Re: Akua Sweets
  • Next by Date: Re: Help with find text command
  • Previous by thread: Re: Help with find text command
  • Next by thread: Re: Help with find text command
  • Index(es):
    • Date
    • Thread