On 2007-08-01, at 10:32:30, Wallace, William wrote:
[…]
Seems to work fine up to a point. However, it occurred to me that the regexp could match this string: "0-0-0-0". Which is not at all what I want. I'm looking for 10 digit ISBNs in the block of text (which should always be 13 characters--10 digits divided into 4 substrings by 3 hyphens). Is there a way that I can maintain the flexibility in the number of digits within each substring, but insist that the total number of characters in the matched string remain constant at 13?
I schlipped two ISBNs into some of the text from your email for a test. In this form, the Tcl regexp will return a space-separated list of the hyphenated digits. There are other options such as returning offsets but I think returning the actual found items would be best. You could probably grab the regexp inside the braces to use with most other languages but I can't say how they would deal with the -inline and -all options (which are very effective for this kind of search).
set t to "I'm using the find text command from satimage.osax to search a block of text
to find a string that fits a pattern defined as a regular _expression_. I have
the basic regexp ISBN: 05-961-8253-7 working but I'm looking to refine it a little and, being a
regexp newb, I'm wondering if what I want to do is even possible. The
string(s) I'm looking for are in the following format:
[1-5 digits][hyphen][1-7 digits][hyphen][1-7 digits][hyphen][1 digit (which
may actually be an \"X\")]
This is the command that I have so far to match this:
--
find text
\"[[:digit:]]{1,5}-[[:digit:]]{1,7}-[[:digit:]]{1,7}-[[:digit:]X]{1}\" in
theText with regexp and all occurrences
--
Seems to work fine up to a point. However, it occurred to me that the regexp
could match this string: \"0-0-0-0\". Which is not at all what I want. I'm
looking for 10 digit ISBNs in the block of text (which should always be 13
characters--10 digits divided ISBN: 0-596-00053-7 into 4 substrings by 3 hyphens). Is there a
way that I can maintain the flexibility in the number of digits within each
substring, but insist that the total number of characters in the matched
string remain constant at 13?"
do shell script "tclsh <<< 'puts [regexp -inline -all -- {[[:digit:]X-]{13}} {'" & quoted form of t & "}]"
--> "05-961-8253-7 0-596-00053-7"