Re: Help with find text command
Re: Help with find text command
- Subject: Re: Help with find text command
- From: "Wallace, William" <email@hidden>
- Date: Wed, 01 Aug 2007 14:44:21 -0500
- Thread-topic: Help with find text command
From: Philip Aker <email@hidden>
Date: Wed, 01 Aug 2007 12:24:06 -0700
Subject: Re: Help with find text command
I schlipped two ISBNs into some of the text from your email for a test. In
this form, the Tcl regexp will return a space-separated list of the
hyphenated digits. There are other options such as returning offsets but I
think returning the actual found items would be best. You could probably
grab the regexp inside the braces to use with most other languages but I
can't say how they would deal with the -inline and -all options (which are
very effective for this kind of search).
[...]
--> "05-961-8253-7 0-596-00053-7"
Philip Aker
email@hidden
---------------------
Hi Philip,
I will need the offset as well as the matched string. Satimage returns a
record with the offset, the length, and the matched string:
--
set t to "I'm using the find text command from satimage.osax to search a
block of text
to find a string that fits a pattern defined as a regular expression. I have
the basic regexp ISBN: 05-961-8253-7 working but I'm looking to refine it a
little and, being a
regexp newb, I'm wondering if what I want to do is even possible. The
string(s) I'm looking for are in the following format:
[1-5 digits][hyphen][1-7 digits][hyphen][1-7 digits][hyphen][1 digit (which
may actually be an \"X\")]
This is the command that I have so far to match this:
find text
\"[[:digit:]]{1,5}-[[:digit:]]{1,7}-[[:digit:]]{1,7}-[[:digit:]X]{1}\" in
theText with regexp and all occurrences
Seems to work fine up to a point. However, it occurred to me that the regexp
could match this string: \"0-0-0-0\". Which is not at all what I want. I'm
looking for 10 digit ISBNs in the block of text (which should always be 13
characters--10 digits divided ISBN: 0-596-00053-7 into 4 substrings by 3
hyphens). Is there a
way that I can maintain the flexibility in the number of digits within each
substring, but insist that the total number of characters in the matched
string remain constant at 13?"
set foundText to find text
"[0-9]{5}-[0-9]{3}-[0-9]-([0-9]|X)|[0-9]{5}-[0-9]{2}-[0-9]{2}-([0-9]|X)|[0-9
]{5}-[0-9]-[0-9]{3}-([0-9]|X)|[0-9]{4}-[0-9]{4}-[0-9]-([0-9]|X)|[0-9]{4}-[0-
9]{3}-[0-9]{2}-([0-9]|X)|[0-9]{4}-[0-9]{2}-[0-9]{3}-([0-9]|X)|[0-9]{4}-[0-9]
-[0-9]{4}-([0-9]|X)|[0-9]{3}-[0-9]{5}-[0-9]-([0-9]|X)|[0-9]{3}-[0-9]{4}-[0-9
]{2}-([0-9]|X)|[0-9]{3}-[0-9]{3}-[0-9]{3}-([0-9]|X)|[0-9]{3}-[0-9]{2}-[0-9]{
4}-([0-9]|X)|[0-9]{3}-[0-9]-[0-9]{5}-([0-9]|X)|[0-9]{2}-[0-9]{6}-[0-9]-([0-9
]|X)|[0-9]{2}-[0-9]{5}-[0-9]{2}-([0-9]|X)|[0-9]{2}-[0-9]{4}-[0-9]{3}-([0-9]|
X)|[0-9]{2}-[0-9]{3}-[0-9]{4}-([0-9]|X)|[0-9]{2}-[0-9]{2}-[0-9]{5}-([0-9]|X)
|[0-9]{2}-[0-9]-[0-9]{6}-([0-9]|X)|[0-9]-[0-9]{7}-[0-9]-([0-9]|X)|[0-9]-[0-9
]{6}-[0-9]{2}-([0-9]|X)|[0-9]-[0-9]{5}-[0-9]{3}-([0-9]|X)|[0-9]-[0-9]{4}-[0-
9]{4}-([0-9]|X)|[0-9]-[0-9]{3}-[0-9]{5}-([0-9]|X)|[0-9]-[0-9]{2}-[0-9]{6}-([
0-9]|X)|[0-9]-[0-9]-[0-9]{7}-([0-9]|X)" in t with regexp and all occurrences
--> {{matchPos:177, matchLen:13, matchResult:"05-961-8253-7"},
{matchPos:923, matchLen:13, matchResult:"0-596-00053-7"}}
--
If I find that performance is problem with the osax route, however, I will
be investigating your shell solution.
Thanks.
--
B!ll
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden