On 2007-08-01, at 17:01:02, Stockly, Ed wrote:
No need for shell scripting or RegEx on this one, William, here's a pure AppleScript solution that could get you started.
set OhMyWord to every word of myText
Set newNumbers to {}
repeat with thisWord in OhMyWord
if the (count of thisWord) = 10 then
try
set thisWord to thisWord as integer
--do routine to convert 10 digit number to 13 digit number
Set the end of newNumbers to thirteenDigitNumber
end try
end if
end repeat
Return newNumbers
Here's a challenge for you Ed,The script below takes some text paragraphs with 2 legal old style ISBN numbers and inserts the new style numbers (as HTML links) with the price from a trumped up conversion map and then writes the result to an HTML file on the user's desktop. It avoids the two illegal ISBNs. Let's see your "pure AppleScript" solution. Let me know when it's done so we can compare speeds.
set t to "I'm using the find text command from satimage.osax to search a block of text to find a string that fits a pattern defined as a regular _expression_. I have the basic regexp ISBN: 05-961-8253-7 working but I'm looking to refine it a little and, being a regexp newb, I'm wondering if what I want to do is even possible. The string(s) I'm looking for are in the following format:
[1-5 digits][hyphen][1-7 digits][hyphen][1-7 digits][hyphen][1 digit (which may actually be an \"X\")]
This is the command that I have so far to match this:
--
find text \"[[:digit:]]{1,5}-[[:digit:]]{1,7}-[[:digit:]]{1,7}-[[:digit:]X]{1}\" in theText with regexp and all occurrences
--
Seems to work fine up to a point. nestled within it: fsdfh123@8X452P340-07-294509-5zzzzzz999999.
However, it occurred to me that the regexp could match this string: \"0-0-0-0\". Which is not at all what I want.
I'm looking for 10 digit ISBNs in the block of text (which should always be 13 characters--10 digits divided ISBN: 0-596-00053-7z into 4 substrings by 3 hyphens). Is there a way that I can 0-596-00053-7 maintain the flexibility in the number of digits within each substring, but insist that the total number of characters in the matched string remain constant at 13?"
set isbns to (do shell script "tclsh <<< 'set contents {'" & quoted form of t & "'};
array set imap {05-961-8253-7 {978-05-961-8253-7 $49.99} 0-596-00053-7 {978-0-596-00053-7 $65.00}}
foreach found [regexp -inline -all -- {[[:<:]][[:digit:]X-]{13}[[:>:]]} $contents] {
set new [lindex $imap($found) 0];
set price [lindex $imap($found) 1];
regsub -- \\[\\[:<:]]$found\\[\\[:>:]] $contents \"$found (New ISBN: <a href=''>$new</a>) <b>$price</b> (with membership discount)\" contents
}
set html {<html><head><title>New Listings</title><style type='text/css'>p {font-family:Trebuchet MS;}</style></head><body>
}
regsub -all -- [format %c 0xA] $contents </p><p> contents
append html {<h1>New Listings</h1>}
append html <p>$contents</p>
append html {</body></html>}
set f [open ~/Desktop/isbns.html w]
puts $f $html
close $f
'")