Re: http: grep syntax for use in Tex-Edit
Re: http: grep syntax for use in Tex-Edit
- Subject: Re: http: grep syntax for use in Tex-Edit
- From: Matt Petrowsky <email@hidden>
- Date: Thu, 8 May 2003 00:10:10 -0700
Well, I guess I'll answer my own question and post my results.
If anyone is using Tex-Edit and would like to auto-format urls within a
document then here is the code needed.
tell window 1 of application "Tex-Edit Plus"
replace looking for "([^;:/?# ^c<\"]+://[^ ^c>]*)" replacing with "<a
href=\"^0\">^0</a>" with grep
end tell
The assumes you have a trailing space or return after the url.
If you have urls that may end with a period, slash period or any other
variation you can use this code below. The trick is to create a space
after the url, format it then clean up the space (which that part is
missing here).
ENJOY - P.S. I'm writing a major overhaul of a script to format a
Tex-Edit document. It accounts for all styles. If you would like to
collaborate please email for a copy of the script. - I would like peer
review most of all.
--- cut code from here ---
------------------------------------------------------------------------
--------------------------------------- DECLARATIONS
global rootDomains, countryDomains, pageTypes, badEnd, goodEnd
set rootDomains to {".aero", ".biz", ".com", ".coop", ".edu", ".gov",
".info", ".int", ".mil", ".museum", ".name", ".net", ".org", ".pro"}
set pageTypes to {".html", ".htm", ".shtml", ".php", ".php3", ".php4",
".asp", ".cfm", ".mysql"}
set countryDomains to {".ac", ".ad", ".ae", ".af", ".ag", ".ai", ".al",
".am", ".an", ".ao", ".aq", ".ar", ".as", ".at", ".au", ".aw", ".az",
".ba", ".bb", ".bd", ".be", ".bf", ".bg", ".bh", ".bi", ".bj", ".bm",
".bn", ".bo", ".br", ".bs", ".bt", ".bv", ".bw", ".by", ".bz", ".ca",
".cc", ".cd", ".cf", ".cg", ".ch", ".ci", ".ck", ".cl", ".cm", ".cn",
".co", ".cr", ".cu", ".cv", ".cx", ".cy", ".cz", ".de", ".dj", ".dk",
".dm", ".do", ".dz", ".ec", ".ee", ".eg", ".eh", ".er", ".es", ".et",
".fi", ".fj", ".fk", ".fm", ".fo", ".fr", ".ga", ".gd", ".ge", ".gf",
".gg", ".gh", ".gi", ".gl", ".gm", ".gn", ".gp", ".gq", ".gr", ".gs",
".gt", ".gu", ".gw", ".gy", ".hk", ".hm", ".hn", ".hr", ".ht", ".hu",
".id", ".ie", ".il", ".im", ".in", ".io", ".iq", ".ir", ".is", ".it",
".je", ".jm", ".jo", ".jp", ".ke", ".kg", ".kh", ".ki", ".km", ".kn",
".kp", ".kr", ".kw", ".ky", ".kz", ".la", ".lb", ".lc", ".li", ".lk",
".lr", ".ls", ".lt", ".lu", ".lv", ".ly", ".ma", ".mc", ".md", ".mg",
".mh", ".mk", ".ml", ".mm", ".mn", ".mo", ".mp", ".mq", ".mr", ".ms",
".mt", ".mu", ".mv", ".mw", ".mx", ".my", ".mz", ".na", ".nc", ".ne",
".nf", ".ng", ".ni", ".nl", ".no", ".np", ".nr", ".nu", ".nz", ".om",
".pa", ".pe", ".pf", ".pg", ".ph", ".pk", ".pl", ".pm", ".pn", ".pr",
".ps", ".pt", ".pw", ".py", ".qa", ".re", ".ro", ".ru", ".rw", ".sa",
".sb", ".sc", ".sd", ".se", ".sg", ".sh", ".si", ".sj", ".sk", ".sl",
".sm", ".sn", ".so", ".sr", ".st", ".sv", ".sy", ".sz", ".tc", ".td",
".tf", ".tg", ".th", ".tj", ".tk", ".tm", ".tn", ".to", ".tp", ".tr",
".tt", ".tv", ".tw", ".tz", ".ua", ".ug", ".uk", ".um", ".us", ".uy",
".uz", ".va", ".vc", ".ve", ".vg", ".vi", ".vn", ".vu", ".wf", ".ws",
".ye", ".yt", ".yu", ".za", ".zm", ".zw"}
set badEnd to {".", "/.", "/>"}
set goodEnd to {" .", "/ .", "/ >"}
------------------------------------------------------------------------
--------------------------------------- MAIN SCRIPT
formatURLS()
------------------------------------------------------------------------
--------------------------------------- FUNCTION
-- This is a general routine used to pass in 2 arrays of items that you
wish to swap.
-- the arrays must have a matching count of items. The first part of
the routine
-- will build both an input and output string if you have a long list
of possible items with varied
-- tail endings
on swapIt(input, output, oldtail, newtail, buildit)
if buildit then -- the switch buildit is boolean meaning you want to
build input/output strings
set newinput to {}
set newoutput to {}
-- Build the input string
repeat with i from 1 to count of input
set theVal to item i of input
repeat with i from 1 to count of oldtail
set newinput to newinput & (theVal & item i of oldtail)
end repeat
end repeat
-- Build the output string
repeat with i from 1 to count of input
set theVal to item i of input
repeat with i from 1 to count of newtail
set newoutput to newoutput & (theVal & item i of newtail)
end repeat
end repeat
-- after this routine builds the new input and output it uses itself
to make the changes in Tex-Edit
my swapIt(newinput, newoutput, {}, {}, false)
else
if (count of input) !A (count of output) then -- Make sure that the
routine is getting lists with matching item counts
display dialog "Both input and output must have the same number of
items!"
else
repeat with i from 1 to count of input
tell window 1 of application "Tex-Edit Plus"
replace looking for item i of input replacing with item i of
output with cases matching
end tell
end repeat
end if
end if
end swapIt
------------------------------------------------------------------------
--------------------------------------- FUNCTION
on formatURLS()
tell window 1 of application "Tex-Edit Plus"
-- format all urls
set domainsToChange to rootDomains & pageTypes -- if you use a lot of
foreign sites then include the countryDomains here.
-- there is a known issue where .com.au will result in .com .au .
which will not format properly
my swapIt(domainsToChange, {}, badEnd, goodEnd, true)
replace looking for "([^;:/?# ^c<\"]+://[^ ^c>]*)" replacing with "<a
href=\"^0\">^0</a>" with grep
end tell
end formatURLS
--- end code ---
On Tuesday, May 6, 2003, at 10:28 PM, Matt Petrowsky wrote:
Just wondering if anyone has any regex code for pulling out http://
links (or ftp://, etc.) and formatting them as href tags within
Tex-Edit Plus.
I know there's code out there somewhere and parsing through the grep
for breaking down a URI
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
provided from http://www.ietf.org/rfc/rfc2396.txt is a great starting
point but doesn't seem to work natively.
Just thought I would ask before I start heading into the long haul
trying to account for urls ending with "." or "\r".
Here's my far from completed code so far. It uses Tex-Edits wildcard
run right now. But I would like to switch to grep since it would be
more precise.
--- snip ---
-- Basic routine for swapping characters or works out.
on swapIt(input, output)
repeat with i from 1 to count of input
tell window 1 of application "Tex-Edit Plus"
replace looking for item i of input replacing with item i of output
with cases matching
end tell
end repeat
end swapIt
tell window 1 of application "Tex-Edit Plus"
-- format all urls
set urlEnd to {".html.", ".htm.", ".php.", ".shtml.", ".html" &
return, ".htm" & return, ".php" & return, ".shtml" & return}
set urlNice to {".html .", ".htm .", ".php .", ".shtml .", ".html " &
return, ".htm " & return, ".php " & return, ".shtml " & return}
my swapIt(urlEnd, urlNice)
replace looking for "http://^* " replacing with "<a
href=\"^*\">^*</a>"
replace looking for " \">" replacing with "\">"
replace looking for " </a>" replacing with "</a>"
replace looking for "< <a" replacing with "<<a"
end tell
--- snip ---
Matt Petrowsky
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.