Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: http: grep syntax for use in Tex-Edit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: http: grep syntax for use in Tex-Edit

Subject: Re: http: grep syntax for use in Tex-Edit
From: Matt Petrowsky <email@hidden>
Date: Thu, 8 May 2003 00:10:10 -0700

Well, I guess I'll answer my own question and post my results.

If anyone is using Tex-Edit and would like to auto-format urls within a document then here is the code needed.

tell window 1 of application "Tex-Edit Plus"
replace looking for "([^;:/?# ^c<\"]+://[^ ^c>]*)" replacing with "<a href=\"^0\">^0</a>" with grep
end tell

The assumes you have a trailing space or return after the url.

If you have urls that may end with a period, slash period or any other variation you can use this code below. The trick is to create a space after the url, format it then clean up the space (which that part is missing here).

ENJOY - P.S. I'm writing a major overhaul of a script to format a Tex-Edit document. It accounts for all styles. If you would like to collaborate please email for a copy of the script. - I would like peer review most of all.

--- cut code from here ---

------------------------------------------------------------------------ --------------------------------------- DECLARATIONS

global rootDomains, countryDomains, pageTypes, badEnd, goodEnd

set rootDomains to {".aero", ".biz", ".com", ".coop", ".edu", ".gov", ".info", ".int", ".mil", ".museum", ".name", ".net", ".org", ".pro"}

set pageTypes to {".html", ".htm", ".shtml", ".php", ".php3", ".php4", ".asp", ".cfm", ".mysql"}

set countryDomains to {".ac", ".ad", ".ae", ".af", ".ag", ".ai", ".al", ".am", ".an", ".ao", ".aq", ".ar", ".as", ".at", ".au", ".aw", ".az", ".ba", ".bb", ".bd", ".be", ".bf", ".bg", ".bh", ".bi", ".bj", ".bm", ".bn", ".bo", ".br", ".bs", ".bt", ".bv", ".bw", ".by", ".bz", ".ca", ".cc", ".cd", ".cf", ".cg", ".ch", ".ci", ".ck", ".cl", ".cm", ".cn", ".co", ".cr", ".cu", ".cv", ".cx", ".cy", ".cz", ".de", ".dj", ".dk", ".dm", ".do", ".dz", ".ec", ".ee", ".eg", ".eh", ".er", ".es", ".et", ".fi", ".fj", ".fk", ".fm", ".fo", ".fr", ".ga", ".gd", ".ge", ".gf", ".gg", ".gh", ".gi", ".gl", ".gm", ".gn", ".gp", ".gq", ".gr", ".gs", ".gt", ".gu", ".gw", ".gy", ".hk", ".hm", ".hn", ".hr", ".ht", ".hu", ".id", ".ie", ".il", ".im", ".in", ".io", ".iq", ".ir", ".is", ".it", ".je", ".jm", ".jo", ".jp", ".ke", ".kg", ".kh", ".ki", ".km", ".kn", ".kp", ".kr", ".kw", ".ky", ".kz", ".la", ".lb", ".lc", ".li", ".lk", ".lr", ".ls", ".lt", ".lu", ".lv", ".ly", ".ma", ".mc", ".md", ".mg", ".mh", ".mk", ".ml", ".mm", ".mn", ".mo", ".mp", ".mq", ".mr", ".ms", ".mt", ".mu", ".mv", ".mw", ".mx", ".my", ".mz", ".na", ".nc", ".ne", ".nf", ".ng", ".ni", ".nl", ".no", ".np", ".nr", ".nu", ".nz", ".om", ".pa", ".pe", ".pf", ".pg", ".ph", ".pk", ".pl", ".pm", ".pn", ".pr", ".ps", ".pt", ".pw", ".py", ".qa", ".re", ".ro", ".ru", ".rw", ".sa", ".sb", ".sc", ".sd", ".se", ".sg", ".sh", ".si", ".sj", ".sk", ".sl", ".sm", ".sn", ".so", ".sr", ".st", ".sv", ".sy", ".sz", ".tc", ".td", ".tf", ".tg", ".th", ".tj", ".tk", ".tm", ".tn", ".to", ".tp", ".tr", ".tt", ".tv", ".tw", ".tz", ".ua", ".ug", ".uk", ".um", ".us", ".uy", ".uz", ".va", ".vc", ".ve", ".vg", ".vi", ".vn", ".vu", ".wf", ".ws", ".ye", ".yt", ".yu", ".za", ".zm", ".zw"}

set badEnd to {".", "/.", "/>"}
set goodEnd to {" .", "/ .", "/ >"}

------------------------------------------------------------------------ --------------------------------------- MAIN SCRIPT

formatURLS()

------------------------------------------------------------------------ --------------------------------------- FUNCTION

-- This is a general routine used to pass in 2 arrays of items that you wish to swap.
-- the arrays must have a matching count of items. The first part of the routine
-- will build both an input and output string if you have a long list of possible items with varied
-- tail endings

on swapIt(input, output, oldtail, newtail, buildit)
if buildit then -- the switch buildit is boolean meaning you want to build input/output strings
set newinput to {}
set newoutput to {}
-- Build the input string
repeat with i from 1 to count of input
set theVal to item i of input
repeat with i from 1 to count of oldtail
set newinput to newinput & (theVal & item i of oldtail)
end repeat
end repeat
-- Build the output string
repeat with i from 1 to count of input
set theVal to item i of input
repeat with i from 1 to count of newtail
set newoutput to newoutput & (theVal & item i of newtail)
end repeat
end repeat
-- after this routine builds the new input and output it uses itself to make the changes in Tex-Edit
my swapIt(newinput, newoutput, {}, {}, false)
else
if (count of input) !A (count of output) then -- Make sure that the routine is getting lists with matching item counts
display dialog "Both input and output must have the same number of items!"
else
repeat with i from 1 to count of input
tell window 1 of application "Tex-Edit Plus"
replace looking for item i of input replacing with item i of output with cases matching
end tell
end repeat
end if
end if
end swapIt

------------------------------------------------------------------------ --------------------------------------- FUNCTION

on formatURLS()
tell window 1 of application "Tex-Edit Plus"
-- format all urls
set domainsToChange to rootDomains & pageTypes -- if you use a lot of foreign sites then include the countryDomains here.
-- there is a known issue where .com.au will result in .com .au . which will not format properly
my swapIt(domainsToChange, {}, badEnd, goodEnd, true)
replace looking for "([^;:/?# ^c<\"]+://[^ ^c>]*)" replacing with "<a href=\"^0\">^0</a>" with grep
end tell
end formatURLS

--- end code ---

On Tuesday, May 6, 2003, at 10:28 PM, Matt Petrowsky wrote:

Just wondering if anyone has any regex code for pulling out http:// links (or ftp://, etc.) and formatting them as href tags within Tex-Edit Plus.

I know there's code out there somewhere and parsing through the grep for breaking down a URI

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

provided from http://www.ietf.org/rfc/rfc2396.txt is a great starting point but doesn't seem to work natively.

Just thought I would ask before I start heading into the long haul trying to account for urls ending with "." or "\r".

Here's my far from completed code so far. It uses Tex-Edits wildcard run right now. But I would like to switch to grep since it would be more precise.

--- snip ---

-- Basic routine for swapping characters or works out.
on swapIt(input, output)
repeat with i from 1 to count of input
tell window 1 of application "Tex-Edit Plus"
replace looking for item i of input replacing with item i of output with cases matching
end tell
end repeat
end swapIt

tell window 1 of application "Tex-Edit Plus"
-- format all urls
set urlEnd to {".html.", ".htm.", ".php.", ".shtml.", ".html" & return, ".htm" & return, ".php" & return, ".shtml" & return}
set urlNice to {".html .", ".htm .", ".php .", ".shtml .", ".html " & return, ".htm " & return, ".php " & return, ".shtml " & return}
my swapIt(urlEnd, urlNice)
replace looking for "http://^* " replacing with "<a href=\"^*\">^*</a>"
replace looking for " \">" replacing with "\">"
replace looking for " </a>" replacing with "</a>"
replace looking for "< <a" replacing with "<<a"
end tell

--- snip ---
Matt Petrowsky
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References:
	>http: grep syntax for use in Tex-Edit (From: Matt Petrowsky <email@hidden>)

Prev by Date: Re: Machine ID
Next by Date: ill-10-scripting: file-name = dataset-name
Previous by thread: Re: http: grep syntax for use in Tex-Edit
Next by thread: System Events PrefPane Conroll Bug
Index(es):
- Date
- Thread