Re: String searching
Re: String searching
- Subject: Re: String searching
- From: Emmanuel <email@hidden>
- Date: Mon, 19 Jan 2004 19:15:43 +0100
At 12:29 PM -0500 19/01/04, Steve Suranie wrote:
>
the word pakistan appears at the 17,224 character in the string theCode - it's actually in this line
>
>
<a href=\"/2004/WORLD/asiapcf/01/18/pakistan.scientists.nukes/index.html\">Pakistan widens nuke probe</a>
>
>
What I would like to be able to do is search backwards from itemLocale of theCode to find the first instance of "<a href" and then to search forward from that point to find the first instance of "</a>" Basically I want to be able to pull the hyperlink associated with the word I am searching for.
>
>
Or if there is a simplier way of doing it that would be appreciated as well.
You've really got two options.
- "plain vanilla", no 3rd party stuff, slow, long to write.
For instance, once you've got your offset, find "manually" the previous CR by testing in a loop "character i of theCode" with i=itemLocale, itemLocale-1 etc. Then extract the text after that CR with "text i thru j of theCode". Finally use "offset" to find the anchor tags.
- using 3rd party (free) stuff [1], fast, shorter code.
Use a regular expression. Though I would not recommend in general to use regular expressions to perform search on tagged text, here it would be fairly easy to search for the pattern you're interested in. You would search with:
------------------------- tested
find text "<a href[^>]+>[^<]*Pakistan[^<]*</a>" in theCode with regexp
-------------------------
This means: "<a href", followed by several characters which can't be ">", followed by ">", followed by any number of characters which can't be "<", followed by "Pakistan", followed by any number of characters which can't be "<", followed by the closing tag.
Depending on what you expect to find in the source, you may need to use a more complex pattern, for instance here I suppose you've got exactly one white space in "a href".
Best regards,
Emmanuel
[1] Namely, the Satimage osax by Satimage-software <
http://www.satimage-software.com>
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.