• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Parsing comments from HTML...
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing comments from HTML...


  • Subject: Re: Parsing comments from HTML...
  • From: "Marc K. Myers" <email@hidden>
  • Date: Fri, 1 Nov 2002 13:51:30 -0500

Subject: Parsing comments from HTML...
Date: Fri, 1 Nov 2002 10:28:07 -0600
From: Peter Bunn <email@hidden>
To: "AppleScript Users Mailing List" <email@hidden>

Hello:

I'm writing a script that tries to retrieve text which has been commented
out in HTML. (The comment text is automatically generated by another
script so the formatting is controllable/predictable.) Below is a script
snippet adapted from an AS Guidebook example. Using two 'unique'
characters (common as shown here; non-ASCII in the actual script), the
script does, in effect, read and return the items listed between the
symbols... but as the amount of HTML grows, everything slows to a crawl
(roughly 2 minutes to retrieve 100 items from an HTML page of 100K).

I've tried other methods - involving 'read to the offset of' and tid's,
but haven't had much luck... mostly just shots in the dark, owing to my
inexperience.

I wonder if there's a way to speed up the process?

As an added bonus, if there's a way to sort the final list
alphabetically, that would be of great interest also.

Any/all suggestions are most welcome.

Thanks.

Peter B.

OS 8.6 / AS 1.3.7

(I've left the HTML comment symbols out in case the list server wouldn't
handle them properly...)

-----

set the_read to "
*Cow
Chicken
Pig
$

*Duck
Goose
$"

set _copy_ to false
set the the_list to ""

repeat with this_character in the_read
set this_character to the contents of this_character
if this_character is "*" then
set _copy_ to true
else if this_character is "$" then
set _copy_ to false
else if _copy_ is true then
set the_list to the_list & this_character
end if
end repeat

set _Priors_ to AppleScript's text item delimiters
set AppleScript's text item delimiters to return
set the_reindex_list to (every text item of the_list) as list
set AppleScript's text item delimiters to _Priors_

set the_reindex_list to text items 1 through -2 of the_reindex_list as
list

-->{"Cow", "Chicken", "Pig", "Duck", "Goose"}

This can be done quickly and efficiently using the inexpensive and highly scriptable text editor Tex-Edit Plus:

tell application "Tex-Edit Plus"
make new window at beginning
set contents of window 1 to "*Cow
Chicken
Pig
$

*Duck
Goose
$"
set outList to {}
set fndIt to (search window 1 looking for "*^*$")
repeat while fndIt is true
set theHit to the selection
set theHit to text 2 thru -2 of the theHit
set outList to outList & (words of theHit)
set fndIt to (search window 1 looking for "*^*$" finding next)
end repeat
end tell
outList

Marc K. Myers <email@hidden>
http://AppleScriptsToGo.com
4020 W.220th St.
Fairview Park, OH 44126
(440) 331-1074

[11/01/02 1:51:11 PM]
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Prev by Date: RE: applescript-users digest, Vol 3 #1025 - 14 msgs
  • Next by Date: enumerated types: best way to fake them?
  • Previous by thread: Parsing comments from HTML...
  • Next by thread: Re: Parsing comments from HTML...
  • Index(es):
    • Date
    • Thread