Re: theSIMS -- Applescript copy paste from Explorer
Re: theSIMS -- Applescript copy paste from Explorer
- Subject: Re: theSIMS -- Applescript copy paste from Explorer
- From: Kai Edwards <email@hidden>
- Date: Tue, 19 Mar 2002 05:48:17 +0000
on 17/3/02 11:58 pm, Applescript User Lewis at email@hidden
wrote:
>
At 16:04 +0000 17/03/02, Kai Edwards wrote: on 17/3/02 7:31 am, Applescript
>
User Lewis at email@hidden wrote:
>
>
>> This is very nice _IF_ the page is short. Otherwise, there is a stack
>
>> overflow
>
>>
>
>>> set html to html's text items
>
>>>
>
>> there.
>
>>
>
> OMM, I'm afraid that just gives me a list of all text characters returned by
>
> OE's 'GetSource' command - and it still includes all html tags and codes.
>
>
>
Well, the script was set text item delims to "<" then set html to html's text
>
items. This was then repeated with the tids set to ">".
Yeah - it seems I completely misinterpreted your original comment. My
confusion was partly due to an inability to replicate the problem on my
system. I'm sorry for being so obtuse! :-(
I started to address the problem by modifying the list handler I posted
recently. However, I took a look at some of has's excellent work on the
subject - and realised that his approach was much more efficient (thanks for
that, has)! :-)
I still haven't fallen foul of the problem, so the incorporated fix is
untested here. But for the benefit of any newcomers (and with apologies to
the older hands), here's the script again with the mods:
----------------------------------------
property chunkSize : 3600
property cleanCharList : {[opt-L]
{"“", ASCII character 210}, [opt-L]
{"”", ASCII character 211}, [opt-L]
{"—", ASCII character 208}, [opt-L]
{"’", ASCII character 213}, [opt-L]
{" ", ASCII character 32}, [opt-L]
{space & space, space}, [opt-L]
{tab & tab, tab}, [opt-L]
{return & space, return}, [opt-L]
{space & return, return}, [opt-L]
{return & tab, return}, [opt-L]
{tab & return, return}, [opt-L]
{return & return & return, return & return} [opt-L]
} -- modify list as required
on run
tell application "Internet Explorer" to set html to GetSource
set txt to stripTags(html)
set txt to cleanChars(txt) -- comment out for faster (dirtier) results
set txt to trimTxt(txt) -- ditto
set the clipboard to txt -- or do something else with it
end run
on stripTags(html)
set text item delimiters to "<"
set html to html's getTextItems(html)
set text item delimiters to ">"
set html to getTextItems(html as string)
set text item delimiters to ""
set txt to ""
repeat with n from 1 to count html by 2
set txt to txt & item n of html
end repeat
txt
end stripTags
on getTextItems(txt)
try
txt's text items
on error number -2706
chunkTextItems(txt)
end try
end getTextItems
on chunkTextItems(txt)
set itemCount to count txt's text items
try
if itemCount < chunkSize then return txt's text items
set theList to {}
set lastChunk to itemCount mod chunkSize
repeat with n from 1 to (itemCount - lastChunk) by chunkSize
set theList to theList & txt's text items n thru (n + chunkSize - 1)
end repeat
if lastChunk = 0 then return theList
theList & txt's text items -lastChunk thru -1
on error number -2706
set chunkSize to ((chunkSize) div 4) * 3
chunkTextItems(txt)
end try
end chunkTextItems
on cleanChars(txt)
repeat with oldNew in cleanCharList
set {oldChar, newChar} to {oldNew's item 1, oldNew's item 2}
repeat while oldChar is in txt
set text item delimiters to oldChar
set txt to txt's text items
set text item delimiters to newChar
set txt to (txt as string)
end repeat
end repeat
set text item delimiters to ""
txt
end cleanChars
on trimTxt(txt)
repeat while txt starts with return
set txt to txt's text 2 thru end
end repeat
repeat while txt ends with return
set txt to txt's text 1 thru -2
end repeat
txt
end trimTxt
----------------------------------------
Best wishes.
Kai
--
**********************************
Kai Edwards Creative Resources
1 Compton Avenue Brighton UK
Telephone +44 (0)1273 326810
**********************************
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.