Re: script to remove html
Re: script to remove html
- Subject: Re: script to remove html
- From: has <email@hidden>
- Date: Tue, 16 Oct 2001 00:56:09 +0100
>
Date: Mon, 15 Oct 2001 23:24:58 +0200
>
To: email@hidden
>
From: Miro Lucassen <email@hidden>
>
Subject: script to remove html
>
>
A friend advised me this list for an applescript question.I have a
>
datafile from an online bank. It contains some html-codes that i replace
>
with tabs to use the data in an AppleWorks database. After this
>
replacement-script i have a file with data and an html-header of 25 to 30
>
lines. Now i want to write a subroutine that strips all remaining
>
html-code plus the other text in this 'header' - this information is not
>
needed.What i cannot find out is how to write a script that removes lines
>
like the following:
Not entirely clear what you want to keep and what you want to remove; if
it's simple, predictable stuff then you can use Applescript's Text Item
Delimiters to do basic find & replace, but for complex text processing you
probably want to use Regular Expressions (aka grep) which are like Find &
Replace wildcards on steroids.
AS doesn't come with regex as standard (tsk), but there's plenty ways to
add them, either through a 3rd-party addition such as RegEx Commands**
(which also comes with some very good documentation)
http://www.lazerware.com/software.html or by using a scriptable app such as
BBEdit (full version, not lite) which has a grep facility.
It'll take a wee bit of practice to get used to using regular expressions,
but stick with it cos they're wonderful things.
has
**I just sent Leonard Rosenthol an email re one typo and one omission in
the manual, and a bug in the osax. Hopefully he'll fix these and then I can
thereafter make like a complete and shameless RegEx sycophant with a
completely clear conscience. Not that these (relatively) minor points
should stop you diving into RegEx Commands or a similar OSAXen.