Re: Convert MS Word to HTML
Re: Convert MS Word to HTML
- Subject: Re: Convert MS Word to HTML
- From: Arthur Knapp <email@hidden>
- Date: Fri, 14 Nov 2003 11:20:23 -0500
From: Mats-Olof Liljegren <email@hidden>
Subject: Re: Convert MS Word to HTML
Date: Thu, 13 Nov 2003 14:39:54 +0100
And I would like to get some preformatting like bold, underline etc and
not just clean text.
I have an old solution with regard to getting "clean" text out of
Word while preserving bold and italic. I'm sure that it was probably a
very silly way to go about it, (and I'm prepared for any critisism to
that effect), but the scripts served me well for a long time:
Our problem was that we needed to flow these Word documents in Quark,
applying a variety of complicated "full" styles to the imported text.
In Quark, these means selecting text and then clicking a style sheet
while holding down the option key. The problem was that we always lost
bold and italic in doing this. My solution was to use two scripts: one
that used Word to "mark up" the bold and italic, and another script
that told Quark to "style" text that contained "markup up" data.
My original solution for doing the mark up was this:
tell application "Microsoft Word"
tell document 1
set wordCount to count words
repeat with i from wordCount to 1 by -1
if (bold of word i) then
if (italic of word i) then
if (underline of word i is not none) then
set word i to "`biu`" & (contents of word i) & "`biu`"
else
set word i to "`bi`" & (contents of word i) & "`bi`"
end if
else if (underline of word i is not none) then
set word i to "`bu`" & (contents of word i) & "`bu`"
else
set word i to "`b`" & (contents of word i) & "`b`"
end if
else if (italic of word i) then
if (underline of word i is not none) then
set word i to "`iu`" & (contents of word i) & "`iu`"
else
set word i to "`i`" & (contents of word i) & "`i`"
end if
else if (underline of word i is not none) then
set word i to "`u`" & (contents of word i) & "`u`"
...
Later, I changed to modifying recorded Word:
tell application "Microsoft Word"
activate
do Visual Basic " Selection.Find.ClearFormatting"
do Visual Basic " Selection.Find.Font.Bold = True"
do Visual Basic " Selection.Find.Replacement.ClearFormatting"
do Visual Basic " With Selection.Find
.Text = \"\"
.Replacement.Text = \"``b^&``b\"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With"
do Visual Basic " Selection.Find.Execute Replace:=wdReplaceAll"
do Visual Basic " Selection.Find.ClearFormatting"
do Visual Basic " Selection.Find.Font.Italic = True"
do Visual Basic " Selection.Find.Replacement.ClearFormatting"
do Visual Basic " With Selection.Find
.Text = \"\"
.Replacement.Text = \"``i^&``i\"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With"
do Visual Basic " Selection.Find.Execute Replace:=wdReplaceAll"
do Visual Basic " Selection.Find.ClearFormatting"
do Visual Basic " Selection.Find.Font.Underline = wdUnderlineSingle"
do Visual Basic " Selection.Find.Replacement.ClearFormatting"
do Visual Basic " With Selection.Find
.Text = \"\"
.Replacement.Text = \"``u^&``u\"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With"
do Visual Basic " Selection.Find.Execute Replace:=wdReplaceAll"
end tell
In Quark then, (after applying styles), it was just a question of
styling the marked up words:
tell application "QuarkXPress 4.11"
activate
tell document 1
set styleTags to {"``b", "``i", "``u"}
repeat with storyIndex from 1 to count every story
set theStory to (a reference to story storyIndex)
repeat with tagIndex from 1 to count styleTags
set theTag to item tagIndex of styleTags
try
set offsetPairs to (offset of every text of theStory where (it =
theTag))
on error
set offsetPairs to {}
end try
repeat with offsetIndex from 1 to count offsetPairs by 2
set firstOffset to (item offsetIndex of offsetPairs)
set secondOffset to (item (1 + offsetIndex) of offsetPairs)
set offsetStart to firstOffset + 1
set offsetEnd to secondOffset + (count theTag)
if (theTag contains "b") then ,
set style of text from character offsetStart to ,
character offsetEnd of theStory to bold
if (theTag contains "i") then ,
set style of text from character offsetStart to ,
character offsetEnd of theStory to italic
if (theTag contains "u") then ,
set style of text from character offsetStart to ,
character offsetEnd of theStory to underline
end repeat
try
delete (every text of theStory where it = theTag)
end try
end repeat
end repeat
end tell
end tell
{ Arthur J. Knapp;
<
mailto:email@hidden>;
What...? Oh...!
}
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.