• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Convert MS Word to HTML
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Convert MS Word to HTML


  • Subject: Re: Convert MS Word to HTML
  • From: Arthur Knapp <email@hidden>
  • Date: Fri, 14 Nov 2003 11:20:23 -0500

From: Mats-Olof Liljegren <email@hidden>
Subject: Re: Convert MS Word to HTML
Date: Thu, 13 Nov 2003 14:39:54 +0100

And I would like to get some preformatting like bold, underline etc and
not just clean text.

I have an old solution with regard to getting "clean" text out of Word while preserving bold and italic. I'm sure that it was probably a very silly way to go about it, (and I'm prepared for any critisism to that effect), but the scripts served me well for a long time:

Our problem was that we needed to flow these Word documents in Quark, applying a variety of complicated "full" styles to the imported text. In Quark, these means selecting text and then clicking a style sheet while holding down the option key. The problem was that we always lost bold and italic in doing this. My solution was to use two scripts: one that used Word to "mark up" the bold and italic, and another script that told Quark to "style" text that contained "markup up" data.

My original solution for doing the mark up was this:

tell application "Microsoft Word"
tell document 1
set wordCount to count words
repeat with i from wordCount to 1 by -1
if (bold of word i) then
if (italic of word i) then
if (underline of word i is not none) then
set word i to "`biu`" & (contents of word i) & "`biu`"
else
set word i to "`bi`" & (contents of word i) & "`bi`"
end if
else if (underline of word i is not none) then
set word i to "`bu`" & (contents of word i) & "`bu`"
else
set word i to "`b`" & (contents of word i) & "`b`"
end if
else if (italic of word i) then
if (underline of word i is not none) then
set word i to "`iu`" & (contents of word i) & "`iu`"
else
set word i to "`i`" & (contents of word i) & "`i`"
end if
else if (underline of word i is not none) then
set word i to "`u`" & (contents of word i) & "`u`"
...

Later, I changed to modifying recorded Word:

tell application "Microsoft Word"
activate
do Visual Basic " Selection.Find.ClearFormatting"
do Visual Basic " Selection.Find.Font.Bold = True"
do Visual Basic " Selection.Find.Replacement.ClearFormatting"
do Visual Basic " With Selection.Find
.Text = \"\"
.Replacement.Text = \"``b^&``b\"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With"
do Visual Basic " Selection.Find.Execute Replace:=wdReplaceAll"
do Visual Basic " Selection.Find.ClearFormatting"
do Visual Basic " Selection.Find.Font.Italic = True"
do Visual Basic " Selection.Find.Replacement.ClearFormatting"
do Visual Basic " With Selection.Find
.Text = \"\"
.Replacement.Text = \"``i^&``i\"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With"
do Visual Basic " Selection.Find.Execute Replace:=wdReplaceAll"
do Visual Basic " Selection.Find.ClearFormatting"
do Visual Basic " Selection.Find.Font.Underline = wdUnderlineSingle"
do Visual Basic " Selection.Find.Replacement.ClearFormatting"
do Visual Basic " With Selection.Find
.Text = \"\"
.Replacement.Text = \"``u^&``u\"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With"
do Visual Basic " Selection.Find.Execute Replace:=wdReplaceAll"
end tell

In Quark then, (after applying styles), it was just a question of styling the marked up words:

tell application "QuarkXPress 4.11"
activate
tell document 1
set styleTags to {"``b", "``i", "``u"}
repeat with storyIndex from 1 to count every story
set theStory to (a reference to story storyIndex)
repeat with tagIndex from 1 to count styleTags
set theTag to item tagIndex of styleTags
try
set offsetPairs to (offset of every text of theStory where (it = theTag))
on error
set offsetPairs to {}
end try
repeat with offsetIndex from 1 to count offsetPairs by 2
set firstOffset to (item offsetIndex of offsetPairs)
set secondOffset to (item (1 + offsetIndex) of offsetPairs)
set offsetStart to firstOffset + 1
set offsetEnd to secondOffset + (count theTag)
if (theTag contains "b") then ,
set style of text from character offsetStart to ,
character offsetEnd of theStory to bold
if (theTag contains "i") then ,
set style of text from character offsetStart to ,
character offsetEnd of theStory to italic
if (theTag contains "u") then ,
set style of text from character offsetStart to ,
character offsetEnd of theStory to underline
end repeat
try
delete (every text of theStory where it = theTag)
end try
end repeat
end repeat
end tell
end tell

{ Arthur J. Knapp;
<mailto:email@hidden>;

What...? Oh...!
}
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Prev by Date: Filtering Mail by content.
  • Next by Date: Re : Script tell the wrong app !
  • Previous by thread: Re: Convert MS Word to HTML
  • Next by thread: Re: Convert MS Word to HTML
  • Index(es):
    • Date
    • Thread