• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Extracting Table Data from Word 14.6.5
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extracting Table Data from Word 14.6.5


  • Subject: Extracting Table Data from Word 14.6.5
  • From: Christopher Stone <email@hidden>
  • Date: Mon, 04 Jul 2016 01:53:25 -0500

Hey Folks,

I had a really large 3 column table in a Word document I needed to process into differently formatted text for import into another application.

In the past I've played the copy and paste game, tried saving as text, printed to PDF and extracted the text from it with `pdftotext`...

Word can be so hard to script that I didn't really give it much consideration – until today.

When I went looking I found this Apple Support Communities item right away, and it gave me enough hints to move forward.

It turns out that getting data from a table is quite easy, although Word introduces control characters that must be removed.

The appended script creates a list-of-lists and writes it to a file for later processing.

My 3 row by 403 column table took a little less than 30 seconds to export.

Reading it back for processing takes a fraction of a second.

So I learned some useful stuff today.

--
Best Regards,
Chris

-------------------------------------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2016/07/03 23:35 
# dMod: 2016/07/04 01:43
# Appl: Microsoft Word 14.6.5
# Task: Extract Table Data from a Word Document
# Libs: None
# Osax: Satimage.osax { http://tinyurl.com/dc3soh }
# Tags: @Applescript, @Script, @Microsoft_Word, @Satimage.osax, @Extract, @Table, @Data
# Note: Tested only with Word 14.6.5 from Office 2011 on OSX 10.11.5.
-------------------------------------------------------------------------------------------

set dataCollator to {}

tell application "Microsoft Word"
   tell active document
      tell table 1
         set rowCount to count of rows
         set columnCount to count of columns

         

         repeat with theRow from 1 to rowCount
            set tempData to {}

            

            tell row theRow
               repeat with theCellNum from 1 to columnCount
                  set end of tempData to content of text object of cell theCellNum
               end repeat
            end tell

            

            --------------------------------------------------------------------------------
            # Remove characters introduced by Word:
            --------------------------------------------------------------------------------
            # Remove BELL character.
            set tempData to cng("\\x{07}", "", tempData) of me
            # Remove LINE TABULATION character.
            set tempData to cng("\\x{0B}", "\\n", tempData) of me
            # Remove Trailing whitespace
            set tempData to cng("\\s+\\Z", "", tempData) of me
            --------------------------------------------------------------------------------

            

            set end of dataCollator to tempData

            

         end repeat

         

      end tell
   end tell
end tell

set targetFile to ((path to desktop folder as text) & "subtitleDataList.lst")
writeFile(dataCollator, targetFile, 0, list)

-------------------------------------------------------------------------------------------
--» HANDLERS
-------------------------------------------------------------------------------------------
on cng(_find, _replace, _data)
   change _find into _replace in _data with regexp without case sensitive
end cng
-------------------------------------------------------------------------------------------
on writeFile(srcData, targetFile, startPosition, dataType)
   try
      set fileRef to open for access targetFile with write permission
      if startPosition = 0 then
         set eof of fileRef to 0
      end if
      write srcData to fileRef as dataType starting at startPosition
      close access fileRef
   on error errMsg number errNum
      try
         close access fileRef
      end try
      error "Error in handler: writeFile of library: gen.lib" & return & return & errMsg
   end try
end writeFile

-------------------------------------------------------------------------------------------

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Prev by Date: Re: Compiler oddities
  • Next by Date: Re: Extracting Table Data from Word 14.6.5
  • Previous by thread: Re: Compiler oddities
  • Next by thread: Re: Extracting Table Data from Word 14.6.5
  • Index(es):
    • Date
    • Thread