• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Reading a pdf text file
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading a pdf text file


  • Subject: Re: Reading a pdf text file
  • From: Gil Dawson <email@hidden>
  • Date: Mon, 10 Jan 2005 13:41:32 -0800

Several techniques were tried for converting a mostly-text pdf file to text using AppleScript. This is a report on the results of these techniques.

1. Sending the pdf file to Adobe Acrobat Elements Server <email@hidden>. Attached to the return email is a .txt file that has an extraneous space character at the end of every line. I did not test the translation of special characters.

2. Executing an AppleScript that uses UI Scripting with Script Editor 1.8.3 (Classic) to control Adobe Reader 6.0 to Open, Select All, and Next Page to copy the text page by page to another file. This script uses System Events and so works only in Panther, even though Script Editor 1.8.3 runs in Classic mode. The result is the same as #1, above, without the extraneous line-end space. A difficulty with this script is we have not yet come up with an acceptable test for the end of the document. I did not test the translation of special characters.

3. Executing the same script as #2, above, but using Script Editor 2.0 (X), instead of Script Editor 1.8.3 (Classic). An unusual effect was that the System Events commands (e.g., "keystroke") reverted to their "<<class xxxx>>" form after compiling. However, the result is identical to #2, above.

4. Executing a variation of #2, above, in a machine booted with 9.2.2, but using Sändi's Additions instead of System Events and Acrobat Reader 4.0 instead of Adobe reader 6.0. Sändi's Additions, and thus this script, work only in 9.2.2 and earlier. The result is identical to #2, above.

5. Executing pdftotext, an Open Source viewer for pdf files, executed in aa shell script called from within an AppleScript to produce a .txt file in the same folder. The resultant file is useable, but contains numerous, seemingly sporadic, space characters which make parsing a bit more difficult. I did not test the translation of special characters.

6. Executing an AppleScript to control Preview (instead of Adobe Reader) was suggested but not implemented, because I could not figure out how to select text with UI commands to Preview.

Thanks for your help, folks.  I've learned a lot in the past few days.

--Gil
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: Reading a pdf text file
      • From: Roger Howard <email@hidden>
References: 
 >Re: Reading a pdf text file (From: Martin Orpen <email@hidden>)
 >Re: Reading a pdf text file (From: Gil Dawson <email@hidden>)

  • Prev by Date: Re: Hidden Folders
  • Next by Date: Re: A request
  • Previous by thread: Re: Reading a pdf text file
  • Next by thread: Re: Reading a pdf text file
  • Index(es):
    • Date
    • Thread