Re: Reading a pdf text file
Re: Reading a pdf text file
- Subject: Re: Reading a pdf text file
- From: Gil Dawson <email@hidden>
- Date: Mon, 10 Jan 2005 13:41:32 -0800
Several techniques were tried for converting a mostly-text pdf file
to text using AppleScript. This is a report on the results of these
techniques.
1. Sending the pdf file to Adobe Acrobat Elements Server
<email@hidden>. Attached to the return email is a .txt file
that has an extraneous space character at the end of every line. I
did not test the translation of special characters.
2. Executing an AppleScript that uses UI Scripting with Script
Editor 1.8.3 (Classic) to control Adobe Reader 6.0 to Open, Select
All, and Next Page to copy the text page by page to another file.
This script uses System Events and so works only in Panther, even
though Script Editor 1.8.3 runs in Classic mode. The result is the
same as #1, above, without the extraneous line-end space. A
difficulty with this script is we have not yet come up with an
acceptable test for the end of the document. I did not test the
translation of special characters.
3. Executing the same script as #2, above, but using Script Editor
2.0 (X), instead of Script Editor 1.8.3 (Classic). An unusual effect
was that the System Events commands (e.g., "keystroke") reverted to
their "<<class xxxx>>" form after compiling. However, the result is
identical to #2, above.
4. Executing a variation of #2, above, in a machine booted with
9.2.2, but using Sändi's Additions instead of System Events and
Acrobat Reader 4.0 instead of Adobe reader 6.0. Sändi's Additions,
and thus this script, work only in 9.2.2 and earlier. The result is
identical to #2, above.
5. Executing pdftotext, an Open Source viewer for pdf files,
executed in aa shell script called from within an AppleScript to
produce a .txt file in the same folder. The resultant file is
useable, but contains numerous, seemingly sporadic, space characters
which make parsing a bit more difficult. I did not test the
translation of special characters.
6. Executing an AppleScript to control Preview (instead of Adobe
Reader) was suggested but not implemented, because I could not figure
out how to select text with UI commands to Preview.
Thanks for your help, folks. I've learned a lot in the past few days.
--Gil
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden