• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Reading a pdf text file
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading a pdf text file


  • Subject: Re: Reading a pdf text file
  • From: Richard Smykla <email@hidden>
  • Date: Sun, 9 Jan 2005 10:44:21 -0500

Title: Re: Reading a pdf text file
Gil,

After installing the xpdf-tools package, the command-line tool pdftotext can be found in the /sw/bin/ directory. (At least that is where it installed by default on my machine.) You can then use a simple Applescript 'do script' command to create a text version of your pdf. It will be located in the same folder as your original source pdf if all goes well, and if the source pdf file is not protected:

set response to POSIX path of ((choose file) as alias)
tell application "Terminal"
      
activate
       
set commando to "/sw/bin/pdftotext -layout " & quoted form of response
 
do shell script commando
end tell

I haven't included any error checking here, and there are various command line options that you can invoke if you're so inclined. Here's the short version of help for pdftotext:

pdftotext version 3.00Copyright 1996-2004 Glyph & Cog, LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
  -f <int>          : first page to convert
  -l <int>          : last page to convert
  -layout           : maintain original physical layout
  -raw              : keep strings in content stream order
  -htmlmeta         : generate a simple HTML file, including the meta information
  -enc <string>     : output text encoding name
  -eol <string>     : output end-of-line convention (unix, dos, or mac)
  -nopgbrk          : don't insert page breaks between pages
  -opw <string>     : owner password (for encrypted files)
  -upw <string>     : user password (for encrypted files)
  -q                : don't print any messages or errors
  -cfg <string>     : configuration file to use in place of .xpdfrc
  -v                : print copyright and version info
  -h                : print usage information
  -help             : print usage information
  --help            : print usage information
  -?                : print usage information

Hope this helps. . .

Rick

On Jan 7, 2005, at 2:44 PM, Gil Dawson wrote:
Hello--

Does anyone happen to know how to read with AppleScript the text from a pdf file that contains mostly text?

Adobe provides a translation service...

http://www.adobe.com/products/acrobat/access_onlinetools.html

...that does exactly what I need, only it's cumbersome to script sending the pdf file to them and waiting for it to come back.

I was hoping maybe someone had figured out how to read the text out of a pdf file directly.

You can do exactly what I want manually in the Adobe reader with selectall, copy, and paste.  This procedure works only one page at a time, but it would be OK for my purposes in a script.  However, the Reader doesn't seem to be scriptable.  Also Preview isn't.  (Can this be true?)

I looked at Acropbat V7.0 for $300 on the Adobe website, which would be OK, but I didn't see any promises about it being scriptable nor its being able to do what I want.  (What is an enterprise solution, anyway?)

I looked at GhostScript, but it seems to be made for converting into, not out of, pdf.

Has anyone found a way to do read a pdf in either Panther or 9.2.2 or both?

Check out the xpdf package - open source and free, and contains the pdftotext tool.... here's a Mac OSX specific package, though you may (or may not) prefer to build it from source yourself (in which case Google for "xpdf source"):

http://users.phg-online.de/tk/MOSXS/

-R

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
>rizon.net

This email sent to email@hidden


--
Rick Smykla
email@hidden
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Reading a pdf text file
      • From: Rob Stott <email@hidden>
    • Re: Reading a pdf text file
      • From: Gil Dawson <email@hidden>
    • Re: Reading a pdf text file
      • From: Paul Berkowitz <email@hidden>
References: 
 >Reading a pdf text file (From: Gil Dawson <email@hidden>)
 >Re: Reading a pdf text file (From: Roger Howard <email@hidden>)

  • Prev by Date: Re: Converting Hex to binary
  • Next by Date: Re: FileMaker 7 AppleScript problems...
  • Previous by thread: Re: Reading a pdf text file
  • Next by thread: Re: Reading a pdf text file
  • Index(es):
    • Date
    • Thread