Re: Reading a pdf text file
Re: Reading a pdf text file
- Subject: Re: Reading a pdf text file
- From: Richard Smykla <email@hidden>
- Date: Sun, 9 Jan 2005 10:44:21 -0500
Title: Re: Reading a pdf text file
Gil,
After installing the xpdf-tools package, the command-line tool
pdftotext can be found in the /sw/bin/ directory. (At least that is
where it installed by default on my machine.) You can then use a
simple Applescript 'do script' command to create a text version of
your pdf. It will be located in the same folder as your original
source pdf if all goes well, and if the source pdf file is not
protected:
set response to POSIX path of ((choose file) as alias)
tell
application
"Terminal"
activate
set commando to "/sw/bin/pdftotext -layout "
& quoted
form of
response
do shell script commando
end
tell
I haven't included any error checking here, and there are various
command line options that you can invoke if you're so inclined. Here's
the short version of help for pdftotext:
pdftotext version 3.00Copyright 1996-2004 Glyph & Cog,
LLC
Usage: pdftotext [options] <PDF-file> [<text-file>]
-f <int>
: first page to convert
-l <int>
: last page to convert
-layout
: maintain original physical layout
-raw : keep strings in content stream order
-htmlmeta :
generate a simple HTML file, including the meta information
-enc <string> : output text
encoding name
-eol <string> : output
end-of-line convention (unix, dos, or mac)
-nopgbrk
: don't insert page breaks between pages
-opw <string> : owner password
(for encrypted files)
-upw <string> : user password
(for encrypted files)
-q : don't print any messages or
errors
-cfg <string> : configuration
file to use in place of .xpdfrc
-v : print copyright and version
info
-h : print usage information
-help : print usage information
--help
: print usage information
-? : print usage information
Hope this helps. . .
Rick
On Jan 7, 2005, at 2:44 PM, Gil Dawson
wrote:
Hello--
Does anyone happen to know how to read with AppleScript the text from
a pdf file that contains mostly text?
Adobe provides a translation service...
http://www.adobe.com/products/acrobat/access_onlinetools.html
...that does exactly what I need, only it's cumbersome to script
sending the pdf file to them and waiting for it to come back.
I was hoping maybe someone had figured out how to read the text out of
a pdf file directly.
You can do exactly what I want manually in the Adobe reader with
selectall, copy, and paste. This procedure works only one page
at a time, but it would be OK for my purposes in a script.
However, the Reader doesn't seem to be scriptable. Also Preview
isn't. (Can this be true?)
I looked at Acropbat V7.0 for $300 on the Adobe website, which would
be OK, but I didn't see any promises about it being scriptable nor its
being able to do what I want. (What is an enterprise solution,
anyway?)
I looked at GhostScript, but it seems to be made for converting into,
not out of, pdf.
Has anyone found a way to do read a pdf in either Panther or 9.2.2 or
both?
Check out the xpdf package - open source and free, and contains the
pdftotext tool.... here's a Mac OSX specific package, though you may
(or may not) prefer to build it from source yourself (in which case
Google for "xpdf source"):
http://users.phg-online.de/tk/MOSXS/
-R
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list
(email@hidden)
Help/Unsubscribe/Update your Subscription:
>rizon.net
This email sent to email@hidden
--
Rick Smykla
email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden