Re: Extracting Text from PDF
Re: Extracting Text from PDF
- Subject: Re: Extracting Text from PDF
- From: Bill Janssen <email@hidden>
- Date: Wed, 29 Oct 2008 10:57:28 PDT
- Comments: In-reply-to Philip Aker <email@hidden> message dated "Wed, 29 Oct 2008 10:46:17 -0700."
To extract text, pdftotext is just the right tool.
Bill
Philip Aker <email@hidden> wrote:
> On 2008-10-29, at 10:28:33, Bill Janssen wrote:
>
> > Dan Doughtie <email@hidden> wrote:
> >
> >> Scripting Acrobat is pretty messy. Are there any command line apps
> >> that can
> >> extract text from a PDF similar to extracting text from a Word
> >> document with
> >> Textutil.
>
> > I use xpdf to do that; it includes a command-line app 'pdftotext'
> > which
> > will spit out the text of the document. I've written patches for
> > it, as
> > well, to spit out wordbox info and to spit out links in the doc.
> > Those
> > patches are included in the doceng-toolkit project on SourceForge.
>
> Hey Bill,
>
> I compiled xpdf3.0.2 out of the box just fine. However I'm not able to
> get the extra pdftohtml-0.40a to install because of some problem which
> I can't deduce easily because I'm not real familiar with the ins and
> outs of autoconfig, make, etc.
>
> Do you know how to get that module to integrate with xpdf on Mac OS X?
> Sounds like it would be a solution with better options than pdftotext.
>
>
> Philip Aker
> echo email@hidden@nl | tr a-z@. p-za-o.@
>
> Democracy: Two wolves and a sheep voting on lunch.
>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden