• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: pdftotext
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pdftotext


  • Subject: Re: pdftotext
  • From: "koenig.yvan" <email@hidden>
  • Date: Sun, 22 Dec 2013 15:48:07 +0100


Le 22/12/2013 à 14:17, Thomas Fischer <email@hidden> a écrit :

Hello,

if you want to do any serious conversion from PDF to text I would advise to try PDFbox (http://pdfbox.apache.org/). It is much better than pdftotext or Skim or the Apple built-in tools in recognising non-ascii characters and spaces. I use a script that contains something like

set classpath to "…"
set theFile to "…"
set theCall to "java -Xmx1G -classpath " & classpath & " org.apache.pdfbox.ExtractText -encoding UTF-8 -sort -nonSeq "
if theFormat is "Html" then
set theCall to theCall & "-html "
set theSuffix to ".html"
else
set theSuffix to ".txt"
end if
set the text item delimiters to {"."}
set newPath to (text items 1 thru -2 of theFile as text) & "-1" & theSuffix
do shell script theCall & quoted form of theFile & space & quoted form of newPath

Best
Thomas


Hello Thomas

May you explain to an ass like me what is supposed to be the true value of class path.

I just downloaded  pdfbox-app-1.8.3.jar

I assumes that it's the quoted form of the Posix Path of the jar file but I wish to check before running it.

No problem for theFile.

Yvan KOENIG (VALLAURIS, France) dimanche 22 décembre 2013 15:48:02



 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: pdftotext
      • From: "koenig.yvan" <email@hidden>
References: 
 >pdftotext (From: Christopher Stone <email@hidden>)
 >Re: pdftotext (From: Shane Stanley <email@hidden>)
 >Re: pdftotext (From: Christopher Stone <email@hidden>)
 >Re: pdftotext (From: Thomas Fischer <email@hidden>)

  • Prev by Date: Re: pdftotext
  • Next by Date: Re: pdftotext
  • Previous by thread: Re: pdftotext
  • Next by thread: Re: pdftotext
  • Index(es):
    • Date
    • Thread