• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: pdftotext
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pdftotext


  • Subject: Re: pdftotext
  • From: Thomas Fischer <email@hidden>
  • Date: Sun, 22 Dec 2013 14:17:58 +0100

Hello,

if you want to do any serious conversion from PDF to text I would advise to try PDFbox (http://pdfbox.apache.org/). It is much better than pdftotext or Skim or the Apple built-in tools in recognising non-ascii characters and spaces. I use a script that contains something like

set classpath to "…"
set theFile to "…"
set theCall to "java -Xmx1G -classpath " & classpath & " org.apache.pdfbox.ExtractText -encoding UTF-8 -sort -nonSeq "
if theFormat is "Html" then
set theCall to theCall & "-html "
set theSuffix to ".html"
else
set theSuffix to ".txt"
end if
set the text item delimiters to {"."}
set newPath to (text items 1 thru -2 of theFile as text) & "-1" & theSuffix
do shell script theCall & quoted form of theFile & space & quoted form of newPath

Do not post admin requests to the list. They will be ignored. AppleScript-Users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: Archives: http://lists.apple.com/archives/applescript-users This email sent to email@hidden
  • Follow-Ups:
    • Re: pdftotext
      • From: Christopher Stone <email@hidden>
    • Re: pdftotext
      • From: "koenig.yvan" <email@hidden>
References: 
 >pdftotext (From: Christopher Stone <email@hidden>)
 >Re: pdftotext (From: Shane Stanley <email@hidden>)
 >Re: pdftotext (From: Christopher Stone <email@hidden>)

  • Prev by Date: Re: pdftotext
  • Next by Date: Re: pdftotext
  • Previous by thread: Re: pdftotext
  • Next by thread: Re: pdftotext
  • Index(es):
    • Date
    • Thread