Hello Yvan,
sorry if I've been unclear, I was in a bit of a hurry and this was just a snippet from a larger script I used to compare different option to convert PDF to text – in which PDFbox was the clear winner.
is a parameter for the java call to reserve 1 GB of memory (Heapspace) for the process. This will usually not make a difference if you leave it out, unless you run out of memory, that is.
I used this for the conversion of hundreds of (mathematical) PDF files, so for the general user this will be too high, I suppose. The explanation is in the java man page:
−Xmsn Specifies the initial size of the memory allocation pool. This value must be a multiple of 1024 greater than 1 MB. Append the letter k or K to indicate kilobytes, the letter m or M to indicate megabytes, the letter g or G to indicate gigabytes, or the letter t or T to indicate terabytes. The default value is 2MB. Examples: −Xms6291456 −Xms6144k −Xms6m
And no, I don't think that you can use PDFBox without the Java engine. But the source code is freely available, if you want to try…
set theFormat to "txt" set classpath to quoted form of POSIX path of ((path to applications folder from user domain as text) & "pdfbox-app-1.8.3.jar") set theFile to POSIX path of ((path to desktop folder as text) & "prix Gerbino.numbers - copie.pdf")
set theCall to "java -Xmx1G -classpath " & classpath & " org.apache.pdfbox.ExtractText -encoding UTF-8 -sort -nonSeq " if theFormat is "Html" then set theCall to theCall & "-html " set theSuffix to ".html" else set theSuffix to ".txt" end if set the text item delimiters to {"."} set newPath to (text items 1 thru -2 of theFile as text) & "-1" & theSuffix do shell script theCall & quoted form of theFile & space & quoted form of newPath
Yvan KOENIG (VALLAURIS, France) dimanche 22 décembre 2013 16:08:36
May I get explanations about the parameters -Xmx1G used in the code posted by Thomas Fisher ? (1) they aren’t described in the Command Line Tools web page dedicated to PdfBox (2) as I am curious, I removed them and the resulting text file is exactly the same with and without them.
Yvan KOENIG (VALLAURIS, France) lundi 23 décembre 2013 11:58:24
|