• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Reading a pdf text file
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading a pdf text file


  • Subject: Re: Reading a pdf text file
  • From: Richard Smykla <email@hidden>
  • Date: Tue, 25 Jan 2005 07:15:10 -0500

Title: Re: Reading a pdf text file
Rob,

Does the text file that's being output handle the umlauts correctly, or is it just the final 'cat' statement that's failing? BTW, your outputPosixPath is not 'quoted form' in the 'cat' statement, so the cat statement will fail if there are any characters in the filename that are 'special' to the shell.

I don't have any German or encrypted files to test this on, so I had to remove those options to get it to work; BUT pdftotext does not seem to like your '-enc utf-8' construction. I'm not sure what the correct syntax is for that one, as the usage information is rather terse. You might want to drop the developer a line, unless else someone here knows?

Rick

On 9 Jan 2005, at 15:44, Richard Smykla wrote:

After installing the xpdf-tools package, the command-line tool pdftotext can be found in the /sw/bin/ directory. (At least that is where it installed by default on my machine.) You can then use a simple Applescript 'do script' command to create a text version of your pdf.


Hi all,

I've been following this thread with interest. I have the following chunk of applescript...

set pdfPosixPath to POSIX path of (choose file with prompt "Please find the PDF")
set outputPosixPath to (pdfPosixPath & ".txt")

set theResult to do shell script ("/usr/local/pdftotext -eol mac  opw  1234 -raw  " & quoted form of pdfPosixPath & " " & quoted form of outputPosixPath & " ; cat " & outputPosixPath)

...this works exactly as I wanted for my English files, but certain 'foreign' characters come out strangely, for example the German character "ΓΌ". Apparently adding the option "-enc utf-8" should work so we end up with...

set theResult to do shell script ("/usr/local/pdftotext -eol mac  opw  1234 -raw -enc utf-8 " & quoted form of pdfPosixPath & " " & quoted form of outputPosixPath & " ; cat " & outputPosixPath)


...but for some reason this doesn't seem to work at all - am I doing something daft? Does anyone have any suggestions?

Any hints would be appreciated,

Thanks in advance
Rob

--
Rick Smykla
email@hidden
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Prev by Date: Re: Reading a pdf text file
  • Next by Date: Re: Reading a pdf text file
  • Previous by thread: Re: Reading a pdf text file
  • Next by thread: Fetch
  • Index(es):
    • Date
    • Thread