• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
pdftotext
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

pdftotext


  • Subject: pdftotext
  • From: Christopher Stone <email@hidden>
  • Date: Fri, 20 Dec 2013 15:08:23 -0600

Hey Folks,

There's a nice little unix executable called pdftotext.

http://www.bluem.net/en/mac/packages/

I needed to rename a bunch of receipts in my web-receipts folder that were conducive to automation.

Fifteen minutes of scripting just saved me roughly 1 1/2 hours of work, so I'm going to build upon that foundation and write a smart-receipt-renamer script.

A snippet with the text-extracter code and the date-conversion code:

-------------------------------------------------------------------------------------------

tell application "Finder"
set sel to selection as alias list
end tell

if sel ≠ {} then
set _file to first item of sel
set _filePx to quoted form of (POSIX path of _file)


set _data to do shell script "/usr/local/bin/pdftotext -raw " & _filePx & " -"


# Search for desired information and extract if found.
# ...


# Reformat date using the shell with input format: Jan 01, 2014 ouput: 2014-01-01
set _cmd to "date -j -f \"%d-%b-%y\" \"" & _date & "\" \"+%Y-%m-%d\""
set _date to do shell script _cmd


end if

-------------------------------------------------------------------------------------------

Of course I'm using the Satimage.osax's regex engine to do the heavy lifting.

My question is:

Is there a better way to script the extraction of text from an unlocked pdf?

Thanks.

--
Best Regards,
Chris

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: pdftotext
      • From: Shane Stanley <email@hidden>
    • Re: pdftotext
      • From: Deivy Petrescu <email@hidden>
  • Prev by Date: Re: Changes in Evernote syntax?
  • Next by Date: Re: pdftotext
  • Previous by thread: Re: Rewriting Mach in AppleScript?!
  • Next by thread: Re: pdftotext
  • Index(es):
    • Date
    • Thread