pdftotext
pdftotext
- Subject: pdftotext
- From: Christopher Stone <email@hidden>
- Date: Fri, 20 Dec 2013 15:08:23 -0600
Hey Folks,
There's a nice little unix executable called pdftotext.
I needed to rename a bunch of receipts in my web-receipts folder that were conducive to automation.
Fifteen minutes of scripting just saved me roughly 1 1/2 hours of work, so I'm going to build upon that foundation and write a smart-receipt-renamer script.
A snippet with the text-extracter code and the date-conversion code:
-------------------------------------------------------------------------------------------
tell application "Finder" set sel to selection as alias list end tell
if sel ≠ {} then set _file to first item of sel set _filePx to quoted form of (POSIX path of _file)
set _data to do shell script "/usr/local/bin/pdftotext -raw " & _filePx & " -"
# Search for desired information and extract if found. # ...
# Reformat date using the shell with input format: Jan 01, 2014 ouput: 2014-01-01 set _cmd to "date -j -f \"%d-%b-%y\" \"" & _date & "\" \"+%Y-%m-%d\"" set _date to do shell script _cmd
end if
-------------------------------------------------------------------------------------------
Of course I'm using the Satimage.osax's regex engine to do the heavy lifting.
My question is:
Is there a better way to script the extraction of text from an unlocked pdf?
Thanks.
-- Best Regards, Chris
|
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden