• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: pdftotext
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pdftotext


  • Subject: Re: pdftotext
  • From: Shane Stanley <email@hidden>
  • Date: Sat, 21 Dec 2013 09:28:46 +1100

On 21 Dec 2013, at 8:08 AM, Christopher Stone <email@hidden> wrote:

Is there a better way to script the extraction of text from an unlocked pdf?

That depends on what you mean by "better". if it covers without recourse to third-party software, you could use AppleScript:

-- put in ASObjC-based script library
use framework "Foundation"
use framework "Quartz"

on textInPDF:thePath
set theText to current application's NSMutableString's |string|()
set anNSURL to current application's NSURL's fileURLWithPath:thePath
set theDoc to current application's PDFDocument's alloc()'s initWithURL:anNSURL
set theCount to theDoc's pageCount() as integer
repeat with i from 1 to theCount
set thePage to (theDoc's pageAtIndex:(i - 1))
(theText's appendString:(thePage's |string|()))
end repeat
return theText as text
end textInPDF:

Of course I'm using the Satimage.osax's regex engine to do the heavy lifting.

<scratched_record> Or you could use AppleScript...

-- 
Shane Stanley <email@hidden>
<www.macosxautomation.com/applescript/apps/>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: pdftotext
      • From: Christopher Stone <email@hidden>
References: 
 >pdftotext (From: Christopher Stone <email@hidden>)

  • Prev by Date: Re: Changes in Evernote syntax?
  • Next by Date: Re: Changes in 10.8
  • Previous by thread: Re: pdftotext
  • Next by thread: Re: pdftotext
  • Index(es):
    • Date
    • Thread