• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: collectdata
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: collectdata


  • Subject: Re: collectdata
  • From: Thomas Fischer <email@hidden>
  • Date: Tue, 07 Mar 2017 19:35:22 +0100

Hello Shane,

my doubts refer to the possibility of recreating a PDF file from the extracted text. If you build it into a new one it might work, but I didn’t test it.

I tried to stick with the original PDF file, modifying it using Skim.
Here is my suggestion. It starts with opening the file considered in Skim, but you can get the text any other way, if you like. But I don’t know of another way to add highlights to a given PDF through AppleScript. Depending on the number of hits, this might take a while.

@Julien: I didn’t understand what the meaning of the second search was, but obviously it could be done in a similar way. I changed the regular _expression_ to the (for me) more readable "\d{3} \d{4}“, the same as "[0-9][0-9][0-9][ ][0-9][0-9][0-9][0-9]“.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

tell application "Skim"
set myText to text of front document
end tell

tell application "BBEdit"
make new text document
set text of text document 1 to myText
set thequery1 to "\\d{3} \\d{4}"
try
set myFind to found matches of (find thequery1 searching in text document 1 options {search mode:grep, extend selection:false, returning results:true})
on error
display dialog "Nothing found."
return
end try
set myFindRef to a reference to myFind
set myHits to {}
set myHitsRef to a reference to myHits
repeat with theFound in myFindRef
copy the match_string of theFound to the end of myHitsRef
end repeat
close text document 1 saving no
end tell

tell application "Skim"
tell document 1
repeat with theItem in myHitsRef
set theSel to find it text theItem
repeat while theSel is not {}
--make new note with properties {type:highlight note, selection:theSel} # old Skim, ≤ 1.4.9
make new note with data theSel with properties {type:highlight note} # new Skim, ≥ 1.2.26, for in between, try
set theSel to find it text theItem from theSel
end repeat
end repeat
end tell
end tell

Cheers
Thomas


Am 07.03.2017 um 13:50 schrieb Shane Stanley <email@hidden>:

On 7 Mar 2017, at 7:58 pm, Julien Battist <email@hidden> wrote:

It seems not possible, is that the conclusion?

Don't you become a doubting Thomas too!

OK, this is proof-of-concept. Choose a file and it will search for all runs of numbers, highlight them, save as a new PDF, and return a list of records in the form:

{{pageNumber:1, foundStrings:{"21", "06", "2016", "4", "54", "21"}}, {pageNumber:2, foundStrings:{"21", "06", "2016", "4", "54", "21"}} ,...}

The code needs cleaning up and modifying to deal with your multiple searches, and hopefully packaged into a handler. But it should give you something to work with:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "Quartz" -- for PDF stuff
use scripting additions

set thePath to POSIX path of (choose file of type {"pdf"})
-- define search pattern
set {theRegex, theError} to current application's NSRegularExpression's regularExpressionWithPattern:"\\d+" options:0 |error|:(reference)
-- make URL
set anNSURL to current application's |NSURL|'s fileURLWithPath:thePath
-- make destination URL
set oldName to anNSURL's lastPathComponent()'s stringByDeletingPathExtension()
set newName to oldName's stringByAppendingString:"-2.pdf"
set destURL to anNSURL's URLByDeletingLastPathComponent()'s URLByAppendingPathComponent:newName
-- open doc and count pages
set theDoc to current application's PDFDocument's alloc()'s initWithURL:anNSURL
set theCount to theDoc's pageCount() as integer
-- create list to hold results
set findInfo to {}
-- loop through pages
repeat with i from 1 to theCount
-- get page and its text
set thePage to (theDoc's pageAtIndex:(i - 1))
set theText to thePage's |string|()
-- do search
set theRanges to (theRegex's matchesInString:theText options:0 range:{0, theText's |length|()})
-- create list to hold contents of matches
set pageFinds to {}
-- loop through matches found
repeat with aFind in theRanges
-- get the range and start and finish indexes
set foundRange to aFind's range()
set theStart to foundRange's location
set theEnd to theStart + (foundRange's |length|) - 1
-- get the matched text
set end of pageFinds to (theText's substringWithRange:foundRange) as text
-- get bounds of the matched text
set startBounds to (thePage's characterBoundsAtIndex:theStart)
set endBounds to (thePage's characterBoundsAtIndex:theEnd)
-- combine to get word bounds; assuming they're on a single line
set wordBounds to current application's NSUnionRect(startBounds, endBounds)
-- make highlight annotation
set theHighlight to (current application's PDFAnnotationMarkup's alloc()'s initWithBounds:wordBounds)
(theHighlight's setMarkupType:(current application's kPDFMarkupTypeHighlight))
-- add to page
(thePage's addAnnotation:theHighlight)
end repeat
-- update result
set end of findInfo to {pageNumber:i, foundStrings:pageFinds}
end repeat
-- save new PDF
theDoc's writeToURL:destURL
-- return result
return findInfo


-- 
Shane Stanley <email@hidden>
<www.macosxautomation.com/applescript/apps/>, <latenightsw.com>

Do not post admin requests to the list. They will be ignored. AppleScript-Users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: Archives: http://lists.apple.com/archives/applescript-users This email sent to email@hidden
  • Follow-Ups:
    • Re: collectdata
      • From: Julien Battist <email@hidden>
    • Re: collectdata
      • From: Shane Stanley <email@hidden>
References: 
 >collectdata (From: Julien Battist <email@hidden>)
 >Re: collectdata (From: Shane Stanley <email@hidden>)
 >Re: collectdata (From: Thomas Fischer <email@hidden>)
 >Re: collectdata (From: Julien Battist <email@hidden>)
 >Re: collectdata (From: Yvan KOENIG <email@hidden>)
 >Re: collectdata (From: Christopher Stone <email@hidden>)
 >Re: collectdata (From: Thomas Fischer <email@hidden>)
 >Re: collectdata (From: Julien Battist <email@hidden>)
 >Re: collectdata (From: Shane Stanley <email@hidden>)

  • Prev by Date: Re: Scripting Blackboard (Learning Management System)
  • Next by Date: Re: AppleScript Versions per Iteration of OSX?
  • Previous by thread: Re: collectdata
  • Next by thread: Re: collectdata
  • Index(es):
    • Date
    • Thread