• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: collectdata
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: collectdata


  • Subject: Re: collectdata
  • From: Julien Battist <email@hidden>
  • Date: Wed, 08 Mar 2017 11:57:49 +0100
  • Importance: normal
  • Sensitivity: Normal

Hi Shane,
 
On your response "Don't you become a doubting Thomas too!"
I meant whether this can be done in plain applescript code not in AppleScriptObjC... :)
 
Anyway, @Thomas, that looks pretty close to what I want to accomplish... 
the query2 is to prevent overlapping hits.
 
In the document your have a number 1234 and 12345....
For example I look of "1234" (as a regular _expression_)
Also I look for "12345" (as a regular _expression_)
Output will be:
hit 1234
then 1234 for the 2nd time
then 12345
 
Nubers are in Cells (pdf is created from indesign) or in textboxes so end delimiter can be nothing or a space or a tab.
Maybe that's worth an other treat with focus on 'regular expresssions' :)
 
Main target is code below with focus on visible tracking of hits and I believe I can use this to finetune what I need to do.
 
Many thanks Thomas and Shane for your effort !
Julien
 
 
 
 
Sent: Tuesday, March 07, 2017 at 7:35 PM
From: "Thomas Fischer" <email@hidden>
To: "AS users" <email@hidden>
Subject: Re: collectdata
Hello Shane,
 
my doubts refer to the possibility of recreating a PDF file from the extracted text. If you build it into a new one it might work, but I didn’t test it.
 
I tried to stick with the original PDF file, modifying it using Skim.
Here is my suggestion. It starts with opening the file considered in Skim, but you can get the text any other way, if you like. But I don’t know of another way to add highlights to a given PDF through AppleScript. Depending on the number of hits, this might take a while.
 
@Julien: I didn’t understand what the meaning of the second search was, but obviously it could be done in a similar way. I changed the regular _expression_ to the (for me) more readable "\d{3} \d{4}“, the same as "[0-9][0-9][0-9][ ][0-9][0-9][0-9][0-9]“.
 
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
 
tell application "Skim"
set myText to text of front document
end tell
 
tell application "BBEdit"
make new text document
set text of text document 1 to myText
set thequery1 to "\\d{3} \\d{4}"
try
set myFind to found matches of (find thequery1 searching in text document 1 options {search mode:grep, extend selection:false, returning results:true})
on error
display dialog "Nothing found."
return
end try
set myFindRef to a reference to myFind
set myHits to {}
set myHitsRef to a reference to myHits
repeat with theFound in myFindRef
copy the match_string of theFound to the end of myHitsRef
end repeat
close text document 1 saving no
end tell
 
tell application "Skim"
tell document 1
repeat with theItem in myHitsRef
set theSel to find it text theItem
repeat while theSel is not {}
--make new note with properties {type:highlight note, selection:theSel} # old Skim, ≤ 1.4.9
make new note with data theSel with properties {type:highlight note} # new Skim, ≥ 1.2.26, for in between, try
set theSel to find it text theItem from theSel
end repeat
end repeat
end tell
end tell
 
Cheers
Thomas
 
 
Am 07.03.2017 um 13:50 schrieb Shane Stanley <email@hidden>:
 
On 7 Mar 2017, at 7:58 pm, Julien Battist <email@hidden> wrote:
 
It seems not possible, is that the conclusion?
Don't you become a doubting Thomas too!
 
OK, this is proof-of-concept. Choose a file and it will search for all runs of numbers, highlight them, save as a new PDF, and return a list of records in the form:
 
{{pageNumber:1, foundStrings:{"21", "06", "2016", "4", "54", "21"}}, {pageNumber:2, foundStrings:{"21", "06", "2016", "4", "54", "21"}} ,...}
 
The code needs cleaning up and modifying to deal with your multiple searches, and hopefully packaged into a handler. But it should give you something to work with:
 
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "Quartz" -- for PDF stuff
use scripting additions
 
set thePath to POSIX path of (choose file of type {"pdf"})
-- define search pattern
set {theRegex, theError} to current application's NSRegularExpression's regularExpressionWithPattern:"\\d+" options:0 |error|:(reference)
-- make URL
set anNSURL to current application's |NSURL|'s fileURLWithPath:thePath
-- make destination URL
set oldName to anNSURL's lastPathComponent()'s stringByDeletingPathExtension()
set newName to oldName's stringByAppendingString:"-2.pdf"
set destURL to anNSURL's URLByDeletingLastPathComponent()'s URLByAppendingPathComponent:newName
-- open doc and count pages
set theDoc to current application's PDFDocument's alloc()'s initWithURL:anNSURL
set theCount to theDoc's pageCount() as integer
-- create list to hold results
set findInfo to {}
-- loop through pages
repeat with i from 1 to theCount
-- get page and its text
set thePage to (theDoc's pageAtIndex:(i - 1))
set theText to thePage's |string|()
-- do search
set theRanges to (theRegex's matchesInString:theText options:0 range:{0, theText's |length|()})
-- create list to hold contents of matches
set pageFinds to {}
-- loop through matches found
repeat with aFind in theRanges
-- get the range and start and finish indexes
set foundRange to aFind's range()
set theStart to foundRange's location
set theEnd to theStart + (foundRange's |length|) - 1
-- get the matched text
set end of pageFinds to (theText's substringWithRange:foundRange) as text
-- get bounds of the matched text
set startBounds to (thePage's characterBoundsAtIndex:theStart)
set endBounds to (thePage's characterBoundsAtIndex:theEnd)
-- combine to get word bounds; assuming they're on a single line
set wordBounds to current application's NSUnionRect(startBounds, endBounds)
-- make highlight annotation
set theHighlight to (current application's PDFAnnotationMarkup's alloc()'s initWithBounds:wordBounds)
(theHighlight's setMarkupType:(current application's kPDFMarkupTypeHighlight))
-- add to page
(thePage's addAnnotation:theHighlight)
end repeat
-- update result
set end of findInfo to {pageNumber:i, foundStrings:pageFinds}
end repeat
-- save new PDF
theDoc's writeToURL:destURL
-- return result
return findInfo
 
 
-- 
Shane Stanley <email@hidden>
<www.macosxautomation.com/applescript/apps/>, <latenightsw.com>
 
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: collectdata
      • From: Yvan KOENIG <email@hidden>
References: 
 >collectdata (From: Julien Battist <email@hidden>)
 >Re: collectdata (From: Shane Stanley <email@hidden>)
 >Re: collectdata (From: Thomas Fischer <email@hidden>)
 >Re: collectdata (From: Julien Battist <email@hidden>)
 >Re: collectdata (From: Yvan KOENIG <email@hidden>)
 >Re: collectdata (From: Christopher Stone <email@hidden>)
 >Re: collectdata (From: Thomas Fischer <email@hidden>)
 >Re: collectdata (From: Julien Battist <email@hidden>)
 >Re: collectdata (From: Shane Stanley <email@hidden>)
 >Re: collectdata (From: Thomas Fischer <email@hidden>)

  • Prev by Date: Re: other folder actions ?
  • Next by Date: Re: other folder actions ?
  • Previous by thread: Re: collectdata
  • Next by thread: Re: collectdata
  • Index(es):
    • Date
    • Thread