Ray, Page 1 of ? really is a text string. I can readily identify the pages that have it and put that page number in a list. In fact I do have a working script, but it seems to be a bit of a kludge. As Christian mentioned it also involves Acrobat. Here are the basic steps for the working script (tested on a small 500 page document):
Open pdf in skim get text of page, and look for "Page 1 of 1" If found add page number to single_pg_list if not add page number to multiple_pg_list close pdf and open in acrobat delete all pages in multiple_pg_list (in reverse order to avoid changing the following page numbers) save document as single.pdf Then do the same for the single_pg_list to get the multiple.pdf
Again this does work, however I need to address a couple of things. Some of these pdfs could be as much as 50,000 pages. So the page list is very long. Is there a limit to the size of an applescript list? Also this is very slow. The acrobat command to delete pages is delete pages reference first integer last integer since my list is of individual pages it deletes pages one at a time. This is very time consuming. In my testing deleting 250 pages from a 500 page document takes about 17 seconds on my machine. However, deleting a 250 page range takes less than a second. So my next problem is how do I take a list of page numbers and sort them into start and end page numbers?
If I have the following list set thelist to {1, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26, 27, 28, 29, 30} and I could sort them into two list, a starting page and and ending page, I could delete ranges when possible. So I would end up with: the_starting_pages {1, 4, 12, 22} the_ending_pages {1, 10, 19, 30}
Not sure how to do this, If I should do this, or even if I am barking up the wrong tree. Thanks for the help guys.
Paul
On Mar 21, 2011, at 6:57 PM, Ray Gonzalez wrote: Paul;
Can we look at this problem from a different view? I'm not sure that "Page 1 of ?" really represents text or strings in your document.
Aren't they really dynamic underlying code fragments which are constantly responding to changes the User makes in the total number of pages and which page has the focus?
It would seem that any attempt you make to grab certain pages, will immediately activate the paging code... and it will begin trying to adjust the very numbers your script depends on.
I'm not familiar with Adobe coding but somehow, there must be a way, to first convert those 4,000 pages of paging code to hard text. Then, and only then; whatever Search and Copy routine you desire should be straightforward.
|