Re: How to speed up execution time of this script
Re: How to speed up execution time of this script
- Subject: Re: How to speed up execution time of this script
- From: "Mark J. Reed" <email@hidden>
- Date: Sun, 15 Aug 2010 13:10:43 -0400
On Sat, Aug 14, 2010 at 6:11 PM, Bert Groeneveld
<email@hidden> wrote:
>
> 1) That "pattern" search you use, is that the same as the so called "grep" searching?
Not quite the same, no. The grep command accepts patterns in a syntax
based on formal computer science "regular expressions", and I could go
into a lot of history there, but suffice to say they're related to but
distinct from (and more powerful than) filename wildcards, which is
what the -name option to the find command expects.
However, the nice thing about UNIX commands is that you can chain them
together with "pipes" (the | character; "command1 | command2" sends
the output of command1 as the input to command2) to combine their
powers. So if you want to search for a pattern that's more complex
than what -name lets you specify, you can just ask find to print out
all the files, no matter their name, and then use grep itself to
winnow the list down to the lines you want:
to findMatchingFilesViaGrep(parentFolder, namePattern)
return paragraphs of (do shell script ¬
"find " & (quoted form of POSIX path of parentFolder) & " -print | grep ¬
& (quoted form of namePattern)))
end
There are some subtleties here. First, you're now matching the
pattern against the entire pathname, not just the filename. Second,
grep patterns are not anchored - "blah" matches any string containing
blah, not just strings exactly equal to blah. We could manipulate the
pattern inside the handler before passing it to grep to change the
defaults, but it's more flexible to leave it alone and give the caller
the full power of grep instead of restricting them.
> As an exercise I tried this without succes: I want to find all the files in a folder whose file name starts
> with at least 20 characters or digits (except spaces) followed by the articlenumber. I called the
> handler with the namePattern argument "[^ ]{20}" & articleNumber & "*"
That should work, but because of the fact we're matching a whole
pathname, you want to add slash to your list of excluded characters.
Then because grep requires backslashes for the curly-brace
quantifiers, the pattern looks like this:
"[^ ]\\{20\\}" & articleNumber & "[^/]*$"
That is, any sequence of 20 characters that aren't a space or a slash,
followed by the article number, followed by any sequence of any number
of characters (including none) not containing a slash, and then the
end of the string.
> Here's what I actually want: I want to get a list (full paths) of all the files in a certain folder (including
> those files in subfolders) whose file name (including the extension) contains one or more contiguous
> substrings of 20 characters or digits without spaces and dashes. Is that possible?
That would look like this:
"[^ /-]\\{20\\}[^/]*$"
And that's why people accuse regular expressions of being line noise.
:) Although the need to double the backslashes for AppleScript
doesn't help. Alternatively, if you replace the "grep" command in the
do shell script line with "egrep", you can drop those, and then it
looks like this:
"[^ /-]{20}[^/]*$"
which is at least a little better.
Any sequence of 20 or more non-space, non-slash, non-hyphen characters
followed by any sequence of non-slash characters at the end of the
string. The [...] syntax describes a set of characters to match; a
caret (^) as the first thing inside the [...] inverts the meaning so
it matches everything *but* those characters. A hyphen (-) creates a
range, so [a-z] matches any lowercase English letter. So to match a
literal hyphen (there's no backslash-escaping in character classes) it
has to be the first or last character inside the [...].
You might try http://www.regular-expressions.info/tutorial.html to
learn more about constructing regexes.
> 2) I really would like to know in plain english ;) what you're telling here: | sed -e 's,.*/,,' "
OK, the pipe character | between two Unix commands says "take the
output of the first command and feed it into the second command as
input". That's what we're using in the handler above. This command:
find /some/folder -print
gives you a list of all the files and directories under the starting
folder. This command:
grep "blah"
returns a list of all lines in its input matching "blah". Put them
together and you get a list of all files and directories under the
starting folder whose name contains "blah".
sed, the "stream editor", is a sort of in-line text editor. It takes
its input, applies a set of editing commands, and generates the
transformed text as output. The -e option tells it what edit commands
to apply, and s,old,new, is the search-and-replace operation (you can
replace the commas with any other character; most of the time people
use /, but since the pattern I'm working with contains a slash, I
switched to commas).
The pattern .*/ matches anything followed by a /. I'm taking
advantage of the fact that in sed, patterns match the longest string
possible. So given "/foo/bar/baz/zoo/wicky.txt", the pattern .*/
could match immediately before each of the five slashes, and at each
one of those points the match could consist of anywhere from 0
characters to the entire preceding string. But the longest possible
match is the one that starts at the beginning of the line and
continues all the way to the last slash, that is, "/foo/bar/baz/zoo/".
It replaces that with nothing, leaving only "wicky.txt". So, given a
list of pathnames, it returns a list of just the filenames, with the
leading directory paths removed.
> 3) Is it also possible to return the paths as Applescript references instead of posix paths?
> If not: Is this the smartest way to convert a posix path into an Applescript file reference?
> set myResult to "/Applescript/Giving/SpreadTool/HiRes_Images/3148333_G.psd" as POSIX file as string
You can just do 'POSIX file
"/Applescript/Giving/SpreadTool/HiRes_Images/3148333_G.psd" '
--
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden