Re: Optimizing Character Matching
Re: Optimizing Character Matching
- Subject: Re: Optimizing Character Matching
- From: "Gary (Lists)" <email@hidden>
- Date: Fri, 03 Nov 2006 21:49:54 -0500
"Rick Gordon" wrote:
> I've created an InDesign script to parse paragraph styles for the purpose of
> creating XML-compatible tag names, deleting illegal characters or unneeded
> styles along the way.
>
> My question concerns how I might optimize the script...
Perhaps you're using the word 'optimize' to mean something more like
'redesign'. [1]
> ...(the red-highlighted areas)...
Er? Do not rely on other readers having the same formatting as you. I
receive this as plain text and therefore am only confused by
"red-highlighted areas".
> ... so that I have more flexibility in modifying my search elements -- like
> eliminating any spaces or punctuation.
This is really the redesign issue that I think you are asking about, since
you have lots of hard-coded bits in your script. More on that below, after
the rest of your question.
> It would seem as though I could create some kind of match to the contents of a
> string, like
>
> if myString contains any character in " <>()-_**~" then
> ...
> end if
>
> ... but that generates an error. Suggestions are appreciated.
Well, you've got the right idea, but you're using the wrong data type.
In human terms, what you wish to ask is:
Is this character in a list of known bad characters?
So, just use a list of known bad characters, and compare against that using
"is in".
Here is a non-contextual example:
--
set bad_chars to {"<", ">", "(", ")", "-", "_", "*", "*", "~"}
set this_char to ")"
this_char is in bad_chars
--> true
--
That answers the specific question, but whether or not this is the best
choice to get you the "flexibility in modifying search elements" depends on
how you might decide to redesign/reduce your script.
I think your script is a bit bulky and perhaps hard to maintain. You
basically know this, hence your question. Perhaps the "is in [some list]"
construct will go a long way in helping to reduce the script.
But, you might get even flexibility (and optimization for speed) if you use
a tool that is designed for the job already.
Sticking with AppleScript only, I would suggest that you emply the free
Satimage.osax, which has a very robust 'change' command, which can operate
on strings or on file objects (affecting their contents on disk).
The 'change' command provides the flexibility of using regular expressions
(compiled or not) or strings for its search argument. It is very fast, OMM.
You will also get rid of this kind of mishegas:
-- from your script
set tempName to (characters 1 thru (i - 1) of tempName as text) &
(characters (i + 1) thru -1 of tempName as text)
--
and replace it with something like:
--
change "<" into "" in some_string
--
This will directly reduce all of your "pStyle" stuff below, which where your
code gets a bit thick. The fact that the default install of AS components
doesn't provide much string manipulation (hence the wealth of "tids"
scripts) is not a problem. That's why there are scripting additions (and
some good AS libraries) which extend the language.
Using a combination of lists (to hold your dis-allowed characters), you
could easily maintain code-edited sets or edit as needed.
If you go ahead and get Smile instead of just downloading the Satimage.osax,
then you can easily wrap your script up in custom UI window, so you can
easily gain more flexibility with a full interface.
Maybe you would want to:
- select a pre-saved set of "bad chars" (for: XML, URL, Job Code, &c.)
- select a file target or a folder full
- keep a log of changed files (and even changes [*])
- and so on.
The Satimage.osax 'find text' command (which you would also then have) does
much like 'change', except it does not actually "change" anything. It
returns a record of the position of matches, as in:
Result : record -- {matchLen: length of the match, matchPos: offset of
the match, matchResult: the matching string (possibly formatted according to
the "using" parameter)}
In this way, I could envision that you could keep a record of
character-by-character changes made to a document, if that were needed for
any purpose.
Anyway, you have some fine options to improve your script with either
Satimage.osax or Smile or both.
> The (almost) full script is below:
> ----
>
> tell application "Adobe InDesign CS2"
> tell active document
> set tempList to {}
> set nameList to name of every paragraph style
>
> repeat with eachItem in nameList
> tell contents of eachItem
> set pStyleName to contents of eachItem as text
> tell pStyleName to set tempName to it
>
> set pStyleName to tempName
> tell pStyleName
> if it contains " " or it contains "-" or it contains "_" or it
> contains "<" or it contains ">" or it contains "(" or it contains ")" or it
> contains "*" or it contains "*" or it contains "~" then
> set tempName to pStyleName
> set charCount to count characters in pStyleName
> repeat with i from charCount to 1 by -1
> tell pStyleName
> if character i is in " <>()-_**~" then
> try
> set tempName to (characters 1 thru (i - 1) of tempName as
> text) & (characters (i + 1) thru -1 of tempName as text)
> end try
> end if
> end tell
> end repeat
> end if
> end tell
>
> set pStyleName to tempName
> tell character 1 of pStyleName
> if it contains " " or it contains "-" or it contains "_" or it
> contains "<" or it contains ">" or it contains "(" or it contains ")" or it
> contains "*" or it contains "*" or it contains "~" then
> set tempName to (characters 2 thru -1 of pStyleName as text)
> end if
> end tell
> copy tempName to end of tempList
> end tell
> end repeat
>
> tell tempList
> set listLength to length of it
> set tagList to {}
> repeat with i from 1 to listLength
> if item i does not contain "[" or does not contain "last" or item i does
> not contain "first" or item i does not contain "Normal" then
> copy item i to end of tagList
> end if
> end repeat
> end tell
>
> return tagList
> end tell
> end tell
> --
--
Gary
[1] For historical (i.e., 2 weeks ago) reference regarding 'optimization'
(in techie parlance).
> * Subject: Re: Dumb performance question: tell blocks
> * From: Christopher Nebel <email@hidden>
> * Date: Tue, 17 Oct 2006 11:20:19 -0700
> [...] And don't forget: don't bother trying to optimize something until
> you're dissatisfied with how fast it runs. [2]
>
> --Chris Nebel
> AppleScript Engineering
>
> [...]
>
> [2] See <http://en.wikipedia.org/wiki/Software_optimization#Quotes>.
> (the red-highlighted areas) so that I have more flexibility in modifying my
> search elements
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/mailman//archives/applescript-users
This email sent to email@hidden