• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: What's in a word
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What's in a word


  • Subject: Re: What's in a word
  • From: Christopher Stone <email@hidden>
  • Date: Wed, 17 Oct 2012 14:04:28 -0500

On Oct 17, 2012, at 12:03, John McClintock <email@hidden> wrote:
In the following, why is word -1 of a_url "jpg" and word -1 of the_url the whole file name?
______________________________________________________________________

Hey John,

I confirm.  It happens on my U.S. 10.8.2 system with 'Word-Break' set to 'Standard' in the language and text prefs.

That's really freaky...  I see no discernible differentiator in your sample.

From the Applescript Language Guide - Words:

"A continuous series of characters, with word elements parsed according to the word-break rules set in the International preference pane.

Because the rules for parsing words are thus under user control, your scripts should not count on a deterministic text parsing of words."

Long ago I discovered that 'word' was an unpredictable quantity in Applescript, so I stay away from it unless I know it will work for me in a certain situation (and then I still test it).

Generally I stick with TIDS and regular expressions, where I have precise control:

set AppleScript's text item delimiters to {"_", "-", "."}
set url1 to "http://stereo-ssc.nascom.nasa.gov/browse/2009/08/02/ahead/euvi/195/1024/20090802_001530_n7euA_195.jpg"
set _suffix to last text item of url1

--> "jpg"

# Satimage.osax
# Note that \w (word character) considers the underscore to be one.
set foundText to find text "\\w+" in url1 regexp true ¬
case sensitive false ¬
whole word true ¬
all occurrences true ¬
with string result

--> 
{
"http", 
"stereo", 
"ssc", 
"nascom", 
"nasa", 
"gov", 
"browse", 
"2009", 
"08", 
"02", 
"ahead", 
"euvi", 
"195", 
"1024", 
"20090802_001530_n7euA_195", 
"jpg"
}

# Satimage.osax
set foundText to find text "[[:alnum:]]+" in url1 regexp true ¬
case sensitive false ¬
whole word true ¬
all occurrences true ¬
with string result

--> 
{
"http", 
"stereo", 
"ssc", 
"nascom", 
"nasa", 
"gov", 
"browse", 
"2009", 
"08", 
"02", 
"ahead", 
"euvi", 
"195", 
"1024", 
"20090802", 
"001530", 
"n7euA", 
"195", 
"jpg"
}

set url1 to "http://stereo-ssc.nascom.nasa.gov/browse/2009/08/02/ahead/euvi/195/1024/20090802_001530_n7euA_195.jpg"

Satimage.osax
set foundText to find text "\\b.+?\\b" in url1 regexp true ¬
case sensitive false ¬
all occurrences true ¬
with string result

--> 
{
"http", 
"://", 
"stereo", 
"-", 
"ssc", 
".", 
"nascom", 
".", 
"nasa", 
".", 
"gov", 
"/", 
"browse", 
"/", 
"2009", 
"/", 
"08", 
"/", 
"02", 
"/", 
"ahead", 
"/", 
"euvi", 
"/", 
"195", 
"/", 
"1024", 
"/", 
"20090802_001530_n7euA_195", 
".", 
"jpg"
}

set AppleScript's text item delimiters to ""
foundText as text

--> "http://stereo-ssc.nascom.nasa.gov/browse/2009/08/02/ahead/euvi/195/1024/20090802_001530_n7euA_195.jpg"

Note that this one reassembles verbatim, whereas the other two do not.

--
Best Regards,
Chris

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

References: 
 >What's in a word (From: John McClintock <email@hidden>)

  • Prev by Date: Re: What's in a word
  • Next by Date: rép: What's in a word
  • Previous by thread: Re: What's in a word
  • Next by thread: Re: What's in a word
  • Index(es):
    • Date
    • Thread