Re: words in Unicode text
Re: words in Unicode text
- Subject: Re: words in Unicode text
- From: John W Baxter <email@hidden>
- Date: Sun, 16 Mar 2003 10:46:00 -0800
- Envelope-to: email@hidden
At 10:23 +0100 3/16/2003, julifos wrote:
>
I forgot it ;-)
>
>
These are my results (snagged character was "is not equal to"):
>
>
#############
>
set x to "-rw-r--r--"
>
set y to x as Unicode text
>
>
words of x --> {"rw", "r", "r"}
>
words of y --> {"rw-r", "r"}
>
#############
>
>
OS 10.2.4 (international, spanish), AS 1.9.1
I think the key to the difference in what you see and what several others
see lies here. One of the things which can differ in a non-English Mac OS
(X or before) compared with the US English version is the definition of a
"word."
One explanation of your result above would be that when working with ASCII,
the Spanish Mac OS X treats - as a word break, but when working with
Unicode, - is a normal word character but -- is a word break according to
Spanish Mac OS X.
Note that in Script Debugger, with the result window set to Source Language
and with Pretty Print checked, I see a subtle difference in the result from
the ASCII and the Unicode form:
set x to "-rw-r--r--"
set y to x as Unicode text
set wx to words of x
set wy to words of y
{wx, wy}
Result window shows:
{
{
"rw-r",
"r"
},
{
"rw-r",
"r"
}
}
The subtle difference doesn't who in this plain text message, but the wy
value is displayed in a different font than is the wx value.
>
>
I found this issue with text returned from a shell script, which returns
>
Unicode text by default (which is a sh*t for me).
>
>
Actually, I'm trying to retrieve the file size of a given file, and now I
>
don't know if I should handle a possible error:
>
>
#############
>
word 6 of (do shell script "ls -l path/to/file")
>
#############
>
>
.... Where I should ask for "word 7" (?)
I don't think you should, unfortunately, and that conclusion isn't related
to English vs Spanish Mac OS X. That is, I don't think word is the
appropriate thing to use when parsing the whole directory entry. After
all, the flags for a given file might be
-rw-r--r-- (as above, and very common)
-rw-rw-r--
-rw-------
and so on.
One thing we do know is that the flags are 10 characters long...so let's
get rid of them, using something like (given
-rw------- 1 john staff 4068 Aug 16 2002 .tcsh_history
in my home directory)
set direntry to do shell script "ls -l ~/.tcsh_history"
set entryNoFlags to text 11 through -1 of direntry
set filesize to word 4 of entryNoFlags
display dialog filesize
(correctly displayed 4068).
Yes...that can be collapsed into one line, and I might if *all* I want is
the size...but currently I don't know that. And even if the file size is
all I want, I might not fully collapse the code.
There is still a chance of a problem, if Mac OS X allows word break
characters in the short username or the group name, so setting the text
item delimiters to {" "} and using text item rather than word might be
safer. But then you have to worry about empty text items and differences
due to different numbers of columns used by some of the data elements.
Continuing the example,
text items of direntry
produces
{
"-rw-------",
"",
"1",
"john",
"",
"staff",
"",
"4068",
"Aug",
"16",
"",
"2002",
"/Users/john/.tcsh_history"
}
but might not have the empty second item if the link count (third item)
were sufficiently large, and might not have the next two empty items if the
names were longer. So you would probably have to strip out the empty items
before grabbing what would then be the 5th text item. I think I'd rather
write the code for sensible user and group names.
--John
--
John W. Baxter Port Ludlow, WA USA
Why does the computer industry speak of "input/output"? Depending on
viewpoint, it should be either intake/output or input/outtake.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.