Re: Getting Unicode Number
Re: Getting Unicode Number
- Subject: Re: Getting Unicode Number
- From: Emmanuel <email@hidden>
- Date: Sun, 6 Feb 2005 16:37:43 +0100
At 9:08 AM -0600 2/6/05, Joseph Weaks wrote:
Question 1:
Why is there a need to write a Unicode string to a temp file and then read
it when it is already in a variable?
Just because the way AppleScript writes to a file is a known
standard, while the way it stores things internally is not - it's
opaque, possibly varying from a version to the other, and even the
developer should not rely on that.
Question 2:
The above routine is said to duplicate ASCII character command for Unicode
(decimal I'm guessing) characters, but I'm not seeing that. For instance,
when ASCII character of x is passed a string of more than one character, it
returns the ASCII number of the first character, but in the above routine,
it seems to be adding up the decimial values of all the characters. So in
the above routine, I need to be sure to pass only one character at a time?
Yes, otherwise you get a (huge and) meaningless result.
Question 3:
I don't really understand what's going on with the meat of the script,
namely: "set n to (ASCII number of character 1 of ss) + 256 * n". Setting n
to zero and then setting it to the ascii number of the previous character is
not making sense to me. Could someone explain it?
'ASCII number of character n' is really a trick to get the numerical
value of the nth byte.
Question 4:
My goal is to convert characters into the XHTML entities. So, once I have
each decimal value, I can check to see if it's over 126, right? If it's over
that, then it needs to be encoded as an entity?
"it needs", I would not say that. It is very unlikely - yet, possible
- that the client for your XHTML files does not understand any
Unicode file format. If such is the case, then, yes, to be sure, make
all high HTML (over 127) into entities. But in most cases it's enough
to make a UTF-8 file, in which case:
- you don't need entities, just write the characters as they are
(except '&', '<', and '>'),
- use Smile's Unicode windows, you can check both the XML syntax
(Enter key) and the conformance to the XHTML DTD (cmd-Enter combo).
Question 5:
My routine will be working on a Unicode string on the clipboard, so is it
still necessary to write to a temp file? Any suggestions on how to make it
so?
If you really want to do that - which I still hardly believe - my
advice to make it as fast as possible would be to use Smile's "ufind
text", which lets you search for high-ascii. For some specific thing
I use the following:
repeat
set x to ufind text "[^\\u0001-\\u008F]" in w with regexp and
string result
set y to x
set n to UnicodeNumber(x)
set y to y & " (" & n & ")"
set text of w to (uchange x into "&#" & n & ";" in (get text of w))
end repeat
You might put "FFFF" instead of "008F", and change the references to
"w" (a window) into references to a string.
Emmanuel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden