Re: handling Unicode text
Re: handling Unicode text
- Subject: Re: handling Unicode text
- From: Christopher Nebel <email@hidden>
- Date: Fri, 16 Jan 2004 00:45:44 -0800
On Jan 15, 2004, at 11:27 PM, Donald Hall wrote:
I updated my file reading handler as you suggested and got the
Japanese characters to show in the Script Editor result window.
However, in a dialog box they show up as question marks. Is there some
way to have them show up correctly in the dialog box?
Sounds like you're still using Jaguar. "display dialog" was updated to
display Unicode in Panther. (Though not to enter it in the input field
-- separate bug, I'm afraid.)
Also, I noticed one thing that is inconsistent in how AppleScript
handles text data (Unicode or MacOSRoman) read in as you suggested.
(here "theText" contains a list of the paragraphs of the file):
set x to every text item of item 1 of theText
gives the expected result, list of the text items in the first
paragraph. However the following gives an extra empty list member:
set x to every text item of paragraph 1 of (theText as Unicode text)
-- or 'as text' in the MacOSRoman case
For example if I have a file created in TextEdit containing one line:
abc<tab>def<tab>ghi<return>
in the first case I get {"abc","def","ghi"}, and in the second case I
get {"abc","def","ghi", ""}
I think it's behaving fine, and the combination of a list named
"theText" and AppleScript's handling of lf/cr/etc as a paragraph
*separator*, not terminator, has got you confused. Presumably, the
complete script looks something like this:
set AppleScript's text item delimiters to tab
set rawText to (read file "test.txt" as Unicode text)
set theText to every paragraph of rawText
set x to every text item of paragraph 1 of (theText as Unicode text)
Just to keep things simple, let's pretend we're working with your abc
file from above:
set rawText to (read file "test.txt" as Unicode text)
--> rawText is "abc<tab>def<tab>ghi<return>" (Linefeed, really, but
never mind that.)
set theText to every paragraph of rawText
--> theText is {"abc<tab>def<tab>ghi", ""}
See that extra item on the end, there? AppleScript treats return
(etc.) as a *separator* of paragraphs, so it actually sees two
paragraphs, not just one -- the second one simply has no text in it.
...theText as Unicode text...
--> returns "abc<tab>def<tab>ghi<tab>"
...because the text item delimiters are still set to tab, so the two
items are glued together with a tab; the result is a trailing tab
because the second item was empty...
set x to every text item of paragraph 1 of the result
"paragraph 1" returns the same string (no returns left now), and "every
text item" of that sees an empty text item at the end thanks to that
trailing tab. Clear now?
--Chris Nebel
AppleScript Engineering
P.S.: Periodically, I consider suppressing that empty last paragraph.
I'm not sure if that would help things or not. On the one hand, it
would probably cause fewer surprises. On the other, it makes the rule
more complicated.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.