Re: handling Unicode text
Re: handling Unicode text
- Subject: Re: handling Unicode text
- From: Donald Hall <email@hidden>
- Date: Fri, 16 Jan 2004 00:27:49 -0700
Thanks for the information, Chris.
I updated my file reading handler as you suggested and got the
Japanese characters to show in the Script Editor result window.
However, in a dialog box they show up as question marks. Is there
some way to have them show up correctly in the dialog box?
Also, I noticed one thing that is inconsistent in how AppleScript
handles text data (Unicode or MacOSRoman) read in as you suggested.
(here "theText" contains a list of the paragraphs of the file):
set x to every text item of item 1 of theText
gives the expected result, list of the text items in the first
paragraph. However the following gives an extra empty list member:
set x to every text item of paragraph 1 of (theText as Unicode text)
-- or 'as text' in the MacOSRoman case
For example if I have a file created in TextEdit containing one line:
abc<tab>def<tab>ghi<return>
in the first case I get {"abc","def","ghi"}, and in the second case I
get {"abc","def","ghi", ""}
Perhaps I am missing something obvious? The last item is really empty
- it doesn't seem to contain the new line character. ('every
character of the last item of x' is an empty list)
Thanks,
Don
At 10:00 PM -0800 2004/01/10, email@hidden wrote:
>
On Jan 9, 2004, at 11:41 PM, Donald Hall wrote:
>
>
> Does AppleScript handle Unicode text that has no ASCII equivalent
>
> (e.g. Japanese characters)? I get the contents of a file using the
>
> following script:
>
>
>
> ...
>
> set theData to (read dataFileRef from 1 to eof as {Unicode text}
>
> using delimiter {return,
>
> linefeed})
>
> ...
>
>
If you're not using Panther, that's your problem there. There's a bug
>
prior to Panther that using "using delimiter", "before", or "until"
>
will force the file contents to be interpreted as plain text, not
>
whatever you specified. You can upgrade to Panther, or read without
>
the delimiter and break it apart inside AppleScript, like this:
>
>
set theRawData to (read dataFileRef as Unicode text)
>
set theData to every paragraph of theRawData
>
-- breaks on CR, LF, CRLF, PARASEP, and LINESEP.
>
>
Notice that you don't need "from 1 to eof" -- that's the default -- and
>
that the "as" type should *not* be in a list. The only reason that's
>
still supported is because it appeared in an example in the old
>
Scripting Additions Guide.
>
>
Once you read Unicode text, you may or may not be able to see it
>
correctly in the result window. AppleScript pushes everything through
>
styled text internally to display it, so Unicode-only characters such
>
as Arabic or Thai will be mangled. Japanese should be fine, though.
>
>
>
--Chris Nebel
>
AppleScript Engineering
--
Donald S. Hall, Ph.D.
Apps & More Software Design, Inc.
email@hidden
http://www.appsandmore.com
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.