Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: handling Unicode text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: handling Unicode text

Subject: Re: handling Unicode text
From: Donald Hall <email@hidden>
Date: Fri, 16 Jan 2004 00:27:49 -0700

Thanks for the information, Chris.

I updated my file reading handler as you suggested and got the
Japanese characters to show in the Script Editor result window.
However, in a dialog box they show up as question marks. Is there
some way to have them show up correctly in the dialog box?

Also, I noticed one thing that is inconsistent in how AppleScript
handles text data (Unicode or MacOSRoman) read in as you suggested.
(here "theText" contains a list of the paragraphs of the file):

set x to every text item of item 1 of theText

gives the expected result, list of the text items in the first
paragraph. However the following gives an extra empty list member:

set x to every text item of paragraph 1 of (theText as Unicode text)
-- or 'as text' in the MacOSRoman case

For example if I have a file created in TextEdit containing one line:

abc<tab>def<tab>ghi<return>

in the first case I get {"abc","def","ghi"}, and in the second case I
get {"abc","def","ghi", ""}

Perhaps I am missing something obvious? The last item is really empty
- it doesn't seem to contain the new line character. ('every
character of the last item of x' is an empty list)

Thanks,

Don

At 10:00 PM -0800 2004/01/10, email@hidden wrote:
>On Jan 9, 2004, at 11:41 PM, Donald Hall wrote:
>
>> Does AppleScript handle Unicode text that has no ASCII equivalent
>> (e.g. Japanese characters)? I get the contents of a file using the
>> following script:
>>
>> ...
>> set theData to (read dataFileRef from 1 to eof as {Unicode text}
>> using delimiter {return,
> > linefeed})
>> ...
>
>If you're not using Panther, that's your problem there. There's a bug
>prior to Panther that using "using delimiter", "before", or "until"
>will force the file contents to be interpreted as plain text, not
>whatever you specified. You can upgrade to Panther, or read without
>the delimiter and break it apart inside AppleScript, like this:
>
> set theRawData to (read dataFileRef as Unicode text)
> set theData to every paragraph of theRawData
> -- breaks on CR, LF, CRLF, PARASEP, and LINESEP.
>
>Notice that you don't need "from 1 to eof" -- that's the default -- and
>that the "as" type should *not* be in a list. The only reason that's
>still supported is because it appeared in an example in the old
>Scripting Additions Guide.
>
>Once you read Unicode text, you may or may not be able to see it
>correctly in the result window. AppleScript pushes everything through
>styled text internally to display it, so Unicode-only characters such
>as Arabic or Thai will be mangled. Japanese should be fine, though.
>
>
>--Chris Nebel
>AppleScript Engineering

--
Donald S. Hall, Ph.D.
Apps & More Software Design, Inc.
email@hidden
http://www.appsandmore.com
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: handling Unicode text
  - From: Christopher Nebel <email@hidden>

Prev by Date: Re: Address Book: Delete Person
Next by Date: mail.app scripting
Previous by thread: Re: handling Unicode text
Next by thread: Re: handling Unicode text
Index(es):
- Date
- Thread