• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode/UTF confusion
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode/UTF confusion


  • Subject: Re: Unicode/UTF confusion
  • From: John Delacour <email@hidden>
  • Date: Sat, 3 May 2003 02:09:49 +0100
  • Mac-eudora-version: 6.0a16

At 3:53 pm -0700 2/5/03, Christopher Nebel wrote:

That's proper UTF-16. (It doesn't have a byte-order mark, but that's not required.) I'm not sure who wrote that note, but the second sentence is very misleading. There's no support for storing data within AppleScript itself in UTF-8, so saying "s as <<class utf8>>" has no real effect. However, you can make it write UTF-8 to a file by saying this:

write s to theFile as <<class utf8>>


A good demonstration of what happens is below. If people experiment with characters in the 00xx 01xx range, the visible result returned is confusing because the first character of the UCS-2 (as I prefer to call it) is invisible. It's even more confusing since a conversion is made for the display.

Here you see exactly what happens to the apple character - a character well out of the invisible range - in its various transformations. The Mac lozenge (#215) will also serve as an example.


set fU to "/tmp/x"
set f to POSIX file fU as file specification
set _apple to (ASCII character 240) as Unicode text
open for access f with write permission
set eof f to 0
write _apple to f
close access f
set _UCS2 to read f
set _UCS2bytes to {}
repeat with c in characters of _UCS2
set end of _UCS2bytes to ASCII number c
end repeat
open for access f with write permission
set eof f to 0
write _apple to f as <<class utf8>> -- TYPE THE PROPER THINGS <SIGH>
close access f
set _UTF8 to read f
set _UTF8_bytes to {}
repeat with c in characters of _UTF8
set end of _UTF8_bytes to ASCII number c
end repeat
{_apple, "UCS-2=>", _UCS2, _UCS2bytes, "UTF-8=>", _UTF8, _UTF8_bytes}


JD
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Unicode/UTF confusion
      • From: John Delacour <email@hidden>
References: 
 >Re: Unicode/UTF confusion (From: Christopher Nebel <email@hidden>)

  • Prev by Date: Re: mac to unix path names
  • Next by Date: Re: Unicode/UTF confusion
  • Previous by thread: Re: Unicode/UTF confusion
  • Next by thread: Re: Unicode/UTF confusion
  • Index(es):
    • Date
    • Thread