• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode versus Utf8
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode versus Utf8


  • Subject: Re: Unicode versus Utf8
  • From: Philip Aker <email@hidden>
  • Date: Sat, 4 Jul 2009 12:17:10 -0700

On 2009-07-04, at 10:32:38, Yvan KOENIG wrote:

Is there a way to get, with a script, the Utf8 code of a given Unicode character ?

Example:

Unicode: 2019
Utf8: E28099

Both of them are used in the Index.xml files describing the contents of Pages documents.
So, it's difficult to identifie the bookmark to which an internal link is pointing to.

The Bookmark descriptor uses Unicode number:  (&#x2019;)
<sf:p sf:style="paragraph-style-32">
           <sf:bookmark sf:name="Pages &#x2019;06" sf:ranged="true" sf:page="5">Pages &#x2019;06</sf:bookmark>
           <sf:br/>
         </sf:p>

The link descriptor uses Utf8 code. (’)

<sf:p sf:style="paragraph-style-32"> <sf:link href=""><sf:span sf:style="SFWPCharacterStyle-7">a link</sf:span></sf:link><sf:insertion-point/><sf:br/></sf:p>

The following are not the most compact solutions. I believe there are shorter ones available in Perl or Python using built-in commands. 

1. For AppleScript, you can write the Unicode character to a file (as «class utf8»), read it back in 1 byte at a time, use 'id of' or ASCII number and translate that to hex.

2. Using a Tcl script on disk, which I have called "utf8chars.tcl" and which looks like this:

#####

fconfigure stdin -translation binary;fconfigure stdin -encoding binary;
while {1} {
if {[eof stdin] == 1} then {
return;
}
set bytes [read -nonewline stdin];
binary scan $bytes H* res;
set length [string length $res];
for {set i 0} {$i < [expr $length - 1]} {incr i 2} {
puts -nonewline %[string toupper [string range $res $i [expr $i + 1]]];
}
}

#####

I call it from AppleScript. Here I use the character you mentioned:

set char to character id 8217
set tclfile to POSIX path of ((path to desktop folder as text) & "utf8chars.tcl")
do shell script "echo " & char & " | tclsh " & tclfile & " - "


Philip Aker
echo email@hidden@nl | tr a-z@. p-za-o.@

Democracy: Two wolves and a sheep voting on lunch.

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

References: 
 >Unicode versus Utf8 (From: Yvan KOENIG <email@hidden>)

  • Prev by Date: Unicode versus Utf8
  • Next by Date: Re: Unicode versus Utf8
  • Previous by thread: Unicode versus Utf8
  • Next by thread: Re: Unicode versus Utf8
  • Index(es):
    • Date
    • Thread