• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode versus Utf8
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode versus Utf8


  • Subject: Re: Unicode versus Utf8
  • From: "Mark J. Reed" <email@hidden>
  • Date: Sat, 4 Jul 2009 18:46:44 -0400

No he didn't. There is a definite numeric difference between a Unicode
scalar value and its representation in bytes via UTF-8.  In HTML, and
XML with numeric entities, you can represent U+2019 as &#x2019; - no
conversion required (though converting to decimal is more portable).
But in a URL you have to first convert to UTF-8 bytes and then use
%-encoding on those byte values.

do shell script "perl -MURI::Escape=uri_escape_utf8 -le 'print
uri_escape_utf8(chr(0x2019))'"
=> â‚‘

On 7/4/09, Matt Neuburg <email@hidden> wrote:
> On Sat, 4 Jul 2009 19:32:38 +0200, Yvan KOENIG <email@hidden> said:
>>Hello
>>
>>Is there a way to get, with a script, the Utf8 code of a given
>>Unicode character ?
>>
>>Example:
>>
>>Unicode: 2019
>>Utf8: E28099
>>
>>Both of them are used in the Index.xml files describing the contents
>>of Pages documents.
>>So, it's difficult to identifie the bookmark to which an internal
>>link is pointing to.
>>
>>The Bookmark descriptor uses Unicode number:  (&#x2019;)
>><sf:p sf:style="paragraph-style-32">
>>             <sf:bookmark sf:name="Pages &#x2019;06" sf:ranged="true"
>>sf:page="5">Pages &#x2019;06</sf:bookmark>
>>             <sf:br/>
>>           </sf:p>
>>
>>The link descriptor uses Utf8 code. (’)
>>
>><sf:p sf:style="paragraph-style-32"> <sf:link href="#Pages â€%
>>9906"><sf:span sf:style="SFWPCharacterStyle-7">a link</sf:span></
>>sf:link><sf:insertion-point/><sf:br/></sf:p>
>
> You've misrepresented the problem. There isn't any conversion between
> numbers going on here; you just want to know an equivalence between two
> string representations, one that uses XML entities and the other that uses
> URL-escaping. The freeware scriptable application UnicodeChecker knows all
> about those:
>
> tell application "UnicodeChecker"
>     get escaped representation of (deXHTMLized representation of "&#x2019;")
>     -- "’"
> end tell
>
> m.
>
> --
> matt neuburg, phd = email@hidden, <http://www.tidbits.com/matt/>
> A fool + a tool + an autorelease pool = cool!
> AppleScript: the Definitive Guide - Second Edition!
> http://www.tidbits.com/matt/default.html#applescriptthings
>
>
>
>  _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> AppleScript-Users mailing list      (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> Archives: http://lists.apple.com/archives/applescript-users
>
> This email sent to email@hidden
>

--
Sent from my mobile device

Mark J. Reed <email@hidden>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: Unicode versus Utf8
      • From: "Mark J. Reed" <email@hidden>
    • Re: Unicode versus Utf8
      • From: Philip Aker <email@hidden>
References: 
 >Re: Unicode versus Utf8 (From: Matt Neuburg <email@hidden>)

  • Prev by Date: Re: Unicode versus Utf8
  • Next by Date: Re: Unicode versus Utf8
  • Previous by thread: Re: Unicode versus Utf8
  • Next by thread: Re: Unicode versus Utf8
  • Index(es):
    • Date
    • Thread