• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode versus Utf8
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode versus Utf8


  • Subject: Re: Unicode versus Utf8
  • From: "Mark J. Reed" <email@hidden>
  • Date: Sun, 5 Jul 2009 09:32:35 -0400

To be clear:, Matt is correct that we're just talking about two
different representations of Unicode text, and Unicode Checker is a
fine solution if you're free to install extra software.  But I don't
think iit was fair to say Yvan was mischaracterizing the problem, when
conversion between different numeric values is definitely involved.

And Philip: the win here is not so much Perl but the large set of
useful modules that come preinstalled with it.

On 7/4/09, Mark J. Reed <email@hidden> wrote:
> No he didn't. There is a definite numeric difference between a Unicode
> scalar value and its representation in bytes via UTF-8.  In HTML, and
> XML with numeric entities, you can represent U+2019 as &#x2019; - no
> conversion required (though converting to decimal is more portable).
> But in a URL you have to first convert to UTF-8 bytes and then use
> %-encoding on those byte values.
>
> do shell script "perl -MURI::Escape=uri_escape_utf8 -le 'print
> uri_escape_utf8(chr(0x2019))'"
> => â‚‘
>
> On 7/4/09, Matt Neuburg <email@hidden> wrote:
>> On Sat, 4 Jul 2009 19:32:38 +0200, Yvan KOENIG <email@hidden> said:
>>>Hello
>>>
>>>Is there a way to get, with a script, the Utf8 code of a given
>>>Unicode character ?
>>>
>>>Example:
>>>
>>>Unicode: 2019
>>>Utf8: E28099
>>>
>>>Both of them are used in the Index.xml files describing the contents
>>>of Pages documents.
>>>So, it's difficult to identifie the bookmark to which an internal
>>>link is pointing to.
>>>
>>>The Bookmark descriptor uses Unicode number:  (&#x2019;)
>>><sf:p sf:style="paragraph-style-32">
>>>             <sf:bookmark sf:name="Pages &#x2019;06" sf:ranged="true"
>>>sf:page="5">Pages &#x2019;06</sf:bookmark>
>>>             <sf:br/>
>>>           </sf:p>
>>>
>>>The link descriptor uses Utf8 code. (’)
>>>
>>><sf:p sf:style="paragraph-style-32"> <sf:link href="#Pages â€%
>>>9906"><sf:span sf:style="SFWPCharacterStyle-7">a link</sf:span></
>>>sf:link><sf:insertion-point/><sf:br/></sf:p>
>>
>> You've misrepresented the problem. There isn't any conversion between
>> numbers going on here; you just want to know an equivalence between two
>> string representations, one that uses XML entities and the other that
>> uses
>> URL-escaping. The freeware scriptable application UnicodeChecker knows
>> all
>> about those:
>>
>> tell application "UnicodeChecker"
>>     get escaped representation of (deXHTMLized representation of
>> "&#x2019;")
>>     -- "’"
>> end tell
>>
>> m.
>>
>> --
>> matt neuburg, phd = email@hidden, <http://www.tidbits.com/matt/>
>> A fool + a tool + an autorelease pool = cool!
>> AppleScript: the Definitive Guide - Second Edition!
>> http://www.tidbits.com/matt/default.html#applescriptthings
>>
>>
>>
>>  _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> AppleScript-Users mailing list      (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>> Archives: http://lists.apple.com/archives/applescript-users
>>
>> This email sent to email@hidden
>>
>
> --
> Sent from my mobile device
>
> Mark J. Reed <email@hidden>
>

--
Sent from my mobile device

Mark J. Reed <email@hidden>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

  • Follow-Ups:
    • Re: Unicode versus Utf8
      • From: Philip Aker <email@hidden>
References: 
 >Re: Unicode versus Utf8 (From: Matt Neuburg <email@hidden>)
 >Re: Unicode versus Utf8 (From: "Mark J. Reed" <email@hidden>)

  • Prev by Date: Re: Unicode versus Utf8
  • Next by Date: Re: Unicode versus Utf8
  • Previous by thread: Re: Unicode versus Utf8
  • Next by thread: Re: Unicode versus Utf8
  • Index(es):
    • Date
    • Thread