Re: Unicode versus Utf8
Re: Unicode versus Utf8
- Subject: Re: Unicode versus Utf8
- From: "Mark J. Reed" <email@hidden>
- Date: Sun, 5 Jul 2009 09:32:35 -0400
To be clear:, Matt is correct that we're just talking about two
different representations of Unicode text, and Unicode Checker is a
fine solution if you're free to install extra software. But I don't
think iit was fair to say Yvan was mischaracterizing the problem, when
conversion between different numeric values is definitely involved.
And Philip: the win here is not so much Perl but the large set of
useful modules that come preinstalled with it.
On 7/4/09, Mark J. Reed <email@hidden> wrote:
> No he didn't. There is a definite numeric difference between a Unicode
> scalar value and its representation in bytes via UTF-8. In HTML, and
> XML with numeric entities, you can represent U+2019 as ’ - no
> conversion required (though converting to decimal is more portable).
> But in a URL you have to first convert to UTF-8 bytes and then use
> %-encoding on those byte values.
>
> do shell script "perl -MURI::Escape=uri_escape_utf8 -le 'print
> uri_escape_utf8(chr(0x2019))'"
> => â‚‘
>
> On 7/4/09, Matt Neuburg <email@hidden> wrote:
>> On Sat, 4 Jul 2009 19:32:38 +0200, Yvan KOENIG <email@hidden> said:
>>>Hello
>>>
>>>Is there a way to get, with a script, the Utf8 code of a given
>>>Unicode character ?
>>>
>>>Example:
>>>
>>>Unicode: 2019
>>>Utf8: E28099
>>>
>>>Both of them are used in the Index.xml files describing the contents
>>>of Pages documents.
>>>So, it's difficult to identifie the bookmark to which an internal
>>>link is pointing to.
>>>
>>>The Bookmark descriptor uses Unicode number: (’)
>>><sf:p sf:style="paragraph-style-32">
>>> <sf:bookmark sf:name="Pages ’06" sf:ranged="true"
>>>sf:page="5">Pages ’06</sf:bookmark>
>>> <sf:br/>
>>> </sf:p>
>>>
>>>The link descriptor uses Utf8 code. (’)
>>>
>>><sf:p sf:style="paragraph-style-32"> <sf:link href="#Pages â€%
>>>9906"><sf:span sf:style="SFWPCharacterStyle-7">a link</sf:span></
>>>sf:link><sf:insertion-point/><sf:br/></sf:p>
>>
>> You've misrepresented the problem. There isn't any conversion between
>> numbers going on here; you just want to know an equivalence between two
>> string representations, one that uses XML entities and the other that
>> uses
>> URL-escaping. The freeware scriptable application UnicodeChecker knows
>> all
>> about those:
>>
>> tell application "UnicodeChecker"
>> get escaped representation of (deXHTMLized representation of
>> "’")
>> -- "’"
>> end tell
>>
>> m.
>>
>> --
>> matt neuburg, phd = email@hidden, <http://www.tidbits.com/matt/>
>> A fool + a tool + an autorelease pool = cool!
>> AppleScript: the Definitive Guide - Second Edition!
>> http://www.tidbits.com/matt/default.html#applescriptthings
>>
>>
>>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> AppleScript-Users mailing list (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>> Archives: http://lists.apple.com/archives/applescript-users
>>
>> This email sent to email@hidden
>>
>
> --
> Sent from my mobile device
>
> Mark J. Reed <email@hidden>
>
--
Sent from my mobile device
Mark J. Reed <email@hidden>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden