Re: Convert unicode string into ascii
Re: Convert unicode string into ascii
- Subject: Re: Convert unicode string into ascii
- From: Ricky Sharp <email@hidden>
- Date: Thu, 28 Aug 2008 15:58:10 -0500
On Aug 28, 2008, at 3:40 PM, Andrew Farmer wrote:
On 28 Aug 08, at 12:08, Ricky Sharp wrote:
Just to point this out, the sequence of ASCII may not be useful at
all if the file is say Unicode. The actual bytes making up each
char could be "ASCII" values themselves.
Unicode is a character set, not an encoding. I'm not sure about
UTF-16 or other stranger encodings, but I do know that any UTF-8
character below 0x80 corresponds directly to a single ASCII
character. This is a design feature of the encoding.
Yea, it wasn't clear what I wrote. I meant if the file _contains_
Unicode.
You're correct in the UTF-8 encoding preserves all ASCII-7 characters
as-is. When you get into the UTF-16, UTF-32 variants, individual
bytes are always in the range [0x00..0xFF]
So, CJK UNIFIED IDEOGRAPH-4142 stored in UTF-16BE (U+4142) will appear
as "AB".
I know the OP mentioned this is for a hex editor. I just looked at
what my copy of "0xED" does. It too reduces text to ASCII just how it
was described by the OP. The app offers a detail view which shows the
current selection of bytes as common data types. One type is a string
which will be interpreted by the user's choice of an encoding.
___________________________________________________________
Ricky A. Sharp mailto:email@hidden
Instant Interactive(tm) http://www.instantinteractive.com
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden