Re: converting text input in any encoding to unicode
Re: converting text input in any encoding to unicode
- Subject: Re: converting text input in any encoding to unicode
- From: Ben Dougall <email@hidden>
- Date: Sun, 27 Apr 2003 15:17:42 +0100
On Sunday, April 27, 2003, at 01:57 pm, Clark Cox III wrote:
On Sunday, April 27, 2003, at 07:32AM, Ben Dougall
<email@hidden> wrote:
what's the best / usual way from a cocoa app to read in text that's
potentially encoded with any encoding, in order to store it internally
in your app in decomposed unicode? i'd like to be able to deal with as
many encodings as possible - and convert them to the base decomposed
unicode format in order to compare different texts confidently.
In order to do that, you'd need to have some idea of what encoding the
text is in. You can try to discern some encodings, but others will be
impossible to differentiate just from looking at the text itself.
surely most (all?) text files not only contains which characters it
contains but which encoding they're in? i'd have thought that was a
standard requirement for text?
You can usually identify Unicode text via the BOM, and you can be
pretty sure that if the text does not contain any bytes that are
greater than 127, then it can be interpreted as ASCII. Other than
that, you'd some other hint as to the text's encoding.
unicode is one char encoding out of goodness knows how many. i guess
different text systems have different methods for indicating which char
encoding? html and xml indicate within the text itself which encoding
it's in. i'd have thought all other text formats also indicate which
enoding they're in, in one way or another - i guess 'in one way or
another' is a stumbling block maybe. but there must be an already
existing method to do that to a reasonable extent?
there's CFString stuff - is that the usual way to go about doing this?
On Sunday, April 27, 2003, at 02:26 pm, Jonathan Jackel wrote:
I'm not a unicode maven by any stretch, but NSString has methods for
creating decomposed strings.
doesn't some extra processing need to occur before that though? is that
not assuming that the input string is of some sort of unicode format in
the first place? or not, i don't know? what happens if the input string
was encoded is say greek iso 8859-7 for example, or iso latin 2? can
NSString deal with ascertaining which encoding the inputted string is
in, to be able to unicodify correctly? and any input string could be
any char encoding in the world.
also there's two issues there - reading the encoding from the input
text somehow, and converting to unicode using that encoding.
obviously unicode is a great way to be able to treat all text in the
same way - a base char encoding. but what about getting text into
unicode? can NSString do that?
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.