Re: converting text input in any encoding to unicode
Re: converting text input in any encoding to unicode
- Subject: Re: converting text input in any encoding to unicode
- From: Douglas Davidson <email@hidden>
- Date: Mon, 28 Apr 2003 10:05:20 -0700
On Monday, April 28, 2003, at 6:52 AM, Ben Dougall wrote:
i think i need to look at the textedit app example that's in devtools
- see how that deals with finding out, or not finding out as the case
may be, which encoding it's opening.
The general procedure used by Cocoa is to look for a BOM of one variety
or another, and if none is found then to fall back to the default C
string encoding. This encoding would be Mac Roman for an English
system, possibly something else for an Asian-localized system; the idea
is to have the greatest chance of correctly interpreting existing files
generated by previous systems, and to require the user to override for
anything else.
There are other things that could be done; for example, some formats
such as XML and occasionally HTML have internal indications of
encoding. A sufficiently sophisticated system might be able to make a
reasonable educated guess about the encoding in use for a file, by
examining a sufficient portion of its data, but this would never be
more than a guess, and would be unlikely to be able to cover more than
a small handful of encodings.
Douglas Davidson
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.