Re: Data to String: what encoding?
Re: Data to String: what encoding?
- Subject: Re: Data to String: what encoding?
- From: Ondra Cada <email@hidden>
- Date: Mon, 16 Sep 2002 14:50:34 +0200
On Monday, September 16, 2002, at 02:13 , Randall Crenshaw wrote:
Um, ok - now I'm really confused. Just what is an
'encoding' anyway? I have been assuming that an encoding
is something like ASCII where ('A' == 0x0101) except that
in some other encoding, ('A' == 0x01000101 ) or something
like that.
Right.
(Byte values not intended to be accurate.) So,
as a pure bytestream, there would be no internal clues, but
if you say "this is text" there should be some inherent
characteristics of the bytestream.
Wrong. "This is text" means "this pure bytestream is to be decoded into
text using *an* ancoding". Alas, unless it's UTF16 with its 0xfeff leading,
there's no clue *which* one.
The way TextEdit uses is extremely simple: "I don't care". The user
selects the encoding, and if the displayed string does not look well, he
can freely try again. In a *vast* majority of cases this is quite all
right, at the worst slightly annoying.
If it is not sufficient, well, you are in for a bunch of heuristics. The
best you can do to choose encoding properly programmatically is to try and
check whether the result is a proper text (which is why I've mentioned the
spellchecker). Again, although you can get a very good results this way,
you'd *NEVER* get 100% guarantee that the heuristic selected well -- which
means that you still have to provide a way for the user to override the
automatically chosen encoding.
For example, if I read the file from disk into an NSString,
... you have to specify the encoding first. If you don't do that
explicitly -- using eg. [NSString stringWithContentsOfFile:] -- no magic,
a more or less fixed default encoding is used.
I am not entirely sure how the default econding is specified now; it used
to be put into a plain file in Foundation.framework/Resources, but that is
not true anymore. I guess now the default encoding is somehow linked to
the script chosen in the International pref panel, but I don't know the
details, and if it is documented, I don't know where.
I can then convert to NSData using -fastestEncoding. This
would appear to solve the problem
It does not solve anything. If you want to read in an NSData and then make
a string from them simulating the -stringWithContentsOfFile: behaviour,
use stringWith
Data:encoding:[NSString defaultCStringEncoding].
Sorry - I'm sure it's apparent I'm at the edge of my
empirical understanding of things. Any books that cover
this stuff?
Exactly this question is so plain that it does not need a book. As for
text processing in Cocoa, well, we planned a pretty comprehensive one with
Simson Garfinkel, but O'Reilly did not want it -- and I have no contacts
to other publishers. Oops.
Incidentally, Douglas (or anyone who knows) -- is it in 10.2 possible to
add one's own encodings (without dirty hacks of course -- with a bundle
which patches all the encoding related methods in Cocoa it was possible
for ages)? Thanks,
---
Ondra Cada
OCSoftware: email@hidden
http://www.ocs.cz
private email@hidden
http://www.ocs.cz/oc
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.