• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Data to String: what encoding?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Data to String: what encoding?


  • Subject: Re: Data to String: what encoding?
  • From: Ondra Cada <email@hidden>
  • Date: Mon, 16 Sep 2002 14:50:34 +0200

On Monday, September 16, 2002, at 02:13 , Randall Crenshaw wrote:

Um, ok - now I'm really confused. Just what is an
'encoding' anyway? I have been assuming that an encoding
is something like ASCII where ('A' == 0x0101) except that
in some other encoding, ('A' == 0x01000101 ) or something
like that.

Right.

(Byte values not intended to be accurate.) So,
as a pure bytestream, there would be no internal clues, but
if you say "this is text" there should be some inherent
characteristics of the bytestream.

Wrong. "This is text" means "this pure bytestream is to be decoded into text using *an* ancoding". Alas, unless it's UTF16 with its 0xfeff leading,
there's no clue *which* one.

The way TextEdit uses is extremely simple: "I don't care". The user selects the encoding, and if the displayed string does not look well, he can freely try again. In a *vast* majority of cases this is quite all right, at the worst slightly annoying.

If it is not sufficient, well, you are in for a bunch of heuristics. The best you can do to choose encoding properly programmatically is to try and check whether the result is a proper text (which is why I've mentioned the spellchecker). Again, although you can get a very good results this way, you'd *NEVER* get 100% guarantee that the heuristic selected well -- which means that you still have to provide a way for the user to override the automatically chosen encoding.

For example, if I read the file from disk into an NSString,

... you have to specify the encoding first. If you don't do that explicitly -- using eg. [NSString stringWithContentsOfFile:] -- no magic, a more or less fixed default encoding is used.

I am not entirely sure how the default econding is specified now; it used to be put into a plain file in Foundation.framework/Resources, but that is not true anymore. I guess now the default encoding is somehow linked to the script chosen in the International pref panel, but I don't know the details, and if it is documented, I don't know where.

I can then convert to NSData using -fastestEncoding. This
would appear to solve the problem

It does not solve anything. If you want to read in an NSData and then make a string from them simulating the -stringWithContentsOfFile: behaviour, use stringWithData:encoding:[NSString defaultCStringEncoding].

Sorry - I'm sure it's apparent I'm at the edge of my
empirical understanding of things. Any books that cover
this stuff?

Exactly this question is so plain that it does not need a book. As for text processing in Cocoa, well, we planned a pretty comprehensive one with Simson Garfinkel, but O'Reilly did not want it -- and I have no contacts to other publishers. Oops.

Incidentally, Douglas (or anyone who knows) -- is it in 10.2 possible to add one's own encodings (without dirty hacks of course -- with a bundle which patches all the encoding related methods in Cocoa it was possible for ages)? Thanks,
---
Ondra Cada
OCSoftware: email@hidden http://www.ocs.cz
private email@hidden http://www.ocs.cz/oc
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
References: 
 >Re: Data to String: what encoding? (From: Randall Crenshaw <email@hidden>)

  • Prev by Date: Re: Data to String: what encoding?
  • Next by Date: Re: Data to String: what encoding?
  • Previous by thread: Re: Data to String: what encoding?
  • Next by thread: Re: Data to String: what encoding?
  • Index(es):
    • Date
    • Thread