• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: text encodings
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: text encodings


  • Subject: Re: text encodings
  • From: Dan Wood <email@hidden>
  • Date: Fri, 22 Nov 2002 07:59:53 -0800

we're currently writing a text-conversion plug-in for our app, and are quite unsure which encoding to choose. :)
we got plaintext to start with and would like to export these texts to unix-, mac-, windows-, unicode-format. now... these are our options:

You'll probably best off supporting all of the formats, and letting the user decide what they want. Everybody is going to have different needs; perhaps you could take the most common ones and present them first. (If you look in the documentation and header comments, you'll probably find the official names of these encodings.)


NSASCIIStringEncoding = 1,

This is just plain ASCII text, with no high-bit characters. If your text contains European accent marks like "risumi" or is in any language other than English, you don't want this unless your user needs the text to be 7-bit ASCII.


NSNEXTSTEPStringEncoding = 2,

This is an old encoding used by NeXT computers -- probably not in much demand now, but there anyhow.

NSJapaneseEUCStringEncoding = 3,
NSShiftJISStringEncoding = 8,

A couple of ways of encoding Japanese characters, useful if your text might be Japanese.

NSUTF8StringEncoding = 4,

VERY useful -- this encodes any of the 7-bit ASCII characters as a byte, and any other UNICODE characters get encoded over multiple bytes. This allows text to be fully unicode, but look like ASCII when it's just plain English. You can learn about this format with a bit of Web searching.

NSISOLatin1StringEncoding = 5,

This is a common way for European characters to get encoded as high-bit ASCII (values 128-255), and it's an international, cross-platform standard. Many web sites deliver their text in this format.


NSSymbolStringEncoding = 6,

This is a way for symbol characters to be encoded. Don't know many specifics on this.

NSNonLossyASCIIStringEncoding = 7,

AFAIK, this is sort of like 1... not too sure, I think it just enforces no high-bit characters.

NSISOLatin2StringEncoding = 9,

Another way of encoding European characters; I haven't run across this much.

NSUnicodeStringEncoding = 10,

This encodes all characters as two bytes each (for the most part); there's also special marker bytes at the beginning of the stream/file so that the program reading the text can figure out if it was encoded in Hi-endian or Lo-endian byte order. This will result in a smaller stream/file size if you are expecting lots of double-byte characters, since those take up more space in UTF8 encoding, and a larger stream size if you're pretty much using English characters, since you'd be using two bytes for each character that only needs one.

NSWindowsCP1251StringEncoding = 11,
NSWindowsCP1252StringEncoding = 12,
NSWindowsCP1253StringEncoding = 13,
NSWindowsCP1254StringEncoding = 14,
NSWindowsCP1250StringEncoding = 15,

Various Windows-specific encodings. 12 is the most common for English/European text that I've seen; most web sites that are hosted on a Windows machine tend to deliver their content in that format.

NSISO2022JPStringEncoding = 21,

Not sure off the top of my head, I think this is another Japanese encoding.

NSMacOSRomanStringEncoding = 30,

This is the default encoding for the Mac, to hold English/European characters. On the Mac, if you open a file and there is no way to guess the encoding, this is the encoding it will try.

NSProprietaryStringEncoding = 65536


This would be used if you had some other encoding.... haven't run across this in practical use.



----snip----

some seem obvious, but some just don't. :)

anybody care to shed some light on this?

or are we heading in a completely wrong direction?

I think you're probably doing the right thing. Take a look at how TextEdit works for opening files and saving files, that should give you and idea of what makes sense. If you are reading text that comes from an arbitrary source, you need to support as many encoding formats as possible. Internally, the text will be stored as unicode. Then, if you are writing it out, and the user might want to write it out in a different format, you should give them plenty of options.

Also, be sure to handle Unix, mac, and Windows end-of-line encodings.... \n, \r, and \r\n respectively. Somebody opening a file in Mac format and saving it in Windows format is going to want to have your program do the right thing....






--
Dan Wood
Karelia Software, LLC
email@hidden
http://www.karelia.com/
Watson for Mac OS X: http://www.karelia.com/watson/
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: text encodings
      • From: m <email@hidden>
References: 
 >text encodings (From: m <email@hidden>)

  • Prev by Date: OAToolbarWindowController.h and NSToolbarSeparatorItem
  • Next by Date: Text color of dragged rows in NSTableView
  • Previous by thread: text encodings
  • Next by thread: Re: text encodings
  • Index(es):
    • Date
    • Thread