Re: Internationalized text
Re: Internationalized text
- Subject: Re: Internationalized text
- From: "Alastair J.Houghton" <email@hidden>
- Date: Mon, 29 Sep 2003 16:48:54 +0100
On Monday, September 29, 2003, at 03:47 pm, Darrin Cardani wrote:
I am writing an application that will be used to produce text in
multiple languages. It's a translation tool of sorts, so documents
will likely have text in many different languages within them. Some of
the text the user enters may end up in menus in the interface, too.
For example, they may view their document in it's original English
text. Then they can choose another language that they've translated it
into, and view it in that language.
So I'm left with a couple of questions.
1) What is the best (cross-platform) way to store the data on disk?
What information do I need to make sure that when the document is
opened on another computer, it is still legible?
HTML wouldn't be a bad choice, as it can represent any Unicode
character without straying outside of the simple ASCII character set.
Also, most platforms have web browsers that could be used to view the
result, and many word processors can import HTML.
UTF8, UCS2 and UTF16 all have problems of one sort or another, although
if your text is mostly ASCII, then UTF8 is a good choice.
2) What is the best way to put multi-lingual data into interface
elements? For example, if the user has English and Greek versions of
their document, I would want my popup menu to have the word "English"
(in Roman letters), and the word "Ellinika" in Greek letters in the
menu, probably. Can that be done? I was planning on allowing the user
to enter the name of the languages they will be translating to and
from, so the popup menu could theoretically have words in dozens of
languages and scripts in it.
You can use any character you like in the Cocoa UI AFAIK. Cocoa uses
NSStrings for just about everything, and they support Unicode, as well
as a number of other coding systems.
3) What internal data types (again, cross-platform preferred) should
be used for keeping around the data the user enters?
IMO, UTF8 strings are a good choice. UTF8 is very simple and can
encode the entire Unicode code space; it can also be stored in ordinary
C strings, is quite compact, and represents ASCII characters as ASCII.
The only problem with it is that one Unicode character may not be a
single octet of UTF8; if that is an issue, you can either use UCS2
(16-bit characters, but cannot access the entire code space), or UCS4
(32-bit characters, uses lots of space).
Unfortunately, you can't easily use the C functions (wctombs, mbtowcs
et al.) that were supposed to support this area, because wchar_t varies
in size (most platforms use 32-bit wchar_t, but Windows uses 16-bits),
and the ANSI C standard was defined before Unicode, so the neither the
wide character nor the multibyte functions define which coding
system(s) they support.
Kind regards,
Alastair.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.