Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Internationalized text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Internationalized text

Subject: Re: Internationalized text
From: "Alastair J.Houghton" <email@hidden>
Date: Mon, 29 Sep 2003 16:48:54 +0100

On Monday, September 29, 2003, at 03:47 pm, Darrin Cardani wrote:

I am writing an application that will be used to produce text in multiple languages. It's a translation tool of sorts, so documents will likely have text in many different languages within them. Some of the text the user enters may end up in menus in the interface, too. For example, they may view their document in it's original English text. Then they can choose another language that they've translated it into, and view it in that language.

So I'm left with a couple of questions.

1) What is the best (cross-platform) way to store the data on disk? What information do I need to make sure that when the document is opened on another computer, it is still legible?

HTML wouldn't be a bad choice, as it can represent any Unicode character without straying outside of the simple ASCII character set. Also, most platforms have web browsers that could be used to view the result, and many word processors can import HTML.

UTF8, UCS2 and UTF16 all have problems of one sort or another, although if your text is mostly ASCII, then UTF8 is a good choice.

2) What is the best way to put multi-lingual data into interface elements? For example, if the user has English and Greek versions of their document, I would want my popup menu to have the word "English" (in Roman letters), and the word "Ellinika" in Greek letters in the menu, probably. Can that be done? I was planning on allowing the user to enter the name of the languages they will be translating to and from, so the popup menu could theoretically have words in dozens of languages and scripts in it.

You can use any character you like in the Cocoa UI AFAIK. Cocoa uses NSStrings for just about everything, and they support Unicode, as well as a number of other coding systems.

3) What internal data types (again, cross-platform preferred) should be used for keeping around the data the user enters?

IMO, UTF8 strings are a good choice. UTF8 is very simple and can encode the entire Unicode code space; it can also be stored in ordinary C strings, is quite compact, and represents ASCII characters as ASCII. The only problem with it is that one Unicode character may not be a single octet of UTF8; if that is an issue, you can either use UCS2 (16-bit characters, but cannot access the entire code space), or UCS4 (32-bit characters, uses lots of space).

Unfortunately, you can't easily use the C functions (wctombs, mbtowcs et al.) that were supposed to support this area, because wchar_t varies in size (most platforms use 32-bit wchar_t, but Windows uses 16-bits), and the ANSI C standard was defined before Unicode, so the neither the wide character nor the multibyte functions define which coding system(s) they support.

Kind regards,

Alastair.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

References:
	>Internationalized text (From: Darrin Cardani <email@hidden>)

Prev by Date: Re: Internationalized text
Next by Date: Little question on kCFCoreFoundationVersionNumber
Previous by thread: Re: Internationalized text
Next by thread: Re: Internationalized text
Index(es):
- Date
- Thread