Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Unicode and languages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and languages

Subject: Re: Unicode and languages
From: email@hidden
Date: Thu, 8 Apr 2004 18:49:12 +0900

Why exactly do you want to do this? Part of the point of Unicode is that all languages are treated equally -- you just preserve everything. I suppose you could check for the presence of Japanese characters, but their presence doesn't guarantee that the text is Japanese (some strings are valid Japanese and Chinese), nor does their absence guarantee that the text is English. (Exactly how you'd do this in the first place is an interesting question, seeing as how there's no "Unicode number" command.)

If the only two possible options for his text are japanese or english, it is likely that all cjk characters will be japanese and not chinese or korean. besides, english is written in roman alphabet and the ascii character set that covers it is placed in a similar top position in unicode. so I suppose it would not be too hard to figure out that any string that can be mapped to ascii is english while any string that can't be is japanese. Another possibility would be to check the characters one by one since all cjk characters have a defined position in unicode (a unique code :). Any string in the CJK "plane" could be considered as CJK. Since Japanese makes _very_ little use of alphabetic characters, it is very likely that only CJK plane members would be Japanese.

Of course, implementing that in AS is a different matter...

If you want to know if the text can be losslessly encoded in MacRoman vs. MacJapanese, then that's a different question; in that case I'd suggest attempting to pipe the text through iconv(1) -c and seeing if it errors.

How would you segment the string before conversion ? By doing that you only know that if any given string converted to MacRoman fails then it must contain something else somewhere, but you can't determine where, can you ?

JC Helary
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

References:
	>Unicode and languages (From: Hanaan Rosenthal <email@hidden>)
	>Re: Unicode and languages (From: Christopher Nebel <email@hidden>)

Prev by Date: Re: produce list of differences between two folders?
Next by Date: Re: which webbrowser (apart from safari) is very scriptable
Previous by thread: Re: Unicode and languages
Next by thread: Re: Unicode and languages
Index(es):
- Date
- Thread