Re: AM/PM letter UNICODE issues
Re: AM/PM letter UNICODE issues
- Subject: Re: AM/PM letter UNICODE issues
- From: Quincey Morris <email@hidden>
- Date: Mon, 18 Oct 2010 11:47:34 -0700
On Oct 18, 2010, at 10:19, Alex Kac wrote:
> What we are trying to do:
> Shorten the AM/PM to just the first character in Western Languages so that a time is shown as "1:30a".
>
> NSDateFormatter* formatter = [[NSDateFormatter alloc] init];
> NSString* am = [[[formatter AMSymbol] substringToIndex:1] lowercaseString];
> NSString* pm = [[[formatter PMSymbol] substringToIndex:1] lowercaseString];
>
>
> This works in Western languages just fine. However in languages like Korean it does not work giving a random character seemingly. From reading on this list over time I believe its because I'm just getting one part of a multi-part character (I'm no good with unicode terms sorry).
>
> My guess is I need to use rangeOfComposedCharacterSequenceAtIndex and then get the range and use a substring with that range. But I'm not sure since my knowledge here is pretty limited.
This description seems pretty good (and short):
http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/stringsClusters.html
Basically, there are several nested levels of complexity:
1. UTF-16 units (which are the 16 bit values that are indexed by NSString's '...AtIndex:' methods)
2. Unicode code points (which are UTF-16 units or surrogate pairs of UTF-16 units)
3. Composed characters (such as accented characters) made up of pairs of Unicode code points
4. Grapheme clusters, which are sequences of Unicode code points representing things that are written as a single unit (in some sense, depending on the language)
5. Related character sequences (I don't know there's an official name for this) such as German 'ß' and 'SS' that figure into algorithms for sorting and case changing.
According to the above-linked page, #3 and #4 aren't really different.
Also according to the above-linked page, 'rangeOfComposedCharacterSequenceAtIndex:' does sound like the method to use.
It's not obvious that taking the first grapheme is going to be semantically meaningful in every language (for example, if the English abbreviations happened to be MA and MP, taking the first grapheme wouldn't help you -- the assumption that the first character distinguishes the time range is not necessarily valid across all languages), but at least it's not going to give you an unrelated character.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden