Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: AM/PM letter UNICODE issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AM/PM letter UNICODE issues

Subject: Re: AM/PM letter UNICODE issues
From: Quincey Morris <email@hidden>
Date: Mon, 18 Oct 2010 11:47:34 -0700

On Oct 18, 2010, at 10:19, Alex Kac wrote:

> What we are trying to do:
> Shorten the AM/PM to just the first character in Western Languages so that a time is shown as "1:30a".
>
> 	NSDateFormatter* formatter = [[NSDateFormatter alloc] init];
> 	NSString* am = [[[formatter AMSymbol] substringToIndex:1] lowercaseString];
> 	NSString* pm = [[[formatter PMSymbol] substringToIndex:1] lowercaseString];
>
>
> This works in Western languages just fine. However in languages like Korean it does not work giving a random character seemingly. From reading on this list over time I believe its because I'm just getting one part of a multi-part character (I'm no good with unicode terms sorry).
>
> My guess is I need to use rangeOfComposedCharacterSequenceAtIndex and then get the range and use a substring with that range. But I'm not sure since my knowledge here is pretty limited.

This description seems pretty good (and short):

	http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/stringsClusters.html

Basically, there are several nested levels of complexity:

1. UTF-16 units (which are the 16 bit values that are indexed by NSString's '...AtIndex:' methods)

2. Unicode code points (which are UTF-16 units or surrogate pairs of UTF-16 units)

3. Composed characters (such as accented characters) made up of pairs of Unicode code points

4. Grapheme clusters, which are sequences of Unicode code points representing things that are written as a single unit (in some sense, depending on the language)

5. Related character sequences (I don't know there's an official name for this) such as German 'ß' and 'SS' that figure into algorithms for sorting and case changing.

According to the above-linked page, #3 and #4 aren't really different.

Also according to the above-linked page, 'rangeOfComposedCharacterSequenceAtIndex:' does sound like the method to use.

It's not obvious that taking the first grapheme is going to be semantically meaningful in every language (for example, if the English abbreviations happened to be MA and MP, taking the first grapheme wouldn't help you -- the assumption that the first character distinguishes the time range is not necessarily valid across all languages), but at least it's not going to give you an unrelated character.

_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: AM/PM letter UNICODE issues
  - From: Alex Kac <email@hidden>

References:
	>AM/PM letter UNICODE issues (From: Alex Kac <email@hidden>)

Prev by Date: Re: AM/PM letter UNICODE issues
Next by Date: Re: AM/PM letter UNICODE issues
Previous by thread: Re: AM/PM letter UNICODE issues
Next by thread: Re: AM/PM letter UNICODE issues
Index(es):
- Date
- Thread