Re: Normalize an NSAttributedString
Re: Normalize an NSAttributedString
- Subject: Re: Normalize an NSAttributedString
- From: Michael Ash <email@hidden>
- Date: Wed, 26 Aug 2009 11:43:57 -0400
On Wed, Aug 26, 2009 at 5:42 AM, Ken Thomases<email@hidden> wrote:
> On Aug 25, 2009, at 7:21 PM, Ross Carter wrote:
>
>>> I haven't tried it, but this should work:
>>>
>>> NSAttributedString* original = whatever;
>>> NSMutableAttributedString* normalized = [[original mutableCopy]
>>> autorelease];
>>> CFMutableStringRef str = (CFMutableStringRef)[original
>>> mutableString];
>>> CFStringNormalize(str, kCFStringNormalizationFormD);
>>>
>>> This works because -[NSMutableAttributedString mutableString] is a proxy
>>> that automatically fixes up the attribute runs held by its owner.
>>>
>>> ~Martin
>>>
>>
>> Brilliant! Works just like you said. Thanks, Martin.
>
> Hmm, this seems dangerous in the sense that the conversion may be lossy. As
> far as I can see, there's no guarantee that CFStringNormalize will perform
> minimal replacements. If it does not, then whole ranges of characters may
> have their attributes reset to that of the first replaced character.
>
> Even if testing reveals it to be non-lossy under one testing environment,
> without a guarantee that might differ under any other testing environment.
http://en.wikipedia.org/wiki/Unicode_equivalence
"Unicode provides standard normalization algorithms that produce a
unique (normal) code point sequence for all sequences that are
equivalent.... Both the composed and decomposed forms impose a
canonical ordering on the code point sequence, which is necessary for
the normal forms to be unique."
"All these forms impose the canonical order on the resulting sequence
to guarantee uniqueness of the result over the corresponding
equivalence class."
"NFD Normalization Form Canonical Decomposition Characters are
decomposed by canonical equivalence."
> Also, it should be self-evident that normalizing to a precomposed form will
> obliterate attribute differences between a base character and any combining
> characters, as discussed elsewhere in this thread.
Good thing he went and normalized to a *de*composed form then, isn't it?
Mike
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden