Re: Normalize an NSAttributedString
Re: Normalize an NSAttributedString
- Subject: Re: Normalize an NSAttributedString
- From: Ross Carter <email@hidden>
- Date: Sat, 29 Aug 2009 12:46:58 -0400
On Aug 26, 2009, at 1:21 PM, Ken Thomases wrote:
On Aug 26, 2009, at 10:43 AM, Michael Ash wrote:
On Wed, Aug 26, 2009 at 5:42 AM, Ken Thomases<email@hidden>
wrote:
On Aug 25, 2009, at 7:21 PM, Ross Carter wrote:
I haven't tried it, but this should work:
NSAttributedString* original = whatever;
NSMutableAttributedString* normalized = [[original
mutableCopy]
autorelease];
CFMutableStringRef str = (CFMutableStringRef)[original
mutableString];
CFStringNormalize(str, kCFStringNormalizationFormD);
This works because -[NSMutableAttributedString mutableString] is
a proxy
that automatically fixes up the attribute runs held by its owner.
Hmm, this seems dangerous in the sense that the conversion may be
lossy. As
far as I can see, there's no guarantee that CFStringNormalize will
perform
minimal replacements. If it does not, then whole ranges of
characters may
have their attributes reset to that of the first replaced character.
Even if testing reveals it to be non-lossy under one testing
environment,
without a guarantee that might differ under any other testing
environment.
http://en.wikipedia.org/wiki/Unicode_equivalence
[... quote snipped ...]
I'm well aware of what it means. The question is, which exact
operations on the mutable string proxy does CFStringNormalize
perform. If CFStringNormalize performs the minimal replace
operations to get the result, then it will preserve the attributes
closely. It's conceivable, though, that CFStringNormalize uses a
side buffer to compute the normalized form and then does one big
replace of the whole mutable string's range. Or, anywhere in
between. Like, it might replace a series of precomposed characters
with their decompositions all with one replace operation. In that
case, the attributes of most of the characters will be lost
(replaced with the attributes of the first character in the replace
range).
So, it's clear that the _strings_ will always have a deterministic
value as a result of normalization. That's the point of
normalization. But the _attributed strings_ may not.
Also, it should be self-evident that normalizing to a precomposed
form will
obliterate attribute differences between a base character and any
combining
characters, as discussed elsewhere in this thread.
Good thing he went and normalized to a *de*composed form then,
isn't it?
Martin's example used Form D, but Ross never quite said that's what
he was normalizing to. He might have been adapting Martin's example
but using a different form.
Regards,
Ken
Just to make it clear what my situation is:
Suppose an NSAttributedString comprises the string o + umlaut in
decomposed form, plus one attribute. Its length is 2, and the range of
an attribute is {0, 2}. The string and its attribute are archived
separately as xml data like this:
<string>รถ</string>
<attrName>NSFontAttributeName</attrName>
<attrValue location='0', length='2'>Helvetica 12.0</attrValue>
If, during unarchiving, the string is represented by an NSString
object in precomposed form, its length will be 1, and an attempt to
apply the attribute range of {0, 2} will fail.
From the discussion here it seems to me that the only safe approach
is to normalize the NSAttributedString to Form D before archiving
(using Martin's approach), and when unarchiving to normalize the
unarchived NSString to form D before combining it with the attribute
to make an NSAttributedString.
Ross_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden