Re: Reversing a String
Re: Reversing a String
- Subject: Re: Reversing a String
- From: "Michael Ash" <email@hidden>
- Date: Wed, 31 Dec 2008 11:52:59 -0500
On Wed, Dec 31, 2008 at 11:29 AM, Dave DeLong <email@hidden> wrote:
> Ironic... This question came up in a job interview I had a couple weeks ago.
> The following NSString category will work to reverse a string, and in my
> limited tests, it works with accents, mathematical symbols, and Korean
> characters:
>
> - (NSString *) stringByReversingSelf {
> NSMutableString * r = [NSMutableString stringWithString:self];
> NSUInteger len = [r length];
> NSUInteger mid = floor(len/2);
> for(int i = 0; i < mid; i++) {
> NSRange fr = NSMakeRange(i,1);
> NSRange lr = NSMakeRange(len-i-1,1);
> NSString * f = [r substringWithRange:fr];
> NSString * l = [r substringWithRange:lr];
> [r replaceCharactersInRange:fr withString:l];
> [r replaceCharactersInRange:lr withString:f];
> }
> return r;
> }
>
> Here's the output of my extremely limited tests: (attached as a .png
> screenshot so that the encoding doesn't get messed up)
Nope, doesn't work. It works on your test data because your test data
doesn't contain any multi-character units.
The fundamental error that everyone is making here is in assuming that
a unichar is a single indivisible unit that can be tossed around at
will. But it doesn't work that way. Sometimes you have multiple
unichars next to each other in a grouping which must be preserved.
Try your method on a string which contains abcde\u0301f where the
\u0301 is actually unicode code point 0301 COMBINING ACUTE ACCENT. It
starts out with abcdef with an acute accent on the e. After passing
through your method, it ends up with the accent on the f!
Or try it with a string that contains 1D11E MUSICAL SYMBOL G CLEF.
This is a single code point which requires two unichars to represent,
because unichar is only 16-bits and so NSString is implicitly UTF-16.
After passing through your method, the two unichars which make up this
single character get reversed which produces an invalid sequence, and
the resulting string can't be printed.
Here's code which works properly with those problems:
- (NSString *)stringByReversingSelf {
NSMutableString *me = [NSMutableString stringWithString:self];
NSMutableString *result = [NSMutableString string];
while([me length]) {
NSRange range = [me rangeOfComposedCharacterSequenceAtIndex:0];
[result insertString:[me substringWithRange:range] atIndex:0];
[me deleteCharactersInRange:range];
}
return result;
}
The key is the usage of -rangeOfComposedCharacterSequenceAtIndex:.
Without calling this method or doing the equivalent to what it does,
your code will suffer the problems I described above.
I tested that code with the string @"abcdéf𝄞g" (that's an accented e
using a combining diacritical before the f, and the aforementioned
musical note at the end) and it worked as expected.
I don't guarantee that my code will work on everything. Unicode is
weird enough and covers enough weird languages that there are probably
situations where this will still fail. But it covers most of the
tricky bits, and at least will always produce valid unicode output.
Mike
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden