Re: Swift: How to determine if a Character represents whitespace?
Re: Swift: How to determine if a Character represents whitespace?
- Subject: Re: Swift: How to determine if a Character represents whitespace?
- From: Charles Jenkins <email@hidden>
- Date: Thu, 02 Apr 2015 07:54:38 -0400
I kept my original question as brief as I could, but let me tell you what problem I’m trying to solve, and maybe someone will have good advice I haven’t yet considered.
I’m trying to code in pure Swift. I have an NSAttributedString which can potentially be very large, and I want to save off the attributedSubstringFromRange: which represents the string with leading and trailing whitespace trimmed. I’m trying to avoid copying the giant string merely to determine the proper substring range for copying it again.
Swift has a built-in func stringByTrimmingCharactersInSet(set: NSCharacterSet) -> String which won’t help me because using it would copy the string and discard the attributes. Even using it for length-testing wouldn’t work, because I have no way to know how many characters were trimmed off the head versus the tail of the string.
What would be nice is a way to count leading and trailing characters in place while the thing is still an NSAttributedString--without using NSAttributedString.string to convert to a Swift string in the first place. If there were no conversion to the unicode-compliant and amazingly difficult-to-do-anything-with-it Swift string, I’d be more confident that the shrunken range I calculate would be apples to apples.
--
Charles
On April 2, 2015 at 01:25:40, Quincey Morris (email@hidden) wrote:
On Apr 1, 2015, at 21:17 , Charles Jenkins <email@hidden> wrote:
for ch in String(char).utf16 {
if !set.characterIsMember(ch) { found = false }
}
Except that this code can’t possibly be right, in general.
1. A ‘unichar’ is a UTF-16 code value, but it’s not a Unicode code point. Some UTF-16 code values have no meaning as “characters” by themselves. I think you could mitigate this problem by using ‘longCharacterIsMember’, which takes a UTF-32 code value instead (and enumerating the string as UTF-32 instead of UTF-16).
2. A Swift ‘Character’ isn’t a Unicode code point, but rather a grapheme. That is, it might be a sequence of code points (and I mean code points, not code values). It might be such a sequence either because there’s no way of representing the grapheme by a single code point, or because it’s a composed character made up of a base code points and some combining characters.
In this case, you can’t validly test the individual code points for membership of the character set.
I’m not sure, but I suspect the underlying obstacle is that NSCharacterSet is at best a set of code points, and you cannot test a grapheme for membership of a set of code points.
In your particular application, if it’s true that all** Unicode whitespace characters are represented as a single code point (via a single UTF-32 code value), or a single UTF-16 code value, then you can get away with one of the above solutions. Otherwise you’re going to need a more complex solution, that doesn’t involve NSCharacterSet at all.
** Or at least the ones you happen to care about, but ignoring the others may be a perilous proceeding.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden