Re: Using Flex/Lex in a Cocoa project
Re: Using Flex/Lex in a Cocoa project
- Subject: Re: Using Flex/Lex in a Cocoa project
- From: Ricky Sharp <email@hidden>
- Date: Mon, 18 Aug 2008 15:55:46 -0500
On Aug 18, 2008, at 3:40 PM, mm w wrote:
to avoid the splitting problem
(c < 128) ? "%c" : "\\ux", c);
I'm not sure what this solves.
Per Michael's e-mail below, this is indeed a difficult problem. UTF-8
is just a particular scheme to store Unicode strings. Operating on
individual bytes in such streams will most likely not make any sense.
What I would do is pick some normalized form and operate on that
data. For a recent feature at my day job, we normalized all input CSV
files to UTF-16BE. We were able to handle all of our customer data so
far. The final solution still isn't 100% Unicode-savvy (e.g. it does
crap-out with surrogate pairs), but we have unit tests to expose/
document such limitations. And, customer data doesn't yet have such
things.
On Sat, Aug 16, 2008 at 7:43 AM, Michael Ash <email@hidden>
wrote:
- It's very difficult to split UTF-8 strings correctly. If you
encounter a run of non-ASCII characters, ensure that you follow that
run through the end, until you get back to ASCII. Don't have a regex
that stops in the middle of it and then expects your code to be able
to do something useful with it.
___________________________________________________________
Ricky A. Sharp mailto:email@hidden
Instant Interactive(tm) http://www.instantinteractive.com
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden