Re: Using Flex/Lex in a Cocoa project
Re: Using Flex/Lex in a Cocoa project
- Subject: Re: Using Flex/Lex in a Cocoa project
- From: "mm w" <email@hidden>
- Date: Mon, 18 Aug 2008 14:06:43 -0700
if you knew flex you could understand
On Mon, Aug 18, 2008 at 1:55 PM, Ricky Sharp <email@hidden> wrote:
>
> On Aug 18, 2008, at 3:40 PM, mm w wrote:
>
>> to avoid the splitting problem
>>
>> (c < 128) ? "%c" : "\\ux", c);
>
> I'm not sure what this solves.
>
> Per Michael's e-mail below, this is indeed a difficult problem. UTF-8 is
> just a particular scheme to store Unicode strings. Operating on individual
> bytes in such streams will most likely not make any sense.
>
> What I would do is pick some normalized form and operate on that data. For
> a recent feature at my day job, we normalized all input CSV files to
> UTF-16BE. We were able to handle all of our customer data so far. The
> final solution still isn't 100% Unicode-savvy (e.g. it does crap-out with
> surrogate pairs), but we have unit tests to expose/document such
> limitations. And, customer data doesn't yet have such things.
>
>
>> On Sat, Aug 16, 2008 at 7:43 AM, Michael Ash <email@hidden>
>> wrote:
>>>
>>> - It's very difficult to split UTF-8 strings correctly. If you
>>> encounter a run of non-ASCII characters, ensure that you follow that
>>> run through the end, until you get back to ASCII. Don't have a regex
>>> that stops in the middle of it and then expects your code to be able
>>> to do something useful with it.
>>>
>
> ___________________________________________________________
> Ricky A. Sharp mailto:email@hidden
> Instant Interactive(tm) http://www.instantinteractive.com
>
>
>
>
--
-mmw
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden