Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Most efficient character parsing.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Most efficient character parsing.

Subject: Re: Most efficient character parsing.
From: Martin Fled <email@hidden>
Date: Thu, 16 Feb 2006 05:51:41 -0800 (PST)

"Jerry W. Walker" <email@hidden> wrote:

Hi, Chuck,

I'm having trouble parsing this last messagel. Can you explain?

Thanks,
Jerry

On Feb 15, 2006, at 6:22 PM, Chuck Hill wrote:

> And use any array of int (or boolean?) pre-populated with 1/0 at
> the index of the relevant character so you can do:
>
> if (allowable[bees[i]]) /*add character */
>
> instead of multiple if comparisons.
>
> And run tests to see if this is really faster.
>
> Chuck
>
>

I don't think it could be much more efficient - maybe more elegant here. Another mem lookup is needed for each character so probably it runs same or a little bit faster then comparisons (checked in my tests - max 10% gain).

> On Feb 15, 2006, at 2:40 PM, Arturo Pérez wrote:
>
>> Not really a WO question but here's what I'd do.
>>
>> byte [] bees = s.toBytes();
>> for (int i = 0; ....)
>>
>> I've done that in the past with great success.
>>
>> -arturo
>

You will have more success if you will use toCharArray then getBytes. First of all getBytes is depreciated, secondly it is highly inefficient compared to toCharArray. Strings internally holds 'value' attribute of type char[] - it takes less time to clone it then to create array of different type. (in my tests - it is 4x faster in average). Java is not good character cruncher at all, and possibly other specialized languages may get better performance here. The main point is that Strings are immuttable so you have to create a clone of it's internal value - change that and create new String (it's not the problem when you are using raw char[]). It's possible to make String's mutable using reflection like:

final Field valueField = String.class.getDeclaredField("value");
valueField.setAccessible(true);

and then use it like:

char[] bees = (char[]) valueField.get(s);

and after changes you can save s somewhere but be aware that the gains will be not big (about 15-20% - don't need to create new String instance or char[] clone, char[] clone creation is very fast indeed) and there will be other effects like wrong hash and etc. If you are changing only non ASCII values to some other values then it is not a big problem if you don't make any other operations befour save, but this can be a problem when you are throwing out those v! alues - because size of the array is smaller and you should then set other private field (count). Using reflection to such problems is tricky and I don't like it much.

>> On Feb 15, 2006, at 4:43 PM, Eric Stewart wrote:
>>
>>> I've got a WOApp that needs to deal with 200 strings of
>>> approximately
>>> 250-500 characters per string.
>>>
>>> I need to strip out all characters that are not ISO-8859-1 legal
>>> characters. So basically any character that is not decimal ascii
>>> character 9, 10, 13, 32-126, 160-255 need to be removed from the
>>> string. This process is happening roughly 2.5 million times a day
>>> and
>>> I'm trying to figure out what is the most efficient way to do it.
>>>
>>> Right n! ow I'm tearing the strings apart character-by-character and
>>> checking it's ascii decimal value against the values I know are
>>> good.
>>> Is there a more efficient way to do it?
>>>
>>> Thanks everyone,
>>>
>>> - Eric
>>> email@hidden

--

Martin Fleder

Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new and used cars.

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References:
	>Re: Most efficient character parsing. (From: "Jerry W. Walker" <email@hidden>)

Prev by Date: Re: Most efficient character parsing.
Next by Date: Re: Flattening attributes
Previous by thread: Re: Most efficient character parsing.
Next by thread: Re: Most efficient character parsing.
Index(es):
- Date
- Thread