• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Most efficient character parsing.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Most efficient character parsing.


  • Subject: Re: Most efficient character parsing.
  • From: Anjo Krank <email@hidden>
  • Date: Mon, 20 Feb 2006 17:39:33 +0100


Am 20.02.2006 um 16:58 schrieb Eric Stewart:

On 2/20/06, Anjo Krank <email@hidden> wrote:
Uh, why don't you pre-compile the regex, pre-initialize the array
(next time probably with a loop?) and measure more iterations? After
all, a "0" millisecond result would suggest that the time needed is
not actually measurable, while times larger could come from - say -
the class loader, the garbage collector or whatever else? Or is this
an actual example of how you will call up your character conversion
routine?

The only point of the test was to see which was faster. I wanted to know which initialized faster. I also want to know which was faster on subsequent call after call.

So you need to test both seperately (init and run). The way you're doing it now, the init always runs. And you need to test with more iterations. And with longer data. And you can forget about the sb.append() stuff as an optimization (if it is meant as such, there's a lot of references on the web about it). And you're timer is probably much too coarse, you should use microseconds? These puny string comparisons shouldn't be *near* a millisecond with your data. Change your DA to run a loop of - say - 10k iterations and output this.


Oh, while you're at it, you could pre-init your stringbuffer with the
length of your result string, so it doesn't get re-allocated every
few calls, pull up the chararray[i] into a variable and also with an
explicit if(c == \n...) check.

Both of these are very good points. They should speed up the boolean array solution even more. Not sure what the "if(c == \n...)" is in reference to.

"\n". Use an explicit character comparison.

After all, you test data is so short, it probably takes three times
longer to pre-init the array and compile the pattern, than it takes
to look at each char.

The boolean array is a class level property and not a method level property so it should only be initialized once.

But it gets re-created every time you create a ISOLatin1CharacterUtilityArrayMap, which is with every string you test. So nothing gets cached.


But I really should
store the compiled regex as a class level property and not method
property as well. That is a glaring mistake and I will fix that and
run the test again.

Heed the advice given previously: don't care about this stuff until
you actually see that you have a problem...

This is a problem or I wouldn't have asked about and I would have taken several hours to write two different solutions. Then write a test and then share the results. I have one application that currently takes 3 XServes to handle the total load and I will be purchasing a fourth XServe in the next few days. So I'm re-examining all the code looking for ways to save time. This particular operation is happening roughly 600 million times a day so I'm looking to optimize it the best that I can.

Thank you for pointing those things out, I'm still fairly new to Java
and I'm still learning the most efficient ways to get things done.

Cheers, Anjo

Am 19.02.2006 um 23:45 schrieb Eric Stewart:

Okay, I took the advice on the board and choose to write a boolean
array and regex solution and tested the two head-to-head. The bottom
of the email contains both solution classes and the testing class (So
you can see exactly what I did).


So here's a brief description of the testing process. The test file
has 5 separate strings. Each string is run through the character
checker and the time taken to run is recorded in milliseconds. For
each round I built the application and started it. Then ran either the
boolean array or regex test three times successively. Then stop the
application, rebuilt it and restarted it and ran the other test. 10
rounds were run.


Here is how it went.

Round 1:

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 4,0,0,1,3,0,0,0,0,0,0,0,0,0,1
Total Time: 9

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,4,1,6,1,1,1,0,3,1,0,0,0,1,1
Total Time: 39


Round 2.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 5,0,1,0,3,0,1,0,0,0,0,0,0,0,0
Total Time: 10

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 18,5,2,3,0,0,1,0,2,1,1,1,0,1,0
Total Time: 35


Round 3.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 4,1,0,0,4,0,1,0,0,0,0,0,0,0,0
Total Time: 10

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,7,2,3,2,0,0,1,3,0,1,0,1,0,1
Total Time: 40


Round 4.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 4,1,0,0,4,0,0,0,0,0,0,0,0,0,0
Total Time: 9

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 20,4,1,6,1,0,1,0,3,0,1,1,0,1,0
Total Time: 39


Round 5.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 4,0,0,0,3,0,0,0,0,0,0,0,0,0,0
Total Time: 7

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,4,1,5,0,0,0,1,2,0,0,0,0,1,0
Total Time: 33


Round 6.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 5,0,0,1,3,0,0,0,0,0,0,0,0,0,1
Total Time: 10

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 36,5,2,4,0,0,0,1,2,1,1,1,0,1,0
Total Time: 54


Round 7.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 4,0,0,1,3,0,0,0,0,1,0,0,1,1,0
Total Time: 11

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,4,1,4,0,1,0,0,2,1,1,1.0,0,1
Total Time: 35


Round 8.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 5,1,0,0,4,0,0,0,0,0,0,0,0,0,0
Total Time: 10

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 21,5,1,4,1,1,0,0,3,0,1,1,0,1,0
Total Time: 39


Round 9.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 5,0,0,0,3,0,0,0,0,0,0,1,0,0,0
Total Time: 9

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 18,4,2,5,1,0,0,1,2,0,0,0,0,0,0
Total Time: 34


Round 10.

Built & ran application. Invoked boolean array direct action 3
times in a row.
Results in milliseconds: 4,0,0,0,4,0,0,0,0,0,0,0,0,0,0
Total Time: 8

Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 37,4,1,4,1,1,1,0,3,1,0,0,1,0,1
Total Time: 55


Summary: The first time in each test was obviously higher because it
was the first time the solution object was instantiated. What was
interesting was that even though I was building a boolean array with
194 elements, it was still faster to initialize than the regex
solution, and by quite a bit at that.

The boolean array solution was faster to initialize and ran faster
overall. Both the first time it ran and over the majority of
subsequent runs.

I hope this helps someone else.

Here are the java files I used.

DirectAction.java
-------------------------
//
// DirectAction.java
// Project Norway
//
// Created by ericstewart on 2/15/06
//

import com.webobjects.foundation.*;
import com.webobjects.appserver.*;
import com.webobjects.eocontrol.*;
import java.util.*;

public class DirectAction extends WODirectAction {

    public DirectAction(WORequest aRequest) {
        super(aRequest);
    }

    public WOActionResults defaultAction() {
        return pageWithName("Main");
    }

public WOActionResults charSpeedArrayMapAction() {
// build test string
StringBuffer testString = new StringBuffer ("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
testString.append((char)1000);
testString.append("fd0 f023 fkdls anflrwjap fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
testString.append((char)10000);
testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
testString.append((char)100000);
testString.append("o489jnrbnv8m 5tjvb6fci9 uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk f78 fm,f juy fiker fdmf");
testString.append((char)1000000);
testString.append("irmvn 984mn juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
testString.append((char)10000000);


// Strip illegal characters.
NSTimestamp time1 = new NSTimestamp();
ISOLatin1CharacterUtilityArrayMap charUtility = new
ISOLatin1CharacterUtilityArrayMap();
String resultString =
charUtility.stripInvalidCharsFromString(testString.toString());
NSTimestamp time2 = new NSTimestamp();
GregorianCalendar startCal = new GregorianCalendar();
GregorianCalendar endCal = new GregorianCalendar();
long diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);


testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^# VM#I( )*$ JM KOW#M$ @M< IF");
testString.append((char)1000);
testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI DVO$MMV");
testString.append((char)10000);
testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems duij4m");
testString.append((char)100000);
testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd J893M K
VEniw8923m mdwjw8m vmskl w o290894 vw m s s94");
testString.append((char)1000000);
testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
testString.append((char)10000000);


time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);


              testString = new StringBuffer("fj fjau");
              testString.append((char)1000);
              testString.append("fafj daf ds");
              testString.append((char)10000);
              testString.append("fjw csl jw ");
              testString.append((char)100000);
              testString.append("M)_CQ)");
              testString.append((char)1000000);
              testString.append("K(@*NE");
              testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);


testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^# VM#I( )*$ JM KOW#M$ @M< IF");
testString.append((char)1000);
testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI DVO$MMV");
testString.append((char)10000);
testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems duij4m");
testString.append((char)100000);
testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd J893M K
VEniw8923m mdwjw8m vmskl w o290894 vw m s s94");
testString.append((char)1000000);
testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
testString.append((char)10000000);


time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);


testString = new StringBuffer("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
testString.append((char)1000);
testString.append("fd0 f023 fkdls anflrwjap fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
testString.append((char)10000);
testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
testString.append((char)100000);
testString.append("o489jnrbnv8m 5tjvb6fci9 uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk f78 fm,f juy fiker fdmf");
testString.append((char)1000000);
testString.append("irmvn 984mn juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
testString.append((char)10000000);


time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);


              Main page = (Main)pageWithName("Main");
              page.setVersion(resultString);

              return page;
      }

public WOActionResults charSpeedRegexAction() {
// build test string
StringBuffer testString = new StringBuffer ("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
testString.append((char)1000);
testString.append("fd0 f023 fkdls anflrwjap fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
testString.append((char)10000);
testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
testString.append((char)100000);
testString.append("o489jnrbnv8m 5tjvb6fci9 uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk f78 fm,f juy fiker fdmf");
testString.append((char)1000000);
testString.append("irmvn 984mn juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
testString.append((char)10000000);


// Strip illegal characters.
NSTimestamp time1 = new NSTimestamp();
ISOLatin1CharacterUtilityRegex charUtility = new
ISOLatin1CharacterUtilityRegex();
String resultString =
charUtility.stripInvalidCharsFromString(testString.toString());
NSTimestamp time2 = new NSTimestamp();
GregorianCalendar startCal = new GregorianCalendar();
GregorianCalendar endCal = new GregorianCalendar();
long diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);


testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^# VM#I( )*$ JM KOW#M$ @M< IF");
testString.append((char)1000);
testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI DVO$MMV");
testString.append((char)10000);
testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems duij4m");
testString.append((char)100000);
testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd J893M K
VEniw8923m mdwjw8m vmskl w o290894 vw m s s94");
testString.append((char)1000000);
testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
testString.append((char)10000000);


time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);


              testString = new StringBuffer("fj fjau");
              testString.append((char)1000);
              testString.append("fafj daf ds");
              testString.append((char)10000);
              testString.append("fjw csl jw ");
              testString.append((char)100000);
              testString.append("M)_CQ)");
              testString.append((char)1000000);
              testString.append("K(@*NE");
              testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);


testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^# VM#I( )*$ JM KOW#M$ @M< IF");
testString.append((char)1000);
testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI DVO$MMV");
testString.append((char)10000);
testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems duij4m");
testString.append((char)100000);
testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd J893M K
VEniw8923m mdwjw8m vmskl w o290894 vw m s s94");
testString.append((char)1000000);
testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
testString.append((char)10000000);


time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);


testString = new StringBuffer("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
testString.append((char)1000);
testString.append("fd0 f023 fkdls anflrwjap fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
testString.append((char)10000);
testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
testString.append((char)100000);
testString.append("o489jnrbnv8m 5tjvb6fci9 uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk f78 fm,f juy fiker fdmf");
testString.append((char)1000000);
testString.append("irmvn 984mn juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
testString.append((char)10000000);


time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString
(testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);



Main page = (Main)pageWithName("Main"); page.setVersion(resultString);

              return page;
      }

}


ISOLatin1CharacterUtilityArrayMap.java ---------------------------------------------------------- // // ISOLatin1CharacterUtilityArrayMap.java // Norway // // Created by Eric Stewart on 2/19/06. // Copyright 2006 __MyCompanyName__. All rights reserved. //

public class ISOLatin1CharacterUtilityArrayMap {
      private boolean[] charMap = new boolean[256];

public ISOLatin1CharacterUtilityArrayMap() {
// Initialize ISO-8859-1 character map.
charMap[9] = true; charMap[10] = true; charMap [13] = true;
charMap[32] = true;
charMap[33] = true; charMap[34] = true; charMap [35] = true;
charMap[36] = true;
charMap[37] = true; charMap[38] = true; charMap [39] = true;
charMap[40] = true;
charMap[41] = true; charMap[42] = true; charMap [43] = true;
charMap[44] = true;
charMap[45] = true; charMap[46] = true; charMap [47] = true;
charMap[48] = true;
charMap[49] = true; charMap[50] = true; charMap [51] = true;
charMap[52] = true;
charMap[53] = true; charMap[54] = true; charMap [55] = true;
charMap[56] = true;
charMap[57] = true; charMap[58] = true; charMap [59] = true;
charMap[60] = true;
charMap[61] = true; charMap[62] = true; charMap [63] = true;
charMap[64] = true;
charMap[65] = true; charMap[66] = true; charMap [67] = true;
charMap[68] = true;
charMap[69] = true; charMap[70] = true; charMap [71] = true;
charMap[72] = true;
charMap[73] = true; charMap[74] = true; charMap [75] = true;
charMap[76] = true;
charMap[77] = true; charMap[78] = true; charMap [79] = true;
charMap[80] = true;
charMap[81] = true; charMap[82] = true; charMap [83] = true;
charMap[84] = true;
charMap[85] = true; charMap[86] = true; charMap [87] = true;
charMap[88] = true;
charMap[89] = true; charMap[90] = true; charMap [91] = true;
charMap[92] = true;
charMap[93] = true; charMap[94] = true; charMap [95] = true;
charMap[96] = true;
charMap[97] = true; charMap[98] = true; charMap [99] = true;
charMap[100] = true;
charMap[101] = true; charMap[102] = true; charMap [103] = true;
charMap[104] = true;
charMap[105] = true; charMap[106] = true; charMap [107] = true;
charMap[108] = true;
charMap[109] = true; charMap[110] = true; charMap [111] = true;
charMap[112] = true;
charMap[113] = true; charMap[114] = true; charMap [115] = true;
charMap[116] = true;
charMap[117] = true; charMap[118] = true; charMap [119] = true;
charMap[120] = true;
charMap[121] = true; charMap[122] = true; charMap [123] = true;
charMap[124] = true;
charMap[125] = true; charMap[126] = true; charMap [160] = true;
charMap[161] = true;
charMap[162] = true; charMap[163] = true; charMap [164] = true;
charMap[165] = true;
charMap[166] = true; charMap[167] = true; charMap [168] = true;
charMap[169] = true;
charMap[170] = true; charMap[171] = true; charMap [172] = true;
charMap[173] = true;
charMap[174] = true; charMap[175] = true; charMap [176] = true;
charMap[177] = true;
charMap[178] = true; charMap[179] = true; charMap [180] = true;
charMap[181] = true;
charMap[182] = true; charMap[183] = true; charMap [184] = true;
charMap[185] = true;
charMap[186] = true; charMap[187] = true; charMap [188] = true;
charMap[189] = true;
charMap[190] = true; charMap[191] = true; charMap [192] = true;
charMap[193] = true;
charMap[194] = true; charMap[195] = true; charMap [196] = true;
charMap[197] = true;
charMap[198] = true; charMap[199] = true; charMap [200] = true;
charMap[201] = true;
charMap[202] = true; charMap[203] = true; charMap [204] = true;
charMap[205] = true;
charMap[206] = true; charMap[207] = true; charMap [208] = true;
charMap[209] = true;
charMap[210] = true; charMap[211] = true; charMap [212] = true;
charMap[213] = true;
charMap[214] = true; charMap[215] = true; charMap [216] = true;
charMap[217] = true;
charMap[218] = true; charMap[219] = true; charMap [220] = true;
charMap[221] = true;
charMap[222] = true; charMap[223] = true; charMap [224] = true;
charMap[225] = true;
charMap[226] = true; charMap[227] = true; charMap [228] = true;
charMap[229] = true;
charMap[230] = true; charMap[231] = true; charMap [232] = true;
charMap[233] = true;
charMap[234] = true; charMap[235] = true; charMap [236] = true;
charMap[237] = true;
charMap[238] = true; charMap[239] = true; charMap [240] = true;
charMap[241] = true;
charMap[242] = true; charMap[243] = true; charMap [244] = true;
charMap[245] = true;
charMap[246] = true; charMap[247] = true; charMap [248] = true;
charMap[249] = true;
charMap[250] = true; charMap[251] = true; charMap [252] = true;
charMap[253] = true;
charMap[254] = true; charMap[255] = true;
}


      /*
       * Determines if a char is a valid ISO-8859-1 character.
       */
      public boolean isCharValid(char value) {
              if (((int)value) < 256 && charMap[(int)value]) {
                      return true;
              } else {
                      return false;
              }
      }

/*
* Returns a string clean of all invalid ISO-8859-1 characters.
*/
public String stripInvalidCharsFromString(String value) {
StringBuffer buffer = new StringBuffer();
char[] charArray = value.toCharArray();
int charArrayLength = charArray.length;
for (int i = 0; i < charArrayLength; i++) {
if (((int)charArray[i]) < 256 && charMap [(int)charArray[i]]) {
buffer.append(charArray[i]);
}
}
return buffer.toString();
}
}


ISOLatin1CharacterUtilityRegex.java
-----------------------------------------------------
//
//  ISOLatin1CharCleaner.java
//  Norway
//
//  Created by Eric Stewart on 2/15/06.
//  Copyright 2006 __MyCompanyName__. All rights reserved.
//

import java.util.regex.*;

public class ISOLatin1CharacterUtilityRegex {

      public ISOLatin1CharacterUtilityRegex() {
      }

/*
* Returns a string clean of all invalid ISO-8859-1 characters.
*/
public String stripInvalidCharsFromString(String value) {
String regExp = "[^\\x09\\x0A\\x0D\\x20-\\x7E\\xA0-\ \xFF]+";
Pattern p = Pattern.compile(regExp);
String result = p.matcher(value).replaceAll("");
return result;
}
}
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40logicunited.com


This email sent to email@hidden



_______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
  • Follow-Ups:
    • Re: Most efficient character parsing.
      • From: Jean-François Veillette <email@hidden>
References: 
 >Re: Most efficient character parsing. (From: "Eric Stewart" <email@hidden>)
 >Re: Most efficient character parsing. (From: Anjo Krank <email@hidden>)
 >Re: Most efficient character parsing. (From: "Eric Stewart" <email@hidden>)

  • Prev by Date: Re: Most efficient character parsing.
  • Next by Date: Re: Deleting Flattened Relationships
  • Previous by thread: Re: Most efficient character parsing.
  • Next by thread: Re: Most efficient character parsing.
  • Index(es):
    • Date
    • Thread