• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Most efficient character parsing.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Most efficient character parsing.


  • Subject: Re: Most efficient character parsing.
  • From: "Jerry W. Walker" <email@hidden>
  • Date: Sun, 19 Feb 2006 19:43:09 -0500

Hi, Eric,

On Feb 19, 2006, at 5:45 PM, Eric Stewart wrote:
Summary: The first time in each test was obviously higher because it
was the first time the solution object was instantiated. What was
interesting was that even though I was building a boolean array with
194 elements, it was still faster to initialize than the regex
solution, and by quite a bit at that.

Wow, thank you for sharing that!

Interesting how old biases hold over. Regular expressions used to be very fast under the original Unix systems, but that's suspect for a couple reasons: first, I never timed them against the "boolean" array approach that Chuck suggested and second, I think they may have been a bit faster against an ASCII domain rather than than that of the newer encodings.

I might add that I never timed them at all, but that's also not quite true. On an old PDP-11, you could literally feel the speeds of operations when they numbered in the thousands. Those old processors just weren't very fast. :-)

In any case, thanks again for that contribution.

Regards,
Jerry

On Feb 19, 2006, at 5:45 PM, Eric Stewart wrote:

Okay, I took the advice on the board and choose to write a boolean
array and regex solution and tested the two head-to-head. The bottom
of the email contains both solution classes and the testing class (So
you can see exactly what I did).

So here's a brief description of the testing process. The test file
has 5 separate strings. Each string is run through the character
checker and the time taken to run is recorded in milliseconds. For
each round I built the application and started it. Then ran either the
boolean array or regex test three times successively. Then stop the
application, rebuilt it and restarted it and ran the other test. 10
rounds were run.

Here is how it went.

Round 1:

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 4,0,0,1,3,0,0,0,0,0,0,0,0,0,1
Total Time: 9


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,4,1,6,1,1,1,0,3,1,0,0,0,1,1
Total Time: 39

Round 2.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 5,0,1,0,3,0,1,0,0,0,0,0,0,0,0
Total Time: 10


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 18,5,2,3,0,0,1,0,2,1,1,1,0,1,0
Total Time: 35

Round 3.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 4,1,0,0,4,0,1,0,0,0,0,0,0,0,0
Total Time: 10


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,7,2,3,2,0,0,1,3,0,1,0,1,0,1
Total Time: 40

Round 4.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 4,1,0,0,4,0,0,0,0,0,0,0,0,0,0
Total Time: 9


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 20,4,1,6,1,0,1,0,3,0,1,1,0,1,0
Total Time: 39

Round 5.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 4,0,0,0,3,0,0,0,0,0,0,0,0,0,0
Total Time: 7


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,4,1,5,0,0,0,1,2,0,0,0,0,1,0
Total Time: 33

Round 6.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 5,0,0,1,3,0,0,0,0,0,0,0,0,0,1
Total Time: 10


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 36,5,2,4,0,0,0,1,2,1,1,1,0,1,0
Total Time: 54

Round 7.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 4,0,0,1,3,0,0,0,0,1,0,0,1,1,0
Total Time: 11


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 19,4,1,4,0,1,0,0,2,1,1,1.0,0,1
Total Time: 35

Round 8.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 5,1,0,0,4,0,0,0,0,0,0,0,0,0,0
Total Time: 10


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 21,5,1,4,1,1,0,0,3,0,1,1,0,1,0
Total Time: 39

Round 9.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 5,0,0,0,3,0,0,0,0,0,0,1,0,0,0
Total Time: 9


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 18,4,2,5,1,0,0,1,2,0,0,0,0,0,0
Total Time: 34

Round 10.

Built & ran application. Invoked boolean array direct action 3 times in a row.
Results in milliseconds: 4,0,0,0,4,0,0,0,0,0,0,0,0,0,0
Total Time: 8


Built & ran application. Invoked regex direct action 3 times in a row.
Results in milliseconds: 37,4,1,4,1,1,1,0,3,1,0,0,1,0,1
Total Time: 55

Summary: The first time in each test was obviously higher because it
was the first time the solution object was instantiated. What was
interesting was that even though I was building a boolean array with
194 elements, it was still faster to initialize than the regex
solution, and by quite a bit at that.

The boolean array solution was faster to initialize and ran faster
overall. Both the first time it ran and over the majority of
subsequent runs.

I hope this helps someone else.

Here are the java files I used.

DirectAction.java
-------------------------
//
// DirectAction.java
// Project Norway
//
// Created by ericstewart on 2/15/06
//

import com.webobjects.foundation.*;
import com.webobjects.appserver.*;
import com.webobjects.eocontrol.*;
import java.util.*;

public class DirectAction extends WODirectAction {

    public DirectAction(WORequest aRequest) {
        super(aRequest);
    }

    public WOActionResults defaultAction() {
        return pageWithName("Main");
    }

	public WOActionResults charSpeedArrayMapAction() {
		// build test string
		StringBuffer testString = new StringBuffer("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
		testString.append((char)1000);
		testString.append("fd0 f023 fkdls anflrwjap  fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
		testString.append((char)10000);
		testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
		testString.append((char)100000);
		testString.append("o489jnrbnv8m 5tjvb6fci9   uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk  f78 fm,f juy fiker fdmf");
		testString.append((char)1000000);
		testString.append("irmvn 984mn  juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
		testString.append((char)10000000);

		// Strip illegal characters.
		NSTimestamp time1 = new NSTimestamp();
		ISOLatin1CharacterUtilityArrayMap charUtility = new
ISOLatin1CharacterUtilityArrayMap();
		String resultString =
charUtility.stripInvalidCharsFromString(testString.toString());
		NSTimestamp time2 = new NSTimestamp();
		GregorianCalendar startCal = new GregorianCalendar();
		GregorianCalendar endCal = new GregorianCalendar();
		long diffMillis = 0;
		startCal.setTime(time1);
		endCal.setTime(time2);
		diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
		NSLog.debug.appendln("Array map time to parse string: "+diffMillis);

testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^# VM#I( )*$ JM KOW#M$ @M< IF");
testString.append((char)1000);
testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI DVO$MMV");
testString.append((char)10000);
testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems duij4m");
testString.append((char)100000);
testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd J893M K
VEniw8923m mdwjw8m vmskl w o290894 vw m s s94");
testString.append((char)1000000);
testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);

testString = new StringBuffer("fj fjau");
testString.append((char)1000);
testString.append("fafj daf ds");
testString.append((char)10000);
testString.append("fjw csl jw ");
testString.append((char)100000);
testString.append("M)_CQ)");
testString.append((char)1000000);
testString.append("K(@*NE");
testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);

testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^# VM#I( )*$ JM KOW#M$ @M< IF");
testString.append((char)1000);
testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI DVO$MMV");
testString.append((char)10000);
testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems duij4m");
testString.append((char)100000);
testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd J893M K
VEniw8923m mdwjw8m vmskl w o290894 vw m s s94");
testString.append((char)1000000);
testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);

testString = new StringBuffer("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
testString.append((char)1000);
testString.append("fd0 f023 fkdls anflrwjap fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
testString.append((char)10000);
testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
testString.append((char)100000);
testString.append("o489jnrbnv8m 5tjvb6fci9 uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk f78 fm,f juy fiker fdmf");
testString.append((char)1000000);
testString.append("irmvn 984mn juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Array map time to parse string: "+diffMillis);

Main page = (Main)pageWithName("Main");
page.setVersion(resultString);

return page;
}


	public WOActionResults charSpeedRegexAction() {
		// build test string
		StringBuffer testString = new StringBuffer("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
		testString.append((char)1000);
		testString.append("fd0 f023 fkdls anflrwjap  fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
		testString.append((char)10000);
		testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
		testString.append((char)100000);
		testString.append("o489jnrbnv8m 5tjvb6fci9   uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk  f78 fm,f juy fiker fdmf");
		testString.append((char)1000000);
		testString.append("irmvn 984mn  juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
		testString.append((char)10000000);

		// Strip illegal characters.
		NSTimestamp time1 = new NSTimestamp();
		ISOLatin1CharacterUtilityRegex charUtility = new
ISOLatin1CharacterUtilityRegex();
		String resultString =
charUtility.stripInvalidCharsFromString(testString.toString());
		NSTimestamp time2 = new NSTimestamp();
		GregorianCalendar startCal = new GregorianCalendar();
		GregorianCalendar endCal = new GregorianCalendar();
		long diffMillis = 0;
		startCal.setTime(time1);
		endCal.setTime(time2);
		diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
		NSLog.debug.appendln("Regex time to parse string: "+diffMillis);

		testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^#  VM#I( )*$ JM  KOW#M$ @M< IF");
		testString.append((char)1000);
		testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI  DVO$MMV");
		testString.append((char)10000);
		testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems  duij4m");
		testString.append((char)100000);
		testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd  J893M K
VEniw8923m  mdwjw8m vmskl w o290894 vw m s s94");
		testString.append((char)1000000);
		testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
		testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);

testString = new StringBuffer("fj fjau");
testString.append((char)1000);
testString.append("fafj daf ds");
testString.append((char)10000);
testString.append("fjw csl jw ");
testString.append((char)100000);
testString.append("M)_CQ)");
testString.append((char)1000000);
testString.append("K(@*NE");
testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);

testString = new StringBuffer("9km4bnchj 8jk4 v y739dfj jme89vu8 n
d';v FRkfvK U*N UO (F&^# VM#I( )*$ JM KOW#M$ @M< IF");
testString.append((char)1000);
testString.append("4mvj930 fn89 2 no98304 nr0mj v8v87395 09vm vwlr e
;vd s,mnrv K VUYRMNVDHJ SUISVI DVO$MMV");
testString.append((char)10000);
testString.append("i4m *$N lfju67 K$(N kjgurn jkd7 KMN* JND^&V
kf9]6l4m,d id 8f4 j md k3idd8j4m cems duij4m");
testString.append((char)100000);
testString.append("imn4nf8 IUj4nvjud8mner iec 883mnd J893M K
VEniw8923m mdwjw8m vmskl w o290894 vw m s s94");
testString.append((char)1000000);
testString.append("fwjo wo fro3neqwvr03 f94fdwc J VW)RJ)VJ EQW(
VNDSHVV@HPNVDSOPV)J*(J V)W)RHjiwo vhjdwj vlj");
testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);

testString = new StringBuffer("kfdlas;n 0wqm dsagjnoisa
fd;af[aghjr3q-tifnewna fafjpewiq nor0dafnlw;l jfh0w flw;f saofh8");
testString.append((char)1000);
testString.append("fd0 f023 fkdls anflrwjap fsa[w fjnw f[2-
dawjv094 tn3oh9k04r3 309r3hg854mvrm3w0v5nw[0 v9");
testString.append((char)10000);
testString.append("qmgn vjdsop 00 89w nv3ni0vr nmv p3orm vnrv rm v
fw mdndw sjuio490n v uckm4uv4n fj iivkmcj");
testString.append((char)100000);
testString.append("o489jnrbnv8m 5tjvb6fci9 uv77vj vu v7v 678i9ls
fdgo09 i9 r98 jk f78 fm,f juy fiker fdmf");
testString.append((char)1000000);
testString.append("irmvn 984mn juf78 km 4 d0v76 7 m j37 67k
6mbvjk8cv56 6yjn r vcjv u7849md cx;df]c0-8 end");
testString.append((char)10000000);

time1 = new NSTimestamp();
resultString = charUtility.stripInvalidCharsFromString (testString.toString());
time2 = new NSTimestamp();
diffMillis = 0;
startCal.setTime(time1);
endCal.setTime(time2);
diffMillis = endCal.getTimeInMillis() - startCal.getTimeInMillis();
NSLog.debug.appendln("Regex time to parse string: "+diffMillis);


Main page = (Main)pageWithName("Main");
page.setVersion(resultString);

return page;
}


}


ISOLatin1CharacterUtilityArrayMap.java ---------------------------------------------------------- // // ISOLatin1CharacterUtilityArrayMap.java // Norway // // Created by Eric Stewart on 2/19/06. // Copyright 2006 __MyCompanyName__. All rights reserved. //

public class ISOLatin1CharacterUtilityArrayMap {
	private boolean[] charMap = new boolean[256];

	public ISOLatin1CharacterUtilityArrayMap() {
		// Initialize ISO-8859-1 character map.
		charMap[9]   = true; charMap[10]  = true; charMap[13]  = true;
charMap[32] = true;
		charMap[33]  = true; charMap[34]  = true; charMap[35]  = true;
charMap[36] = true;
		charMap[37]  = true; charMap[38]  = true; charMap[39]  = true;
charMap[40] = true;
		charMap[41]  = true; charMap[42]  = true; charMap[43]  = true;
charMap[44] = true;
		charMap[45]  = true; charMap[46]  = true; charMap[47]  = true;
charMap[48] = true;
		charMap[49]  = true; charMap[50]  = true; charMap[51]  = true;
charMap[52] = true;
		charMap[53]  = true; charMap[54]  = true; charMap[55]  = true;
charMap[56] = true;
		charMap[57]  = true; charMap[58]  = true; charMap[59]  = true;
charMap[60] = true;
		charMap[61]  = true; charMap[62]  = true; charMap[63]  = true;
charMap[64] = true;
		charMap[65]  = true; charMap[66]  = true; charMap[67]  = true;
charMap[68] = true;
		charMap[69]  = true; charMap[70]  = true; charMap[71]  = true;
charMap[72] = true;
		charMap[73]  = true; charMap[74]  = true; charMap[75]  = true;
charMap[76] = true;
		charMap[77]  = true; charMap[78]  = true; charMap[79]  = true;
charMap[80] = true;
		charMap[81]  = true; charMap[82]  = true; charMap[83]  = true;
charMap[84] = true;
		charMap[85]  = true; charMap[86]  = true; charMap[87]  = true;
charMap[88] = true;
		charMap[89]  = true; charMap[90]  = true; charMap[91]  = true;
charMap[92] = true;
		charMap[93]  = true; charMap[94]  = true; charMap[95]  = true;
charMap[96] = true;
		charMap[97]  = true; charMap[98]  = true; charMap[99]  = true;
charMap[100] = true;
		charMap[101] = true; charMap[102] = true; charMap[103] = true;
charMap[104] = true;
		charMap[105] = true; charMap[106] = true; charMap[107] = true;
charMap[108] = true;
		charMap[109] = true; charMap[110] = true; charMap[111] = true;
charMap[112] = true;
		charMap[113] = true; charMap[114] = true; charMap[115] = true;
charMap[116] = true;
		charMap[117] = true; charMap[118] = true; charMap[119] = true;
charMap[120] = true;
		charMap[121] = true; charMap[122] = true; charMap[123] = true;
charMap[124] = true;
		charMap[125] = true; charMap[126] = true; charMap[160] = true;
charMap[161] = true;
		charMap[162] = true; charMap[163] = true; charMap[164] = true;
charMap[165] = true;
		charMap[166] = true; charMap[167] = true; charMap[168] = true;
charMap[169] = true;
		charMap[170] = true; charMap[171] = true; charMap[172] = true;
charMap[173] = true;
		charMap[174] = true; charMap[175] = true; charMap[176] = true;
charMap[177] = true;
		charMap[178] = true; charMap[179] = true; charMap[180] = true;
charMap[181] = true;
		charMap[182] = true; charMap[183] = true; charMap[184] = true;
charMap[185] = true;
		charMap[186] = true; charMap[187] = true; charMap[188] = true;
charMap[189] = true;
		charMap[190] = true; charMap[191] = true; charMap[192] = true;
charMap[193] = true;
		charMap[194] = true; charMap[195] = true; charMap[196] = true;
charMap[197] = true;
		charMap[198] = true; charMap[199] = true; charMap[200] = true;
charMap[201] = true;
		charMap[202] = true; charMap[203] = true; charMap[204] = true;
charMap[205] = true;
		charMap[206] = true; charMap[207] = true; charMap[208] = true;
charMap[209] = true;
		charMap[210] = true; charMap[211] = true; charMap[212] = true;
charMap[213] = true;
		charMap[214] = true; charMap[215] = true; charMap[216] = true;
charMap[217] = true;
		charMap[218] = true; charMap[219] = true; charMap[220] = true;
charMap[221] = true;
		charMap[222] = true; charMap[223] = true; charMap[224] = true;
charMap[225] = true;
		charMap[226] = true; charMap[227] = true; charMap[228] = true;
charMap[229] = true;
		charMap[230] = true; charMap[231] = true; charMap[232] = true;
charMap[233] = true;
		charMap[234] = true; charMap[235] = true; charMap[236] = true;
charMap[237] = true;
		charMap[238] = true; charMap[239] = true; charMap[240] = true;
charMap[241] = true;
		charMap[242] = true; charMap[243] = true; charMap[244] = true;
charMap[245] = true;
		charMap[246] = true; charMap[247] = true; charMap[248] = true;
charMap[249] = true;
		charMap[250] = true; charMap[251] = true; charMap[252] = true;
charMap[253] = true;
		charMap[254] = true; charMap[255] = true;
	}

	/*
	 * Determines if a char is a valid ISO-8859-1 character.
	 */
	public boolean isCharValid(char value) {
		if (((int)value) < 256 && charMap[(int)value]) {
			return true;
		} else {
			return false;
		}
	}

	/*
	 * Returns a string clean of all invalid ISO-8859-1 characters.
	 */
	public String stripInvalidCharsFromString(String value) {
		StringBuffer buffer = new StringBuffer();
		char[] charArray = value.toCharArray();
		int charArrayLength = charArray.length;
		for (int i = 0; i < charArrayLength; i++) {
			if (((int)charArray[i]) < 256 && charMap[(int)charArray[i]]) {
				buffer.append(charArray[i]);
			}
		}
		return buffer.toString();
	}
}

ISOLatin1CharacterUtilityRegex.java
-----------------------------------------------------
//
//  ISOLatin1CharCleaner.java
//  Norway
//
//  Created by Eric Stewart on 2/15/06.
//  Copyright 2006 __MyCompanyName__. All rights reserved.
//

import java.util.regex.*;

public class ISOLatin1CharacterUtilityRegex {

public ISOLatin1CharacterUtilityRegex() {
}

/*
* Returns a string clean of all invalid ISO-8859-1 characters.
*/
public String stripInvalidCharsFromString(String value) {
String regExp = "[^\\x09\\x0A\\x0D\\x20-\\x7E\\xA0-\\xFF]+";
Pattern p = Pattern.compile(regExp);
String result = p.matcher(value).replaceAll("");
return result;
}
}
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40gmail.com


This email sent to email@hidden


--
__ Jerry W. Walker,
WebObjects Developer/Instructor for High Performance Industrial Strength Internet Enabled Systems


    email@hidden
    203 278-4085        office



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Re: Most efficient character parsing. (From: "Eric Stewart" <email@hidden>)

  • Prev by Date: Deleting Flattened Relationships
  • Next by Date: Re: Unique attributes
  • Previous by thread: Re: Most efficient character parsing.
  • Next by thread: Re: Most efficient character parsing.
  • Index(es):
    • Date
    • Thread