I was wondering if it is possible to split a very large XML file (100
megabytes) into several smaller size chunks and then run several SAX
parsing threads on each of the chunks? The problem I having trying to
conceptualise in mind is how to avoid the situation where I split the
large XML file in the wrong place and thus leaving two of the threads
with incomplete information. I reckon I need to run a pre-processor on
the file to determine where to spilt it....
What is the connection to java?
I am just attempting to learn some XML again so my understanding here
might be off. But isn't SAX more suited to parsing document type XML
files as opposed to data type files and 100M would more likely be a
data type file.
You probably could manage to split the file, I tried a similar
approach to digitally signing PDF files using iText for PDF handling
and bouncycastle for crypto. It turns out I did it incorrectly and
iText added the support shortly after. I mean to go back and look at
what I did wrong and what they did instead at some point though. But
the signature needed to be embedded into the split PDF it seemed to
me. So I won't argue that splitting a file can make sense.
Still for purposes of multi-threading I think it may be a wrong
approach. The disk thrashing you would get into trying to read the
file in multiple places would probably more than offset any multi-
threaded performance gain? As another example I decided a threaded
classpath search was probably a bad idea for this reason, I would be
doing disk accesses all over the place. I never need any profiling
though either way which I should of. Also that had the rare but bad
side effect of occasionally finding the class in the wrong place if
there were duplicates. So I think I currently have the multi-tasking
pretty much inactivated.
Possibly the disk thrashing is what is giving you a nagging feeling
that something might be wrong here?
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden
This email sent to email@hidden