I was wondering if it is possible to split a very large XML file (100
megabytes) into several smaller size chunks and then run several SAX
parsing threads on each of the chunks?
I think the issue you might have here is that you must first parse
the file to know where you can split the file - even if you have a
clever lexer to do the splitting, you will still be seeing *all* of
the data. Then when you parse with each of the threads, you will
be seeing *all* of the data a second time. This will not help you
with speed.
Of course, you will need to avoid DOM to prevent all of the data
being put into memory :)
You should probably take a look at StAX [0] as well as the normal
SAX approaches.