How to help VM (hotspot) has to optimise the code?
What exactly is it that you're doing? What is your program trying to
achieve?
i dont use garbage collector, all objects are re-use.
Unless you need to allocate a very large number of objects in short
periods of time, this will only really result in memory savings.
Runtime savings for most Java applications in such a scenario is
minimal.
Many methodes are final.
This won't help you at all -- it's a myth that there is any
performance benefit to be seen doing this. Ref:
It really depends on what you're doing. A jump isn't a massively
expensive operation. It depends on how tight your loops are, and how
much time you're spending performing the jumps. For most uses,
unrolling the loops results in negligible runtime benefits , and only
serves to make your code significantly less maintainable.
Also note that unrolling your loops may be working against you, as you
just make HotSpot work even harder. Take, for example:
Now you're forcing HotSpot to optimize and convert Integer.MAX_VALUE
lines of code. Which do you think is going to take longer? Optimize
and compile once, or optimize and compile 2^31-1 times?
Remember that with HotSpot, the real compilation phase is at runtime.
That result isn't typically cached, so it occurs each time you run the
tool. So if you unroll complex loops, you just force HotSpot to work
/harder/, causing the CPU to spend less time actually processing your
program.
It typically is a good idea to convert recursive algorithms to
iterative algorithms, however.
My code is slower than same code in C++ (3 times).
A lot of people waste a whole lot of time trying to implement "tricks"
like the ones you mention above, to little or no benefit, when the
major thing you should be looking at is your algorithms in the first
place.
It's worth understanding how HotSpot works. As a broad overview,
HotSpot will expend more effort optimizing those methods which are
accessed more often. As such, you can see a better improvement in your
code if you try to find relatively simple calculations you make over
and over again, and break them out into their own methods.
I'll give you an example from the jSyncManager
(http://www.jsyncmanager.org), an Open Source Java project I work on.
We do a lot of work reading byte data from input streams and buffers of
various sorts, and need to be able to reconstitute them into other
primitive (and complex) types (and the reverse). For example, four
bytes may need to be reconstituted into an int. The code to do this is
fairly simple:
int i = (byte1<<24) | (byte2<<16) | (byte3<<8) | byte4;
However, we were doing this all over the place in our code. Methods
may have to do this operation dozens of times. Why should HotSpot have
to re-optimize this same little algorithm each time it encounters it
when we can convert it into a method where it is optimized once? As
well, some of the instances of this conversion may not be run too often
-- they might run only once or twice in an execution. In such cases,
HotSpot may not optimize them very well (depending on the HotSpot
compiler profile in use -- HotSpot tends to be more aggressive about
optimizing methods which are used more frequently than those which are
run less).
Thus, by breaking these into their own methods, HotSpot can optimize
them heavily in one place, instead of to varying degrees in lots of
places.
(As a note for other optimizers out there, I didn't actually undertake
this conversion routine for the sake of optimization, but to improve
code readability overall. We had some many classes doing binary
rotations and operations like this of all sorts of varying types that
the code was prone to developer-error and bad readability. The
performance benefit in the jSyncManager was negligible, due in large
part because our code spends a lot of time starved for input anyway.
You only tend to notice a slight difference on older, slower hardware).
As well, do some profiling while running your code to see where you're
spending the bulk of your time during a typical run, and then spend
time optimizing that. Spending time unrolling loops in routines you
only spend 1% of your runtime in isn't going to be useful -- you waste
more time manually unrolling the loop than you'll ever save in runtime.
The command "java -Xrunhprof:help" is your friend here. In
particular, look at the cpu=samples parameter.
Even using this, you may not be able to do much. Using the
jSyncManager as an example again, I just ran it through a single
synchronization session with profiling enabled, using TCP/IP as the
underlying transport mechanism (RS-232 serial and USB are also
supported, and may have different results due to different transport
code and in the case of RS-232 communications, a different protocol
stack). And you know what? It isn't until the 56th ranked entry that
a jSyncManager routine finally shows up. 55 entries, all in the
standard Java 1.4.2 Runtime, use more CPU time than any of my routines.
That 56th entry used only 0.05% of CPU time during the execution. The
#1 item, which was java.net.PlainSocketImpl.socketAccept, used 54.74%
of the CPUs time during the execution. The total of the first 55
entries is 94.13% of the CPU's time.
I can't optimize java.net.PlainSocket. Nor can I avoid it. It
doesn't make much sense for me to sit around optimizing
org.jSyncManager.API.Protocol.Util.DLPDatabaseListGroup.<init> -- even
if I could make it five times more efficient, I'll only save 0.04% of
CPU time. Even if I optimized _every_ routine to make it five times
faster, I'd be lucky to save a whole 1% of CPU time.
(And if you're curious, for the example I gave above that performs
byte-to-integer conversions, the method was run 378 times in the span
of a 1 minute run, and took up only 0.02% of the runtime. It's hard to
compare this to the previous situation, as there is no way to aggregate
the time that would have been spent in just those operations inside
dozens of different methods, but as you can see it's negligible).
Now your routines and algorithms may be different -- if you're doing
some serious scientific computing, you may find you are spending 80% of
your time in one of your own routines -- in which case profiling is
going to tell you which routines are most worth expending your time on.
Or you might find, like me, that aggressively pre-optimizing your code
simply isn't worth it, because the performance benefit is going to be
so negligible that you're not even going to be able to notice a
difference.
One last thing to look at: try running your program while running a
CPU activity monitor, with nothing else running, to see if you're
keeping the CPU busy at all times during your operations. If you find
you're using 100% CPU for short stretches, but then have lengthy
periods where nothing is happening, your code might be a good candidate
for parallelization. You may find there are times when your code is
just sitting around waiting on I/O, during which time you might be able
to process some other parts of your algorithm that are of a lesser
priority, but which will eventually be needed. Keep the CPU busy as
much as possible, and your runtime will decrease.
But overall, run the profiling and concentrate the most on improving
your algorithms. Also try to structure your code such that HotSpot
spends less time doing optimizations and conversions to native code
(which is typically accomplished by just writing good OO code in the
first place). I hope this helps!
Brad BARCLAY,
Lead Developer & Project Administrator,
The jSyncManager Project.
=-=-=-=-=-=-=-=-=-=
From the Mac OS X Desktop of Brad BARCLAY
E-Mail: email@hidden Web: http://www.jsyncmanager.org
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden
This email sent to email@hidden