Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Optimisation



On Apr 15, 2005, at 05:03, Bruno Causse wrote:

Hi Bruno:

Sorry for my english.

No problem.

How to help VM (hotspot) has to optimise the code?

What exactly is it that you're doing? What is your program trying to achieve?


i dont use garbage collector, all objects are re-use.

Unless you need to allocate a very large number of objects in short periods of time, this will only really result in memory savings. Runtime savings for most Java applications in such a scenario is minimal.


Many methodes are final.

This won't help you at all -- it's a myth that there is any performance benefit to be seen doing this. Ref:


				http://www-128.ibm.com/developerworks/java/library/j-jtp1029.html

loops unrolled (is this really important)

It really depends on what you're doing. A jump isn't a massively expensive operation. It depends on how tight your loops are, and how much time you're spending performing the jumps. For most uses, unrolling the loops results in negligible runtime benefits , and only serves to make your code significantly less maintainable.


Also note that unrolling your loops may be working against you, as you just make HotSpot work even harder. Take, for example:

			for(int i=0;i<Integer.MAX_VALUE;i++) result[i]=<some complex routine>

HotSpot can optimize and compile <some complex routine> exactly once, and then loop through it. Contrast this with:

			result[0]=<some complex result>;
			result[1]=<some complex result>;
			result[2]=<some complex result>;
				...
			result[Integer.MAX_VALUE]=<some complex result>;

Now you're forcing HotSpot to optimize and convert Integer.MAX_VALUE lines of code. Which do you think is going to take longer? Optimize and compile once, or optimize and compile 2^31-1 times?

Remember that with HotSpot, the real compilation phase is at runtime. That result isn't typically cached, so it occurs each time you run the tool. So if you unroll complex loops, you just force HotSpot to work /harder/, causing the CPU to spend less time actually processing your program.

It typically is a good idea to convert recursive algorithms to iterative algorithms, however.

My code is slower than same code in C++ (3 times).

A lot of people waste a whole lot of time trying to implement "tricks" like the ones you mention above, to little or no benefit, when the major thing you should be looking at is your algorithms in the first place.


It's worth understanding how HotSpot works. As a broad overview, HotSpot will expend more effort optimizing those methods which are accessed more often. As such, you can see a better improvement in your code if you try to find relatively simple calculations you make over and over again, and break them out into their own methods.

I'll give you an example from the jSyncManager (http://www.jsyncmanager.org), an Open Source Java project I work on. We do a lot of work reading byte data from input streams and buffers of various sorts, and need to be able to reconstitute them into other primitive (and complex) types (and the reverse). For example, four bytes may need to be reconstituted into an int. The code to do this is fairly simple:

		int i = (byte1<<24) | (byte2<<16) | (byte3<<8) | byte4;

However, we were doing this all over the place in our code. Methods may have to do this operation dozens of times. Why should HotSpot have to re-optimize this same little algorithm each time it encounters it when we can convert it into a method where it is optimized once? As well, some of the instances of this conversion may not be run too often -- they might run only once or twice in an execution. In such cases, HotSpot may not optimize them very well (depending on the HotSpot compiler profile in use -- HotSpot tends to be more aggressive about optimizing methods which are used more frequently than those which are run less).

Thus, by breaking these into their own methods, HotSpot can optimize them heavily in one place, instead of to varying degrees in lots of places.

(As a note for other optimizers out there, I didn't actually undertake this conversion routine for the sake of optimization, but to improve code readability overall. We had some many classes doing binary rotations and operations like this of all sorts of varying types that the code was prone to developer-error and bad readability. The performance benefit in the jSyncManager was negligible, due in large part because our code spends a lot of time starved for input anyway. You only tend to notice a slight difference on older, slower hardware).

As well, do some profiling while running your code to see where you're spending the bulk of your time during a typical run, and then spend time optimizing that. Spending time unrolling loops in routines you only spend 1% of your runtime in isn't going to be useful -- you waste more time manually unrolling the loop than you'll ever save in runtime. The command "java -Xrunhprof:help" is your friend here. In particular, look at the cpu=samples parameter.

Even using this, you may not be able to do much. Using the jSyncManager as an example again, I just ran it through a single synchronization session with profiling enabled, using TCP/IP as the underlying transport mechanism (RS-232 serial and USB are also supported, and may have different results due to different transport code and in the case of RS-232 communications, a different protocol stack). And you know what? It isn't until the 56th ranked entry that a jSyncManager routine finally shows up. 55 entries, all in the standard Java 1.4.2 Runtime, use more CPU time than any of my routines. That 56th entry used only 0.05% of CPU time during the execution. The #1 item, which was java.net.PlainSocketImpl.socketAccept, used 54.74% of the CPUs time during the execution. The total of the first 55 entries is 94.13% of the CPU's time.

I can't optimize java.net.PlainSocket. Nor can I avoid it. It doesn't make much sense for me to sit around optimizing org.jSyncManager.API.Protocol.Util.DLPDatabaseListGroup.<init> -- even if I could make it five times more efficient, I'll only save 0.04% of CPU time. Even if I optimized _every_ routine to make it five times faster, I'd be lucky to save a whole 1% of CPU time.

(And if you're curious, for the example I gave above that performs byte-to-integer conversions, the method was run 378 times in the span of a 1 minute run, and took up only 0.02% of the runtime. It's hard to compare this to the previous situation, as there is no way to aggregate the time that would have been spent in just those operations inside dozens of different methods, but as you can see it's negligible).

Now your routines and algorithms may be different -- if you're doing some serious scientific computing, you may find you are spending 80% of your time in one of your own routines -- in which case profiling is going to tell you which routines are most worth expending your time on. Or you might find, like me, that aggressively pre-optimizing your code simply isn't worth it, because the performance benefit is going to be so negligible that you're not even going to be able to notice a difference.

One last thing to look at: try running your program while running a CPU activity monitor, with nothing else running, to see if you're keeping the CPU busy at all times during your operations. If you find you're using 100% CPU for short stretches, but then have lengthy periods where nothing is happening, your code might be a good candidate for parallelization. You may find there are times when your code is just sitting around waiting on I/O, during which time you might be able to process some other parts of your algorithm that are of a lesser priority, but which will eventually be needed. Keep the CPU busy as much as possible, and your runtime will decrease.

But overall, run the profiling and concentrate the most on improving your algorithms. Also try to structure your code such that HotSpot spends less time doing optimizations and conversions to native code (which is typically accomplished by just writing good OO code in the first place). I hope this helps!

Brad BARCLAY,
Lead Developer & Project Administrator,
The jSyncManager Project.

=-=-=-=-=-=-=-=-=-=
From the Mac OS X Desktop of Brad BARCLAY
E-Mail:  email@hidden     Web:  http://www.jsyncmanager.org

Attachment: smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Java-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/java-dev/email@hidden

This email sent to email@hidden

References: 
 >Optimisation (From: Bruno Causse <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.