Re: improving numerical applications performance
Re: improving numerical applications performance
- Subject: Re: improving numerical applications performance
- From: "Edward K. Chew" <email@hidden>
- Date: Mon, 17 May 2004 18:06:46 -0400
This is a complex topic, but allow me to weigh in with a number of
points:
1. The first thing to do is profile your application to see which
functions are taking the most time.
2. Of these functions, determine which can be handled in
single-precision without overly compromising the accuracy of your
results. These, then, would be prime candidates for AltiVec
optimization.
3. Check to see if Apple's AltiVec-optimized libraries can handle any
of the functionality you are looking for. Otherwise, you may need to
write some assembly-like code using the Altivec C extensions. In most
cases, however, a careful analysis of your program will reveal that
only a few places require such tweaking, so the amount of code you
would need to change is probably small. Automatic vectorization
through compiler options, VAST, etc. may help to a degree, but you can
usually do better by hand (though NOT better than Apple's libraries).
4. Consider parallelizing your code with one pre-emptive thread per
processor. High-end Macs use dual-processors nowadays, so you might as
well put them both to work.
5. Try to determine if your code is CPU or memory-bound. If you see
only a modest improvement with AltiVec and a negligible improvement
with multithreading, it is likely memory-bound. In other words, the
PowerPC chip is stuck doing nothing because it is waiting to fetch
something from RAM. This is a difficult problem to solve. Careful
management of cache memory may help to a degree, but a change to your
algorithm may be the ultimate solution. For example, if you have
pre-calculated some sort of interaction matrix, consider removing the
matrix altogether and calculating just the cells you need on the fly.
I was shocked to discover that a 30-step calculation (including square
roots) can be much faster to perform than a single array lookup (i.e.
if you are missing the cache a lot with a huge data set)!
6. If AltiVec is unsuitable for your algorithm (or even if it isn't), a
G5-class computer is your best bet for numerical work. It has a
dual-FPU with hardware square root instructions and the like, and
memory access is improved through a MUCH faster data bus. Compared to
other chips of its class, it's floating-point performance is noticeably
superior, while its integer performance is comparable.
-Ted
________________________________________________________________
//////////////////
// LAMONTAGNE // GEOPHYSICS LTD
////////////////// GEOPHYSIQUE LTEE
Kingston ON Canada
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.