On Sat, 1 Jan 2005 21:18:40 +0000, Paul Sargent <email@hidden> wrote:
> I've done a little shark profiling on my application. The main core of
> it is using LAPACK in the accelerate framework to solve a large set of
> simultaneous equations with sgells_. There appears to be one function
> (libBLAS.dylib ATL_srot_xp0yp0aXbX) which uses 43% of my run time. I
> took a quick look at it and it seems to be a loop which calculates
> a*b-c*d and a*d-c*b where a and c are constants over the loop, and
> appears very stall heavy on my machine (a G4+, looks mainly like data
> dependancies). Switch Shark to a G5 and it appears to get worse (bigger
> stalls, which probably negate the two fp-ops per dispatch group bonus).
>
> Is there anything I can do about this? (probably not because it's in a
> framework)
> It looks like it could be unrolled reasonably easily. Could I write a
> new version and then override the dynamic linker to point at my new
> version?
> Is there any way of knowing if my code (actually, the LAPACK code) is
> calling this with large loop counts or just very often? (to know if
> unrolling would be worthwhile)
My suggestion would be to file a bug. The Velocity Engine Team is very
actively pursuing optimization on these frameworks and would be interested
in knowing if there's anything that they may have overlooked.
--
Enjoy,
George Warner,
Schizophrenic Optimization Scientist
Apple Developer Technical Support (DTS)
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden