I've done a little shark profiling on my application. The main core of
it is using LAPACK in the accelerate framework to solve a large set of
simultaneous equations with sgells_. There appears to be one function
(libBLAS.dylib ATL_srot_xp0yp0aXbX) which uses 43% of my run time. I
took a quick look at it and it seems to be a loop which calculates
a*b-c*d and a*d-c*b where a and c are constants over the loop, and
appears very stall heavy on my machine (a G4+, looks mainly like data
dependancies). Switch Shark to a G5 and it appears to get worse (bigger
stalls, which probably negate the two fp-ops per dispatch group bonus).
Is there anything I can do about this? (probably not because it's in a
framework)
It looks like it could be unrolled reasonably easily. Could I write a
new version and then override the dynamic linker to point at my new
version?
Is there any way of knowing if my code (actually, the LAPACK code) is
calling this with large loop counts or just very often? (to know if
unrolling would be worthwhile)
Thanks
Paul
mr. r3,r3 1:1 ! Inline
beqlr 1:1
mtctr r3 *2:2
0.0% slwi r7,r7,2 1:1
slwi r5,r5,2 1:1
2.3% lfs f13,0(r4) 4:1 ! Stall=2, Unaligned
loop start
22.5% lfs f0,0(r6) 4:1 Stall=2
6.0% fmuls f12,f2,f13 5:1 Stall=3
6.3% fmuls f11,f2,f0 5:1 Stall=3
1.6% fmsubs f0,f1,f0,f12 5:1 Stall=3
2.0% fmadds f13,f1,f13,f11 5:1 Stall=2
1.8% stfs f0,0(r6) 3:3
add r6,r6,r7 1:1
0.6% stfs f13,0(r4) 3:3
add r4,r4,r5 1:1
bdnz $-40 1:1 Loop end[1]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1648 bytes
Desc: not available
Url :
http://lists.apple.com/pipermail/perfoptimization-dev/attachments/
20050101/4f650296/attachment.bin
------------------------------
_______________________________________________
PerfOptimization-dev mailing list
email@hidden
http://lists.apple.com/mailman/listinfo/perfoptimization-dev
End of PerfOptimization-dev Digest, Vol 5, Issue 1
**************************************************