Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: L2 Cache Miss (was: Floating Point comparison G5)



In profiling with Shark using Ian's suggestion for L2 cache misses. I got the follow:

44% was on one function from my program
40% was from the mach kernel library (pmap_zero_page)

  The assembly for my code from this profile where it hotspots is:

0xe9f0 cmpwi cr5,r5,-3 1:1
0xe9f4 cmpwi cr2,r5,-1 1:1
0xe9f8 ori r28,r9,0x0000 1:1
0xe9fc ori r27,r8,0x0000 1:1
0xea00 ori r26,r7,0x0000 1:1
0xea04 mtctr r30 *2:2
0xea08 add r30,r25,r17 1:1
0xea0c add r25,r25,r13 1:1
0xea10 lfsux f4,r10,r2 4:1 ! Stall=1, Loop start[20], Unroll, AltiVec
0xea14 cmpwi r5,0 1:1
0xea18 addi r13,r6,1 1:1
52.9% 52.3% 0xea1c fsub f4,f31,f4 5:1 Stall=4
0.0% 0.0% 0xea20 fmadd f4,f4,f4,f5 5:1
0xea24 beq $+900 <cards13and14_ + 9480>1:1
0xea28 cmpw cr3,r13,r31 1:1
0xea2c beq cr3,$+724 <cards13and14_ + 9312>1:1
0xea30 cmpw r14,r31 1:1
0xea34 beq $+716 <cards13and14_ + 9312>1:1
0xea38 beq cr1,$+660 <cards13and14_ + 9260>1:1
0xea3c cmpw r24,r31 1:1
0xea40 bne $+604 <cards13and14_ + 9212>1:1
0xea44 lfs f5,0(r29) 4:1 Stall=3
0xea48 fsubs f5,f2,f5 5:1 Stall=4
0xea4c fmadds f7,f5,f5,f7 5:1
0xea50 b $+588 <cards13and14_ + 9212> 1:1


I'm not certain what exactly this means (I'm still looking through the references that Ian provided earlier). In this profiling mode do I want to correlate these percentage of this function with the amount of time this function takes up in a Variable Time Profile? Or by virtue of it taking up 44% in the profile window does it automatically become a candidate for prefetching?

  Any additional guidance would be greatly appreciated.

  Dave				

On Jan 9, 2005, at 6:58 PM, Ian Ollmann wrote:



Some short descriptions on the Opteron core, incl FPU:

http://www4.tomshardware.com/cpu/20030422/opteron-04.html
http://www.top500.org/ORSC/2002/opteron.html

The top 500 site has a nice comparison of a wide variety of processors.

http://www.top500.org/ORSC/2002/processors.html#ccNUMA

The G5 (aka PowerPC 970) is quite similar to a single core Power4 with reduced cache size and vmx added on. According to the site, IBM recommends a single core version of Power4 for HPC to avoid resource contention in the caches and front side bus.

Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/ email@hidden


This email sent to email@hidden


David W. Gohara, Ph.D.
Harvard Medical School
http://www.scianafilms.com
617-432-1216 (p)
617-432-4360 (f)

_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Floating Point comparison G5 vs. Opteron (64-bit) question (From: David Gohara <email@hidden>)
 >Re: Floating Point comparison G5 vs. Opteron (64-bit) question (From: Marco Scheurer <email@hidden>)
 >Re: Floating Point comparison G5 vs. Opteron (64-bit) question (From: Ian Ollmann <email@hidden>)
 >Re: Floating Point comparison G5 vs. Opteron (64-bit) question (From: Ian Ollmann <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.