Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Strange perfromance results on G4 and G5 FFT



At 1:38 PM -0700 10/13/04, Bevis, Chris wrote:
I'm afraid my knowledge of ppc optimization is primitive compared to
yours.


What is dbcz and how would I change the distance and cacheline size for
read aheads ?

dcbz is the cacheline zero instruction. If you don't know what it is, you probably aren't using it.

Because the cacheline size is 128 bytes on a G5 (up from 32 bytes on all previous desktop PPC chips), you need to change the way you unroll loops and handle read aheads/cache hints.
Because the CPU is getting faster relative to DRAM, you also need to change the distance that you read ahead/hint.


If you don't understand that, then you need to learn more about memory architecture and caches -- it's important for anything that uses more than a few kbytes of memory.

Chris





-----Original Message-----
From:
perfoptimization-dev-bounces+chris.bevis=email@hidden
[mailto:perfoptimization-dev-bounces+chris.bevis=email@hidden
ple.com] On Behalf Of Chris Cox
Sent: Wednesday, October 13, 2004 1:29 PM
To: Performance optimization list
Subject: Re: Strange perfromance results on G4 and G5 FFT


You did look for any cache hints that might have been put in for the G4, right?

dcbz = evil on the G5

And you need to change the distance and cacheline size for read
aheads (dcbt) on the G5.

Chris


At 12:23 PM -0700 10/13/04, Bevis, Chris wrote:
I am benchmarking accelerated FFT's ( 512x512 real, using fft2d_zrip
in vDSP ) and am getting some wierd results :

1Ghz single G4 ( codegen and instruction scheduling set for G4 )
13msec
1Ghz single G4 ( codegen and instruction scheduling set for G5 )
12msec
2Ghz dual   G5 ( codegen and instruction scheduling set for G5
)          17msec

When I launch two instances simultaneously on the dual G5, THey both
finish in 24msec.

Does anyone have any idea why a 2x faster dual G5 performs slower
than a G4 for a single instance and about the same for two instances
?  This one has me puzzled.

  _______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list
(email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/ccox%40adob
e.com

This email sent to email@hidden

_______________________________________________ Do not post admin requests to the list. They will be ignored. PerfOptimization-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/perfoptimization-dev/chris.bevis% 40kla-tencor.com

This email sent to email@hidden

_______________________________________________ Do not post admin requests to the list. They will be ignored. PerfOptimization-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >RE: Strange perfromance results on G4 and G5 FFT (From: "Bevis, Chris" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.