Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Floating Point comparison G5 vs. Opteron (64-bit) question



Hi All,

I'm hoping someone can help shed some light on an issue that's cropped up during some benchmarking. I'm not looking for any comments along the lines of 'that's why Opteron' are better' etc. I'm just trying to find a solution to a problem that's come up.

First, the application is pure FORTRAN 77. It has been compiled with either XLF on a dual G5 1.8 GHz with 2 GB RAM under Panther or with the 64-bit Portland compilers on a dual Opteron 240 (1.43 GHz), 8GB RAM with 64-bit Linux (mandrake, I believe). During no time in the execution of the program does paging occur (as observed from top at least).

Here is the issue. It appears that scalar portions of the code specifically FFT calculations on the Opteron system are significantly faster than on the G5. When the code enters a region where there is branching, however, the two systems normalize and the G5 slightly outperforms the Opteron.

In looking through the Portland manual, it states that on a 64-bit OS with Opterons the floating point instructions default to SSE/SSE2. I interpret this to mean that it's using the FPU's on the vector units. Presumably this means that there are more registers there for processing more data (correct me if I'm wrong). This is the only thing that I could ascribe the very fast floating point operations to on the Opteron.

Is my interpretation of what is occurring correct, or is the performance difference due to something entirely different? If it is the case, are there compiler directives or procedures that can be used to increase the floating point performance (throughput?) on the G5 via Altivec? That is, without going through and hand vectorizing all of the various routines that are slow.

One other thing that I'll point out that caused me to think it was the SSE/SSE2 usage. If I use the FFT's in vDSP for the portion of the FFT calculation are 2x faster than on the Opteron. If I compile the application in 32-bit mode on the Opteron the G5 FFT ends up being 4-6x times faster.

I'd appreciate any insight that anyone could offer or solutions that might help boost performance on our G5's.

  Thanks in advance,

Dave

David W. Gohara, Ph.D.
Harvard Medical School
http://www.scianafilms.com
617-432-1216 (p)
617-432-4360 (f)

_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.