> Message: 6
> Date: Tue, 14 Sep 2004 10:23:50 -0500
> From: "Sean C. Garrick" <email@hidden>
> Subject: Re: altivec/velocity engine examples
> To: Todd Dailey <email@hidden>
> Cc: Apple Scitech Mailing List <email@hidden>
> Message-ID: <email@hidden>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
>
> Todd:
>
> I've looked at that many times and its not really helped me.
> Any real-world examples? Any before/after code snippets?
>
> Thanks!
> Sean
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 14 Sep 2004 18:40:32 +0300
> From: Kyros Yakinthos <email@hidden>
> Subject: Re: altivec/velocity engine examples
> To: email@hidden
> Cc: Apple Scitech Mailing List <email@hidden>, "Discussion
> list for clustering Apple server technologies \(previously
> clusters\)." <email@hidden>
> Message-ID: <email@hidden>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
>
> I would like to add one more question to this one posted by Sean:
>
> Is there anyone that has found nice speedups when he used AltiVec in a
> FORTRAN code
> especially if he was using IBM's xlf?
>
> Is it finally worth to program using ALtiVec in a FORTRAN code by
> calling C subroutines?
>
> Kyros
>
>
So I don't bore the old timers with another repeat war story, anyone
interested can search the archives for "Jet3D" (I'd search but the new list
serv requires a password and I can't remember mine at the moment). I am
fairly certain I have talked about my previous experience with FORTRAN and
AltiVec in Jet3D, which is a typical CFD postprocessor code. I probably
even included examples. I got about 5-9X speedup depending on how you count
your beans, all from replacing 10-12 lines of FORTRAN code (in my innermost
loop) with a few calls to C subroutines containing AltiVec instructions and
vecLib calls. There are probably less than 40 lines of C code total. At
the time I didn't know C and was new to vector programming, so if I could
figure it out, anybody can. It is not too hard to roll your own vector code
in C, but there are a few important rules to follow, most especially keeping
data aligned properly (F77 malloc and F90 allocate will take care of this
for you). In my particular case I got a huge payoff because the
vectorization implemented 4-way parallelism in the kernal of a nested loop,
and the speedup really compounded. I would heartily recommend anyone in the
same situation take a serious look at AltiVec.
Craig
--
Dr. Craig Hunter
NASA Langley Research Center
AAAC/Configuration Aerodynamics Branch
email@hidden (new!!)
(757) 864-3020
(Dual G4 - OS X)
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden
This email sent to email@hidden