AltiVec in Jet3D, which is a typical CFD postprocessor code.
I probably even included examples.
I might have read your posts regarding AltiVec in fortran-dev list.
I got about 5-9X speedup depending on how you count your beans,
all from replacing 10-12 lines of FORTRAN code (in my innermost
loop) with a few calls to C subroutines containing AltiVec
instructions and vecLib calls. There are probably less than 40
lines of C code total.
About 10% CPU times were used by that part in the scalar computation,
estimated from Amdahl's law.
At the time I didn't know C and was new to vector programming,
so if I could figure it out, anybody can. It is not too hard to
roll your own vector code in C, but there are a few important rules
to follow, most especially keeping data aligned properly (F77 malloc
and F90 allocate will take care of this for you).
It is not too hard as far as the critical loop is small. It turns to be
quite hard if the loop is long or a lot of loops need to be vectorized.
One needs the autovectorizer in the latter case, and often so called
real world programs are latter ones.
In my particular case I got a huge payoff because the
vectorization implemented 4-way parallelism in the kernal of a
nested loop, and the speedup really compounded. I would heartily
recommend anyone in the same situation take a serious look at AltiVec.
It would be worthful if the AltiVec brings 4 times speed up of the
code, might be for 2 times speed up and might not be for 10% ups.
That just depends on the code itself.
Professor Hideaki Tanabe, Dr. Eng.
Technical Education Department
Gunma University
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden