Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FORTRAN and AltiVec



> Message: 1
> To: email@hidden
> From: Kyros Yakinthos <email@hidden>
> Subject: FORTRAN and AltiVec
> Date: Fri, 9 Apr 2004 10:53:38 +0300
> 
> Hello All,
> 
> I have noticed a strange result.
> I have a fortran code with the following command line
> 
>     res= a*b-c*d
[snip]
> 
> When I compile using xlf 8.1 (evaluation) with the -g option, AltiVec
> is faster about 50%
> When I compile using xlf 8.1 with the -O5 option for full optimization
> AltiVec is slower about 40%
> 
> Is xlf doing such a good optimization or is something wrong with the C
> subroutine for
> AltiVec instructions? Further, is this related to something about
> alignment or... what else?


After communicating with Kyros, I took a look at this and found something
interesting -- at higher levels of optimization, XLF can do vector-adds
(really scalar array adds in a loop) nearly as fast or faster than passing
the data to AltiVec for a true vector computation.  This is on my old 400MHz
PowerBook G4 by the way, so I would expect a G5 to be spectacular.

xlf90 add.f -o addX -O0 vecadd.o -qextname=dtime
   n      scalar-time      vector-time       AltiVec speedup
 4   2.970000029   0.5500000119   5.400000095
 16   2.680000067   0.3400000036   7.882352829
 64   2.640000105   0.2800000012   9.428571701
 256   2.629999876   0.2099999934   12.52380943
 1024   2.690000057   0.2500000000   10.76000023
 4096   2.750000000   0.4600000083   5.978260994
 16384   2.740000010   0.5099999905   5.372549057
 65536   3.279999971   1.409999967   2.326241255
 262144   3.359999895   1.289999962   2.604651213

xlf90 add.f -o addX -O3 vecadd.o -qextname=dtime
   n      scalar-time      vector-time       AltiVec speedup
 4   0.4199999869   0.4499999881   0.9333333373
 16   0.3000000119   0.3199999928   0.9375000596
 64   0.2700000107   0.2399999946   1.125000119
 256   0.2500000000   0.2500000000   1.000000000
 1024   0.2399999946   0.2500000000   0.9599999785
 4096   0.4000000060   0.4799999893   0.8333333731
 16384   0.4399999976   0.4899999797   0.8979592323
 65536   0.8199999928   1.419999957   0.5774648190
 262144   1.179999948   1.309999943   0.9007633328

As you can see, the vector times are about the same from case to case, it's
the scalar time that really comes down with -O3.

I also compiled with Absoft for comparison:

f90 -s add.f -o addA vecadd.o -lU77
   n      scalar-time      vector-time       AltiVec speedup
   4    3.80000    0.710000    5.35211
   16    3.18000    0.370000    8.59459
   64    3.13000    0.270000    11.5926
   256    3.19000    0.230000    13.8696
   1024    3.25000    0.230000    14.1304
   4096    3.46000    0.480000    7.20833
   16384    3.46000    0.460000    7.52174
   65536    3.79000    0.810000    4.67901
   262144    4.23000    1.21000    3.49587

f90 -s add.f -o addA vecadd.o -lU77 -O3
   n      scalar-time      vector-time       AltiVec speedup
   4    1.66000    0.480000    3.45833
   16    1.55000    0.300000    5.16667
   64    1.44000    0.280000    5.14286
   256    1.49000    0.250000    5.96000
   1024    1.47000    0.230000    6.39130
   4096    1.67000    0.490000    3.40816
   16384    1.69000    0.500000    3.38000
   65536    2.15000    0.840000    2.55952
   262144    2.45000    1.25000    1.96000

Here, the vector times are pretty consistent case to case (and roughly
consistent with the XLF compilation) but you can see that the scalar times
don't improve quite so much with optimization.  So, AltiVec looks pretty
good here.

So, for anybody using XLF and considering AltiVec, be sure to run some tests
before doing your coding/porting!  At least for vector-add, XLF is doing
quite well.  If you're using Absoft, you'll still see a nice gain from
implementing AltiVec for vector-add.

Kyros noted that AltiVec/vecLib still wins for exp(x) compared to XLF, so I
suspect general computations may still benefit from AltiVec overall.  It
would be interesting to test sqrt(x) on a G5, as it can do this in hardware.

On a final note, I was wondering if XLF perhaps used AltiVec for vector-add,
so I repeated the scalar benchmark with double precision data.  It still
looked good.  For most vector lengths, XLF could do a double-precision
vector-add faster than using AltiVec on single precision data!

Craig

-- 
Dr. Craig A. Hunter
NASA Langley Research Center
AAAC / Configuration Aerodynamics Branch
(757) 864-3020
email@hidden  (NEW!!)
_______________________________________________
scitech mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/scitech
Do not post admin requests to the list. They will be ignored.




Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.