Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: altivec/velocity engine examples



I think Jay's comments are very fair, but doesn't tell the whole story.
So here's a slightly different perspective that is not really in
disagreement with Jay, but brings out other points.

I am a scientist/researcher/engineer at a small company.  We have one
junior-level software engineer and just hired one senior-level software
engineer.  Frankly, it is not enough to go around.  We recently
developed tools in Matlab that took days-weeks or more to run.

It was necessary to speed up the cpu-intensive parts of the matlab code,
so naturally I wrote a compiled version in C.  I thought about if and
how the code could be vectorized before I began and thought it was worth
an attempt.  Without getting into the details of the code, it turns out
it was a little bit tricky to vectorize simply because How I thought I
could vectorize it really didn't work out (I was not the original
algorithm designer so I didn't appreciate some of its subtleties).

Anyway, in the end I had a 2x speed-up using the vectorized code.  This
was an important improvement in the overall timeline.  With this
speed-up, the code ran faster on our 2 GHz G5 than it does (scalar) on
our 3 GHz Xeon.  So without going to 'exotic' hardware, the 2x speed-up
could not be replaced by a faster machine, because there were none.

The upshot is it was worth it to me to vectorize.  With the first
section of code, I spent about 2-3 weeks getting it all worked out (this
was my first attempt at using altivec and the time includes 'standard'
debugging as well).  I also learned a lot so later things I did were
quicker, and I was better able to judge what was worth doing....

Roger




> -----Original Message-----
> From: hpc-bounces+rkylin=email@hidden 
> [mailto:hpc-bounces+rkylin=email@hidden] On 
> Behalf Of Jay A. Kreibich
> Sent: Tuesday, September 14, 2004 1:17 PM
> To: Kyros Yakinthos
> Cc: Apple Scitech Mailing List; Discussion list for 
> clustering Apple server technologies (previously clusters).
> Subject: Re: altivec/velocity engine examples
> 
> 
> On Tue, Sep 14, 2004 at 06:40:32PM +0300, Kyros Yakinthos 
> scratched on the wall:
> 
> > Is it finally worth to program using ALtiVec in a FORTRAN code by
> > calling C subroutines?
> 
>   As a professional software engineer, my answer would be "no," but
>   there is a lot in that answer-- I might not even be answering the
>   question you're asking.
> 
>   If you are asking about using existing libraries or frameworks, such
>   as Apple's Accelerate framework (which contains vector optimized
>   versions of vDSP, vImage, BLAS, LAPACK, vMathLib, and BigNumb)
>   then I would say, "yes!!!"  If any of Apple's libraries 
> does what you
>   need, it is likely worth the trouble to stub the libs out in C to
>   FORTRAN and/or link against them.  They have both vectorized and
>   non-vectorized versions of most of the calls, so they'll run on G3s
>   as well as G4s and G5s as required.  You really don't need to think
>   or care about if the call you are making is vectorized-- 
> you can just
>   be reasonably confident that the library will get whatever you want
>   done as fast as it can given the current processor and datatypes--
>   including future hardware.  No need to re-invent the wheel.
> 
>   If you are looking at auto-vectorization tools, I would say a much
>   less enthusiastic "yes," or even just a "maybe." If the tools aren't
>   real expensive, they are worth giving it a shot, but you should
>   understand what you have (or don't have) to gain so you can look at
>   the cost and the result and see what works for you.  For many these
>   tools are a gift from the heavens; for others they only offer
>   disappointment.
> 
> 
>   On the other hand, if you are asking about hand-coding operations on
>   the vector unit (I assume you are), I would seriously question this
>   practice.  Vector programming is very tricky.  It is not a simple
>   "array processor"... how you pack your data into vector 
> units is very
>   critical to performance and you have to understand a lot of the low
>   level details to wrap your mind around how the vector unit was
>   designed to be used.  Even if you're writing in C or 
> FORTRAN you need
>   to *think* in individual assembly instructions; that also 
> means knowing
>   your tools and systems will enough to know how and why specific
>   program statements are compiled into machine instructions.  Doing
>   this kind of thing well is extremely difficult, just as hand-coding
>   instructions for the G5s dual FPUs would be extremely difficult.  I
>   would never attempt it without the PowerPC-970 Instruction Reference
>   Manual on the desk next to me.  If you've never looked at a 
> processor
>   reference manual, save yourself and don't start.  Most CS undergrads
>   have never looked at one (although most CE or EE undergrads have!).
> 
>   For the bench-scientists, researcher, and/or engineer, this kind of
>   very low-level mucking about is very very rarely worth the 
> effort.  I
>   assume most of the people on this list are scientists or engineers
>   first and computer programmers second.  This is a good thing
>   (actually, it isn't.  I'd rather you guys were computer programmers
>   "tenth" or some larger number, but that's a different story).  The
>   computer is simply a means to an end, not an obsession in itself.
>   Spend your time doing good research, not fighting compilers.
> 
>   Faster code may lead to faster and better research, but consider
>   this.  The ideal vector code will, at best, give you 2x the 
> performance
>   over the ideal non-vector code on the G5 (assuming single-precision
>   floating point; double-precision can't be vectorized; best-case
>   integer performance may be higher).  One could also make a strong
>   argument that it is easier to write "good" non-vector code 
> than it is
>   to write "good" vector code, effectively making that 2x 
> even smaller.
> 
>   If all you want is 2x performance, go buy another machine.  It is
>   likely to be much cheaper than the people-time to make the code
>   faster by hand-vectorizing it.  Even if that requires rewriting
>   sections to allow distributed computing, this is time spent that is
>   more worth the effort.  At least distributed versions 
> typically scale
>   past two.
> 
>   OK, I'll admit that "buy more machines" isn't an option if you
>   already have a 1000 node cluster since another 1000 machines will
>   pay for a *lot* of programming time (I'll trade you!).  On those
>   kinds of scales, it is an individual call.
> 
> 
> 
> 
>   Everyone's situation is different, and there are times when
>   cost/performance is outweighed by raw performance.  Just understand
>   the high costs of this kind of work, and the rather slim 
> results even
>   if you do a great job.  That said, there is no reason not to take
>   advantage of it if you can-- the Accelerate libs from Apple 
> make that
>   easy and can reduce a lot of other programming work.  They'd be
>   highly desirable even if they weren't vectorized.  Throw in the IBM
>   compilers, which are fairly inexpensive next to the programming time
>   they can save, and you're fairly well off.  But tweaking the vector
>   pipeline by hand is high wizardry.
> 
>    -j
> 
> -- 
>                      Jay A. Kreibich | Comm. Technologies, R&D
>                         email@hidden | Campus IT & Edu. Svcs.
>           <http://www.uiuc.edu/~jak> | University of Illinois 
> at U/C  _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Hpc mailing list      (email@hidden)
> Help/Unsubscribe/Update your Subscription: 
> http://lists.apple.com/mailman/options/hpc/email@hidden
m

This email sent to email@hidden

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden

This email sent to email@hidden

References: 
 >Re: altivec/velocity engine examples (From: "Jay A. Kreibich" <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.