I think Jay's comments are very fair, but doesn't tell the whole story.
So here's a slightly different perspective that is not really in
disagreement with Jay, but brings out other points.
I am a scientist/researcher/engineer at a small company. We have one
junior-level software engineer and just hired one senior-level software
engineer. Frankly, it is not enough to go around. We recently
developed tools in Matlab that took days-weeks or more to run.
It was necessary to speed up the cpu-intensive parts of the matlab code,
so naturally I wrote a compiled version in C. I thought about if and
how the code could be vectorized before I began and thought it was worth
an attempt. Without getting into the details of the code, it turns out
it was a little bit tricky to vectorize simply because How I thought I
could vectorize it really didn't work out (I was not the original
algorithm designer so I didn't appreciate some of its subtleties).
Anyway, in the end I had a 2x speed-up using the vectorized code. This
was an important improvement in the overall timeline. With this
speed-up, the code ran faster on our 2 GHz G5 than it does (scalar) on
our 3 GHz Xeon. So without going to 'exotic' hardware, the 2x speed-up
could not be replaced by a faster machine, because there were none.
The upshot is it was worth it to me to vectorize. With the first
section of code, I spent about 2-3 weeks getting it all worked out (this
was my first attempt at using altivec and the time includes 'standard'
debugging as well). I also learned a lot so later things I did were
quicker, and I was better able to judge what was worth doing....
Roger
> -----Original Message-----
> From: hpc-bounces+rkylin=email@hidden
> [mailto:hpc-bounces+rkylin=email@hidden] On
> Behalf Of Jay A. Kreibich
> Sent: Tuesday, September 14, 2004 1:17 PM
> To: Kyros Yakinthos
> Cc: Apple Scitech Mailing List; Discussion list for
> clustering Apple server technologies (previously clusters).
> Subject: Re: altivec/velocity engine examples
>
>
> On Tue, Sep 14, 2004 at 06:40:32PM +0300, Kyros Yakinthos
> scratched on the wall:
>
> > Is it finally worth to program using ALtiVec in a FORTRAN code by
> > calling C subroutines?
>
> As a professional software engineer, my answer would be "no," but
> there is a lot in that answer-- I might not even be answering the
> question you're asking.
>
> If you are asking about using existing libraries or frameworks, such
> as Apple's Accelerate framework (which contains vector optimized
> versions of vDSP, vImage, BLAS, LAPACK, vMathLib, and BigNumb)
> then I would say, "yes!!!" If any of Apple's libraries
> does what you
> need, it is likely worth the trouble to stub the libs out in C to
> FORTRAN and/or link against them. They have both vectorized and
> non-vectorized versions of most of the calls, so they'll run on G3s
> as well as G4s and G5s as required. You really don't need to think
> or care about if the call you are making is vectorized--
> you can just
> be reasonably confident that the library will get whatever you want
> done as fast as it can given the current processor and datatypes--
> including future hardware. No need to re-invent the wheel.
>
> If you are looking at auto-vectorization tools, I would say a much
> less enthusiastic "yes," or even just a "maybe." If the tools aren't
> real expensive, they are worth giving it a shot, but you should
> understand what you have (or don't have) to gain so you can look at
> the cost and the result and see what works for you. For many these
> tools are a gift from the heavens; for others they only offer
> disappointment.
>
>
> On the other hand, if you are asking about hand-coding operations on
> the vector unit (I assume you are), I would seriously question this
> practice. Vector programming is very tricky. It is not a simple
> "array processor"... how you pack your data into vector
> units is very
> critical to performance and you have to understand a lot of the low
> level details to wrap your mind around how the vector unit was
> designed to be used. Even if you're writing in C or
> FORTRAN you need
> to *think* in individual assembly instructions; that also
> means knowing
> your tools and systems will enough to know how and why specific
> program statements are compiled into machine instructions. Doing
> this kind of thing well is extremely difficult, just as hand-coding
> instructions for the G5s dual FPUs would be extremely difficult. I
> would never attempt it without the PowerPC-970 Instruction Reference
> Manual on the desk next to me. If you've never looked at a
> processor
> reference manual, save yourself and don't start. Most CS undergrads
> have never looked at one (although most CE or EE undergrads have!).
>
> For the bench-scientists, researcher, and/or engineer, this kind of
> very low-level mucking about is very very rarely worth the
> effort. I
> assume most of the people on this list are scientists or engineers
> first and computer programmers second. This is a good thing
> (actually, it isn't. I'd rather you guys were computer programmers
> "tenth" or some larger number, but that's a different story). The
> computer is simply a means to an end, not an obsession in itself.
> Spend your time doing good research, not fighting compilers.
>
> Faster code may lead to faster and better research, but consider
> this. The ideal vector code will, at best, give you 2x the
> performance
> over the ideal non-vector code on the G5 (assuming single-precision
> floating point; double-precision can't be vectorized; best-case
> integer performance may be higher). One could also make a strong
> argument that it is easier to write "good" non-vector code
> than it is
> to write "good" vector code, effectively making that 2x
> even smaller.
>
> If all you want is 2x performance, go buy another machine. It is
> likely to be much cheaper than the people-time to make the code
> faster by hand-vectorizing it. Even if that requires rewriting
> sections to allow distributed computing, this is time spent that is
> more worth the effort. At least distributed versions
> typically scale
> past two.
>
> OK, I'll admit that "buy more machines" isn't an option if you
> already have a 1000 node cluster since another 1000 machines will
> pay for a *lot* of programming time (I'll trade you!). On those
> kinds of scales, it is an individual call.
>
>
>
>
> Everyone's situation is different, and there are times when
> cost/performance is outweighed by raw performance. Just understand
> the high costs of this kind of work, and the rather slim
> results even
> if you do a great job. That said, there is no reason not to take
> advantage of it if you can-- the Accelerate libs from Apple
> make that
> easy and can reduce a lot of other programming work. They'd be
> highly desirable even if they weren't vectorized. Throw in the IBM
> compilers, which are fairly inexpensive next to the programming time
> they can save, and you're fairly well off. But tweaking the vector
> pipeline by hand is high wizardry.
>
> -j
>
> --
> Jay A. Kreibich | Comm. Technologies, R&D
> email@hidden | Campus IT & Edu. Svcs.
> <http://www.uiuc.edu/~jak> | University of Illinois
> at U/C _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Hpc mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/hpc/email@hidden
m
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden
This email sent to email@hidden