> I got about 5-9X speedup depending on how you count
> your beans, all from replacing 10-12 lines of FORTRAN code
> In my particular case I got a huge payoff because the
> vectorization implemented 4-way parallelism in the kernal of a nested loop,
> and the speedup really compounded. I would heartily recommend anyone in the
> same situation take a serious look at AltiVec.
This brings up a few points that perhaps I should have made more
clear in my first "rant." I realize I came across a bit more
strongly then I really meant.
First off, if you have a simple situation, it can't hurt to try. If
all you do is spend an afternoon trying to get it to work, even a small
pay-off will justify the time. My main point was that people often
seem to have the strange obsession about performance and let it blind
them to the simple economics of the situation (open-source programmers
are some of the worst at this). They'll happily let themselves get
dragged into an optimization fight for weeks that, in the end, leads
to some fairly minor improvements. They'll come back after a month
and say, "Look, it runs 10% faster!" and I'll say, "That's great.
Moore's Law did better in the same amount of time."
These kinds of battles are just not worth the time and stress unless
your running on a multi-million dollar machine or have some other
unusual situation that justifies the time and cost.
Giving it a basic try on a simple program (or simple loop) will,
however, occasionally produce results well worth the time and effort.
Where you get into trouble is when you have to redesign a large
existing FORTRAN program that is already stable and known to work
correctly; it often just isn't worth the effort. The time it takes
to rerun the test (for those rare programs that *have* tests) to
verify correctness can be significant, depending on the complexity of
the application. You just have to be smart about the true cost of
getting the modifications to work and the returns those modification
will give you.
Another thing that might not have been clear is that the advantages
of the AltiVec when dealing with 32-bit floating point numbers on the
G5 are not awesome -BUT- there is a much bigger difference on the G4.
If you need your code to run on G4 systems (e.g. Powerbooks), it might
be worth a little extra time to look at the vector stuff. There are
much bigger improvements available on that platform. The same is
true with integer stuff on the G4 and the G5. If you're doing DSP
like algorithms on 8 or 16 bit int data, the vector stuff can provide
*huge* improvements.
Nearly all of the work I've done in labs (mostly low-energy physics)
was strictly double-precision floating point. It is sometimes hard
to remember that people still do interesting stuff with integers. 8-)
Also, as someone that usually works on large software systems where
long-term maintenance and support are key concerns, I'm usually a bit
pessimistic about departing from the norm. That's why I like the
Accelerate libs-- they give me all the performance a specific
platform can give without any of the work or trouble. I don't need
to special case G3s or anything like that. I also like having the
higher level APIs available that means less work for me. I'm also
often working with improving large systems that don't have easy or
obvious bottlenecks. Applications often require reworking whole
computational sections, not just a loop here or a step there. These
are not the kind of things you want to dive into lightly and start to
tear apart. Small tight loops or single steps are prime candidates
for this kind of key-hole optimization. Unfortunately (for me,
anyways), by the time I seem most code these kinds of easy steps
have already been done.
I've also worked in enough research institutions to know that isn't
how all the world works. If you do want to take a crack at the
vector stuff, the best advice I can offer is to read everything on
Apple's site about the AltiVec unit. As others have said, memory
alignment is key to getting any kind of performance out of the
system. Also, before you start anything RUN SHARK!!! I can't stress
this enough. No matter how sure you are of the bottleneck, profile
the code so that you know EXACTLY where the problem is. Don't waste
your time fixing what isn't broken. Shark is an **amazing** tool,
and it comes free with the developer tools. Take advantage of it.
Finally, I want to apologize to anyone that got the impression that I
was putting down their 'l33t h4ck3r ski11z. While it may be safe to
assume that most of you get the most enjoyment out of your primary
profession (and computer programming isn't it), I never meant to imply
that none of you can develop software. The vector stuff *is* a bit
tricky, and you need to keep all your Is dotted and Ts crossed, but
someone that has a good grasp of how computers work beyond FORTRAN or
C itself *can* do this. It may be wizardry, but it isn't impossible.
That said, there are a lot of people in academia (well, and industry,
for that matter) that have no concept how much skilled development
time (by anyone!) costs. No, price/performance isn't everything, but it
is a starting point. If you want to cross that line and go for a
finely-tuned, well-oiled program that just screams, go for it! Just
be aware of the decisions your making and why you're making them.
Then it just becomes an engineering compromise.
Best of luck to everyone,
-j
--
Jay A. Kreibich | Comm. Technologies, R&D
email@hidden | Campus IT & Edu. Svcs.
<http://www.uiuc.edu/~jak> | University of Illinois at U/C
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden
This email sent to email@hidden