> Only thing I would say is that surely it's better to do all the work
> on one pass through the array, rather doing multiple passes, each one
> only doing a small amount of work (?)
Yes. A common technique is to split your data up into small segments,
frequently called tiles. Do all the work on segment, then start over
and do all the work to the next segment. Borrowing Richard's example,
lets say the array is much larger than the caches. You would split that
up into a bunch of small chunks, perhaps 16kB in size or smaller, so
that each chunk fits in the L1 cache. Do everything to one chunk before
you go on to the next one.
while there are chunks left
For each element in the chunk do
x[n] = r*x[n] - floor( r*x[n] );
x[n] = LERP( x[n] )
If you don't tile, then once you have finished finding the remainder of
x[n]/(2*pi), you will have flushed the other x[n]'s out of the cache --
they don't all fit in there. So when you go to apply the second line to
all of the data in the very large array, you'll have to load it into
the caches again, which is very expensive. Better to do everything to a
little chunk while it is in the cache than to
load/unload/reload/unload/etc. You can get a big speedup that way.
I'm not a very big fan of lookup tables though. I have found I can
evaluate a 9th order polynomial in hand tuned vector code in less time
than it takes to look up results from a 256 entry float table using
hand tuned scalar code. There are times when lookup tables are the best
way or only way to do things, but surprisingly often they are not the
fastest way. They also are a very good way to obfuscate your code. Very
often, they are completely undocumented how they are derived. It can be
a nightmare to get rid of them.
Ian
_______________________________________________
scitech mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/scitech
Do not post admin requests to the list. They will be ignored.