Wow, did I ever get hammered on that little "optimization". For
those watching at home, the optimization Chris is talking about is
probably this one:
tmp = alpha * red;
remainder256 = tmp & 0xFF; // get alpha * red (mod 256)
dividend = tmp >> 8; // compute alpha * red / 256
remainder255 = dividend + remainder256; // compute alpha * red (mod 255)
dividend += ((remainder255 >= (255 + 128)) & 1) +
((remainder255 >= 128) & 1); // add 0, 1, 2 depending on how large
the remainder (mod 255) is
This is provably correct and even rounds correctly. And yes, it
would be much, much easier in AltiVec.
No, there are faster ways of doing it, without branches.
Er, that code doesn't have any branches in it that I can see...
Sorry, I misread part of it (too many things happening at once).
And if you know such optimizations, why not contribute to the thread
and enlighten us by posting them rather than just sounding superior
about your knowledge?
Because being employed by a major software company means that you
can't give anything out without forms signed by vice presidents.
But you can find one version of it in Jim Blinn's writings.