On the broader front, many of the discussions I see on this list (and the Cocoa list) about various coding tricks to achieve performance are simply wasted effort. Virtually all good compilers perform optimizations far beyond your imagination, and many of these “tricks” are already handled or even worse they so badly confuse the optimizer as to make performance decline.
Interesting, but how do you ratify that with dedicated hand coded SSE functions?
I now write three versions of each maths routine I code.
1. asm. 2. C++ 3. Intrinsics.
Although I am still finding my way around intrinsics there are definitely times when they simply do not cover some of the short cuts you can find by hand coding for specific cases in SSE. I am specifically referring to ways of using and re-using register contents from various positions across a 128bit register, and spotting patterns in complex equations where non-intuitive instruction / data ordering cut real corners in the logic.
Admittedly some of these 'tricks' are melon twisters.
However, generally when I benchmark the three it's speed is in the order asm, then intrinsics/C, then C.
Often the asm is an order of magnitude faster when tested inline.
Yes, there are pitfalls, and you have to manage where your data is yourself, rather than letting the compiler do it.
But if I cannot get the same speed letting the compiler worry about it why would I opt for that?
I totally agree that good code architecture is key, but isolated, special purpose in-lined functions are simply faster when coded in asm.
With the caveat that you must know what you are doing... But dismissing it out of hand and discouraging people from that pursuit is counter productive I think. Compilers are wonderful wonderful things, and thanks for your efforts on them for sure. But they are not the holy grail. Sorry.
Kind regards, Stephen. |