Re: __builtin_expect in clang?
Re: __builtin_expect in clang?
- Subject: Re: __builtin_expect in clang?
- From: Wim Lewis <email@hidden>
- Date: Fri, 11 Nov 2011 19:49:58 -0800
On Nov 11, 2011, at 5:22 PM, Don Quixote de la Mancha wrote:
>> On Nov 10, 2011, at 5:51 AM, Andreas Grosam wrote:
>> So, I'm wondering whether __builtin_expect is implemented in clang?
I think it isn't. I ran clang (and llvm-gcc) with -O3 -S and compared the resulting assembly for opposite values of expect, and I couldn't come up with input for which there were any differences. gcc-4.2.1 makes some fairly obvious changes based on that hint, though (moving unlikely code out of an inner loop, for example).
> Possibly there is a branch prediction bit in more recent models 32-bit x86 CPUs.
The more recent Intel architectures have branch history & prediction logic, but I don't think they have a hint bit. Many/most PowerPC implementations have branch history buffers as well, with the 'hint' bit in the instruction being used for branches the CPU doesn't have history for. Modern Intel/AMD CPUs have some really complicated branch-prediction schemes. For more information than you probably want, I recommend this:
http://www.agner.org/optimize/microarchitecture.pdf
High-performance ARM implementations also have dynamic branch predictors, though I'm not very familiar with them:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/Cjhchihb.html
FWIW, the "Shark" tool is really great for this kind of low-level performance measurement.
> Even if the CPU doesn't support branch prediction in its instruction
> set, instruction cache and virtual memory utilization can greatly
> improved by moving the machine code of all the infrequenly-taken
> branches to the end of the subroutines that contain them.
Or, for that matter, to entirely separate pages--- gcc has the ability to partition the basic blocks of a function into 'hot' and 'cold' sections so the linker can put them far apart in memory. (I don't know if Apple's gcc can do that, though.)
In general, I think, using __builtin_expect() is going to be less effective than using profile feedback directed optimization, which gives the compiler much more information about branch patterns. (And that's usually less effective than taking a step back and addressing your program's overall design :) ) For a few special cases like error-checking macros it's probably worthwhile though.
> The easiest to implement in hardware is to always execute the code
> that immediately follows the conditonal branch instruction.
Arranging for (forward) branches not to be taken also improves cache locality.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden