Re: __builtin_expect in clang?
Re: __builtin_expect in clang?
- Subject: Re: __builtin_expect in clang?
- From: Don Quixote de la Mancha <email@hidden>
- Date: Fri, 11 Nov 2011 17:22:42 -0800
> On Nov 10, 2011, at 5:51 AM, Andreas Grosam wrote:
>
> I can't see any performance differences in a very hot code spot, when using
> it in branches, or omitting it, or even when giving the compiler *false*
> hints.
> So, I'm wondering whether __builtin_expect is implemented in clang?
PowerPC has a branch prediction bit in its machine code instruction,
but ARM and Thumb do not, so I would expect that __builtin_expect
would work really well on PowerPC if you got its setting right, but
really poorly on ARM.
The ARM Cortex A8 tries to make up for that by providing a branch
history buffer that is hidden from the user. To the extent that your
conditional branches are consistent in their choices, the history
buffer will speed up your code, but __builtin_expect would have no
effect on that.
I'm pretty sure i386 doesn't have branch prediction. Later variants
of the 32-bit Instruction Set Architecture have had so many additional
instructions added to them that I have never been able to keep it.
Possibly there is a branch prediction bit in more recent models 32-bit
x86 CPUs.
I don't know about x86_64 at all.
Even if the CPU doesn't support branch prediction in its instruction
set, instruction cache and virtual memory utilization can greatly
improved by moving the machine code of all the infrequenly-taken
branches to the end of the subroutines that contain them.
If the linker can somehow find out about subroutine calls made from
infrequently-used machine code, it can place the implementations of
those subroutines at the end of the executable file or shared library.
That might result in none of your error handlers ever being paged in
from disk to physical memory, as well as better use being made of the
code that is resident.
Some processors support speculative execution: when there is a
conditional branch, the CPU does its best to decide which branch will
be but has not yet been taken. It will then execute that branch's
instructions "ahead of time", but only to the extent that the behavior
of the program would still be correct if the speculative execution
unit made the wrong choice and had to drop all the results on the
floor.
The easiest to implement in hardware is to always execute the code
that immediately follows the conditonal branch instruction. But that
would result in all your error handlers being wastefully executed, and
all the non-erroneous paths not being speculatively executed, so
modern CPUs try to do better than that, based on whatever heuristics
they can cook up from the available machine code, memory, registers,
and, with ARM, previous executions of the conditional branch
instruction.
Getting speculative execution wrong doesn't just waste your code cache
and the virtual memory for your executable code, it also wastes your
data cache and data VM.
I don't have the first clue whether CLang or LLVM implement
__builtin_expect, but if they did, and their implementation was
complete, and your use of __builtin_expect was both widespread
throughout your code and configured correctly, I would expect it to
make a measurable difference even for instruction sets that don't have
branch prediction bits.
--
Don Quixote de la Mancha
Dulcinea Technologies Corporation
Software of Elegance and Beauty
http://www.dulcineatech.com
email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden