Re: suboptimal code-gen of decrement in GCC 4.2.1
Re: suboptimal code-gen of decrement in GCC 4.2.1
- Subject: Re: suboptimal code-gen of decrement in GCC 4.2.1
- From: Alastair Houghton <email@hidden>
- Date: Thu, 15 Oct 2009 10:11:01 +0100
On 14 Oct 2009, at 22:16, Jens Alfke wrote:
The Intel Architecture Optimization Manual actually suggests that
INC and DEC should be replaced with ADD or SUB instructions for
this very reason.
My guess (and it is just a guess) is that GCC is generating the
TEST instruction to work around this problem in a different way by
resetting the flags register before testing it.
If I force the compiler to use a SUB instruction instead, by
changing "--mRefCount" to "m_refCount -= 9", it still generates
similar code, including the unnecessary CMP.
How odd.
Is it possible to use inline assembly to force the optimal
instructions to be generated?
I'm not sure you'd want to do that in practice, because IIRC there's
no way to tell the compiler that you mean for it to use the flags
register, so you'd have to write the body of the method in assembly
language. And then when you move to 64-bit, you'd have to rewrite
it. And again if the compiler's C++ name mangling were to change...
You could mitigate the latter problem a little by writing a C wrapper
function e.g.
extern "C" void deleteRefCounted(RefCounted *pobj) {
delete pobj;
}
It might also be tricky to convince the compiler to use the same
register to point at the reference count *and* at your object (which
is what it's doing in the code *it* generated), though if you know the
layout is like that you can obviously just ask for a pointer to one of
the two and use it for both.
You'd end up with something like (written in Mail.app, with not too
much thought and exactly zero testing...)
void deref() {
#if defined(__x86_64__)
__asm__ volatile (
" subl $1,(%rdi)\n"
" jne 0f\n"
" call _deleteRefCounted\n"
"0: \n"
: : "D" (this) : "memory");
#else if defined(__i386__)
__asm__ volatile (
" subl $1,(%0)\n"
" jne 0f\n"
" pushl %0\n"
" call _deleteRefCounted\n"
" popl %0\n"
"0: \n"
: : "r" (this) : "memory");
#else
if (--m_refCount == 0) delete this;
#endif
}
I'm not sure what the code would look like after that though, and the
push and pop in the x86 variant is a bit of a nuisance too.
Of course, you could write the entire method in assembly language, but
then you're dependent on name mangling again.
Kind regards,
Alastair.
--
http://alastairs-place.net
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden