• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: suboptimal code-gen of decrement in GCC 4.2.1
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: suboptimal code-gen of decrement in GCC 4.2.1


  • Subject: Re: suboptimal code-gen of decrement in GCC 4.2.1
  • From: Alastair Houghton <email@hidden>
  • Date: Thu, 15 Oct 2009 10:11:01 +0100

On 14 Oct 2009, at 22:16, Jens Alfke wrote:

The Intel Architecture Optimization Manual actually suggests that INC and DEC should be replaced with ADD or SUB instructions for this very reason.
My guess (and it is just a guess) is that GCC is generating the TEST instruction to work around this problem in a different way by resetting the flags register before testing it.

If I force the compiler to use a SUB instruction instead, by changing "--mRefCount" to "m_refCount -= 9", it still generates similar code, including the unnecessary CMP.

How odd.

Is it possible to use inline assembly to force the optimal instructions to be generated?

I'm not sure you'd want to do that in practice, because IIRC there's no way to tell the compiler that you mean for it to use the flags register, so you'd have to write the body of the method in assembly language. And then when you move to 64-bit, you'd have to rewrite it. And again if the compiler's C++ name mangling were to change... You could mitigate the latter problem a little by writing a C wrapper function e.g.


extern "C" void deleteRefCounted(RefCounted *pobj) {
  delete pobj;
}

It might also be tricky to convince the compiler to use the same register to point at the reference count *and* at your object (which is what it's doing in the code *it* generated), though if you know the layout is like that you can obviously just ask for a pointer to one of the two and use it for both.

You'd end up with something like (written in Mail.app, with not too much thought and exactly zero testing...)

  void deref() {
#if defined(__x86_64__)
  __asm__ volatile (
  "   subl $1,(%rdi)\n"
  "   jne  0f\n"
  "   call _deleteRefCounted\n"
  "0: \n"
  : : "D" (this) : "memory");
#else if defined(__i386__)
  __asm__ volatile (
  "   subl  $1,(%0)\n"
  "   jne   0f\n"
  "   pushl %0\n"
  "   call _deleteRefCounted\n"
  "   popl  %0\n"
  "0: \n"
  : : "r" (this) : "memory");
#else
  if (--m_refCount == 0) delete this;
#endif
  }

I'm not sure what the code would look like after that though, and the push and pop in the x86 variant is a bit of a nuisance too.

Of course, you could write the entire method in assembly language, but then you're dependent on name mangling again.

Kind regards,

Alastair.

--
http://alastairs-place.net



_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >suboptimal code-gen of decrement in GCC 4.2.1 (From: Jens Alfke <email@hidden>)
 >Re: suboptimal code-gen of decrement in GCC 4.2.1 (From: Alastair Houghton <email@hidden>)
 >Re: suboptimal code-gen of decrement in GCC 4.2.1 (From: Jens Alfke <email@hidden>)

  • Prev by Date: Re: suboptimal code-gen of decrement in GCC 4.2.1
  • Next by Date: Re: More Stripping Questions
  • Previous by thread: Re: suboptimal code-gen of decrement in GCC 4.2.1
  • Next by thread: Re: suboptimal code-gen of decrement in GCC 4.2.1
  • Index(es):
    • Date
    • Thread