To Answer My Own Question (was Re: Odd GCC Behavior)

1 Jul 2008

      site_archiver@lists.apple.com
Delivered-To: darwin-dev@lists.apple.com

ret = gimplify_expr (from_p, pre_p, post_p,
		       rhs_predicate_for (*to_p), fb_rvalue);
  if (ret == GS_ERROR)
    return ret;
-J. Aaron Pendergrass
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list      (Darwin-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl...

This email sent to site_archiver@lists.apple.com
On Jun 30, 2008 at 4:05 PM, Andy Wiese wrote:

Thank you Brian. That's the most useful thing I've learned from this
thread!

I'm not sure I can beat using -Wall, but after the luke warm list
response I figured I'd

just dive into the GCC source and find out for myself exactly what was
going on.
The short answer, this is behavior is specific to Apple's GCC 4.0.1
build 5465 (well > 5370, I don't know

exactly when the change was made because Apple doesn't allow access to
their GCC source control

repository, just the current version).  It is not specific to the PPC
branch as I originally thought (the Intel

compiler I had was just an older build).
And, the cause of it does may actually impact the performance of
completely defined (by the C specification) code.
The long answer follows:

Here's the code responsible for the behavior copied from

gimplify_modify_expr() in gcc/gimplify.c

(see: http://www.opensource.apple.com/darwinsource/10.5/gcc-5465/gcc/gimplify.c
 ):
/* APPLE LOCAL begin 4228828 */

  /* For stores to a pointer, keep computation of the address close
to the

     pointer.  Later optimizations should do this; the 'sink' pass in
4.2

     does it, but that's not in 4.0.  Temporary. */

  if (TREE_CODE (*to_p) != INDIRECT_REF)

    {

      ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue,
fb_lvalue);

      if (ret == GS_ERROR)

	return ret;

    }

/* APPLE LOCAL end 4228828 */
  /* Now see if the above changed *from_p to something we handle
specially.  */

  ret = gimplify_modify_expr_rhs (expr_p, from_p, to_p, pre_p, post_p,

				  want_value);

  if (ret != GS_UNHANDLED)

    return ret;
/* APPLE LOCAL begin 4228828 */

  if (TREE_CODE (*to_p) == INDIRECT_REF)

    {

      ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue,
fb_lvalue);

      if (ret == GS_ERROR)

	return ret;

    }

/* APPLE LOCAL end 4228828 */

gimplify_modify_expr() is the routine responsible for converting the
language dependent

syntax tree for an assignment operation into the language independent
GIMPLE language/tree

(side note: you can view the GIMPLE using the -fdump-tree-gimple
option to GCC, this was a

very helpful ability when trying to understand exactly why this was
happening).
As you can all see,  Apple inserted a special check to see if the left-
hand-side (called to_p) is

an INDIRECT_REF (i.e., a pointer expression).  If it is a pointer
inspection then the right-hand-side

is gimplified before the left,  otherwise (as is the case in stock
gcc) the left-hand-side is always gimplified

first.
In general, gimplification tends to proceed from left to right, so why
did Apple change it here?
The intent here AFAICT is to make register assignment a bit simpler
(especially on a particular register-starved

architecture) by ensuring that the storage location of the LHS is does
not need to be stored to memory

in order to perform the computation of the RHS.
The "sink" pass referred to in the comment is applied after the SSA
transformation (which is after GIMPLIFICATION),

and like the other optimization passes won't (/isn't supposed) not to
change the observable effects of its input,  so

mainline gcc which does not include this trickery computes the l-value
first because the GIMPLE says to and none

of the other optimizations will violate that ordering (due to the
embedded assignments effect.

A noteworthy bit about all this is that the INDIRECT_REF type does not
include array subscripts (unless, I believe, the

array being subscripted is a formal parameter of the current
subroutine in which case all operations on it are converted

to pointer ops by the C-frontend).
(Here comes the piece that may have a performance impact):

The effect is that the lvalue of the LHS of an assignment to an array
element is likely to be computed before the r-value to

be assigned,  this in turn may force that l-value to be written to
memory during the computation of the RHS and then read

back in when the assignment is ready to proceed.
I have not finished experimenting with the optimizer to try to force
this behavior (my test programs have largely been trivial

enough that the other optimizers have essentially wiped out the entire
computation),  but if the above code segment is the only

solution to this problem, then it is definitely missing array element
updates.
I'm glad I took the time to track this down,  understanding why things
happen the way they do is always very rewarding.

I had been meaning to tackle the GCC source code for some time, but
had been somewhat daunted and didn't have any

good questions to ask it.
That said,  this was clearly the wrong list to ask the question I
did.  I was looking for more of a gcc-devel list, but had an inkling

this was an apple only phenomenon (and I was right),  and the mainline
folks tend to not be thrilled about supporting apple's

bizarre forks.
smime.p7s

J. Aaron Pendergrass

tags

participants (1)