Re: GCC stack size
Re: GCC stack size
- Subject: Re: GCC stack size
- From: Ryan McGann <email@hidden>
- Date: Fri, 20 Feb 2009 23:13:21 -0800
On Feb 20, 2009, at 10:07 PM, Eric Gouriou wrote:
On Feb 20, 2009, at 4:15 PM, Ryan McGann wrote:
A couple weeks ago I asked about a machine-check panic I was
getting. As it turns out, my suspicions were right about stack
corruption. I disassembled a function in our kext and saw that for
some reason, GCC's function prologue was allocating around 1660
bytes of stack (in release, debug was slightly better at 1140
bytes). On other platforms compiled with GCC (FreeBSD and many
different Linux distros) the stack usage is near 110 bytes, but for
some reason gcc on Mac OS X is allocating almost 4K.
We are currently using -O3 because the code is pretty compute-
bound, and on other platforms -O3 has a nice 5% boost compared to -
O2. But changing it to -O2 doesn't even help, we have to go all the
way to -O1 to get a usable stack of 400 bytes (still 4x larger than
our Linux driver).
What are your compile options, besides -O3 ?
There's a couple, but most are-I and -isysroot. Here's the trimmed
down (all -I s removed) command for the function that has the problem:
g++ -W -Wall -Wcast-qual -Wcast-align -Wpointer-arith -Wsign-compare -
Winline -Wunused -mmacosx-version-min=10.4 -Di386 -DDarwin -
D__STDC_FORMAT_MACROS -fcheck-new -mkernel -Os -DNDEBUG -c -o
BUILDTARGETS-RELEASE/Darwin-9.6.0-i386-Mac_OS_X-10.5.6/regexr.o
regexr.cpp
Only options of real significance is the -mkernel and -fcheck-new,
neither of which AFAIK should cause (a ton) of stack usage. Also we
have a userspace version of this library that uses the same options
(except -mkernel), and it has stack sizes in the same neighborhood, so
-mkernel is not the culprit here either.
Do you have the same issue when using -mkernel -Os ?
(-Os on Apple's gcc is mostly -O2 with a bit more emphasis on code
size)
The -Os option is better but still bad. All the options, including no
optimization, produce a pretty large stack. -mkernel -O2, -mkernel -O3
produce a stack over 1600 bytes. -mkernel -O1 is better at 400 bytes,
but that's still 4x larger than -O3 on FreeBSD. -Os produces a stack
that's 1036 bytes, which is smaller than -O2 but way bigger than -O1.
The code is vanilla C++ without anything fancy—no virtual functions
even. There are no warnings about temporarys being used, so I have
no clue what is causing the stack usage. It's a huge function with
a lot ofswitch statements and for loops, but not a lot of function
calls, mostly just computes on arrays of data. My best guess is
that GCC is trying to optimize the intermediate operations and
temporary results by placing them on the stack.
You say "not a lot of function calls". Can you disable inlining or
throttle it down to check
that it's not the cause of the bloat ? If so, -Os would help.
That's what I thought too—there is a lot of inlining, so I thought -Os
would help, but in fact -Os seems to be only marginally better than -O2.
This function is is kind of the "heart" of our code, so it can be
called a lot, and sometimes recursively, so we are looking for
something in the 100-200 byte range. All the variables have been put
into a struct that is (OS)Malloc'd and the function was designed to be
a leaf function (not many external function calls except for malloc/
free) to minimize stack usage. The best I can get is 400 bytes, and
that's with -O1. For something that is basically just a lot of array/
pointer manipulation I don't understand where the space is going.
We've used this library on Linksys switches without problem.
My assembly language is pretty bad for x86, and this function is
several pages of C[++] code, but taking a look through the assembly I
found something strange—although the prologue moves the stack down
1000+ bytes, and the epilogue moves it up by the same, I didn't find
any instructions that used anything beyond the 24th byte in the frame.
It looks like the first couple of words on the stack are for the
parameters to the function, and every time there's recursion, those
words are retrieved from the stack to pass to the recurive call. I
didn't see anything accessing a offset from the %esp except in the
prologue and epilogue, where the esp was used for the subl/addl, and I
don't see any pushl except in the prologue. So perhaps it's an
alignment issue--but the default stack alignment should be 4 or 16
bytes, not 1600 bytes.
Also of note: this problem occurs on PowerPC as well, though it's
different (abnormally large) sizes for the stack, and it's not as bad.
But our kext still panics on that platform too.
Thanks,
Ryan
Anybody have ideas on how to show where GCC is allocating things in
the frame, and how to reduce the stack usage? It's hard to distill
this to a single issue because the function is so large, but I am
tempted to file a bug since GCC 4 optimizes things quite nicely on
other platforms.
Thanks,
Ryan
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden