site_archiver@lists.apple.com Delivered-To: darwin-dev@lists.apple.com Ryan On Feb 20, 2009, at 10:07 PM, Eric Gouriou wrote: On Feb 20, 2009, at 4:15 PM, Ryan McGann wrote: What are your compile options, besides -O3 ? Eric Thanks, Ryan _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/eric.gouriou%40pobox.com _______________________________________________ Do not post admin requests to the list. They will be ignored. Darwin-dev mailing list (Darwin-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/darwin-dev/site_archiver%40lists.appl... Correction to my last post: I am learning i386 assembly as I go, and wasn't looking at the usage of the EBP register. The EBP register's (negative) offsets can be quite large: 00004038 movl 0xffffff5c(%ebp),%eax 0000403e movl %eax,0xfffffc48(%ebp) 00004044 movl 0x10(%ebp),%eax 00004047 movl 0x18(%eax),%eax 0000404a movl %eax,0xfffffc50(%ebp) 00004050 movl 0,0xfffffc54(%ebp) 0000405a movl 0xffffff64(%ebp),%eax 00004060 movl (%eax),%edx 00004062 movl 0x04(%eax),%ecx 00004065 movl %edx,0xfffffc58(%ebp) 0000406b movl %ecx,0xfffffc5c(%ebp) 00004071 movl 0xfffffc5c(%ebp),%ecx 00004077 cmpl %ecx,0xfffffc54(%ebp) 0000407d jbl 0x00004156 So values from the ESP don't go beyond the 24 byte range but I see plenty of movl addressing (-939)%ebp. Ouch. A couple weeks ago I asked about a machine-check panic I was getting. As it turns out, my suspicions were right about stack corruption. I disassembled a function in our kext and saw that for some reason, GCC's function prologue was allocating around 1660 bytes of stack (in release, debug was slightly better at 1140 bytes). On other platforms compiled with GCC (FreeBSD and many different Linux distros) the stack usage is near 110 bytes, but for some reason gcc on Mac OS X is allocating almost 4K. We are currently using -O3 because the code is pretty compute- bound, and on other platforms -O3 has a nice 5% boost compared to - O2. But changing it to -O2 doesn't even help, we have to go all the way to -O1 to get a usable stack of 400 bytes (still 4x larger than our Linux driver). Do you have the same issue when using -mkernel -Os ? (-Os on Apple's gcc is mostly -O2 with a bit more emphasis on code size) The code is vanilla C++ without anything fancy—no virtual functions even. There are no warnings about temporarys being used, so I have no clue what is causing the stack usage. It's a huge function with a lot ofswitch statements and for loops, but not a lot of function calls, mostly just computes on arrays of data. My best guess is that GCC is trying to optimize the intermediate operations and temporary results by placing them on the stack. You say "not a lot of function calls". Can you disable inlining or throttle it down to check that it's not the cause of the bloat ? If so, -Os would help. Anybody have ideas on how to show where GCC is allocating things in the frame, and how to reduce the stack usage? It's hard to distill this to a single issue because the function is so large, but I am tempted to file a bug since GCC 4 optimizes things quite nicely on other platforms. This email sent to eric.gouriou@pobox.com This email sent to site_archiver@lists.apple.com
participants (1)
-
Ryan McGann