On Feb 2, 2007, at 9:21 PM, Eric Albert wrote:
On Feb 2, 2007, at 5:46 PM, Dave Hayden wrote:
This is definitely one of the weirdest problems I've run into so far. We discovered that network performance in the latest release of our Usenet app Unison was severely degraded on intel machines--instead of getting the usual ~400K/s transfer rate, it was pulling 1-10% of that. Shark said it's spending all its time in select(). I didn't see the same problem in debug builds either, so, figuring it might be an optimizer bug, I made a small tweak that actually "fixed" the bug. But I couldn't spot the problem in the generated code, so I poked around more.
All I can make out is that something about the segment layout seems to be the cause: if I add enough nop instructions to shift the start of the __text section of the __TEXT segment, it goes away. I've moved nops around so that the select() call is at the same address but the segment size is bigger, and it also fixes it. If I add a whole page of 4096 nops, it's slow again. I don't know enough about the system guts to imagine how segment layout could have any effect on the select() calls..
[...]
Apologies for saying this, but that fix doesn't make any sense. The OS doesn't really care about the size of your code (which is what the __TEXT,__text section is used for).
For that matter, select() is a blocking call. I wouldn't expect an app to spend any measurable amount of time using the CPU there. Instead, it'll sit there waiting for data to appear on the fds or for the timeout to be hit. Are you using a very short timeout or anything like that?
The Shark sample was doing the Sampler-like trace, whatever that's called. "All Thread States"? Anyway, yeah, it says that that thread is spending 99.8% of its time blocked on select() instead of the typical ~96%. It's got a 90 second timeout, not very short. I don't recall the normal time profile showing anything very interesting.
What led you to start adding nops in the first place? That's not a common attempted fix for a performance problem. Perhaps a description of whatever led you to go down that path will help folks here figure out what you can look at to fix the problem in a more typical way.
Since it was only on intel and optimized builds, and since Shark just showed it blocked on select(), I didn't think it was a common performance problem. I figured a code gen problem, and I might have just turned off optimization on that file--and it probably would have worked--but I wanted to be a good developer and report the bug.
The code around the select() looked like this:
int count = 0;
// ...
if ( ssl != NULL )
count = SSL_read(...);
if ( count == 0 )
{
// select, then read..
}
I changed theand it ran at normal speed again. I looked at the generated code but didn't see anything unexpected (given what little I know about x86) so I replaced the extra code with nops: same result. I played around with that to try and sort out a cause by finding the smallest difference required to "fix" the problem, and so I found that when I add just enough nops to change the size of the code section (which also tends to shift the start offset) it's suddenly fast again. You don't have to convince me it doesn't make any sense, I already know. :)
When I get in on Monday I'll post a good and a bad build with a single nop difference between the two. If you'd like to look into this I'd be happy to set up a test account for you.