Message: 2
Date: Fri, 27 Jan 2006 21:54:17 +0100
Subject: Re: Track leaked Filedescriptors
Content-Type: text/plain; charset="us-ascii"
I can cause this to happen on any machine I've tested it on, plain
freshly installed XServes, Quad G5, Dual Gs, nothing installed except
the DB server we use.
We suspected HW problems, we even got a new XServe and send the old
one for repair, but we experienced the problem on the otehr box too
(other model), and then I made it happen.
The last thing in the system log usually is:
Jan 14 15:57:19 tor DirectoryService[41]: NetInfo connection failed
for server 127.0.0.1/local
then nothing, in any logs...
Thats why I suspect out-of file descriptors, as all other possible
solutions are gone,
I still don't understand why you think that the above is caused by a file descriptor leak. There is simply no evidence whatsoever to support that assumption.
or, its a driver level bug in OSX 10.4.x (that
doenst exist in the 10.4.3 build for intel (DTK) where this runs
flawless with the UB build.
Sadly, these sorts of problems are rarely so simple.
However, all that being said, if you were to have mentioned, for example, which particular "DB server" you're using, what the workload for it is, and how the remainder of the system is configured it might be possible to reproduce the problem and determine what is actually going on. Since you're determined that it's a descriptor leak of course, you haven't done that.
I've spent two months looking through every possible logfile on this on every single crash.
I've spent time writing code that can reproduce the crash in the non-production system.
I've reported this to Apple when we experienced this on the brand new XServe using the 90 days Apple Support care someting, installed their special sofdtware, and their conclusion was Its a bug in the database software, outside Appoles support.
Qoute:
" Thank you for all these details. I have checked with the US department and the conclusion is that you need to check with <database vendor>, this problem is outside our scope of support."
You'll get best results overall from describing the problem you're having, and including the above information so that an engineer can reproduce the actual problem on a machine in front of them, and putting it all into a bug report so that Apple can actually take a look at it.
I have put together a crashing example that make this to happen with their sample databases, the problem here is, when it triggers. not even gdb will let you examine the app (so db vendor is out of luck debugging this on their side as well), and my guess is that it has to do with Java as I make heavy use of stored procedures written in Java (my crashing example is just two stored procedures).
Since you have a DTK system you should have an ADC account, and so getting some attention for your bug shouldn't be too hard.
I've started to try to make this "one Question" to dts so I can use a technical incident to solve it. The problem is (I've been in touch with dts many times) is how to "define" this bug. That the reaason I turned to this list, as everybody I spoke to poointed here, and told me that this is the place if you want to know how to find out things that buried deep into Darwin.
What I wanted was to know how I can get a accurate list of processes usage of system resources (fd:s is one of them) so I can tell Apple something like "Why is Java eating all those filedescriptors with this piece of code"?
If I cannot put it tis simple, it wont be one question, and it wont fit into the "one question per incident" formula.
Sorry if I sound to hash, but everybody seems to look at this as "it impossible, it cannot happen", but ask me, I can give you DL instructions and my crash samples, and you can see four your self how bad this issue is".
// Totte
As Arnie once said, "It's not a tumor!".
= Mike
==========================================================
Experience is what you get when you expected something else.