Re: Understanding cores...
Re: Understanding cores...
- Subject: Re: Understanding cores...
- From: Michael Tuexen <email@hidden>
- Date: Thu, 11 Jan 2007 22:59:32 +0100
Hi Terry,
some additions:
- it is not OpenBSD, but FreeBSD. The code is in the CVS repository
of FreeBSD.
- the SCTP implementation supports not only the RFCs mentioned below
but also almost
all SCTP extension currently described by Internet Drafts (and the
intersection
of the authors of the implementation and the authors of the RFC/
IDs is not
empty). So this implementation is the one which supports more
features than
the Solaris or Linux one.
- there are more users of the SCTP NKE: People use it for IPFix,
MPICH supports
now SCTP (also on Macs using the NKE), Reliable Server Pooling
rsplib and more.
None of them have the problems Andreas has. It is not clear what
is so special
with his setup. That is why we are asking if someone might have
some suggestions
to look at.
Best regards
Michael
On Jan 11, 2007, at 5:11 PM, Andreas Fink wrote:
Typically, you just have to design your code so that it either
fails safe, or it fails locally. If you end up corrupting memory,
or end up walking off a pointer into a non existent address, or an
address of something that used to be allocated, but is now freed,
you will get either memory corruption (if you happen to hit
something that's there, or if a freed area is reused for another
purpose), or you crash with a fault in kernel mode.
Thats nothing new for me. Actually most of the techniques you do
there I already use in my own code. The KEXT we are looking at is
however not developed by me but a bunch of very experienced kernel
developers. It is very well working code written by very well
experienced bsd developers (its actually code which is part of the
OpenBSD kernel by now) and we seem to have a crash which is happen
regularly on my setup but not on others. So we have some kind of
race condition or so which is very hard to track down. And every
time it goes down, we get 750MB of useless core dump unless someone
can give me at least a glimpse if there's anything in there
pointing us to the right direction somehow. I know this is a hard
problem to track down but any hints are welcome. In the meantime we
are continuing hunting improving the code on the way. We'll find it!
We might also decide to boot with cpus=1 on an SMP system, to make
sure that any locking we forget to do against reentrancy won't
bite us.
Happens on single CPU as well. That has been tested. XServe G4
single CPU, XServe Dual G5, MacPro 4xIntel, MacPro forced to single
CPU, MacMini Dual CPU have all been seen crashing in the past.
Nowadays the PPC's crash less than the intels but it has been
opposite too a while ago (but we found a few other glitches on the
way and fixed them too).
We don't see any difference if the calls are synchronously,
asynchronously, using poll or select, running in a single thread or
multiple threads concurrently. Last traceback gave us some hint
that it was somewhere in proto_delayed_inject() trying to lock a
mutex (entry->domain->dom_mtx)
Then we are off to do the same under Leopard. (oh by the way,
where can we checkout the sources of current leopard beta so we
can verify our design?).
You can't get the sources. We generally do not release xnu
sources until after release, for reasons which should be obvious,
if you think about them a little bit. If you need to test the
KEXT and you are eligible for seeds (see <http://
developer.apple.,com>), you can install one of the seed builds and
load your KEXT into a binary seed kernel for testing.
Well I have a binary of Leopard since WWDC2006. But that doesn't
help for this KEXT as some internal structures are required to
adapt it for Leopard. It loads fine but doesn't do its job. I've
been asking Apple to make a public API out of it so we are not
kernel version dependent but so far everyone is too busy to get
Leopard out that it gets postponed all the time (I'm asking this
since 2004 including personal visits to WWDC 2005 and WWDC 2006 and
WWDC 2007 and...). Unfortunately we simply don't have any other
choice to implement this protocol. There is no other viable way to
do this except throwing out MacOS X and go for Solaris
(eiiik!)...or Linux (where is my user friendly GUI?) or OpenBSD
(never touched so far) or Windows (please please not!) or do some
really dirty hacking like implementing a layer 4 protocol on top of
our own link layer (yes, we are so keen to implement a second IP
stack...) or to implement it on a API built for IP filtering.
MacOS X is the _LAST_ serious operating system I know which does
not have SCTP (RFC3286, RFC2719,RFC 2960, RFC 3309, RFC 3257,
RFC4460) yet... and SCTP is *MANDATORY* for any new
Telecommunications protocol like M2PA (RFC4165), M3UA (RFC3332,
RFC4666), SUA (RFC3868), IUA (RFC3057, RFC 3807, RFC4129, RFC
4233 ) which are the base of services like VoIP SS7 telephony,
SIGTRAN, GSM network infrastructure, UMTS network infrastructure,
international text messaging (SMS), Number Portability Systems and
all the big iron Telco's have in their racks. Of course none of
them today comes from Apple and Apple seems absolutely not
interested to sell XServes into racks of telco giants having loads
of money willing to spend millions of dollars on turnkey solutions
provided by Apple developers.
Conclusion: Apple is not considering telecommunications sector as
being _ANY_ market for them ...
...and then Steve Jobs comes at MacWorld and is talking hours and
hours about a pure telecommunications device......
(for Apple internals for the fun of it, read radar 4605154,
4609031, 4144183, 4674730)
Andreas Fink
Fink Consulting GmbH
Global Networks Schweiz AG
BebbiCell AG
---------------------------------------------------------------
Tel: +41-61-6666330 Fax: +41-61-6666331 Mobile: +41-79-2457333
Address: Clarastrasse 3, 4058 Basel, Switzerland
E-Mail: email@hidden
www.finkconsulting.com www.global-networks.ch www.bebbicell.ch
---------------------------------------------------------------
ICQ: 8239353 MSN: email@hidden AIM: smsrelay Skype: andreasfink
Yahoo: finkconsulting SMS: +41792457333
<GSM07-468-x-60.gif>
<GSM07-468-x-60.gif>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40lurchi.franken.de
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden