Typically, you just have to design your code so that it either fails safe, or it fails locally. If you end up corrupting memory, or end up walking off a pointer into a non existent address, or an address of something that used to be allocated, but is now freed, you will get either memory corruption (if you happen to hit something that's there, or if a freed area is reused for another purpose), or you crash with a fault in kernel mode.
Thats nothing new for me. Actually most of the techniques you do there I already use in my own code. The KEXT we are looking at is however not developed by me but a bunch of very experienced kernel developers. It is very well working code written by very well experienced bsd developers (its actually code which is part of the OpenBSD kernel by now) and we seem to have a crash which is happen regularly on my setup but not on others. So we have some kind of race condition or so which is very hard to track down. And every time it goes down, we get 750MB of useless core dump unless someone can give me at least a glimpse if there's anything in there pointing us to the right direction somehow. I know this is a hard problem to track down but any hints are welcome. In the meantime we are continuing hunting improving the code on the way. We'll find it!
We might also decide to boot with cpus=1 on an SMP system, to make sure that any locking we forget to do against reentrancy won't bite us.
Happens on single CPU as well. That has been tested. XServe G4 single CPU, XServe Dual G5, MacPro 4xIntel, MacPro forced to single CPU, MacMini Dual CPU have all been seen crashing in the past. Nowadays the PPC's crash less than the intels but it has been opposite too a while ago (but we found a few other glitches on the way and fixed them too).
We don't see any difference if the calls are synchronously, asynchronously, using poll or select, running in a single thread or multiple threads concurrently. Last traceback gave us some hint that it was somewhere in proto_delayed_inject() trying to lock a mutex (entry->domain->dom_mtx)
Then we are off to do the same under Leopard. (oh by the way, where can we checkout the sources of current leopard beta so we can verify our design?).
You can't get the sources. We generally do not release xnu sources until after release, for reasons which should be obvious, if you think about them a little bit. If you need to test the KEXT and you are eligible for seeds (see < http://developer.apple.,com>), you can install one of the seed builds and load your KEXT into a binary seed kernel for testing.
Well I have a binary of Leopard since WWDC2006. But that doesn't help for this KEXT as some internal structures are required to adapt it for Leopard. It loads fine but doesn't do its job. I've been asking Apple to make a public API out of it so we are not kernel version dependent but so far everyone is too busy to get Leopard out that it gets postponed all the time (I'm asking this since 2004 including personal visits to WWDC 2005 and WWDC 2006 and WWDC 2007 and...). Unfortunately we simply don't have any other choice to implement this protocol. There is no other viable way to do this except throwing out MacOS X and go for Solaris (eiiik!)...or Linux (where is my user friendly GUI?) or OpenBSD (never touched so far) or Windows (please please not!) or do some really dirty hacking like implementing a layer 4 protocol on top of our own link layer (yes, we are so keen to implement a second IP stack...) or to implement it on a API built for IP filtering.
MacOS X is the _LAST_ serious operating system I know which does not have SCTP (RFC3286, RFC2719,RFC 2960, RFC 3309, RFC 3257, RFC4460) yet... and SCTP is *MANDATORY* for any new Telecommunications protocol like M2PA (RFC4165), M3UA (RFC3332, RFC4666), SUA (RFC3868), IUA (RFC3057, RFC 3807, RFC4129, RFC 4233 ) which are the base of services like VoIP SS7 telephony, SIGTRAN, GSM network infrastructure, UMTS network infrastructure, international text messaging (SMS), Number Portability Systems and all the big iron Telco's have in their racks. Of course none of them today comes from Apple and Apple seems absolutely not interested to sell XServes into racks of telco giants having loads of money willing to spend millions of dollars on turnkey solutions provided by Apple developers.
Conclusion: Apple is not considering telecommunications sector as being _ANY_ market for them ...
...and then Steve Jobs comes at MacWorld and is talking hours and hours about a pure telecommunications device......
(for Apple internals for the fun of it, read radar 4605154, 4609031, 4144183, 4674730)
Andreas Fink
Fink Consulting GmbH Global Networks Schweiz AG BebbiCell AG
--------------------------------------------------------------- Tel: +41-61-6666330 Fax: +41-61-6666331 Mobile: +41-79-2457333 Address: Clarastrasse 3, 4058 Basel, Switzerland --------------------------------------------------------------- ICQ: 8239353 MSN: email@hidden AIM: smsrelay Skype: andreasfink Yahoo: finkconsulting SMS: +41792457333
|