Re: device driver crashing on G5 Tower (dual processor)

14 Sep 2005

      site_archiver@lists.apple.com
Delivered-To: darwin-kernel@lists.apple.com
User-agent: Mozilla Thunderbird 1.0.2 (Macintosh/20050317)

Is there a call that will let you know how much stack you are using?
On Sep 13, 2005, at 11:00 AM, Derek Kumar wrote:
Herbert,
Does anyone have any ideas to try?
Thanks in advance.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-kernel/drk%40apple.com
This email sent to drk@apple.com

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (Darwin-kernel@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/darwin-kernel/site_archiver%40lists.a...
No call that I know of, but you can examine
current_thread()->kernel_stack (hopefully that's available in kexts - if
not, you can do the same through the debugger, or for debugging
purposes,  examine SPRG1, which maintains a pointer to the current
thread, and proceed from there) and either use an inline asm call to
examine r1 (GPR1, used as the stack pointer), or take the address of a
local (probably less accurate); the delta (SP - kernel_stack) would be
the available stack. You'd have to do this in the deepest call in your
stack, of course. Unless you have reason to believe an overflow is
likely (you're storing a lot of data on the stack, or you have lots of
recursion/deep call stacks), I wouldn't rule out other forms of stack
corruption.

Derek

Herbert wrote:
You are right, my description is a little lite on details. From your
info, it seems likely that I am overflowing the stack.
Did the paniclog (either the "panic.log" file, or the one displayed
with the paniclog kgmacro with the kernel crashdump or debugger
connection) contain a trace from the previous savearea? You can  also
manually walk the thread savearea chain in the crashdump, to  look at
the state of your thread prior to the immediate fatal  exception.
The "corrupt stack" message is often (but not always, of course)
indicative of a stack overflow - the kernel stack size is 4 pages,
and if you overflow that, you will encounter the unmapped "guard
page", causing a DSI.  It's often helpful to translate the PC (or
rather srr0, the extrapolated program counter at the time of the
exception) to a symbol, and to look at the contents of the DAR (or
data access register) which holds the faulting address, the access
to which caused the fatal DSI (the "data access" exception). Both  of
those should be in the paniclog - does the DAR, for instance,  look
like something on your thread's stack? If so, what's the  offset from
the base of the stack? You should also use the debugger  to examine
the faulting instruction (the PC) and look at what the  immediate
cause of the exception was.
Your post is a tad light on details - what sort of synchronization
mechanism are you using? Is there a memory descriptor being passed
down from a user client, whose contents are manipulated prior to an
eventual "down call" through a command gate for serialization/ queueing?

Derek

Herbert wrote:
I'm writing a driver that does encryption. I am testing it on
several  machines, including a mac mini, an iMac, and a G5 Tower
(two  processors).
The extension is by no means perfect, BUT it dies pretty hard on
the G5.
I do a bunch of copies for testing, and on the mini and iMac I can
do  9+ gigs, but the G5 will die (with a kernel panic) after about
270 megs.
There were a couple of things  I tried to narrow it down. I took
out  one HD (that didn't help), and I disabled one processor (that
didn't  help, either).
The kernel panic shows a data access error, and I'm corrupting  the
stack - getting a kernel dump didn't help much because of that.
I suspect a threading problem of some kind, i.e. two threads
accessing my write method, but I've got locks on that. Even with
the  locks I sometimes see two writes happening (I print a message
when  that happens). With locks, I would have thought I'd never  see
that (I  see it rarely on the other machines, too).
So I'm out of ideas. Is there some docs someone has seen  somewhere
that might help?
This email sent to site_archiver@lists.apple.com