Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1
Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1
- Subject: Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1
- From: James Reynolds <email@hidden>
- Date: Tue, 18 Oct 2005 15:09:07 -0600
So, it turns out my problems were a little deeper than I thought.
I'm hoping someone here can help me debug launchd and kernel_task
with gdb (see why below).
First, the kernel panics is only one symptom. Another symptom is
that the computer will not reboot. It goes to the blue screen with
the spinning cogwheel and must be force restarted (holding power
button until it powers off). If I tap the power button and go into
kernel debug mode, I can see the only processes running are launchd,
kernel_task, and the reboot command. I honestly have no idea where
to start looking to see how these are hung since they break at the
NMI code.
Second, the kernel panics (and system hangs) don't occur on the same
file. So this isn't an Icon\R file problem. It is different files
from different Apps, every time. But they are all apps in
/Applications.
Third, it also turns out mds wasn't the only app that caused the
kernel panic. In fact, the find command does it as well and more
often. It can either cause the kernel panic or when I try to
/sbin/reboot then launchd and kernel_task do not quit. These are the
2 commands (executed at the same time)
find / ! -type l -perm -2 &
find / \( ! -fstype local -o -fstype fdesc -o -fstype devfs \) -a
-prune -o -print
The first find command is something we were running to find world
writable files. The second find command is part of
/usr/libexec/locate.updatedb. I'm not sure if the find parameters
are actually important.
So I disabled these commands and our boxes don't kernel panic as
much, but a few kp each day, and a 5-15 hang at reboot each day (all
of our machines reboot at least once a day), indicating that
something else besides the find command is triggering the problem.
I do know the kernel panics only occur on our dual G5 2.0's, and only
with a full load of software, running 10.4.2. This problem did occur
in 10.3, but was so rare we didn't see it as too big of a problem
(and never would have been able to really isolate what was going on
anyway)
If I delete most of our deployed software, the panics/hangs don't
occur. We have so much software it is hard to nail down exactly
which software causes this, but I do have machines running a script
that is helping me narrow down the files, but it takes time.
And to make matters more complicated (if you have the time and
patience to keep reading), here is how we deploy software. We
install it on a test box. Capture the changed files and upload them
to a server. Then the lab systems pull down the software over the
network. This is what Radmind is used for. But to complicated it
more, when we set up a new computer, we make a disk image of a lab
box hard drive, and then clone it to another computer with
/usr/sbin/asr.
While radmind shouldn't be transfer hard disk corruption, asr could I
suppose. Except we have run multiple disk utilities on the hard
disks (Disk Warrior being cheif). And if we delete most of the
applications, the problem goes away. And if we re-download them with
radmind, the problem comes back. Which suggests the problem is not
asr.
Radmind performs checksums on every file and any file that doesn't
pass is replaced with a server copy. So I can't see how a corrupted
file could even be transferred from the server to the clients because
of this.
Perhaps something completely legal as far as the file system and
radmind is concerned is triggering an OS bug?
Any help is much appreciated!
--
Thanks,
James Reynolds
University of Utah
Student Computing Labs
email@hidden
801-585-9811
At 11:27 AM -0600 10/7/05, James Reynolds wrote:
Thanks for the info! It is totally what I needed. Autopsy
conclusions are below.
You will probably want to switch to the frame number from the bt
for the vnode_put_locked frame, and do a:
frame 3 (or whatever the frame number is from the bt)
print *vp
as well; this will narrow down the issue for your bug report.
If the mount point isn't dead:
p vp->v_mount
p dead_vnodeop_p
and v_mount is not 0 or the same value as dead_vnodeop_p, then also do this:
p *vp->v_mount
If you are just interested in avoiding the problem, or want to
avoid it until there's a fixed release (it may already be fixed by
the next update), the showallstacks will report the process name of
all the processes, and you will see the panic in one of them;
It is mds, spotlight, which is what I thought.
the print *vp will give a v_name field, which will be the name of
the file (probably a symbolic link).
Nope. "Icon\r"
You can look at the vp pointed to by the v_parent, and print it's
v_name, and so on, to assemble the path to the vp.
Specifically:
.../Library/Preferences/Painter 7 Prefs/Brushes/Ver 6/Cloners/Icon\r
This is a home folder for a student. I normally delete these
folders at logout. But I've noticed on some machines that the rm
-rf process will fail trying to delete the Cloners folder. I had no
idea why. And kill -9 <pid of rm> failed. ls -l /path/to/Cloners
failed as well (and couldn't be killed either).
We have also been running DiskWarrior and it has been reporting
"Repaired: Custom Icon Flag" for that file and many many more on the
systems. I didn't think it was an issue, but wow, I guess it can
cause a kernel panic and hangs (I didn't know the icons were the
reason the rm -rf was failing, just something in that folder).
Anyway, I am still thinking this might be a Radmind problem. We
distributed these files with Radmind. Radmind may be messing them
up. I'll take this up with them.
I'll still file it with Apple though.
If it's dead because of a forced unmount, the value of v_mount
will be 0; it might also be that the vp->v_op has the same address
in it as the symbol dead_vnodeop_p; you could check both of those.
I really would recommend filing a bug, though, including all the
above information:
<http://bugreporter.apple.com>
PS: Say "hi" to North Physics for me... I started my college career
at the UofU majoring in high energy physics and applied math. 8-).
Yay! Go UTES!
--
Thanks,
James Reynolds
University of Utah
Student Computing Labs
email@hidden
801-585-9811
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden