Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1

Subject: Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1
From: James Reynolds <email@hidden>
Date: Tue, 18 Oct 2005 15:09:07 -0600

So, it turns out my problems were a little deeper than I thought. I'm hoping someone here can help me debug launchd and kernel_task with gdb (see why below).

First, the kernel panics is only one symptom. Another symptom is that the computer will not reboot. It goes to the blue screen with the spinning cogwheel and must be force restarted (holding power button until it powers off). If I tap the power button and go into kernel debug mode, I can see the only processes running are launchd, kernel_task, and the reboot command. I honestly have no idea where to start looking to see how these are hung since they break at the NMI code.

Second, the kernel panics (and system hangs) don't occur on the same file. So this isn't an Icon\R file problem. It is different files from different Apps, every time. But they are all apps in /Applications.

Third, it also turns out mds wasn't the only app that caused the kernel panic. In fact, the find command does it as well and more often. It can either cause the kernel panic or when I try to /sbin/reboot then launchd and kernel_task do not quit. These are the 2 commands (executed at the same time)

find / ! -type l -perm -2 & find / \( ! -fstype local -o -fstype fdesc -o -fstype devfs \) -a -prune -o -print

The first find command is something we were running to find world writable files. The second find command is part of /usr/libexec/locate.updatedb. I'm not sure if the find parameters are actually important.

So I disabled these commands and our boxes don't kernel panic as much, but a few kp each day, and a 5-15 hang at reboot each day (all of our machines reboot at least once a day), indicating that something else besides the find command is triggering the problem.

I do know the kernel panics only occur on our dual G5 2.0's, and only with a full load of software, running 10.4.2. This problem did occur in 10.3, but was so rare we didn't see it as too big of a problem (and never would have been able to really isolate what was going on anyway)

If I delete most of our deployed software, the panics/hangs don't occur. We have so much software it is hard to nail down exactly which software causes this, but I do have machines running a script that is helping me narrow down the files, but it takes time.

And to make matters more complicated (if you have the time and patience to keep reading), here is how we deploy software. We install it on a test box. Capture the changed files and upload them to a server. Then the lab systems pull down the software over the network. This is what Radmind is used for. But to complicated it more, when we set up a new computer, we make a disk image of a lab box hard drive, and then clone it to another computer with /usr/sbin/asr.

While radmind shouldn't be transfer hard disk corruption, asr could I suppose. Except we have run multiple disk utilities on the hard disks (Disk Warrior being cheif). And if we delete most of the applications, the problem goes away. And if we re-download them with radmind, the problem comes back. Which suggests the problem is not asr.

Radmind performs checksums on every file and any file that doesn't pass is replaced with a server copy. So I can't see how a corrupted file could even be transferred from the server to the clients because of this.

Perhaps something completely legal as far as the file system and radmind is concerned is triggering an OS bug?

Any help is much appreciated!

--

Thanks,

James Reynolds
University of Utah
Student Computing Labs
email@hidden
801-585-9811

At 11:27 AM -0600 10/7/05, James Reynolds wrote:

Thanks for the info! It is totally what I needed. Autopsy conclusions are below.
You will probably want to switch to the frame number from the bt for the vnode_put_locked frame, and do a:
    frame 3 (or whatever the frame number is from the bt)
    print *vp
as well; this will narrow down the issue for your bug report.
If the mount point isn't dead:
    p vp->v_mount
    p dead_vnodeop_p
and v_mount is not 0 or the same value as dead_vnodeop_p, then also do this:
    p *vp->v_mount
If you are just interested in avoiding the problem, or want to avoid it until there's a fixed release (it may already be fixed by the next update), the showallstacks will report the process name of all the processes, and you will see the panic in one of them;
It is mds, spotlight, which is what I thought.
the print *vp will give a v_name field, which will be the name of the file (probably a symbolic link).
Nope.  "Icon\r"
You can look at the vp pointed to by the v_parent, and print it's v_name, and so on, to assemble the path to the vp.
Specifically:
.../Library/Preferences/Painter 7 Prefs/Brushes/Ver 6/Cloners/Icon\r
This is a home folder for a student. I normally delete these folders at logout. But I've noticed on some machines that the rm -rf process will fail trying to delete the Cloners folder. I had no idea why. And kill -9 <pid of rm> failed. ls -l /path/to/Cloners failed as well (and couldn't be killed either).

We have also been running DiskWarrior and it has been reporting "Repaired: Custom Icon Flag" for that file and many many more on the systems. I didn't think it was an issue, but wow, I guess it can cause a kernel panic and hangs (I didn't know the icons were the reason the rm -rf was failing, just something in that folder).

Anyway, I am still thinking this might be a Radmind problem. We distributed these files with Radmind. Radmind may be messing them up. I'll take this up with them.
I'll still file it with Apple though.
If it's dead because of a forced unmount, the value of v_mount will be 0; it might also be that the vp->v_op has the same address in it as the symbol dead_vnodeop_p; you could check both of those.

I really would recommend filing a bug, though, including all the above information:
    <http://bugreporter.apple.com>
PS: Say "hi" to North Physics for me... I started my college career at the UofU majoring in high energy physics and applied math. 8-).
Yay!  Go UTES!
--
Thanks,
James Reynolds
University of Utah
Student Computing Labs
email@hidden
801-585-9811
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Darwin-kernel mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1
From: James Reynolds <email@hidden>


References:  
  >panic(cpu 1 caller ...): vnode_put(...): iocount < 1 (From: James Reynolds <email@hidden>)
  >Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1 (From: Terry Lambert <email@hidden>)
  >Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1 (From: James Reynolds <email@hidden>)




Prev by Date:
Re: Writing to system.log from within a filesystem kext

Next by Date:
why events are not getting send on proper time from kernel to user ?

Previous by thread:
Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1

Next by thread:
Re: panic(cpu 1 caller ...): vnode_put(...): iocount < 1

Index(es):

Date
Thread