Following up, I was able to get everything functioning again. Long story short, I tried downgrading to 1.4.2 to stop the RPL upgrade process, but that did not work. To resolve the issue, I upgraded to 10.6 on one of the metadata controllers and allowed the RPL process to run. This time it completed without a panic. Once the RPL process finished, the volume mounted right away.
To be safe, I backed up all the data to some external drives (we'll have an official backup solution in July) then destroyed the XSan volume and started from scratch.
Everything (so far) is operating smoothly. If anyone would like more details on all the steps I tried, please email me and I will be happy to share.
Thanks to everyone who offered their help.
On May 28, 2010, at 11:12 AM, Rhon Fitzwater wrote:
> I have an XSan volume that crashed/panic on Monday. The crash appeared to be related to an XATTR error. After running several variations of csfsck (-w -X, etc), I was able to clear the errors. When remounting the volume, it mounted for minute and then crashed again. cvfsck reported no problems this time around. After digging further in logs I see that the RPL_Upgrade process was kicked off. We have been running XSan 2.2 for some time now (2.2.1 prior to crash). To my knowledge this process should only be executed once when upgrading from 1.4.X to 2. This process did run for us when we upgraded six months ago. I talked to Apple and they said to let the RPL process finish and things should come back online. Unfortunately, this is not the case. After taking 2 days to run the [RPL_Upgrade: removal part, it started doing the rebuild part. This gets to 50% and crashes every time now.
>> [0527 15:07:14] 0xa04f8720 (Info) RPL_Upgrade: 0% complete (1 of 8105 IEL blocks)
>> [0527 15:22:41] 0xa04f8720 (Info) RPL_Upgrade: 10% complete (811 of 8105 IEL blocks)
>> [0527 15:40:47] 0xa04f8720 (Info) RPL_Upgrade: 20% complete (1621 of 8105 IEL blocks)
>> [0527 15:53:23] 0xa04f8720 (Info) RPL_Upgrade: 30% complete (2432 of 8105 IEL blocks)
>> [0527 16:04:46] 0xa04f8720 (Info) RPL_Upgrade: 40% complete (3242 of 8105 IEL blocks)
>> [0527 16:23:49] 0xa04f8720 (Info) RPL_Upgrade: 50% complete (4053 of 8105 IEL blocks)
>> [0527 16:41:00] 0xa04f8720 (**FATAL**) PANIC: /Library/Filesystems/Xsan/bin/fsm ASSERT failed "pthread_mutex_lock(&ip->i_ref_lock) == 0" file inode.c, line 12288
>> [0527 16:41:00] 0xa04f8720 (**FATAL**) PANIC: wait 3 secs for journal to flush
>> [0527 16:41:03] 0xa04f8720 (**FATAL**) PANIC: aborting threads now.
>> Logger_thread: sleeps/353748 signals/0 flushes/420 writes/420 switches 0
>> Logger_thread: logged/448 clean/448 toss/0 signalled/0 toss_message/0
>> Logger_thread: waited/0 awakened/0
> So far the only solution I have read about is to downgrade to XSan 1.4.2. This should allow me to mount the volume, pull the data off, destroy the volume and start fresh with XSan 2.2.1.
> My problem is I do NOT have a copy of the XSan 1.4 installer and Apple does not make it available on their website. Does anyone out there have a copy they can provide with? Possibly just a disk image I can download?
> For those that say it would be easier to start from scratch and just restore the data from a backup; thanks, but unfortunately, we do not have a backup. Funding for that was denied when we purchased the XSan. Conveniently it was approved and was going to be installed next month.
> FYI: All servers are running are running 10.5.8 with all updates and XSan 2.2.1. RAID is a 16TB Promise system with latest firmware.
> If anyone has any other ideas, please let me know.
> P.S: Apologies for those who hate cross posting.
> Do not post admin requests to the list. They will be ignored.
> Xsan-Users mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> This email sent to email@hidden
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden