Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Software RAID0 IO error




On Nov 23, 2007, at 10:05 AM, Dan Shoop wrote:


On Nov 16, 2007, at 3:28 PM, Marcus Lingl wrote:


On Nov 16, 2007, at 11:09 AM, Flynn, Daniel wrote:

Good Day List,

Xserve G4
OS X 10.4.9
Xserve RAID, 2 LUN's of 6X250GB HDD's striped via Apple RAID 1.
Direct connected via copper SFP-SFP cables.

During periods of high IO (I think) RAID volume unmounts.

system.log:

Nov 14 23:53:45 eng-tvstfgfx1 kernel[0]: AppleRAID::completeRAIDRequest -
error 0xe00002ca detected for set "gfx_raid01"
(E1AD0375-67AE-11D8-8EAA-000A958B4568), member
64391080-FF5B-4546-9CAF-D50500000000, set byte offset = 641573285888.


Nov 14 23:53:45 eng-tvstfgfx1 kernel[0]: disk5: I/O error.
Nov 14 23:53:45 eng-tvstfgfx1 kernel[0]: AppleRAID::recover() member
64391080-FF5B-4546-9CAF-D50500000000 from set "gfx_raid01"
(E1AD0375-67AE-11D8-8EAA-000A958B4568) has been marked offline.

Nov 14 23:53:45 eng-tvstfgfx1 kernel[0]: AppleRAID::restartSet - restarting
set "gfx_raid01" (E1AD0375-67AE-11D8-8EAA-000A958B4568).


Nov 14 23:54:00 eng-tvstfgfx1 kernel[0]: AppleRAID::completeRAIDRequest -
underrun detected, expected = 0x8000, actual = 0x0, set = "gfx_raid01"
(E1AD0375-67AE-11D8-8EAA-000A958B4568)


Nov 14 23:54:00 eng-tvstfgfx1 kernel[0]: disk5: data underrun.
Nov 14 23:54:00 eng-tvstfgfx1 kernel[0]: disk5: media is not present.
Nov 14 23:54:00 eng-tvstfgfx1 kernel[0]: disk5: media is not present.
Nov 14 23:54:00 eng-tvstfgfx1 kernel[0]: jnl: do_jnl_io: strategy err 0x6
Nov 14 23:54:00 eng-tvstfgfx1 kernel[0]: jnl: end_transaction: only wrote 0
of 24576 bytes to the journal!


Nov 14 23:54:00 eng-tvstfgfx1 kernel[0]: jnl: close: journal 0x2c89d7c, is
invalid. aborting outstanding transactions



One of the LUN's is no longer available to the OS. Restart, and the volume
mounts as expected.


$ diskutil verifyVolume

indicates volume does not need repair.

Once volume is restored:

$ diskutil checkRAID

Name: gfx_raid01
Unique ID: E1AD0375-67AE-11D8-8EAA-000A958B4568
Type: Stripe
Status: Online
Device Node: disk5
Apple RAID Version: 1
-------------------------------------------------------------------- --
# Device Node UUID Status
-------------------------------------------------------------------- --
1 disk3s3 411AA7C0-8186-40D8-82D1-E50000000000 Online
0 disk4s3 90191C39-E1DD-44FC-9371-DC7600000000 Online
-------------------------------------------------------------------- --




This happens consistently when an rsync cron job is enabled. Therefore it is
currently disabled. Also, I believe a Retrospect job triggered the behavior
and log entries above. I have only witnessed this behavior under these 2
scenarios.


There are no errors with the underlying LUN's in RAID Admin.

Using /bin/dd to write to the volume results in 127.89 MB/s.

The HBA has been replaced and the behavior remains.

Any thoughts?

Thank you,

-dgf


I have had this very same exact thing happen to me. Made me bang my head against the wall. In my case, it was also an rsync cron job that made the volume unavailable to the OS (for writing, anyways -- users could still read, but make no changes). Would bring up an error -50 in the Finder anytime someone tried writing to the volume.


I switched to using unison instead of rsync and haven't had the problem since. It's slower, but works like a champ.

http://www.cis.upenn.edu/~bcpierce/unison/

Marcus




_______________________________________________
Do not post admin requests to the list. They will be ignored.
Macos-x-server mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/macos-x-server/shoop% 40iwiring.net


This email sent to email@hidden


Have either of you bothered to turn off spotlight when doing high I/ O operations to your volumes???



-dhan


Yes, I made sure that spotlight was turned off when doing my troubleshooting. And by "off" I mean that I disabled indexing for the volume by doing "mdutil -i off <volume_name>". If that is incorrect, please correct me. I would be more than happy to re-test my syncing with rsync to see if the I/O errors come back.


Now that I went back to look at my troubleshooting notes, I remember that the I/O errors only happened if I used the -E switch with rsync. When I removed the -E switch, no more I/O errors. Unfortunately, the files I am syncing absolutely require their resource forks. ACLs are not enabled on either volume being sync'd. Both volumes are HFS+ (Journaled). I noticed with rsync (with -E), that it copies the resource fork for every file, whether the file changed or not (in my case that meant ~150,000 files copied on each sync). Unison also copies the resource fork, but seemingly not in the I/O pummeling manner that rsync does :)

Marcus


_______________________________________________ Do not post admin requests to the list. They will be ignored. Macos-x-server mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/macos-x-server/email@hidden

This email sent to email@hidden
References: 
 >Software RAID0 IO error (From: "Flynn, Daniel" <email@hidden>)
 >Re: Software RAID0 IO error (From: Marcus Lingl <email@hidden>)
 >Re: Software RAID0 IO error (From: Dan Shoop <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.