Okay first a warning .. this is going to be a long post!
Overview: When my raid is connected through my Fibre Channel Switch it
randomly dismounts.
Specifics:
Xserve G5 (10.4.7) with the LSI Dual-Channel 2GB Fibre Channel PCI-X
HBA & Fibre Channel Utility 2.0
Xserve RAID - Each side with its own Raid 5 , hot spare - no masking &
latest firmware
Exabyte Magnum 224 FC
Qlogic Sanbox 5200 Series (12 port) - Latest Firmware (configured with
two zones (port not WWN) ; Zone 1: Port 1 HBA, Both Xraid | Zone 2:
Port 0 HBA, Exabyte
Bru
Background : This used to ALL work! Recently we began to experience
drastic slow downs when writing to the raid and common sense not
working free space, which was the problem, was the last thing I
checked. Before that though I tried swapping cables, replacing the
raid controller, redoing zoning, straight connect - Obviously none of
this was the issue. *note here: at this point I am using an identical
Fibre Channel Card in my Xserve from my G4 - a careless maneuver broke
the Apple installed card whilst it was laying on a table.*
When I got things working (made room on the volume) I happened to be
in a configuration the circumvented the switch.. raid directly into
the host and things , as said, worked fine. The next night I took the
systems down to put everything on the switch again to allow a back-up
to be ran. Now is when the current behavior begins. Randomly and
without recognizable patterns both raid volumes would dismount from
the host. Checking Raid Admin shows the volumes as being in order and
all lights are green.
This time I fear my switch has gone and after time spent with Qlogic
(and zilliions of different configurations) a new switch is sent out
in the mean time back to a straight connection and things are working
fine. New switch arrives, is configured, and immediately problem
persists. Qlogic/Apple suggest the midplane on the raid is bad (even
though it works straight through). I happen to have a brand new in box
XRAID (with current firmware) so I set that up put in the drives from
the old raid and the problem persists again when plugged into the
switch. Direct to host things work fine.
So I read some documentation on Apple and Fibre channel and made a few
changes to HBA ports and Switch Port Settings and came to the
following scenario... Volumes stayed mounted through an entire work
day, but after a brief time into a backup the volumes unmounted.
Here is some more specific info about my exact setup as it last failed...
ZONE 1 (zoned by port not WWN)
Switch Ports 0, 1, 2
HBA Port 1 into Switch Port 0
Raid into Switch Ports 1, 2
All ports are identifying themselves to switch as switch topology
HBA port configured as Topology: Point-to-Point, Speed: 2Gb/s
SW Port 0 configed as State: Online, Speed: auto, Type: Detect, I/O
Stream Guard: Enable, Device Scan: Disable
SW Port 1, 2 configed as State: Online, Speed: auto, Type: Detect, I/O
Stream Guard: Auto, Device Scan: Enable
ZONE 2 (zoned by port not WWN)
Switch Ports 8, 9
HBA Port 0 into Switch Port 9
Exabyte into Switch Port 8
Port 8 identifying as Loop topology, 9 as switch
HBA port configured as Topology: Auto, Speed: Auto
SW Port 8 configed as State: Online, Speed: auto, Type: Detect, I/O
Stream Guard: Auto, Device Scan: Enable
SW Port 9 configed as State: Online, Speed: auto, Type: Detect, I/O
Stream Guard: Enable, Device Scan: Disable
I realize that some of these settings, particularly those on the HBA
are different between ports, but we have tried virtually every
combination (I think)
Around the time of the backup job system.log on the host has these entries:
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 5 (External
Bus Reset) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: External Bus Reset for SCSI
Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is down for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 8 (Loop
State Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Loop Initialization Packet
for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 9 (Logout)
for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is active for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 6 (Rescan)
for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 5 (External
Bus Reset) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: External Bus Reset for SCSI
Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is down for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 8 (Loop
State Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Loop Initialization Packet
for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 9 (Logout)
for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is active for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 6 (Rescan)
for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 5 (External
Bus Reset) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: External Bus Reset for SCSI
Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is down for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 8 (Loop
State Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Loop Initialization Packet
for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 9 (Logout)
for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is active for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 6 (Rescan)
for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 5 (External
Bus Reset) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: External Bus Reset for SCSI
Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is down for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 8 (Loop
State Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Loop Initialization Packet
for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 9 (Logout)
for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:41 files kernel[0]: FusionFC: Link is active for SCSI Domain = 3.
Mar 7 17:55:41 files kernel[0]: FusionMPT: Notification = 6 (Rescan)
for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 5 (External
Bus Reset) for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionMPT: External Bus Reset for SCSI
Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionFC: Link is down for SCSI Domain = 3.
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 8 (Loop
State Change) for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionFC: Loop Initialization Packet
for SCSI Domain = 3.
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 9 (Logout)
for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionFC: Link is active for SCSI Domain = 3.
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 6 (Rescan)
for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: disk4s3: I/O error.
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 5 (External
Bus Reset) for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionMPT: External Bus Reset for SCSI
Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionMPT: Notification = 7 (Link
Status Change) for SCSI Domain = 3
Mar 7 17:55:42 files kernel[0]: FusionFC: Link is down for SCSI Domain = 3.
Mar 7 17:55:50 files kernel[0]: s3: I/O error.
.
.
.
=======
Then followed by a ton of various I/O errors
=======
Mar 7 17:56:42 files kernel[0]: disk4s3: I/O error.
Mar 7 17:56:42 files kernel[0]: disk4s3: I/O error.
Mar 7 17:56:42 files kernel[0]: disk4s3: I/O error.
Mar 7 17:56:42 files kernel[0]: FusionMPT: Notification = 6 (Rescan)
for SCSI Domain = 3
Mar 7 17:56:42 files kernel[0]: disk4s3: I/O error.
Mar 7 17:56:42 files kernel[0]: disk4s3: I/O error.
.
.
.
=======
Finally by media not present errors which end the log
=======
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk5s3: media is not present.
Mar 7 17:56:42 files kernel[0]: jnl: do_jnl_io: strategy err 0x6
Mar 7 17:56:42 files kernel[0]: jnl: end_transaction: only wrote 0 of
8192 bytes to the journal!
Mar 7 17:56:42 files kernel[0]: jnl: close: journal 0x4c21bb4, is
invalid. aborting outstanding transactions
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:42 files kernel[0]: disk4s3: media is not present.
Mar 7 17:56:45 files kernel[0]: jnl: close: journal 0x4c21c98, is
invalid. aborting outstanding transactions
And that folks is it.. I'm up for any and all suggestions, questions,
comments etc.. I'm absolutely desperate here!
Please, please does anyone have any ideas what's going on here.
Thanks,
James Nierodzik
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Macos-x-server mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden