Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Xsan + NFS + grid engine




Hello all,

I'm setting up a mid sized (40 node) cluster in the life sciences. I'm encountering all sorts of reboots, freezes and hangs, which seem to be related to the xsan configuration, and I'm looking for a bit of advice:

We've got three portal xserves, one xraid, and XSan. We're using Sun Grid Engine for the queuing system.

Originally, we configured the san with one portal machine to be the metadata controller, one for failover, and the third as a client. We then re-exported the san volume via NFS from all three portals for a client:server ratio of 13:1 on the NFS side.

This configuration seemed very unresponsive, and it would fall over (all three portals reboot or hang) if I loaded the cluster with enough work to get a bunch of reads and writes going from all the compute nodes.

First we totally isolated the MDC machine (turned off most of the other system services) and reconfigured NFS to only serve from the two remaining portal machines (one of which was still configured as a failover MDC). This bumped the client:server ratio for NFS up to around 20:1. This performed better, though I could still knock the three portal machines over.

In this case, I noticed that the failover MDC would occasionally reboot. The logs said something about timeouts communicating with the MDC. On a hunch, I decided to remove it as a failover. This performed better still, but it still is not resilient to high loads. High loads, in this case are defined as "all the compute nodes running jobs that involve reading and writing from their NFS mounted san volumes.

When the systems are loaded, "top" shows me that the "fsm" process on the MDC is using ">>" threads, which means "more than 100".

Is there something else I should be trying here? Any advice is appreciated.

-Chris Dwan

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.