One other thing you should carefully look at is which network is being
used for NFS traffic and which is being used for the Xsan meta data.
If
they're both on the same network, as the NFS load increases you'll lose
connections from the NFS servers to the Xsan MDC. This will cause a
fail over and other problems.
The other thing to look at is how the files are being accessed.
Picture
this:
nfs client 1 using nfs server A
nfs client 2 using nfs server B
If nfs client 1 and nfs client 2 access the same file with one of them
writing, you most definitely will lower performance. All nfs clients
that are accessing the same files should all go through the same nfs
server.
-----Original Message-----
From: xsan-users-bounces+ghheinle=email@hidden
[mailto:xsan-users-bounces+ghheinle=email@hidden] On
Behalf
Of Jason Thorpe
Sent: Friday, February 03, 2006 4:51 PM
To: Christopher Dwan
Cc: email@hidden
Subject: Re: Xsan + NFS + grid engine
On Feb 3, 2006, at 1:13 PM, Christopher Dwan wrote:
> Originally, we configured the san with one portal machine to be the
> metadata controller, one for failover, and the third as a client.
> We then re-exported the san volume via NFS from all three portals
> for a client:server ratio of 13:1 on the NFS side.
You should not be running other services on your Xsan MDCs (primary
or backup). Something like "OD master" or "OD slave" or DNS server
is probably OK in a pinch, but "NFS server", "AFP server", or "SMB
server" definitely is not. Every one of those CPU cycles you steal
from the "fsm" processes on the MDCs is going to negatively impact
your SAN's overall performance.
> This configuration seemed very unresponsive, and it would fall over
> (all three portals reboot or hang) if I loaded the cluster with
> enough work to get a bunch of reads and writes going from all the
> compute nodes.
You didn't mention what version of Xsan you are running... Have you
contacted AppleCare about this problem?
> First we totally isolated the MDC machine (turned off most of the
> other system services) and reconfigured NFS to only serve from the
> two remaining portal machines (one of which was still configured as
> a failover MDC). This bumped the client:server ratio for NFS up to
> around 20:1. This performed better, though I could still knock the
> three portal machines over.
That's because the "backup MDC" can still potentially host the SAN
volume, and anytime it is doing so you run into the same problem as
with the original configuration.
> In this case, I noticed that the failover MDC would occasionally
> reboot. The logs said something about timeouts communicating with
> the MDC. On a hunch, I decided to remove it as a failover. This
> performed better still, but it still is not resilient to high
> loads. High loads, in this case are defined as "all the compute
> nodes running jobs that involve reading and writing from their NFS
> mounted san volumes.
At this point, what is failing? The NFS server or the MDCs? I
suggest you contact AppleCare about this issue.
> When the systems are loaded, "top" shows me that the "fsm" process
> on the MDC is using ">>" threads, which means "more than 100".
It is normal for the "fsm" processes to have > 100 threads.
-- thorpej
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/email@hidden
This email sent to email@hidden
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/email@hidden
This email sent to email@hidden