Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xsan + NFS + grid engine




On Feb 3, 2006, at 1:13 PM, Christopher Dwan wrote:

Originally, we configured the san with one portal machine to be the metadata controller, one for failover, and the third as a client. We then re-exported the san volume via NFS from all three portals for a client:server ratio of 13:1 on the NFS side.

You should not be running other services on your Xsan MDCs (primary or backup). Something like "OD master" or "OD slave" or DNS server is probably OK in a pinch, but "NFS server", "AFP server", or "SMB server" definitely is not. Every one of those CPU cycles you steal from the "fsm" processes on the MDCs is going to negatively impact your SAN's overall performance.


This configuration seemed very unresponsive, and it would fall over (all three portals reboot or hang) if I loaded the cluster with enough work to get a bunch of reads and writes going from all the compute nodes.

You didn't mention what version of Xsan you are running... Have you contacted AppleCare about this problem?


First we totally isolated the MDC machine (turned off most of the other system services) and reconfigured NFS to only serve from the two remaining portal machines (one of which was still configured as a failover MDC). This bumped the client:server ratio for NFS up to around 20:1. This performed better, though I could still knock the three portal machines over.

That's because the "backup MDC" can still potentially host the SAN volume, and anytime it is doing so you run into the same problem as with the original configuration.


In this case, I noticed that the failover MDC would occasionally reboot. The logs said something about timeouts communicating with the MDC. On a hunch, I decided to remove it as a failover. This performed better still, but it still is not resilient to high loads. High loads, in this case are defined as "all the compute nodes running jobs that involve reading and writing from their NFS mounted san volumes.

At this point, what is failing? The NFS server or the MDCs? I suggest you contact AppleCare about this issue.


When the systems are loaded, "top" shows me that the "fsm" process on the MDC is using ">>" threads, which means "more than 100".

It is normal for the "fsm" processes to have > 100 threads.

-- thorpej

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/email@hidden

This email sent to email@hidden
References: 
 >Xsan + NFS + grid engine (From: Christopher Dwan <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.