Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xsan + NFS + grid engine




Agreed that open filehandles are one of the many roots of all evil. Also agreed that it's all about the writes.


The script I've been using to crash my xsan environment does what you describe. Shared input datasets, as much as possible, are staged offline to the nodes. Output is written to the local disk on the nodes and are copied back to shared storage at job completion. The breakthrough this evening to allow the system to not crash on quite as regular a basis was to simply throw away stderr and stdout rather than trying to copy them back to shared storage.

This same script is in production use on a fair number of clusters, many larger than this one. Most of those use a single NFS share from the internal disks in the portal node. I've overloaded NFS and caused the daemons to crash in some cases, but only very rarely did it force a server reboot.

In this case, when the system goes down (reboots), all three machines running XSan go along.

Two questions more:
----------------------------
* Are there any guidelines or rules of thumb out there for how many concurrent reads / writes it takes to overwhelm a single MDC? I would happily sacrifice compute nodes to get the system to stay online ... but I want to push for one hardware upgrade (more fibre ports and cards) rather than sliding into the solution.


* Are there any options in XSan that would, say, throttle performance rather than crashing all of my portal machines?

Thanks for all the help this evening.  Sorry to be so noisy on a Friday.

-Chris Dwan

On Feb 3, 2006, at 7:19 PM, Patrick Gavin wrote:

We have a similar setup, but on a larger scale.

If you are using NFS for I/O in a cluster environment, one of the key lessons you need to pound into your users heads is that they should NEVER maintain an open NFS filehandle during a processing job for longer than is absolutely necessary. If they need to read a file, the node should copy it to local disk first. Output files should be written to local disk and then copied to the NFS server when the job is finished.

It is an extra step for your users (a wrapper script), but the payoff is huge.

The problems you are experiencing are more likely to be NFS problems than Xsan problems, but I wouldn't call Xsan "rock solid" :-)

-P

On Feb 3, 2006, at 3:13 PM, Christopher Dwan wrote:


Okay. Thanks very much for this. It's food for thought.

Do "performance" and "system crash" overlap this much for other folks? I would love to get to a point where I have a performance problem rather than a stability problem.

-Chris Dwan

On Feb 3, 2006, at 6:04 PM, Glenn Heinle wrote:


It depends quite a bit on the application and data flow. Re- exporting
Xsan via NFS is great for input data. If all compute nodes are
reading,
you can increase the number of NFS servers and scale performance.


When it comes to writes, it's better to have each compute node write
its
own output file, or at the very least, have machines that write to the
same output file do it through the same NFS server.


It may even be possible to have the compute nodes do reads through many
servers, but all use a common NFS server for writes.


If you're application has to write to a file on one or more nodes, and
read the same file back in, then it sounds like you're using files for
communications. Use an appropriate communications model for the
compute
cluster.


This is something that has to be tweaked for each
application/environment. It may be the case that the performance
penalty for writing through multiple NFS servers is out-weighed by the
gain of reading through many.


The bottom line is that there is no one ideal setup.  I've seen some
environments get optimal performance with few NFS servers and a high
nfs-client to nfs-server ratio, and other environments be just the
opposite.



-----Original Message-----
From: xsan-users-bounces+ghheinle=email@hidden
[mailto:xsan-users-bounces+ghheinle=email@hidden] On
Behalf
Of Christopher Dwan
Sent: Friday, February 03, 2006 5:35 PM
To: email@hidden
Subject: Re: Xsan + NFS + grid engine



One other thing you should carefully look at is which network is
being

used for NFS traffic and which is being used for the Xsan meta data.
If they're both on the same network, as the NFS load increases you'll
lose
connections from the NFS servers to the Xsan MDC. This will cause a
fail over and other problems.

Xsan MDC traffic is on a different network which is just for the three portal machines along with an uplink to the rest of the WAN. NFS mounts are all on a private cluster network.

The other thing to look at is how the files are being accessed.
Picture
this:
	nfs client 1 using nfs server A
	nfs client 2 using nfs server B
If nfs client 1 and nfs client 2 access the same file with one of
them

writing, you most definitely will lower performance. All nfs clients

that are accessing the same files should all go through the same nfs
server.

My earlier comments about "all Xsan machines crashing" outweighing "performance" apply here as well.

I'm confused now:  The reasons we built this setup were to:

* lower the client:server ratio on NFS (increasing reliability)
* be able to scale I/O by adding portal san machines as needed.

Reading into your comment, it sounds like this isn't the way you
would set up such a system.  Do you have a better architecture than
re-sharing this way?

An update:  I was able to increase stability by splitting up the
stderr and stdout streams from my various jobs and lowering
contention on any single file.  The portal machines still crash if I
load the cluster heavily, but it seems to have to do with file
contention while there are a large number of writes going on.

Assuming that my goal is to build a stable, scalable san for use in
cluster computing (all the nodes pointing at a common pool of data),
is there a better way to do it than this NFS re-sharing?

-Chris Dwan
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/ghheinle% 40yahoo.com


This email sent to email@hidden


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/cdwan% 40bioteam.net


This email sent to email@hidden

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/wezelboy% 40cse.ucsc.edu


This email sent to email@hidden

_______________________________________________ Do not post admin requests to the list. They will be ignored. Xsan-Users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/xsan-users/email@hidden

This email sent to email@hidden
References: 
 >RE: Xsan + NFS + grid engine (From: Glenn Heinle <email@hidden>)
 >Re: Xsan + NFS + grid engine (From: Christopher Dwan <email@hidden>)
 >Re: Xsan + NFS + grid engine (From: Patrick Gavin <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.