It depends quite a bit on the application and data flow. Re-exporting
Xsan via NFS is great for input data. If all compute nodes are
reading,
you can increase the number of NFS servers and scale performance.
When it comes to writes, it's better to have each compute node write
its
own output file, or at the very least, have machines that write to the
same output file do it through the same NFS server.
It may even be possible to have the compute nodes do reads through many
servers, but all use a common NFS server for writes.
If you're application has to write to a file on one or more nodes, and
read the same file back in, then it sounds like you're using files for
communications. Use an appropriate communications model for the
compute
cluster.
This is something that has to be tweaked for each
application/environment. It may be the case that the performance
penalty for writing through multiple NFS servers is out-weighed by the
gain of reading through many.
The bottom line is that there is no one ideal setup. I've seen some
environments get optimal performance with few NFS servers and a high
nfs-client to nfs-server ratio, and other environments be just the
opposite.
-----Original Message-----
From: xsan-users-bounces+ghheinle=email@hidden
[mailto:xsan-users-bounces+ghheinle=email@hidden] On
Behalf
Of Christopher Dwan
Sent: Friday, February 03, 2006 5:35 PM
To: email@hidden
Subject: Re: Xsan + NFS + grid engine
> One other thing you should carefully look at is which network is
being
> used for NFS traffic and which is being used for the Xsan meta data.
> If they're both on the same network, as the NFS load increases you'll
> lose
> connections from the NFS servers to the Xsan MDC. This will cause a
> fail over and other problems.
Xsan MDC traffic is on a different network which is just for the
three portal machines along with an uplink to the rest of the WAN.
NFS mounts are all on a private cluster network.
> The other thing to look at is how the files are being accessed.
> Picture
> this:
> nfs client 1 using nfs server A
> nfs client 2 using nfs server B
> If nfs client 1 and nfs client 2 access the same file with one of
them
> writing, you most definitely will lower performance. All nfs clients
> that are accessing the same files should all go through the same nfs
> server.
My earlier comments about "all Xsan machines crashing" outweighing
"performance" apply here as well.
I'm confused now: The reasons we built this setup were to:
* lower the client:server ratio on NFS (increasing reliability)
* be able to scale I/O by adding portal san machines as needed.
Reading into your comment, it sounds like this isn't the way you
would set up such a system. Do you have a better architecture than
re-sharing this way?
An update: I was able to increase stability by splitting up the
stderr and stdout streams from my various jobs and lowering
contention on any single file. The portal machines still crash if I
load the cluster heavily, but it seems to have to do with file
contention while there are a large number of writes going on.
Assuming that my goal is to build a stable, scalable san for use in
cluster computing (all the nodes pointing at a common pool of data),
is there a better way to do it than this NFS re-sharing?
-Chris Dwan
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/email@hidden
This email sent to email@hidden
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/email@hidden
This email sent to email@hidden