Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Directories, parallel jobs, and XSan?




Working directories contain almost nothing. A job script, the inputs, and a few input and output files. These jobs are onesie- twosie jobs started through a web portal. This is the only thing running on the cluster.


The San traffic is on a different network, served by a different switch, from the MPI traffic.

If the MDC was unable to satisfy requests, was losing contact with clients, or similar problems, would that generate traffic in any of the log files? Is there a "verbose" mode to get some info on this sort of thing? I hesitate to just blame XSan or wave my hands and say "network problems / MDC overloaded / throw hardware at it and hope it goes away".

There could be file concurrency problems. In your experience, would those be intermittent, like these?

I really appreciate all your advice on this.

-Chris Dwan

On Feb 6, 2006, at 1:07 PM, Patrick Gavin wrote:

Are the working directories filled with a lot of smaller fasta files?

I'm thinking that it is a performance issue with the MDC.

-P

On Feb 6, 2006, at 8:16 AM, Christopher Dwan wrote:


Is the working directory on san disk mounted via NFS?

Yup. Both the working directory and the home directories live in a San which is mounted on the compute nodes via NFS.


Linux clients?

All of these systems are OS X, Tiger.

-Chris Dwan

On Feb 6, 2006, at 7:25 AM, Christopher Dwan wrote:


First, thank you very much to the community for all the helpful information on Friday.


I'm now encountering an intermittent error with one of our applications (mpiblast). It's integrated with the cluster scheduler (SGE), and all of the directories involved are XSan volumes re-exported via NFS to the compute nodes.

Sometimes, jobs will fail because they cannot find their startup directory:

Can't start from current directory: No such file or directory
sh: -c: line 1: unexpected EOF while looking for matching `''
sh: -c: line 4: syntax error: unexpected end of file

This persists even if I insert "sleep" or "while (!(-e /the/ appropriate/directory)) {sleep;}}" in my submission script. In fact, I can pause the job and log in to check whether the directory exists is mounted on the compute node, and it *is*.

This is, however, intermittent. Sometimes jobs work fine. Occasionally, I will have a job work, but I get these in the STDERR:

shell-init: could not get current directory: getcwd: cannot access parent directories: No such file or directory.

Other jobs work fine. It's only when a single parallel job tries to start on many of the cluster nodes at the same time that it *sometimes* can't find its startup directory (or the home directory of the submitting user).

Any advice or insight would be appreciated.

-Chris Dwan
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xsan-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xsan-users/wezelboy% 40cse.ucsc.edu


This email sent to email@hidden

_______________________________________________ Do not post admin requests to the list. They will be ignored. Xsan-Users mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/xsan-users/email@hidden

This email sent to email@hidden
References: 
 >Directories, parallel jobs, and XSan? (From: Christopher Dwan <email@hidden>)
 >Re: Directories, parallel jobs, and XSan? (From: Patrick Gavin <email@hidden>)
 >Re: Directories, parallel jobs, and XSan? (From: Christopher Dwan <email@hidden>)
 >Re: Directories, parallel jobs, and XSan? (From: Patrick Gavin <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.