Ok, so I work in the financial industry, and part of my job is taking
all the member accounts from our website: http://www.marketocracy.com
and analyzing their performance.
I did this in perl, because it was the easiest thing to get up and
running
and the analysis program is perfectly suited to a "duct-tape" language
like perl.
However, I like to tweak the parameters to this script occasionally,
which means
running these large matrices to see which set of parameters works the
best.
The faster I can do this, the more money our customers make, the more
money we make.
XGrid is the natural way for me to do this since I have 3 G4s and a G5
in
my office right now.
Data in:
The script iterates over about 200MB of data files.
Data out:
The output data is about 2k of summarized data.
Terminology:
A "job" is a set of tasks that has to be done. In my case, it would
be the
entire matrix I'm running.
A "task" is a useful subset of that job that can be done by one or
more CPUs.
In my case it would be one element of the matrix.
A "thread" is just a typical thread.
Current API:
Right now, XGrid does setup and tear down at the task level. That's
not very efficient for me. I would need to put all the data files onto
a fileserver
somewhere that they could all access. By adding a job concept, I can
get more
efficiency out of XGrid, without having to do any external management.
Also,
rather then having a cluster at my disposal, I'm more likely to talk my
co-workers
into installing the screen saver, so setup has to be automatic.
What I would like XGrid to do (a push model):
for each job:
for each computer:
setup job for computer
for each processor:
setup job for processor
for each task:
Assign to computer or processor();
for each computer:
for each processor:
tear down job for processor()
tear down job for computer
In my case, the job setup for each computer would be just scp-ing the
files from
one location to /tmp, while job tear down would be rm-ing them.
Now from XGrid's point of view, since machines can come and go
dynamically,
I would expect it would actually do the setup lazily rather then all
at once, so it
would actually call the setup stuff when allocating tasks:
def allocate task():
if (task == per_processor):
processor = pick_processor()
computer = processor.computer()
if not computer.job_is_setup(job):
job.setup_computer(computer)
if not processor.job_is_setup(job);
job.setup_processor(processor_setup)
processors_used.append(processor)
elif (task == per_computer())
computer = pick_computer()
if not computer.job_is_setup(job):
job.setup_computer(computer)
computers_used.append(computer)
def job_finished():
for each processors_used():
job.teardown_processor();
for each computers_used():
job.teardown_computer()
That is, when it went to assign a task to a computer or processor, it
would first see
if the job setup had been done, and if it hadn't, it would perform it.
When the job
completes, any computer or processor used would get told to cleanup.
Alternative (a pull model):
The above assumed that the main XGrid controlling process would push
all the files
to all the CPUs. An alternative might be something more along this line:
xgrid_publish_file <job_id> my_data_file.dat
This command line tool would make "my_data_file.dat" available from
any client machine, which basically means that if a client does:
xgrid_retrieve_file <job_id> my_data_file.dat
Xgrid would first look to see if the file had already been copied.
(That is, it
would cache it, which Xgrid doesn't do now.) If it has, its done. If it
hasn't,
xgrid will retrieve the file to the local directory. This would make
things
simpler then having to setup a file server. Perhaps you can publish a
whole
directory of files.
The nice thing about this model is that it all setup is lazy, because
it
would be pretty trivial to have a preflight script for command line
processes
that just did publish for all the data files on the controller, and
retrieve on
all the clients. When Xgrid finished a job, it would (optionally, in
case those files are needed for multiple jobs) delete the files.
Except:
In my case, my perl script isn't currently setup to split things
into tasks,
because it takes a certain amount of time to load all the data files.
So what I
would really need to do is:
1. Rewrite script to run in a command-line fashion. So it would
startup, then
read commands from standard in until told to quit. I could do this
pretty easily.
2. Have setup_job() launch that script as a background task.
3. Have task_execute() send that script a message.
4. Have teardown job send a "quit" message.
So I would still need some kind of job API, the file publishing stuff
wouldn't
quite do it.
Python Plug:
PyObjC, the Python-Objective-C transparent bridge is just a great
tool, it would be wonderful if the XGrid controller could execute
plugins built in it directly. Same
thing for CamelBones and Perl.
Discussion:
The intent of this long posting was just to get everyone thinking
about how
we'd like XGrid to work in the future. Pretty much all the Grid APIs
have
some kind of job concept, its kind of inherent in the problem. Some of
them
(Condor for instance) have an idea about how to distribute files and
such
already built in. So basically, I'd like to see:
XGridProject --- A collection of jobs, has setup/teardown. Jobs are
done
one at a time until completed.
XGridJob ---- a collection of tasks, also has setup/teardown
XGridTask ---- a subset of a job, smallest divisible unit.
Tasks can
be divided per processor or per computer by
Xgrid.
I'd then like to see more hooks into the file distribution mechanism.
For instance,
seti@home and folding@home basically run as command line tools that
take different files, I think with the above methodology, its kind of
obvious how it would be possible to seti@home@home.
Finally, I'd like to be able to hook into the screensaver. One of the
key features
of Seti@home is the cool looking screensaver which motivates people to
run it. The speedometer is kind of boring.
Other thoughts:
Having a suspend()/resume() on XGridJob might make a lot of sense
for people using
XGrid in screensaver mode. For instance, folding@home uses a bunch of
checkpoint files so that it can restart from a certain point.
Condor does some of this automatically, but you have to use their
special tool to relink which doesn't work on OSX yet. It might be cool
to have a Condor plugin for
XGrid, so that you could build a tool with Condor and submit it to
XGrid.
Though I don't think Condor is open source.
_______________________________________________
xgrid-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/xgrid-users
Do not post admin requests to the list. They will be ignored.