Snapshots
By: James Reynolds - Revised: 2006-06-06 devinIntroduction
View snapshots of Xgrid 1.0 that demonstrate how to set up the agent, the controllor, and a grid and how to run a job.
Setting up the controller
xgrid(1) BSD General Commands Manual xgrid(1)
NAME
xgrid -- submit and monitor xgrid jobs
SYNOPSIS
xgrid [-h[ostname] hostname] [-auth { Password | Kerberos }]
[-p[assword] password]
xgrid -job run [-gid grid-identifier] [-si stdin] [-in indir]
[-so stdout] [-se stderr] [-out outdir] [-email email-address] cmd
[arg1 [...]]
xgrid -job submit [-gid grid-identifier] [-si stdin] [-in indir]
[-dids jobid [, jobid]*] [-email email-address] cmd [arg1 [...]]
xgrid -job batch [-gid grid-identifier] xml-batch-file
xgrid -job results -id identifier [-so stdout] [-se stderr] [-out outdir]
xgrid -job {stop | suspend | resume | delete | specification | restart}
-id identifier
xgrid -job list [-gid grid-identifier]
xgrid -job attributes -id identifier
xgrid -grid list
xgrid -grid attributes -gid identifier
OPTIONS
The available options are as follows:
-h[ostname] hostname the hostname or IP address of the
controller -- if not present, xgrid
will use the value specified in
XGRID_CONTROLLER_HOSTNAME
-auth { Password | Kerberos } Default is Password. If Kerberos,
then use or obtain a kerberos ticket
to authenticate to the Xgrid con-
troller
-p[assword] password if the controller requires a password
(as is the default), the password
must be specified here or in
XGRID_CONTROLLER_PASSWORD
-id identifier the xgrid identifier for the job of
interest (not used when submitting a
job)
-gid grid-identifier the xgrid identifier for logical grid
of interest
-job {list | attributes} retrieve list of, or attributes of a
job
-job stop stop job execution but don't delete
job
-job suspend suspend execution of the job (i.e. do
not schedule any pending tasks)
-job resume resume execution of the job -- any
pending tasks may now be scheduled
-job delete stop job execution (if it is running)
and delete job
-job results retrieve results of job previously
submitted
-job specification retrieve specification used when sub-
mitting job
-job restart restart a running or stopped job from
the beginning
-job {submit | run} cmd arg1 ... specify a command (with arguments) to
either run synchronously or submit
asynchronously
-si stdin for submit/run, file to use for stan-
dard input
-in indir for submit/run, working directory to
submit with job
-so stdout for run/results, file to write the
standard output stream to
-se stderr for run/results, file to write the
standard error stream to
-out outdir for run/results, directory to store
job results in
-dids jobid [, jobid]* do not schedule this job for execu-
tion until the list of dependent jobs
have successfully Finished.
-email email-address email address to send job state
change notifications (i.e. "Finished"
or "Cancelled")
-grid list list identifiers of all logical grids
-grid attributes retrieve attributes of the specified
logical grid
RETURN VALUES
Prints the job identifier when a job is submitted.
Returns 0 if command completed successfully.
ENVIRONMENT
XGRID_CONTROLLER_HOSTNAME gives the hostname or IP address of the con-
troller
XGRID_CONTROLLER_PASSWORD gives the password of the controller if one is
required
FILES
/etc/xgrid/agent/controller-password
Password that the agent may require the controller to have (only
readable by root)
/etc/xgrid/controller/agent-password
Password that the controller may be required by the agent to have
(only readable by root)
/etc/xgrid/controller/client-password
Password that the controller may require the client to have (only
readable by root)
/Library/Preferences/com.apple.xgrid.controller.plist
Controller preferences file
/etc/xgrid/controller/com.apple.xgrid.controller.plist.default
Commented sample of all controller preferences
/Library/Preferences/com.apple.xgrid.agent.plist
Agent preferences file
/etc/xgrid/agent/com.apple.xgrid.agent.plist.default
Commented sample of all agent preferences
EXAMPLES
List all grids and specify hostname and password:
$ xgrid -h mycomputer.apple.com -p pword -grid list
(0, 1)
Show information about grid 0:
$ xgrid -h mycomputer.apple.com -p pword -grid attributes -id 0
{isDefault = YES; name = Xgrid; }
Set environment variables for the following examples (tcsh):
% setenv XGRID_CONTROLLER_HOSTNAME mycomputer.apple.com
% setenv XGRID_CONTROLLER_PASSWORD pword
Set environment variables for the following examples (bash):
$ export XGRID_CONTROLLER_HOSTNAME=mycomputer.apple.com
$ export XGRID_CONTROLLER_PASSWORD=pword
List jobs and then get the attributes for one of the jobs:
$ xgrid -job list
{ jobArray = ({ jobIdentifier = 26; } ); }
$ xgrid -job attributes -id 26
Run the cal program (specified using a full path so the command isn't
copied) on a single node synchronously and print the output to standard
output:
$ xgrid -job run /usr/bin/cal 2005
Submit myscript with the files in the input directory. Send email to
somebody@apple.com on every job state change. Then retrieve the results
and save the stdout and stderr streams in files instead of printing them
out to the terminal and save the output files in the specified directory.
Finally delete the job:
$ xgrid -job submit -in ~/data/working -email somebody@apple.com
myscript param1 param2
{ jobIdentifier = 27; }
$ xgrid -job results -id 27 -so job.out -se job.err -out job-outdir
$ xgrid -job delete -id 27
Batch file examples
Complex multi-job specification may be submitted via an XML batch prop-
erty list. Batch file job specifications are submitted as follows:
$ xgrid -job batch sample_batch.xml
The following XML plist submits a simple /usr/bin/cal job:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
<dict>
<key>name</key>
<string>Cal Job</string>
<key>taskSpecifications</key>
<dict>
<key>0</key>
<dict>
<key>command</key>
<string>/usr/bin/cal</string>
<key>arguments</key>
<array>
<string>6</string>
<string>2005</string>
</array>
</dict>
</dict>
</dict>
</array>
</plist>
A list of all XML batch submission properties follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
<dict>
<!-- A symbolic name of the job -->
<key>name</key>
<string>Full Job</string>
<!-- Notification of all job state changes will be sent to this email address -->
<key>notificationEmail</key>
<string>somebody@example.com</string>
<key>schedulerParameters</key>
<dict>
<!-- do all of the given tasks need to start at the same time? -->
<key>tasksMustStartSimultaneously</key>
<string>YES</string>
<!-- what's the minimum number that need to start at the same time? -->
<key>minimumTaskCount</key>
<integer>5</integer>
<!-- do not schedule this job until the following job (id's) have finished successfully -->
<key>dependsOnJobs</key>
<array>
<string>23</string>
<string>44</string>
</array>
</dict>
<key>inputFiles</key>
<dict>
<!-- the file 'textfile' will be created on agent machines in the working directory -->
<key>textfile</key>
<dict>
<!-- base64 encoded file data -->
<key>fileData</key>
<!-- 'this is a test' -->
<string>dGhpcyBpcyBhIHRlc3Q=</string>
<!-- should this file have execute permission? -->
<key>isExecutable</key>
<string>NO</string>
</dict>
<!-- create 'textfile' in the directory 'task1'-->
<key>task1/textfile</key>
<dict>
<key>fileData</key>
<string>dGhpcyBpcyBhIHRlc3Q=</string>
</dict>
</dict>
<!-- define some prototype task specifications. Here we can define sets of common parts of taskSpecification
s -->
<!-- Any taskSpecifications settings are valid. -->
<key>taskPrototypes</key>
<dict>
<key>echoTask</key>
<dict>
<key>command</key>
<string>/bin/echo</string>
<key>arguments</key>
<array>
<string>echoTask Arguments</string>
<string>are here</string>
</array>
</dict>
</dict>
<!-- specifications of all tasks of this job -->
<key>taskSpecifications</key>
<dict>
<!-- key is symbolic task name -->
<key>0</key>
<dict>
<!-- command to execute -->
<key>command</key>
<string>/bin/echo</string>
<!-- environment dictionary -->
<key>environment</key>
<dict>
<key>MY_ENV_VARIABLE</key>
<string>MY_VALUE</string>
</dict>
<!-- argument array -->
<key>arguments</key>
<array>
<string>HelloWorld</string>
</array>
<!-- use the given file as <stdin> -->
<key>inputStream</key>
<string>textfile</string>
<!-- do not start this task until the following tasks (symbolic names) have finished successfully --
>
<key>dependsOnTasks</key>
<array>
<string>1</string>
</array>
</dict>
<key>1</key>
<dict>
<!-- by default use the echoTask prototype settings -->
<key>taskPrototypeIdentifier</key>
<string>echoTask</string>
<!-- override the prototype setting for arguments -->
<key>arguments</key>
<array>
<string>Task 1</string>
</array>
<!-- map a subset of files in the 'inputFiles' section for this task only -->
<key>inputFileMap<key>
<dict>
<key>textfile</key>
<string>task1/textfile</string>
</dict>
</dict>
</dict>
</dict>
<!-- a completely different job -->
<dict>
<key>name</key>
<string>Calendar Job</string>
<key>taskSpecifications</key>
<dict>
<key>0</key>
<dict>
<key>command</key>
<string>/usr/bin/cal</string>
<key>arguments</key>
<array>
<string>6</string>
<string>2005</string>
</array>
</dict>
</dict>
</dict>
</array>
</plist>
DIAGNOSTICS
xgrid with no arguments will print the usage message
ERRORS
If a job fails due to an error, the error code reported is simply the
error code that was returned by the task (or by the system). (For system
error descriptions, see /usr/include/sys/errno.h.) Common errors include:
o Error 1: The permissions of some file are set incorrectly.
o Error 2: No such file exists.
o Error 8: The executable file wasn't actually executable.
SEE ALSO
xgridctl(8) ssh(1)
CONFORMING TO
The protocol used by Xgrid conforms to:
o RFC 3080 (BEEP Protocol)
o Apple XML property list (plist) specification
HISTORY
Xgrid's history can be traced back to Zilla, which was developed by NeXT
in the late 80's and was the first clustering desktop program to make use
of the "noninterventive screen saver" motif, a motif which is now common-
place and widely used in projects like Seti@Home. Zilla won the 1991
national ComputerWorld-Smithsonian Award in the science category for this
noninterventive, community-supercomputing paradigm.
Apple acquired the rights to Zilla in 1997, and later used that as the
inspiration for the research project which became Xgrid. Xgrid was pub-
licly launched as a Technology Preview on January 6, 2004 at MacWorld San
Francisco.
BUGS
Some commonly reported issues are:
1. The controller always copies over the commands and working directory
to the agent before each individual task, rather than checking to
see if it is cached
Bug reports can be sent to bugreport.apple.com
Feedback can be sent to xgrid@apple.com
Mac OS January 25, 2005 Mac OS