Snapshots

By: James Reynolds - Revised: 2006-06-06 devin

Introduction

View snapshots of Xgrid 1.0 that demonstrate how to set up the agent, the controllor, and a grid and how to run a job.


Setting up the agent


Setting up the controller






Setting up a grid






Running a job


The man page

xgrid(1)                  BSD General Commands Manual                 xgrid(1)

NAME
     xgrid -- submit and monitor xgrid jobs

SYNOPSIS
     xgrid [-h[ostname] hostname] [-auth { Password | Kerberos }]
           [-p[assword] password]
     xgrid -job run [-gid grid-identifier] [-si stdin] [-in indir]
           [-so stdout] [-se stderr] [-out outdir] [-email email-address] cmd
           [arg1 [...]]
     xgrid -job submit [-gid grid-identifier] [-si stdin] [-in indir]
           [-dids jobid [, jobid]*] [-email email-address] cmd [arg1 [...]]
     xgrid -job batch [-gid grid-identifier] xml-batch-file
     xgrid -job results -id identifier [-so stdout] [-se stderr] [-out outdir]
     xgrid -job {stop | suspend | resume | delete | specification | restart}
           -id identifier
     xgrid -job list [-gid grid-identifier]
     xgrid -job attributes -id identifier
     xgrid -grid list
     xgrid -grid attributes -gid identifier

OPTIONS
     The available options are as follows:

     -h[ostname] hostname                the hostname or IP address of the
                                         controller -- if not present, xgrid
                                         will use the value specified in
                                         XGRID_CONTROLLER_HOSTNAME

     -auth { Password | Kerberos }       Default is Password. If Kerberos,
                                         then use or obtain a kerberos ticket
                                         to authenticate to the Xgrid con-
                                         troller

     -p[assword] password                if the controller requires a password
                                         (as is the default), the password
                                         must be specified here or in
                                         XGRID_CONTROLLER_PASSWORD

     -id identifier                      the xgrid identifier for the job of
                                         interest (not used when submitting a
                                         job)

     -gid grid-identifier                the xgrid identifier for logical grid
                                         of interest

     -job {list | attributes}            retrieve list of, or attributes of a
                                         job

     -job stop                           stop job execution but don't delete
                                         job

     -job suspend                        suspend execution of the job (i.e. do
                                         not schedule any pending tasks)

     -job resume                         resume execution of the job -- any
                                         pending tasks may now be scheduled

     -job delete                         stop job execution (if it is running)
                                         and delete job

     -job results                        retrieve results of job previously
                                         submitted

     -job specification                  retrieve specification used when sub-
                                         mitting job

     -job restart                        restart a running or stopped job from
                                         the beginning

     -job {submit | run} cmd arg1 ...    specify a command (with arguments) to
                                         either run synchronously or submit
                                         asynchronously

     -si stdin                           for submit/run, file to use for stan-
                                         dard input

     -in indir                           for submit/run, working directory to
                                         submit with job

     -so stdout                          for run/results, file to write the
                                         standard output stream to

     -se stderr                          for run/results, file to write the
                                         standard error stream to

     -out outdir                         for run/results, directory to store
                                         job results in

     -dids jobid [, jobid]*              do not schedule this job for execu-
                                         tion until the list of dependent jobs
                                         have successfully Finished.

     -email email-address                email address to send job state
                                         change notifications (i.e. "Finished"
                                         or "Cancelled")

     -grid list                          list identifiers of all logical grids

     -grid attributes                    retrieve attributes of the specified
                                         logical grid

RETURN VALUES
     Prints the job identifier when a job is submitted.

     Returns 0 if command completed successfully.

ENVIRONMENT
     XGRID_CONTROLLER_HOSTNAME gives the hostname or IP address of the con-
     troller

     XGRID_CONTROLLER_PASSWORD gives the password of the controller if one is
     required

FILES
     /etc/xgrid/agent/controller-password
             Password that the agent may require the controller to have (only
             readable by root)

     /etc/xgrid/controller/agent-password
             Password that the controller may be required by the agent to have
             (only readable by root)

     /etc/xgrid/controller/client-password
             Password that the controller may require the client to have (only
             readable by root)

     /Library/Preferences/com.apple.xgrid.controller.plist
             Controller preferences file

     /etc/xgrid/controller/com.apple.xgrid.controller.plist.default
             Commented sample of all controller preferences

     /Library/Preferences/com.apple.xgrid.agent.plist
             Agent preferences file

     /etc/xgrid/agent/com.apple.xgrid.agent.plist.default
             Commented sample of all agent preferences

EXAMPLES
     List all grids and specify hostname and password:

           $ xgrid -h mycomputer.apple.com -p pword -grid list
           (0, 1)

     Show information about grid 0:

           $ xgrid -h mycomputer.apple.com -p pword -grid attributes -id 0
           {isDefault = YES; name = Xgrid; }

     Set environment variables for the following examples (tcsh):

           % setenv XGRID_CONTROLLER_HOSTNAME mycomputer.apple.com
           % setenv XGRID_CONTROLLER_PASSWORD pword

     Set environment variables for the following examples (bash):

           $ export XGRID_CONTROLLER_HOSTNAME=mycomputer.apple.com
           $ export XGRID_CONTROLLER_PASSWORD=pword

     List jobs and then get the attributes for one of the jobs:

           $ xgrid -job list
           { jobArray = ({ jobIdentifier = 26; } ); }
           $ xgrid -job attributes -id 26

     Run the cal program (specified using a full path so the command isn't
     copied) on a single node synchronously and print the output to standard
     output:

           $ xgrid -job run /usr/bin/cal 2005

     Submit myscript with the files in the input directory.  Send email to
     somebody@apple.com on every job state change. Then retrieve the results
     and save the stdout and stderr streams in files instead of printing them
     out to the terminal and save the output files in the specified directory.
     Finally delete the job:

           $ xgrid -job submit -in ~/data/working -email somebody@apple.com
           myscript param1 param2
           { jobIdentifier = 27; }
           $ xgrid -job results -id 27 -so job.out -se job.err -out job-outdir
           $ xgrid -job delete -id 27

   Batch file examples
     Complex multi-job specification may be submitted via an XML batch prop-
     erty list. Batch file job specifications are submitted as follows:

           $ xgrid -job batch sample_batch.xml

     The following XML plist submits a simple /usr/bin/cal job:

     <?xml version="1.0" encoding="UTF-8"?>
     <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
     <plist version="1.0">
     <array>
         <dict>
             <key>name</key>
             <string>Cal Job</string>
             <key>taskSpecifications</key>
             <dict>
                 <key>0</key>
                 <dict>
                     <key>command</key>
                     <string>/usr/bin/cal</string>
                     <key>arguments</key>
                     <array>
                         <string>6</string>
                         <string>2005</string>
                     </array>
                 </dict>
             </dict>
         </dict>
     </array>
     </plist>

     A list of all XML batch submission properties follows:

     <?xml version="1.0" encoding="UTF-8"?>
     <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
     <plist version="1.0">
     <array>
         <dict>
             <!-- A symbolic name of the job -->
             <key>name</key>
             <string>Full Job</string>

             <!-- Notification of all job state changes will be sent to this email address -->
             <key>notificationEmail</key>
             <string>somebody@example.com</string>

             <key>schedulerParameters</key>
             <dict>
                 <!-- do all of the given tasks need to start at the same time? -->
                 <key>tasksMustStartSimultaneously</key>
                 <string>YES</string>

                 <!-- what's the minimum number that need to start at the same time? -->
                 <key>minimumTaskCount</key>
                 <integer>5</integer>

                 <!-- do not schedule this job until the following job (id's) have finished successfully -->
                 <key>dependsOnJobs</key>
                 <array>
                     <string>23</string>
                     <string>44</string>
                 </array>
             </dict>

             <key>inputFiles</key>
             <dict>
                 <!-- the file 'textfile' will be created on agent machines in the working directory -->
                 <key>textfile</key>
                 <dict>
                     <!-- base64 encoded file data -->
                     <key>fileData</key>
                     <!-- 'this is a test' -->
                     <string>dGhpcyBpcyBhIHRlc3Q=</string>
                     <!-- should this file have execute permission? -->
                     <key>isExecutable</key>
                     <string>NO</string>
                 </dict>

                 <!-- create 'textfile' in the directory 'task1'-->
                 <key>task1/textfile</key>
                 <dict>
                     <key>fileData</key>
                     <string>dGhpcyBpcyBhIHRlc3Q=</string>
                 </dict>
             </dict>

             <!-- define some prototype task specifications. Here we can define sets of common parts of taskSpecification
s -->
             <!-- Any taskSpecifications settings are valid. -->
             <key>taskPrototypes</key>
             <dict>
                 <key>echoTask</key>
                 <dict>
                     <key>command</key>
                     <string>/bin/echo</string>
                     <key>arguments</key>
                     <array>
                         <string>echoTask Arguments</string>
                         <string>are here</string>
                     </array>
                 </dict>
             </dict>

             <!-- specifications of all tasks of this job -->
             <key>taskSpecifications</key>
             <dict>

                 <!-- key is symbolic task name -->
                 <key>0</key>
                 <dict>
                     <!-- command to execute -->
                     <key>command</key>
                     <string>/bin/echo</string>

                     <!-- environment dictionary -->
                     <key>environment</key>
                     <dict>
                         <key>MY_ENV_VARIABLE</key>
                         <string>MY_VALUE</string>
                     </dict>

                     <!-- argument array -->
                     <key>arguments</key>
                     <array>
                         <string>HelloWorld</string>
                     </array>

                     <!-- use the given file as <stdin> -->
                     <key>inputStream</key>
                     <string>textfile</string>

                     <!-- do not start this task until the following tasks (symbolic names) have finished successfully --
>
                     <key>dependsOnTasks</key>
                     <array>
                         <string>1</string>
                     </array>
                 </dict>

                 <key>1</key>
                 <dict>
                     <!-- by default use the echoTask prototype settings -->
                     <key>taskPrototypeIdentifier</key>
                     <string>echoTask</string>

                     <!-- override the prototype setting for arguments -->
                     <key>arguments</key>
                     <array>
                         <string>Task 1</string>
                     </array>

                     <!-- map a subset of files in the 'inputFiles' section for this task only -->
                     <key>inputFileMap<key>
                     <dict>
                         <key>textfile</key>
                         <string>task1/textfile</string>
                     </dict>

                 </dict>
             </dict>
         </dict>

         <!-- a completely different job -->
         <dict>
             <key>name</key>
             <string>Calendar Job</string>

             <key>taskSpecifications</key>
             <dict>
                 <key>0</key>
                 <dict>
                     <key>command</key>
                     <string>/usr/bin/cal</string>
                     <key>arguments</key>
                     <array>
                         <string>6</string>
                         <string>2005</string>
                     </array>
                 </dict>
             </dict>
         </dict>

     </array>

     </plist>

DIAGNOSTICS
     xgrid with no arguments will print the usage message

ERRORS
     If a job fails due to an error, the error code reported is simply the
     error code that was returned by the task (or by the system).  (For system
     error descriptions, see /usr/include/sys/errno.h.) Common errors include:

     o   Error 1: The permissions of some file are set incorrectly.

     o   Error 2: No such file exists.

     o   Error 8: The executable file wasn't actually executable.

SEE ALSO
     xgridctl(8) ssh(1)

CONFORMING TO
     The protocol used by Xgrid conforms to:

     o   RFC 3080 (BEEP Protocol)

     o   Apple XML property list (plist) specification

HISTORY
     Xgrid's history can be traced back to Zilla, which was developed by NeXT
     in the late 80's and was the first clustering desktop program to make use
     of the "noninterventive screen saver" motif, a motif which is now common-
     place and widely used in projects like Seti@Home.  Zilla won the 1991
     national ComputerWorld-Smithsonian Award in the science category for this
     noninterventive, community-supercomputing paradigm.

     Apple acquired the rights to Zilla in 1997, and later used that as the
     inspiration for the research project which became Xgrid.  Xgrid was pub-
     licly launched as a Technology Preview on January 6, 2004 at MacWorld San
     Francisco.

BUGS
     Some commonly reported issues are:

     1.   The controller always copies over the commands and working directory
          to the agent before each individual task, rather than checking to
          see if it is cached

     Bug reports can be sent to bugreport.apple.com

     Feedback can be sent to xgrid@apple.com

Mac OS                         January 25, 2005                         Mac OS