Remote interface for BOINC
This page describes the BOINC remote submission commands. The remote submission interface is a client-server system for researchers to submit jobs. It should not be confused with BOINC clients, which are used by volunteers to perform the computations.
For an extended description of the Remote Boinc system, please also see
T. Giorgino, M. Harvey, and G. de Fabritiis, “Distributed computing as a virtual supercomputer: Tools to run and manage large-scale BOINC simulations,” Computer Physics Communications, vol. 181, Aug. 2010, pp. 1402-1409 link
Note. The most updated list of parameters for each command is gathered calling them with the -help options. All options can be abbreviated with their unique prefixes. For GPUGRID-specific details, see acemd/RemoteBoinc.
Job basics
Each job is identified by an unique combination of GROUP and NAME. Each job keeps its individual boinc-related attributes, like the target number of results, priority, etc., which can be assigned at submission time.
When two or more jobs belong to the same GROUP, their results can be retrieved and deleted together in a single operation (they also share disk space on server).
Jobs may have an arbitrary metadata files attached to them. Metadata are retrieved along with the job results (see retrieval below).
RemoteBoinc supports additional job-types through the -app option. Job types differ for the types of input and output files, the type of processing, etc. To see the list of input/outputs of a specific application, use -help_parameters in conjunction with the -url and -app options.
The URL can be set in your $RBOINC_URL environment variable.
Job submission
Please run the script directly from its installation directory (you can add this to your PATH, or make an alias). A typical, self-explanatory invocation is as follows:
DIR=. # or wherever you have your submission files /shared/lab/software/rboinc/boinc_submit.pl -group GROUPNAME -name NAME \ -conf $DIR/input_boinc_gpu.conf -pdb $DIR/grama.ionized.pdb \ -psf $DIR/grama.ionized.psf -coor $DIR/equil.coor \ -vel $DIR/equil.vel -par $DIR/parameters \ -url http://YOUR_BOINC_SERVER:8383/rboinc_cgi
Note that we have supplied arguments corresponding to all of the mandatory WU input parameters.
Possibly useful options:
-help: show all options
-dry_run: submits the files, but does not start the workunits
-verbose: show progress information
Remarks: GROUPNAME and NAME must be alphanumeric (underscores allowed). If no -index_file is supplied, a standard index file is generated (binary zero). See below for information on the URL field.
Warning: ALWAYS TEST your submitted jobs, including the sizes of the files generated. They may crash the BOINC clients!
Remote retrieval
A typical invocation is as follows:
boinc_retrieve.pl -group GROUPNAME [-name NAME] \ -url http://YOUR_BOINC_SERVER:8383/rboinc_cgi
Possibly useful options:
-help: show all options
-into DIR: put retrieved files into DIR (defaults to current dir)
-intotree DIR: put retrieved files into DIR, creating subdirectories by retrieved NAME
If a -name is specified, results relevant to that individual job are retrieved, along with the corresponding metadata files, if they exist. If a name is not specified, all of the results in the given group will be retrieved, but no metadata. The DCD file will be transferred if generated during the run.
Files are automatically renamed on the basis of the rules in the results template of the application being retrieved. If mixed applications are used in the same GROUP (don't do that), the renaming rules of the app of the first retrieved result take precedence.
Remarks: files will not be overwritten nor retrieved, if they exist locally. Therefore, you can rename, unzip, analyze and re-zip files in your retrieval directory without generating duplicates. "Existence" is judged keeping into account the equivalence classes specified in the results template.
Administration
Halting and cleaning up completed jobs
The boinc_retrieve command is also used to stop and clean-up jobs:
-status tells you at which point each WU arrived.
-stop breaks a workflow. No new jobs will be generated, results can still be retrieved.
-purge completely deletes a GROUPNAME.
-gridstatus computes statistics on the grid (see below).
Both operations are irreversible, and confirmation is requested. There is a short "grace period" during which removals can be reversed by manually tweaking the server. After the purge operation, the WU name can be recycled only after all of the pending workunits have been assimilated.
Grid status
The -gridstatus option will tell you per-group totals for last months of the following values.
grp_sent: time of creation of submission of the first job in the group
grp_name: group name
cur_inprogress: number of WUs in progress
cur_unsent: number of WUs waiting to be sent
per_successful: number of WUs successfully computed
per_unsuccessful: number of WUs un-successfully computed (including user-initiated aborts)
per_credits: credits consumed
range_priority: priority of the pending WUs (min-max)
Note: it is mandatory to add the -group ALL option. Computing the grid status takes time: if the request fails, just retry.
Cron-based periodic retrieval
The boinc_retrieve command may setup cron-based retrieval:
-cron: Setup a script for cron-based retrieval
This is best invoked upon completing the submission of a GROUP of jobs. It should be invoked with the -url, -group and -intotree options; it will create a file named retrieve_cron.sh in the current directory, which will perform the retrieval operation, and print a CRON line in standard output.
Advanced topics
Per-user assignment
The -assign option is an early prototype for the 'safety escalation' protocol. It will assign a WU to a given user (specified by integer ID). However, according to AssignedWork page, these workunits do not get validated or assimilated automatically. (For this to happen, one can set workunit.need_validate=1 in the DB).
Server status and restart
The remote servers will not restart automatically in case of a reboot. To check the server status, open the URL http://YOUR_BOINC_SERVER:8383/rboinc_cgi/boinc_retrieve_server.pl to check if the server is alive.
If a restart is needed, follow these instructions:
ssh boinc@YOUR_BOINC_SERVER cd remote/apache /usr/sbin/httpd -f $PWD/httpd.conf -d . -k start
Dependencies
The client depends on XML::Simple, HTTP::DAV, and other modules. These should be found automatically if the client is run from the shared directory. Installed configurations are currently 5.8.8 (Fedora 8) and 5.10.0 (Fedora 9).
Acknowledgments
Work partially supported by the VPH-NoE.