Submitting a job to the cluster

Usage:

lqsub -j  
      -q  
      -c  
      [-p ] 
      [-g ] 
      [-m ] 
      [-n ] 
      [-s (send email notification)]

Job scripts

Script

Jobs are executed on the cluster using "job scripts".
A job script is a script or program which takes --no arguments-- and has the execution flag set. (chmod +x)
Commonly, this is a shell script which will set up the environment needed by your program and run the program with the appropriate arguments and input.

An example:

#!/bin/sh
export PATH=$PATH:/path/to/myprogram

myprogram -options

# EOF

The script is not limited to traditional shell scripts, any Shebang (#!) is supported.
This offers the possibility to write your job scripts in Python, Perl or any other scripting language as long as the appropriate interpreter is installed on the nodes.

LUCI4HPC grid engine environment variables

The LUCI4HPC grid engine offers a set of environment variables accessible within the job scripts, which reflect submission options:

$TMPDIR
The temporary directory created by the grid engine on the node where the job is running. For jobs that run on more than one node, this directory is only created on the master node.
This is /tmp/lj./
$NUMCPUS
The number of CPUs specified during the submission with the -c argument.
$NUMGPUS
The number of GPUs specified during the submission with the -g argument.
If no GPU is being used the default value is 0.
$NUMMEMORY
The amount of memory specified during the submission with the -m argument.
$NUMNODES
The number of nodes specified during the submission with the -n argument
$MACHINEFILE
The machine file written by the grid engine. This is intended to be used for MPI jobs.
This is .j.hosts , in the same directory as your job script.
$JOBID
The Id of the job.

Basic job submission

Jobs are submitted to the cluster using "lqsub".
The basic required arguments are:
- -j
- -q
- -c
You should not specify a full path for the job script name, as the directory from where you submit is assumed to be the directory that holds the job script.

Example:

lqsub -j job.sh -q simulations -c 12 

Advanced arguments

Using GPUs

You can request GPU resources by specifying the -g argument.
If not specified -g is assumed to be 0.

Example:
lqsub -j job.sh -q simulations -c 12 -g 1 

Reserving memory

You can reserve a specific amount of memory on a node by specifing the -m argument.
If not specified -m is assumed to be 256 MB.

NOTE: these does not reflect an actual memory limitation, but is rather used to block other users from accessing resources on a node if a job requires extensive amounts of memory that would otherwise cause a node to overload.

Example:

lqsub -j job.sh -q simulations -c 12 -m 32768

Choosing a parallel environment

mpi: this will generate a Machinefile which can be used with MPI programs, the Machinefile is: .j.hosts in the same directory as the job script or accessible via the $MACHINEFILE environment variable in the job script.

omp: this will set the OMP_NUM_THREADS environment variable to the amount specified with the -c argument. This is useful for programs that use openMP.

none: this is the default and will do none of the above.

Using more than one node

You can spread your job to more than one node with the -n argument. The -c, -g and -m options then become "per node".
The following example uses 128 CPUs, 32 GPUs and 131072 MB of memory across 16 nodes with mpi.

lqsub -j job.sh -q simulations -c 8 -g 2 -m 8192 -p mpi -n 16

E-mail notification

You can request an email notification on job completion by specifying the -s argument.
NOTE: You cannot specify an email address while submitting, this feature has to be configured by the administrator in advance!

Example:

lqsub -j job.sh -q simulations -c 12 -s

Output files

The LUCI4HPC grid engine generates three output files for every job.
stdout: the standard output of the job will be redirected to .j.out in the same directory as the job script.
stderr: the standard error of the job will be redirected to .j.err in the same directory as the job script.
host file: .j.hosts, this file contains the hostname(s) of the node(s) the job has been assigned to. The content of this files varies depending on the parallel environment specified during the submission.

Back

Current version: 1.0beta1
 

Imprint