Skip to content

Commit 424480d

Browse files
committed
Add launching-apps section to docs
Port the "launching-apps" section from the OMPI docs over to PRRTE since it specifically deals with prterun usage. Add some updates about gridengine support courtesy of open-mpi/ompi#13450. Signed-off-by: Ralph Castain <rhc@pmix.org>
1 parent d072f27 commit 424480d

File tree

14 files changed

+1575
-1
lines changed

14 files changed

+1575
-1
lines changed

docs/Makefile.am

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Copyright (c) 2022-2023 Cisco Systems, Inc. All rights reserved.
33
# Copyright (c) 2023 Jeffrey M. Squyres. All rights reserved.
44
#
5-
# Copyright (c) 2023-2024 Nanook Consulting All rights reserved.
5+
# Copyright (c) 2023-2025 Nanook Consulting All rights reserved.
66
# $COPYRIGHT$
77
#
88
# Additional copyrights may follow
@@ -39,6 +39,7 @@ RST_SOURCE_FILES = \
3939
$(srcdir)/prrte-rst-content/*.rst \
4040
$(srcdir)/placement/*.rst \
4141
$(srcdir)/hosts/*.rst \
42+
$(srcdir)/launching-apps/*.rst \
4243
$(srcdir)/how-things-work/*.rst \
4344
$(srcdir)/how-things-work/schedulers/*.rst \
4445
$(srcdir)/developers/*.rst \

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Table of contents
3535
how-things-work/index
3636
hosts/index
3737
placement/index
38+
launching-apps/index
3839
notifications
3940
session-directory
4041
developers/index

docs/launching-apps/gridengine.rst

Lines changed: 293 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,293 @@
1+
Launching with Grid Engine
2+
==========================
3+
4+
PRRTE supports the family of run-time schedulers including the Sun
5+
Grid Engine (SGE), Oracle Grid Engine (OGE), Grid Engine (GE), Son of
6+
Grid Engine, Open Cluster Scheduler (OCS), Gridware Cluster Scheduler (GCS)
7+
and others.
8+
9+
This documentation will collectively refer to all of them as "Grid
10+
Engine", unless a referring to a specific flavor of the Grid Engine
11+
family.
12+
13+
Verify Grid Engine support
14+
--------------------------
15+
16+
.. important:: To build Grid Engine support in PRRTE, you will need
17+
to explicitly request the SGE support with the ``--with-sge``
18+
command line switch to PRRTE's ``configure`` script.
19+
20+
To verify if support for Grid Engine is configured into your PRRTE
21+
installation, run ``prte_info`` as shown below and look for
22+
``gridengine``.
23+
24+
.. code-block::
25+
26+
shell$ prte_info | grep gridengine
27+
MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3)
28+
29+
30+
Launching
31+
---------
32+
33+
When Grid Engine support is included, PRRTE will automatically
34+
detect when it is running inside SGE and will just "do the Right
35+
Thing."
36+
37+
Specifically, if you execute an ``prterun`` command in a Grid Engine
38+
job, it will automatically use the Grid Engine mechanisms to launch
39+
and kill processes. There is no need to specify what nodes to run on
40+
|mdash| PRRTE will obtain this information directly from Grid
41+
Engine and default to a number of processes equal to the slot count
42+
specified. For example, this will run 4 application processes on the nodes
43+
that were allocated by Grid Engine:
44+
45+
.. code-block:: sh
46+
47+
# Get the environment variables for Grid Engine
48+
49+
# (Assuming Grid Engine is installed at /opt/sge and $Grid
50+
# Engine_CELL is 'default' in your environment)
51+
shell$ . /opt/sge/default/common/settings.sh
52+
53+
# Allocate an Grid Engine interactive job with 4 slots from a
54+
# parallel environment (PE) named 'foo' and run a 4-process job
55+
shell$ qrsh -pe foo 4 -b y prterun -n 4 mpi-hello-world
56+
57+
There are also other ways to submit jobs under Grid Engine:
58+
59+
.. code-block:: sh
60+
61+
# Submit a batch job with the 'prterun' command embedded in a script
62+
shell$ qsub -pe foo 4 my_prterun_job.csh
63+
64+
# Submit a Grid Engine and application job and prterun in one line
65+
shell$ qrsh -V -pe foo 4 prterun hostname
66+
67+
# Use qstat(1) to show the status of Grid Engine jobs and queues
68+
shell$ qstat -f
69+
70+
In reference to the setup, be sure you have a Parallel Environment
71+
(PE) defined for submitting parallel jobs. You don't have to name your
72+
PE "foo". The following example shows a PE named "foo" that would
73+
look like:
74+
75+
.. code-block::
76+
77+
shell$ qconf -sp foo
78+
pe_name foo
79+
slots 99999
80+
user_lists NONE
81+
xuser_lists NONE
82+
start_proc_args NONE
83+
stop_proc_args NONE
84+
allocation_rule $fill_up
85+
control_slaves TRUE
86+
job_is_first_task FALSE
87+
urgency_slots min
88+
accounting_summary FALSE
89+
qsort_args NONE
90+
91+
.. note:: ``qsort_args`` is necessary with the Son of Grid Engine
92+
distribution, version 8.1.1 and later, and probably only applicable
93+
to it.
94+
95+
.. note:: For very old versions of Sun Grid Engine, omit
96+
``accounting_summary`` too.
97+
98+
.. note:: For Open Cluster Scheduler / Gridware Cluster Scheduler it is
99+
necessary to set ``ign_sreq_on_mhost`` (ignoring slave resource requests
100+
on the master node) to ``FALSE``.
101+
102+
You may want to alter other parameters, but the important one is
103+
``control_slaves``, specifying that the environment has "tight
104+
integration". Note also the lack of a start or stop procedure. The
105+
tight integration means that mpirun automatically picks up the slot
106+
count to use as a default in place of the ``-n`` argument, picks up a
107+
host file, spawns remote processes via ``qrsh`` so that Grid Engine
108+
can control and monitor them, and creates and destroys a per-job
109+
temporary directory (``$TMPDIR``), in which PRTE's directory will
110+
be created (by default).
111+
112+
Be sure the queue will make use of the PE that you specified:
113+
114+
.. code-block::
115+
116+
shell$ qconf -sq all.q
117+
[...snipped...]
118+
pe_list make cre foo
119+
[...snipped...]
120+
121+
To determine whether the Grid Engine parallel job is successfully
122+
launched to the remote nodes, you can pass in the MCA parameter
123+
``--prtemca plm_base_verbose 1`` to ``prterun``.
124+
125+
This will add in a ``-verbose`` flag to the ``qrsh -inherit`` command
126+
that is used to send parallel tasks to the remote Grid Engine
127+
execution hosts. It will show whether the connections to the remote
128+
hosts are established successfully or not.
129+
130+
Various Grid Engine documentation with pointers to more used to be available
131+
at `the Son of GridEngine site <http://arc.liv.ac.uk/sge/>`_, and
132+
configuration instructions were found at `the Son of GridEngine
133+
configuration how-to site
134+
<http://arc.liv.ac.uk/SGE/howto/sge-configs.html>`_. This may no longer
135+
be true.
136+
137+
An actively developed (2024, 2025) open source successor of Sun Grid Engine is
138+
`Open Cluster Scheduler <https://github.com/hpc-gridware/clusterscheduler>`_.
139+
It maintains backward compatibility with SGE and provides many new features.
140+
An MPI parallel environment setup for OpenMPI is available in
141+
`the Open Cluster Scheduler GitHub repository
142+
<https://github.com/hpc-gridware/clusterscheduler/tree/master/source/dist/mpi/openmpi>`_.
143+
144+
Grid Engine tight integration support of the ``qsub -notify`` flag
145+
------------------------------------------------------------------
146+
147+
If you are running SGE 6.2 Update 3 or later, then the ``-notify``
148+
flag is supported. If you are running earlier versions, then the
149+
``-notify`` flag will not work and using it will cause the job to be
150+
killed.
151+
152+
To use ``-notify``, one has to be careful. First, let us review what
153+
``-notify`` does. Here is an excerpt from the qsub man page for the
154+
``-notify`` flag.
155+
156+
The ``-notify`` flag, when set causes Sun Grid Engine to send
157+
warning signals to a running job prior to sending the signals
158+
themselves. If a SIGSTOP is pending, the job will receive a SIGUSR1
159+
several seconds before the SIGSTOP. If a SIGKILL is pending, the
160+
job will receive a SIGUSR2 several seconds before the SIGKILL. The
161+
amount of time delay is controlled by the notify parameter in each
162+
queue configuration.
163+
164+
Let us assume the reason you want to use the ``-notify`` flag is to
165+
get the SIGUSR1 signal prior to getting the SIGTSTP signal. PRRTE forwards
166+
some signals by default, but others need to be specifically requested.
167+
The following MCA param controls this behavior:
168+
169+
.. code-block::
170+
171+
prte_ess_base_forward_signals: Comma-delimited list of additional signals (names or integers) to forward to
172+
application processes [\"none\" => forward nothing]. Signals provided by
173+
default include SIGTSTP, SIGUSR1, SIGUSR2, SIGABRT, SIGALRM, and SIGCONT
174+
175+
Within that constraint, something like this batch script can be used:
176+
177+
.. code-block:: sh
178+
179+
#! /bin/bash
180+
#$ -S /bin/bash
181+
#$ -V
182+
#$ -cwd
183+
#$ -N Job1
184+
#$ -pe foo 16
185+
#$ -j y
186+
#$ -l h_rt=00:20:00
187+
prterun -n 16 mpi-hello-world
188+
189+
However, one has to make one of two changes to this script for things
190+
to work properly. By default, a SIGUSR1 signal will kill a shell
191+
script. So we have to make sure that does not happen. Here is one way
192+
to handle it:
193+
194+
.. code-block:: sh
195+
196+
#! /bin/bash
197+
#$ -S /bin/bash
198+
#$ -V
199+
#$ -cwd
200+
#$ -N Job1
201+
#$ -pe ompi 16
202+
#$ -j y
203+
#$ -l h_rt=00:20:00
204+
exec prterun -n 16 mpi-hello-world
205+
206+
Alternatively, one can catch the signals in the script instead of doing
207+
an exec on the mpirun:
208+
209+
.. code-block:: sh
210+
211+
#! /bin/bash
212+
#$ -S /bin/bash
213+
#$ -V
214+
#$ -cwd
215+
#$ -N Job1
216+
#$ -pe ompi 16
217+
#$ -j y
218+
#$ -l h_rt=00:20:00
219+
220+
function sigusr1handler()
221+
{
222+
echo "SIGUSR1 caught by shell script" 1>&2
223+
}
224+
225+
function sigusr2handler()
226+
{
227+
echo "SIGUSR2 caught by shell script" 1>&2
228+
}
229+
230+
trap sigusr1handler SIGUSR1
231+
trap sigusr2handler SIGUSR2
232+
233+
prterun -n 16 mpi-hello-world
234+
235+
Grid Engine job suspend / resume support
236+
----------------------------------------
237+
238+
To suspend the job, you send a SIGTSTP (not SIGSTOP) signal to
239+
``prterun``. ``prterun`` will catch this signal and forward it to the
240+
``mpi-hello-world`` as a SIGSTOP signal. To resume the job, you send
241+
a SIGCONT signal to ``prterun`` which will be caught and forwarded to
242+
the ``mpi-hello-world``.
243+
244+
Here is an example on Solaris:
245+
246+
.. code-block:: sh
247+
248+
shell$ prterun -n 2 mpi-hello-world
249+
250+
In another window, we suspend and continue the job:
251+
252+
.. code-block:: sh
253+
254+
shell$ prstat -p 15301,15303,15305
255+
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
256+
15305 rolfv 158M 22M cpu1 0 0 0:00:21 5.9% mpi-hello-world/1
257+
15303 rolfv 158M 22M cpu2 0 0 0:00:21 5.9% mpi-hello-world/1
258+
15301 rolfv 8128K 5144K sleep 59 0 0:00:00 0.0% mpirun/1
259+
260+
shell$ kill -TSTP 15301
261+
shell$ prstat -p 15301,15303,15305
262+
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
263+
15303 rolfv 158M 22M stop 30 0 0:01:44 21% mpi-hello-world/1
264+
15305 rolfv 158M 22M stop 20 0 0:01:44 21% mpi-hello-world/1
265+
15301 rolfv 8128K 5144K sleep 59 0 0:00:00 0.0% mpirun/1
266+
267+
shell$ kill -CONT 15301
268+
shell$ prstat -p 15301,15303,15305
269+
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
270+
15305 rolfv 158M 22M cpu1 0 0 0:02:06 17% mpi-hello-world/1
271+
15303 rolfv 158M 22M cpu3 0 0 0:02:06 17% mpi-hello-world/1
272+
15301 rolfv 8128K 5144K sleep 59 0 0:00:00 0.0% mpirun/1
273+
274+
Note that all this does is stop the ``mpi-hello-world`` processes. It
275+
does not, for example, free any pinned memory when the job is in the
276+
suspended state.
277+
278+
To get this to work under the Grid Engine environment, you have to
279+
change the ``suspend_method`` entry in the queue. It has to be set to
280+
SIGTSTP. Here is an example of what a queue should look like.
281+
282+
.. code-block:: sh
283+
284+
shell$ qconf -sq all.q
285+
qname all.q
286+
[...snipped...]
287+
starter_method NONE
288+
suspend_method SIGTSTP
289+
resume_method NONE
290+
291+
Note that if you need to suspend other types of jobs with SIGSTOP
292+
(instead of SIGTSTP) in this queue then you need to provide a script
293+
that can implement the correct signals for each job type.

docs/launching-apps/index.rst

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
.. _label-running-applications:
2+
3+
Launching applications
4+
======================
5+
6+
PRRTE can launch processes in a wide variety of environments,
7+
but they can generally be broken down into two categories:
8+
9+
#. Scheduled environments: these are systems where a resource manager
10+
and/or scheduler are used to control access to the compute nodes.
11+
Popular resource managers include Slurm, PBS/Pro/Torque, and LSF.
12+
#. Non-scheduled environments: these are systems where resource
13+
managers are not used. Launches are typically local (e.g., on a
14+
single laptop or workstation) or via ``ssh`` (e.g., across a small
15+
number of nodes).
16+
17+
PRRTE provides two commands for starting applications:
18+
19+
#. ``prun`` - submits the specified application to an existing persistent DVM
20+
for execution. The DVM continues execution once the application has
21+
completed. The prun command will remain active until the application
22+
completes. All application and error output will flow through prun.
23+
#. ``prterun`` - starts a DVM instance and submits the specified application
24+
to it for execution. The DVM is terminated once the application completes.
25+
All application and error output will flow through prterun.
26+
27+
The rest of this section usually refers only to ``prterun``, even though the
28+
same discussions also apply to ``prun`` because the command line syntax
29+
is identical.
30+
31+
32+
.. toctree::
33+
:maxdepth: 1
34+
35+
quickstart
36+
prerequisites
37+
scheduling
38+
39+
localhost
40+
ssh
41+
slurm
42+
lsf
43+
tm
44+
gridengine
45+
46+
unusual
47+
troubleshooting

docs/launching-apps/localhost.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
Launching only on the local node
2+
================================
3+
4+
It is common to develop applications on a single workstation or
5+
laptop, and then move to a larger parallel / HPC environment once the
6+
application is ready.
7+
8+
PRRTE supports running multi-process jobs on a single machine.
9+
In such cases, you can simply avoid listing a hostfile or remote
10+
hosts, and simply list a number of processes to launch. For
11+
example:
12+
13+
.. code-block:: sh
14+
15+
shell$ prterun -n 6 mpi-hello-world
16+
Hello world, I am 0 of 6 (running on my-laptop))
17+
Hello world, I am 1 of 6 (running on my-laptop)
18+
...
19+
Hello world, I am 5 of 6 (running on my-laptop)
20+
21+
If you do not specify the ``-n`` option, ``prterun`` will default to
22+
launching as many processes as there are processor cores (not
23+
hyperthreads) on the machine.

0 commit comments

Comments
 (0)