From 15b426c9b1ffea02dc99cfd97373b74bbcd5251d Mon Sep 17 00:00:00 2001
From: hizv <18361766+hizv@users.noreply.github.com>
Date: Wed, 14 Feb 2024 15:56:24 +0530
Subject: [PATCH 1/2] Use ++n instead of +p in charmrun examples

---
 README.md                       |  4 ++--
 doc/ampi/02-building.rst        |  4 ++--
 doc/ampi/04-extensions.rst      |  4 ++--
 doc/ampi/05-examples.rst        | 34 ++++++++++++++++-----------------
 doc/charisma/manual.rst         |  4 ++--
 doc/charm++/manual.rst          | 28 +++++++++++++--------------
 doc/faq/manual.rst              |  2 +-
 doc/libraries/manual.rst        |  2 +-
 doc/pose/manual.rst             |  4 ++--
 examples/ParFUM/simple2D/README |  2 +-
 10 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/README.md b/README.md
index 6ce3080e32..01b96ef151 100644
--- a/README.md
+++ b/README.md
@@ -272,7 +272,7 @@ executable named `nqueen`.
 
 Following the previous example, to run the program on two processors, type
 
-     $ ./charmrun +p2 ./nqueen 12 6
+     $ ./charmrun ++n 2 ./nqueen 12 6
 
 This should run for a few seconds, and print out:
 `There are 14200 Solutions to 12 queens. Time=0.109440 End time=0.112752`
@@ -307,7 +307,7 @@ want to run program on only one machine, for example, your laptop. This
 can save you all the hassle of setting up ssh daemons.
 To use this option, just type:
 
-     $ ./charmrun ++local ./nqueen 12 100 +p2
+     $ ./charmrun ++local ./nqueen 12 100 ++n 2
 
 However, for best performance, you should launch one node program per processor.
 
diff --git a/doc/ampi/02-building.rst b/doc/ampi/02-building.rst
index 1f9374f194..d3d6b964de 100644
--- a/doc/ampi/02-building.rst
+++ b/doc/ampi/02-building.rst
@@ -175,7 +175,7 @@ arguments. A typical invocation of an AMPI program ``pgm`` with
 
 .. code-block:: bash
 
-   $ ./charmrun +p16 ./pgm +vp64
+   $ ./charmrun ++n 16 ./pgm +vp64
 
 Here, the AMPI program ``pgm`` is run on 16 physical processors with 64
 total virtual ranks (which will be mapped 4 per processor initially).
@@ -189,7 +189,7 @@ example:
 
 .. code-block:: bash
 
-   $ ./charmrun +p16 ./pgm +vp128 +tcharm_stacksize 32K +balancer RefineLB
+   $ ./charmrun ++n 16 ./pgm +vp128 +tcharm_stacksize 32K +balancer RefineLB
 
 Running with ampirun
 ~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/ampi/04-extensions.rst b/doc/ampi/04-extensions.rst
index a69a38e4d7..aaae798648 100644
--- a/doc/ampi/04-extensions.rst
+++ b/doc/ampi/04-extensions.rst
@@ -566,7 +566,7 @@ of the AMPI program with some additional command line options.
 
 .. code-block:: bash
 
-   $ ./charmrun ./pgm +p4 +vp4 +msgLogWrite +msgLogRank 2 +msgLogFilename "msg2.log"
+   $ ./charmrun ./pgm ++n 4 +vp4 +msgLogWrite +msgLogRank 2 +msgLogFilename "msg2.log"
 
 In the above example, a parallel run with 4 worker threads and 4 AMPI
 ranks will be executed, and the changes in the MPI environment of worker
@@ -574,7 +574,7 @@ thread 2 (also rank 2, starting from 0) will get logged into diskfile
 "msg2.log".
 
 Unlike the first run, the re-run is a sequential program, so it is not
-invoked by charmrun (and omitting charmrun options like +p4 and +vp4),
+invoked by charmrun (and omitting charmrun options like ++n 4 and +vp4),
 and additional command line options are required as well.
 
 .. code-block:: bash
diff --git a/doc/ampi/05-examples.rst b/doc/ampi/05-examples.rst
index 811da30a77..969bde55b7 100644
--- a/doc/ampi/05-examples.rst
+++ b/doc/ampi/05-examples.rst
@@ -31,7 +31,7 @@ MiniFE
    program.
 
 -  Refer to the ``README`` file on how to run the program. For example:
-   ``./charmrun +p4 ./miniFE.x nx=30 ny=30 nz=30 +vp32``
+   ``./charmrun ++n 4 ./miniFE.x nx=30 ny=30 nz=30 +vp32``
 
 MiniMD v2.0
 ~~~~~~~~~~~
@@ -44,7 +44,7 @@ MiniMD v2.0
    execute ``make ampi`` to build the program.
 
 -  Refer to the ``README`` file on how to run the program. For example:
-   ``./charmrun +p4 ./miniMD_ampi +vp32``
+   ``./charmrun ++n 4 ./miniMD_ampi +vp32``
 
 CoMD v1.1
 ~~~~~~~~~
@@ -72,7 +72,7 @@ MiniXYCE v1.0
    ``test/``.
 
 -  Example run command:
-   ``./charmrun +p3 ./miniXyce.x +vp3 -circuit ../tests/cir1.net -t_start 1e-6 -pf params.txt``
+   ``./charmrun ++n 3 ./miniXyce.x +vp3 -circuit ../tests/cir1.net -t_start 1e-6 -pf params.txt``
 
 HPCCG v1.0
 ~~~~~~~~~~
@@ -84,7 +84,7 @@ HPCCG v1.0
    AMPI compilers.
 
 -  Run with a command such as:
-   ``./charmrun +p2 ./test_HPCCG 20 30 10 +vp16``
+   ``./charmrun ++n 2 ./test_HPCCG 20 30 10 +vp16``
 
 MiniAMR v1.0
 ~~~~~~~~~~~~
@@ -140,7 +140,7 @@ Lassen v1.0
 
 -  No changes necessary to enable AMPI virtualization. Requires some
    C++11 support. Set ``AMPIDIR`` in Makefile and ``make``. Run with:
-   ``./charmrun +p4 ./lassen_mpi +vp8 default 2 2 2 50 50 50``
+   ``./charmrun ++n 4 ./lassen_mpi +vp8 default 2 2 2 50 50 50``
 
 Kripke v1.1
 ~~~~~~~~~~~
@@ -167,7 +167,7 @@ Kripke v1.1
 
    .. code-block:: bash
 
-      $ ./charmrun +p8 ./src/tools/kripke +vp8 --zones 64,64,64 --procs 2,2,2 --nest ZDG
+      $ ./charmrun ++n 8 ./src/tools/kripke +vp8 --zones 64,64,64 --procs 2,2,2 --nest ZDG
 
 MCB v1.0.3 (2013)
 ~~~~~~~~~~~~~~~~~
@@ -181,7 +181,7 @@ MCB v1.0.3 (2013)
 
    .. code-block:: bash
 
-      $ OMP_NUM_THREADS=1 ./charmrun +p4 ./../src/MCBenchmark.exe --weakScaling
+      $ OMP_NUM_THREADS=1 ./charmrun ++n 4 ./../src/MCBenchmark.exe --weakScaling
        --distributedSource --nCores=1 --numParticles=20000 --multiSigma --nThreadCore=1 +vp16
 
 .. _not-yet-ampi-zed-reason-1:
@@ -228,7 +228,7 @@ SNAP v1.01 (C version)
    while the C version works out of the box on all platforms.
 
 -  Edit the Makefile for AMPI compiler paths and run with:
-   ``./charmrun +p4 ./snap +vp4 --fi center_src/fin01 --fo center_src/fout01``
+   ``./charmrun ++n 4 ./snap +vp4 --fi center_src/fin01 --fo center_src/fout01``
 
 Sweep3D
 ~~~~~~~
@@ -248,7 +248,7 @@ Sweep3D
 
    -  Modify file ``input`` to set the different parameters. Refer to
       file ``README`` on how to change those parameters. Run with:
-      ``./charmrun ./sweep3d.mpi +p8 +vp16``
+      ``./charmrun ./sweep3d.mpi ++n 8 +vp16``
 
 PENNANT v0.8
 ~~~~~~~~~~~~
@@ -264,7 +264,7 @@ PENNANT v0.8
 
 -  For PENNANT-v0.8, point CC in Makefile to AMPICC and just ’make’. Run
    with the provided input files, such as:
-   ``./charmrun +p2 ./build/pennant +vp8 test/noh/noh.pnt``
+   ``./charmrun ++n 2 ./build/pennant +vp8 test/noh/noh.pnt``
 
 Benchmarks
 ----------
@@ -307,7 +307,7 @@ NAS Parallel Benchmarks (NPB 3.3)
       *cg.256.C* will appear in the *CG* and ``bin/`` directories. To
       run the particular benchmark, you must follow the standard
       procedure of running AMPI programs:
-      ``./charmrun ./cg.C.256 +p64 +vp256 ++nodelist nodelist``
+      ``./charmrun ./cg.C.256 ++n 64 +vp256 ++nodelist nodelist``
 
 NAS PB Multi-Zone Version (NPB-MZ 3.3)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -340,7 +340,7 @@ NAS PB Multi-Zone Version (NPB-MZ 3.3)
       directory. In the previous example, a file *bt-mz.256.C* will be
       created in the ``bin`` directory. To run the particular benchmark,
       you must follow the standard procedure of running AMPI programs:
-      ``./charmrun ./bt-mz.C.256 +p64 +vp256 ++nodelist nodelist``
+      ``./charmrun ./bt-mz.C.256 ++n 64 +vp256 ++nodelist nodelist``
 
 HPCG v3.0
 ~~~~~~~~~
@@ -352,7 +352,7 @@ HPCG v3.0
 -  No AMPI-ization needed. To build, modify ``setup/Make.AMPI`` for
    compiler paths, do
    ``mkdir build && cd build && configure ../setup/Make.AMPI && make``.
-   To run, do ``./charmrun +p16 ./bin/xhpcg +vp64``
+   To run, do ``./charmrun ++n 16 ./bin/xhpcg +vp64``
 
 Intel Parallel Research Kernels (PRK) v2.16
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -408,7 +408,7 @@ HYPRE-2.11.1
    ``LIBFLAGS``. Then run ``make``.
 
 -  To run the ``new_ij`` test, run:
-   ``./charmrun +p64 ./new_ij -n 128 128 128 -P 4 4 4 -intertype 6 -tol 1e-8 -CF 0 -solver 61 -agg_nl 1 27pt -Pmx 6 -ns 4 -mu 1 -hmis -rlx 13 +vp64``
+   ``./charmrun ++n 64 ./new_ij -n 128 128 128 -P 4 4 4 -intertype 6 -tol 1e-8 -CF 0 -solver 61 -agg_nl 1 27pt -Pmx 6 -ns 4 -mu 1 -hmis -rlx 13 +vp64``
 
 MFEM-3.2
 ~~~~~~~~
@@ -440,7 +440,7 @@ MFEM-3.2
    -  ``make parallel MFEM_USE_MPI=YES MPICXX=~/charm/bin/ampicxx HYPRE_DIR=~/hypre-2.11.1/src/hypre METIS_DIR=~/metis-4.0.3``
 
 -  To run an example, do
-   ``./charmrun +p4 ./ex15p -m ../data/amr-quad.mesh +vp16``. You may
+   ``./charmrun ++n 4 ./ex15p -m ../data/amr-quad.mesh +vp16``. You may
    want to add the runtime options ``-no-vis`` and ``-no-visit`` to
    speed things up.
 
@@ -464,10 +464,10 @@ XBraid-1.1
    HYPRE in their Makefiles and ``make``.
 
 -  To run an example, do
-   ``./charmrun +p2 ./ex-02 -pgrid 1 1 8 -ml 15 -nt 128 -nx 33 33 -mi 100 +vp8 ++local``.
+   ``./charmrun ++n 2 ./ex-02 -pgrid 1 1 8 -ml 15 -nt 128 -nx 33 33 -mi 100 +vp8 ++local``.
 
 -  To run a driver, do
-   ``./charmrun +p4 ./drive-03 -pgrid 2 2 2 2 -nl 32 32 32 -nt 16 -ml 15 +vp16 ++local``
+   ``./charmrun ++n 4 ./drive-03 -pgrid 2 2 2 2 -nl 32 32 32 -nt 16 -ml 15 +vp16 ++local``
 
 Other AMPI codes
 ----------------
diff --git a/doc/charisma/manual.rst b/doc/charisma/manual.rst
index 53bb4d3a20..0bf8ea4024 100644
--- a/doc/charisma/manual.rst
+++ b/doc/charisma/manual.rst
@@ -483,7 +483,7 @@ Turing Cluster, use the customized job launcher ``rjq`` or ``rj``).
 
 .. code-block:: bash
 
-   $ charmrun pgm +p4
+   $ charmrun pgm ++n 4
 
 Please refer to Charm++'s manual and tutorial for more details of
 building and running a Charm++ program.
@@ -619,7 +619,7 @@ instance, the following command uses ``RefineLB``.
 
 .. code-block:: bash
 
-   $ ./charmrun ./pgm +p16 +balancer RefineLB
+   $ ./charmrun ./pgm ++n 16 +balancer RefineLB
 
 .. _secsparse:
 
diff --git a/doc/charm++/manual.rst b/doc/charm++/manual.rst
index e78eaf4194..335583c312 100644
--- a/doc/charm++/manual.rst
+++ b/doc/charm++/manual.rst
@@ -8452,7 +8452,7 @@ mode. For example:
 
 .. code-block:: bash
 
-     $ ./charmrun hello +p4 +restart log
+     $ ./charmrun hello ++n 4 +restart log
 
 Restarting is the reverse process of checkpointing. Charm++ allows
 restarting the old checkpoint on a different number of physical
@@ -8481,7 +8481,7 @@ After a failure, the system may contain fewer or more processors. Once
 the failed components have been repaired, some processors may become
 available again. Therefore, the user may need the flexibility to restart
 on a different number of processors than in the checkpointing phase.
-This is allowable by giving a different ``+pN`` option at runtime. One
+This is allowable by giving a different ``++n N`` option at runtime. One
 thing to note is that the new load distribution might differ from the
 previous one at checkpoint time, so running a load balancer (see
 Section :numref:`loadbalancing`) after restart is suggested.
@@ -8618,9 +8618,9 @@ it stores them in the local disk. The checkpoint files are named
 Users can pass the runtime option ``+ftc_disk`` to activate this mode. For
 example:
 
-.. code-block:: c++
+.. code-block:: bash
 
-      ./charmrun hello +p8 +ftc_disk
+      ./charmrun hello ++n 8 +ftc_disk
 
 Building Instructions
 ^^^^^^^^^^^^^^^^^^^^^
@@ -8629,7 +8629,7 @@ In order to have the double local-storage checkpoint/restart
 functionality available, the parameter ``syncft`` must be provided at
 build time:
 
-.. code-block:: c++
+.. code-block:: bash
 
       ./build charm++ netlrts-linux-x86_64 syncft
 
@@ -8656,7 +8656,7 @@ name:
 
 .. code-block:: bash
 
-   $ ./charmrun hello +p8 +kill_file <file>
+   $ ./charmrun hello ++n 8 +kill_file <file>
 
 An example of this usage can be found in the ``syncfttest`` targets in
 ``tests/charm++/jacobi3d``.
@@ -9967,7 +9967,7 @@ program
 
 .. code-block:: bash
 
-   $ ./charmrun pgm +p1000 +balancer RandCentLB +LBDump 2 +LBDumpSteps 4 +LBDumpFile lbsim.dat
+   $ ./charmrun pgm ++n 1000 +balancer RandCentLB +LBDump 2 +LBDumpSteps 4 +LBDumpFile lbsim.dat
 
 This will collect data on files lbsim.dat.2,3,4,5. We can use this data
 to analyze the performance of various centralized strategies using:
@@ -11330,7 +11330,7 @@ used, and a port number to listen the shrink/expand commands:
 
 .. code-block:: bash
 
-   	$ ./charmrun +p4 ./jacobi2d 200 20 +balancer GreedyLB ++nodelist ./mynodelistfile ++server ++server-port 1234
+   	$ ./charmrun ++n 4 ./jacobi2d 200 20 +balancer GreedyLB ++nodelist ./mynodelistfile ++server ++server-port 1234
 
 The CCS client to send shrink/expand commands needs to specify the
 hostname, port number, the old(current) number of processor and the
@@ -11988,7 +11988,7 @@ To run a Charm++ program named “pgm” on four processors, type:
 
 .. code-block:: bash
 
-   $ charmrun pgm +p4
+   $ charmrun pgm ++n 4
 
 Execution on platforms which use platform specific launchers, (i.e.,
 **aprun**, **ibrun**), can proceed without charmrun, or charmrun can be
@@ -12122,7 +12122,7 @@ advanced options are available:
 ``++p N``
    Total number of processing elements to create. In SMP mode, this
    refers to worker threads (where
-   :math:`\texttt{n} * \texttt{ppn} = \texttt{p}`), otherwise it refers
+   :math:`\texttt{n} \times \texttt{ppn} = \texttt{p}`), otherwise it refers
    to processes (:math:`\texttt{n} = \texttt{p}`). The default is 1. Use
    of ``++p`` is discouraged in favor of ``++processPer*`` (and
    ``++oneWthPer*`` in SMP mode) where desirable, or ``++n`` (and
@@ -12230,7 +12230,7 @@ The remaining options cover details of process launch and connectivity:
 
    .. code-block:: bash
 
-      $ ./charmrun +p4 ./pgm 100 2 3 ++runscript ./set_env_script
+      $ ./charmrun ++n 4 ./pgm 100 2 3 ++runscript ./set_env_script
 
    In this case, ``set_env_script`` is invoked on each node. **Note:** When this
    is provided, ``charmrun`` will not invoke the program directly, instead only
@@ -12526,7 +12526,7 @@ nodes than there are hosts in the group, it will reuse hosts. Thus,
 
 .. code-block:: bash
 
-   $ charmrun pgm ++nodegroup kale-sun +p6
+   $ charmrun pgm ++nodegroup kale-sun ++n 6
 
 uses hosts (charm, dp, grace, dagger, charm, dp) respectively as nodes
 (0, 1, 2, 3, 4, 5).
@@ -12536,7 +12536,7 @@ Thus, if one specifies
 
 .. code-block:: bash
 
-   $ charmrun pgm +p4
+   $ charmrun pgm ++n 4
 
 it will use “localhost” four times. “localhost” is a Unix trick; it
 always find a name for whatever machine you’re on.
@@ -13237,7 +13237,7 @@ of the above incantation, for various kinds of process launchers:
 
 .. code-block:: bash
 
-   $ ./charmrun +p2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments...
+   $ ./charmrun ++n 2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments...
    $ aprun -n 2 `which valgrind` --log-file=VG.out.%p --trace-children=yes ./application_name ...application arguments...
 
 The first adaptation is to use :literal:`\`which valgrind\`` to obtain a
diff --git a/doc/faq/manual.rst b/doc/faq/manual.rst
index 8f3bd0287c..9a0f2c1013 100644
--- a/doc/faq/manual.rst
+++ b/doc/faq/manual.rst
@@ -204,7 +204,7 @@ following command:
 
 .. code-block:: bash
 
-   ./charmrun +p14 ./pgm ++ppn 7 +commap 0 +pemap 1-7
+   ./charmrun ++n 2 ./pgm ++ppn 7 +commap 0 +pemap 1-7
 
 See :ref:`sec-smpopts` of the Charm++ manual for more information.
 
diff --git a/doc/libraries/manual.rst b/doc/libraries/manual.rst
index 4862ad0104..5ae7dd3464 100644
--- a/doc/libraries/manual.rst
+++ b/doc/libraries/manual.rst
@@ -36,7 +36,7 @@ client is a small Java program. A typical use of this is:
 
    	cd charm/examples/charm++/wave2d
    	make
-   	./charmrun ./wave2d +p2 ++server ++server-port 1234
+   	./charmrun ./wave2d ++n 2 ++server ++server-port 1234
    	~/ccs_tools/bin/liveViz localhost 1234
 
 Use git to obtain a copy of ccs_tools (prior to using liveViz) and build
diff --git a/doc/pose/manual.rst b/doc/pose/manual.rst
index a9c3aef887..0d14a7bb20 100644
--- a/doc/pose/manual.rst
+++ b/doc/pose/manual.rst
@@ -128,12 +128,12 @@ Running
 -------
 
 To run the program in parallel, a ``charmrun`` executable was created by
-``charmc``. The flag ``+p`` is used to specify a number of processors to
+``charmc``. The flag ``++n`` is used to specify a number of processors to
 run the program on. For example:
 
 .. code-block:: bash
 
-   $ ./charmrun pgm +p4
+   $ ./charmrun pgm ++n 4
 
 This runs the executable ``pgm`` on 4 processors. For more information
 on how to use ``charmrun`` and set up your environment for parallel
diff --git a/examples/ParFUM/simple2D/README b/examples/ParFUM/simple2D/README
index 4070a1318f..6069859541 100644
--- a/examples/ParFUM/simple2D/README
+++ b/examples/ParFUM/simple2D/README
@@ -34,7 +34,7 @@ OUTPUT
 This program exports its solution data via NetFEM.
 You can run the program so NetFEM will connect to it
 like:
-	./charmrun ./pgm ++server ++server-port 1234 +p4
+	./charmrun ./pgm ++server ++server-port 1234 ++n 4
 You'd then connect the NetFEM client to yourhostname:1234.
 
 

From 44476f9361cc8470dc8a80bcc80a0b9a52169c0a Mon Sep 17 00:00:00 2001
From: hizv <18361766+hizv@users.noreply.github.com>
Date: Wed, 14 Feb 2024 15:58:45 +0530
Subject: [PATCH 2/2] Discuss ++n in SMP options section

---
 doc/charm++/manual.rst | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/doc/charm++/manual.rst b/doc/charm++/manual.rst
index 335583c312..75f348efe5 100644
--- a/doc/charm++/manual.rst
+++ b/doc/charm++/manual.rst
@@ -12400,20 +12400,29 @@ like:
 
    $ ./charmrun ++ppn 3 +p6 +pemap 1-3,5-7 +commap 0,4 ./app <args>
 
-This will create two logical nodes/OS processes (2 = 6 PEs/3 PEs per
-node), each with three worker threads/PEs (``++ppn 3``). The worker
-threads/PEs will be mapped thusly: PE 0 to core 1, PE 1 to core 2, PE 2
-to core 3 and PE 4 to core 5, PE 5 to core 6, and PE 6 to core 7.
-PEs/worker threads 0-2 compromise the first logical node and 3-5 are the
-second logical node. Additionally, the communication threads will be
-mapped to core 0, for the communication thread of the first logical
-node, and to core 4, for the communication thread of the second logical
-node.
-
 Please keep in mind that ``+p`` always specifies the total number of PEs
 created by Charm++, regardless of mode (the same number as returned by
-``CkNumPes()``). The ``+p`` option does not include the communication
-thread, there will always be exactly one of those per logical node.
+``CkNumPes()``). So this will create two logical nodes/OS processes
+(2 = 6 PEs/3 PEs per node), each with three worker threads/PEs
+(``++ppn 3``).
+
+We recommend using ``++n``, especially with ``++ppn``. Recall
+that :math:`\texttt{n} \times \texttt{ppn} = \texttt{p}`. So the example becomes:
+
+.. code-block:: bash
+
+   $ ./charmrun ++ppn 3 ++n 2 +pemap 1-3,5-7 +commap 0,4 ./app <args>
+
+The worker threads/PEs will be mapped thusly: PE 0 to
+core 1, PE 1 to core 2, PE 2 to core 3 and PE 4 to core 5, PE 5 to
+core 6, and PE 6 to core 7 (``+pemap``). PEs/worker threads 0-2
+compromise the first logical node and 3-5 are the second logical node.
+Additionally, the communication threads will be mapped to core 0, for
+the communication thread of the first logical node, and to core 4,
+for the communication thread of the second logical node (``+commap``).
+
+Note that the ``+p`` option does not include the communication
+thread. There will always be exactly one of those per logical node.
 
 Multicore Options
 ^^^^^^^^^^^^^^^^^