Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update utilities.rst #1077

Merged
merged 8 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/advanced_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Further recommendations for selected HPC systems are given in the

will use the ``mpicc`` compiler wrapper on your PATH to identify the MPI library.
To specify a different compiler wrapper, add the ``MPICC`` option.
You also may wish to avoid existing binary builds e.g.::
You also may wish to avoid existing binary builds; for example,::

MPICC=mpiicc pip install mpi4py --no-binary mpi4py

Expand Down Expand Up @@ -165,10 +165,10 @@ Further recommendations for selected HPC systems are given in the
includes some example ``packages.yaml`` files (which go in ``~/.spack/``).
These files are used to specify dependencies that Spack must obtain from
the given system (rather than building from scratch). This may include
``Python`` and the packages distributed with it (e.g. ``numpy``), and will
``Python`` and the packages distributed with it (e.g., ``numpy``), and will
often include the system MPI library.

Optional dependencies for additional features
Optional Dependencies for Additional Features
---------------------------------------------

The following packages may be installed separately to enable additional features:
Expand Down
18 changes: 9 additions & 9 deletions docs/platforms/platforms_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Running on HPC Systems
======================

Central v Distributed
Central vs. Distributed
---------------------

libEnsemble has been developed, supported, and tested on systems of highly varying
Expand All @@ -12,7 +12,7 @@ two basic modes of configuring libEnsemble to run and launch tasks (user applica
on the available nodes.

The first mode we refer to as **central** mode, where the libEnsemble manager and worker processes
are grouped on to one or more dedicated nodes. Workers launch applications onto
are grouped onto one or more dedicated nodes. Workers launch applications onto
the remaining allocated nodes:

.. image:: ../images/centralized_new_detailed.png
Expand Down Expand Up @@ -52,7 +52,7 @@ If the argument ``libE_specs["dedicated_mode"]=True`` is used when initializing
that is running a libEnsemble manager or worker will be removed from the node-list available
to the workers, ensuring libEnsemble has dedicated nodes.

To run in central mode using a 5 node allocation with 4 workers. From the head node
To run in central mode using a 5 node allocation with 4 workers: From the head node
of the allocation::

mpirun -np 5 python myscript.py
Expand All @@ -72,21 +72,21 @@ For example::
mpirun -np 5 -ppn 1 python myscript.py

would launch libEnsemble with 5 processes across 5 nodes. However, the manager would have its
own node, which is likely wasteful. More often, a machinefile is used to add the manager to
own node, which is likely wasteful. More often, a ``machinefile`` is used to add the manager to
the first node. In the :doc:`examples<example_scripts>` directory, you can find an example submission
script, configured to run libensemble distributed, with multiple workers per node or multiple nodes
per worker, and adding the manager onto the first node.

HPC systems that only allow one application to be launched to a node at any one time,
will not allow a distributed configuration.

Systems with Launch/MOM nodes
Systems with Launch/MOM Nodes
-----------------------------

Some large systems have a 3-tier node setup. That is, they have a separate set of launch nodes
(known as MOM nodes on Cray Systems). User batch jobs or interactive sessions run on a launch node.
Most such systems supply a special MPI runner which has some application-level scheduling
capability (eg. ``aprun``, ``jsrun``). MPI applications can only be submitted from these nodes. Examples
capability (e.g., ``aprun``, ``jsrun``). MPI applications can only be submitted from these nodes. Examples
of these systems include: Summit, Sierra and Theta.

There are two ways of running libEnsemble on these kind of systems. The first, and simplest,
Expand All @@ -100,7 +100,7 @@ will better manage simulation and generation functions that contain considerable
computational work or I/O. Therefore the second option is to use proxy task-execution
services like Balsam_.

Balsam - Externally managed applications
Balsam - Externally Managed Applications
----------------------------------------

Running libEnsemble on the compute nodes while still submitting additional applications
Expand Down Expand Up @@ -132,7 +132,7 @@ Users with persistent ``gen_f`` functions may notice that the persistent workers
are still automatically assigned system resources. This can be resolved by
:ref:`fixing the number of resource sets<zero_resource_workers>`.

Overriding Auto-detection
Overriding Auto-Detection
-------------------------

libEnsemble can automatically detect system information. This includes resource information, such as
Expand All @@ -146,7 +146,7 @@ When using the MPI Executor, it is possible to override the detected information

.. _funcx_ref:

Globus Compute - Remote User functions
Globus Compute - Remote User Functions
--------------------------------------

*Alternatively to much of the above*, if libEnsemble is running on some resource with
Expand Down
6 changes: 3 additions & 3 deletions docs/running_libE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ three options are ``mpi``, ``local``, ``tcp``. The default is ``mpi``.
with MPICH_ and its derivative MPI implementations.

Don't use MPI comms when running on the **launch** nodes of three-tier
systems (e.g. Theta/Summit). In that case ``local`` mode is recommended.
systems (e.g., Theta/Summit). In that case ``local`` mode is recommended.

.. tab-item:: Local Comms

Expand All @@ -59,7 +59,7 @@ three options are ``mpi``, ``local``, ``tcp``. The default is ``mpi``.
set ``libE_specs["dedicated_mode"] = True``.

This mode is often used to run on a **launch** node of a three-tier
system (e.g. Theta/Summit), ensuring the whole compute-node allocation is available for
system (e.g., Theta/Summit), ensuring the whole compute-node allocation is available for
launching apps. Make sure there are no imports of ``mpi4py`` in your Python scripts.

On macOS and Windows, Python's default multiprocessing method is ``"spawn"`` instead
Expand Down Expand Up @@ -253,7 +253,7 @@ For example::

set in your simulation script before the Executor submit command will export the setting to your run.

Further run information
Further Run Information
-----------------------

For running on multi-node platforms and supercomputers, there are alternative ways to configure
Expand Down
2 changes: 1 addition & 1 deletion docs/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Convenience Tools and Functions

.. tab-item:: Allocation Helpers

These routines are used within custom allocation functions to help prepare Work
These routines are used within custom allocation functions to help prepare ``Work``
structures for workers. See the routines within ``libensemble/alloc_funcs/`` for
examples.

Expand Down
74 changes: 37 additions & 37 deletions libensemble/tools/alloc_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,18 +34,18 @@ def __init__(
):
"""Instantiate a new AllocSupport instance

``W`` is. They are referenced by the various methods,
but are never modified.
``W`` is passed in for convenience on init; it is referenced by the various methods,
but never modified.

By default, an ``AllocSupport`` instance uses any initiated libEnsemble resource
module and the built-in libEnsemble scheduler.

:param W: A :ref:`Worker array<funcguides-workerarray>`
:param manage_resources: Optional, boolean for if to assign resource sets when creating work units
:param persis_info: Optional, A :ref:`dictionary of persistent information.<datastruct-persis-info>`
:param scheduler_opts: Optional, A dictionary of options to pass to the resource scheduler.
:param user_resources: Optional, A user supplied ``resources`` object.
:param user_scheduler: Optional, A user supplied ``user_scheduler`` object.
:param manage_resources: (Optional) Boolean for if to assign resource sets when creating work units.
:param persis_info: (Optional) A :ref:`dictionary of persistent information.<datastruct-persis-info>`.
:param scheduler_opts: (Optional) A dictionary of options to pass to the resource scheduler.
:param user_resources: (Optional) A user supplied ``resources`` object.
:param user_scheduler: (Optional) A user supplied ``user_scheduler`` object.
"""
self.W = W
self.persis_info = persis_info
Expand All @@ -69,7 +69,7 @@ def assign_resources(self, rsets_req, use_gpus=None, user_params=[]):

:param rsets_req: Int. Number of resource sets to request.
:param use_gpus: Bool. Whether to use GPU resource sets.
:param user_params: list of Integers. User parameters num_procs, num_gpus
:param user_params: List of Integers. User parameters num_procs, num_gpus.
:returns: List of Integers. Resource set indices assigned.
"""
rset_team = None
Expand All @@ -88,10 +88,10 @@ def assign_resources(self, rsets_req, use_gpus=None, user_params=[]):
def avail_worker_ids(self, persistent=None, active_recv=False, zero_resource_workers=None):
"""Returns available workers as a list of IDs, filtered by the given options.

:param persistent: Optional int. Only return workers with given ``persis_state`` (1=sim, 2=gen).
:param active_recv: Optional boolean. Only return workers with given active_recv state.
:param zero_resource_workers: Optional boolean. Only return workers that require no resources
:returns: List of worker IDs
:param persistent: (Optional) Int. Only return workers with given ``persis_state`` (1=sim, 2=gen).
:param active_recv: (Optional) Boolean. Only return workers with given active_recv state.
:param zero_resource_workers: (Optional) Boolean. Only return workers that require no resources.
:returns: List of worker IDs.

If there are no zero resource workers defined, then the ``zero_resource_workers`` argument will
be ignored.
Expand Down Expand Up @@ -124,7 +124,7 @@ def fltr_recving():
if active_recv and not persistent:
raise AllocException("Cannot ask for non-persistent active receive workers")

# If there are no zero resource workers - then ignore zrw (i.e. use only if they exist)
# If there are no zero resource workers - then ignore zrw (i.e., use only if they exist)
no_zrw = not any(self.W["zero_resource_worker"])
wrks = []
for wrk in self.W:
Expand Down Expand Up @@ -189,11 +189,11 @@ def sim_work(self, wid, H, H_fields, H_rows, persis_info, **libE_info):

:param wid: Int. Worker ID.
:param H: :ref:`History array<funcguides-history>`. For parsing out requested resource sets.
:param H_fields: Which fields from :ref:`H<funcguides-history>` to send
:param H_fields: Which fields from :ref:`H<funcguides-history>` to send.
:param H_rows: Which rows of ``H`` to send.
:param persis_info: Worker specific :ref:`persis_info<datastruct-persis-info>` dictionary
:param persis_info: Worker specific :ref:`persis_info<datastruct-persis-info>` dictionary.

:returns: a Work entry
:returns: a Work entry.

Additional passed parameters are inserted into ``libE_info`` in the resulting work record.

Expand Down Expand Up @@ -223,13 +223,13 @@ def gen_work(self, wid, H_fields, H_rows, persis_info, **libE_info):
Includes evaluation of required resources if the worker is not in a
persistent state.

:param Work: :ref:`Work dictionary<funcguides-workdict>`
:param Work: :ref:`Work dictionary<funcguides-workdict>`.
:param wid: Worker ID.
:param H_fields: Which fields from :ref:`H<funcguides-history>` to send
:param H_fields: Which fields from :ref:`H<funcguides-history>` to send.
:param H_rows: Which rows of ``H`` to send.
:param persis_info: Worker specific :ref:`persis_info<datastruct-persis-info>` dictionary
:param persis_info: Worker specific :ref:`persis_info<datastruct-persis-info>` dictionary.

:returns: A Work entry
:returns: A Work entry.

Additional passed parameters are inserted into ``libE_info`` in the resulting work record.

Expand Down Expand Up @@ -259,8 +259,8 @@ def gen_work(self, wid, H_fields, H_rows, persis_info, **libE_info):
def _filter_points(self, H_in, pt_filter, low_bound):
"""Returns H and pt_filter filted by lower bound

:param pt_filter: Optional boolean array filtering expected returned points in ``H``.
:param low_bound: Optional lower bound for testing all returned.
:param pt_filter: (Optional) Boolean array filtering expected returned points in ``H``.
:param low_bound: (Optional) Lower bound for testing all returned.
"""
# Faster not to slice when whole array
if low_bound is not None:
Expand All @@ -278,49 +278,49 @@ def _filter_points(self, H_in, pt_filter, low_bound):
return H, pfilter

def all_sim_started(self, H, pt_filter=None, low_bound=None):
"""Returns ``True`` if all expected points have started their sim
"""Returns ``True`` if all expected points have started their sim.

Excludes cancelled points.

:param pt_filter: Optional boolean array filtering expected returned points in ``H``.
:param low_bound: Optional lower bound for testing all returned.
:returns: True if all expected points have started their sim
:param pt_filter: (Optional) Boolean array filtering expected returned points in ``H``.
:param low_bound: (Optional) Lower bound for testing all returned.
:returns: True if all expected points have started their sim.
"""
H, pfilter = self._filter_points(H, pt_filter, low_bound)
excluded_points = H["cancel_requested"]
return np.all(H["sim_started"][pfilter & ~excluded_points])

def all_sim_ended(self, H, pt_filter=None, low_bound=None):
"""Returns ``True`` if all expected points have had their sim_end
"""Returns ``True`` if all expected points have had their sim_end.

Excludes cancelled points that were not already sim_started.

:param pt_filter: Optional boolean array filtering expected returned points in ``H``.
:param low_bound: Optional lower bound for testing all returned.
:returns: True if all expected points have had their sim_end
:param pt_filter: (Optional) Boolean array filtering expected returned points in ``H``.
:param low_bound: (Optional) Lower bound for testing all returned.
:returns: True if all expected points have had their sim_end.
"""
H, pfilter = self._filter_points(H, pt_filter, low_bound)
excluded_points = H["cancel_requested"] & ~H["sim_started"]
return np.all(H["sim_ended"][pfilter & ~excluded_points])

def all_gen_informed(self, H, pt_filter=None, low_bound=None):
"""Returns ``True`` if gen has been informed of all expected points
"""Returns ``True`` if gen has been informed of all expected points.

Excludes cancelled points that were not already given out.

:param pt_filter: Optional boolean array filtering expected sim_end points in ``H``.
:param low_bound: Optional lower bound for testing all returned.
:returns: True if gen have been informed of all expected points
:param pt_filter: (Optional) Boolean array filtering expected sim_end points in ``H``.
:param low_bound: (Optional) Lower bound for testing all returned.
:returns: True if gen have been informed of all expected points.
"""
H, pfilter = self._filter_points(H, pt_filter, low_bound)
excluded_points = H["cancel_requested"] & ~H["sim_started"]
return np.all(H["gen_informed"][pfilter & ~excluded_points])

def points_by_priority(self, H, points_avail, batch=False):
"""Returns indices of points to give by priority
"""Returns indices of points to give by priority.

:param points_avail: Indices of points that are available to give
:param batch: Optional boolean. Should batches of points with the same priority be given simultaneously.
:param points_avail: Indices of points that are available to give.
:param batch: (Optional) Boolean. Should batches of points with the same priority be given simultaneously.
:returns: An array of point indices to give.
"""
if "priority" in H.dtype.fields:
Expand Down
20 changes: 10 additions & 10 deletions libensemble/tools/persistent_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ def send(self, output: npt.NDArray, calc_status: int = UNSET_TAG, keep_state=Fal
"""
Send message from worker to manager.

:param output: Output array to be sent to manager
:param calc_status: Optional, Provides a task status
:param keep_state: Optional, If True the manager will not modify its
:param output: Output array to be sent to manager.
:param calc_status: (Optional) Provides a task status.
:param keep_state: (Optional) If True the manager will not modify its
record of the workers state (usually the manager changes the
worker's state to inactive, indicating the worker is ready to receive
more work, unless using active receive mode).
Expand Down Expand Up @@ -63,7 +63,7 @@ def recv(self, blocking: bool = True) -> (int, dict, npt.NDArray):
"""
Receive message to worker from manager.

:param blocking: Optional, If True (default), will block until a message is received.
:param blocking: (Optional) If True (default), will block until a message is received.

:returns: message tag, Work dictionary, calc_in array

Expand All @@ -89,7 +89,7 @@ def recv(self, blocking: bool = True) -> (int, dict, npt.NDArray):

data_tag, calc_in = self.comm.recv() # Receive work rows

# Check for unexpected STOP (e.g. error between sending Work info and rows)
# Check for unexpected STOP (e.g., error between sending Work info and rows)
if data_tag in [STOP_TAG, PERSIS_STOP]:
logger.debug(
f"Persistent {self.calc_str} received signal {tag} " + "from manager while expecting work rows"
Expand All @@ -104,8 +104,8 @@ def send_recv(self, output: npt.NDArray, calc_status: int = UNSET_TAG) -> (int,
"""
Send message from worker to manager and receive response.

:param output: Output array to be sent to manager
:param calc_status: Optional, Provides a task status
:param output: Output array to be sent to manager.
:param calc_status: (Optional) Provides a task status.

:returns: message tag, Work dictionary, calc_in array

Expand All @@ -114,11 +114,11 @@ def send_recv(self, output: npt.NDArray, calc_status: int = UNSET_TAG) -> (int,
return self.recv()

def request_cancel_sim_ids(self, sim_ids: List[int]):
"""Request cancellation of sim_ids
"""Request cancellation of sim_ids.

:param sim_ids: A list of sim_ids to cancel
:param sim_ids: A list of sim_ids to cancel.

A message is sent to the manager to mark requested sim_ids as cancel_requested
A message is sent to the manager to mark requested sim_ids as cancel_requested.
"""
H_o = np.zeros(len(sim_ids), dtype=[("sim_id", int), ("cancel_requested", bool)])
H_o["sim_id"] = sim_ids
Expand Down
6 changes: 3 additions & 3 deletions libensemble/tools/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def save_libE_output(H, persis_info, calling_file, nworkers, dest_path=os.getcwd

persis_info: :obj:`dict`

Persistent information dictionary
Persistent information dictionary.
:doc:`(example)<data_structures/persis_info>`

calling_file : :obj:`string`
Expand Down Expand Up @@ -150,12 +150,12 @@ def add_unique_random_streams(persis_info, nstreams, seed=""):

persis_info: :obj:`dict`

Persistent information dictionary
Persistent information dictionary.
:ref:`(example)<datastruct-persis-info>`

nstreams: :obj:`int`

Number of independent random number streams to produce
Number of independent random number streams to produce.

seed: :obj:`int`

Expand Down