Note

This feature requires MPI, and may not be able to be run on Colab or Binder.

Parallel Groups#

When systems are added to a ParallelGroup, they will be executed in parallel, assuming that the ParallelGroup is given an MPI communicator of sufficient size. Adding subsystems to a ParallelGroup is no different than adding them to a normal Group. For example:

%%px

import openmdao.api as om

prob = om.Problem()
model = prob.model

model.set_input_defaults('x', 1.)

parallel = model.add_subsystem('parallel', om.ParallelGroup(), 
                               promotes_inputs=[('c1.x', 'x'), ('c2.x', 'x'), 
                                                ('c3.x', 'x'), ('c4.x', 'x')])
parallel.add_subsystem('c1', om.ExecComp(['y=-2.0*x']))
parallel.add_subsystem('c2', om.ExecComp(['y=5.0*x']))
parallel.add_subsystem('c3', om.ExecComp(['y=-3.0*x']))
parallel.add_subsystem('c4', om.ExecComp(['y=4.0*x']))

model.add_subsystem('c5', om.ExecComp(['y=3.0*x1 + 7.0*x2 - 2.0*x3 + x4']))

model.connect("parallel.c1.y", "c5.x1")
model.connect("parallel.c2.y", "c5.x2")
model.connect("parallel.c3.y", "c5.x3")
model.connect("parallel.c4.y", "c5.x4")

prob.setup(check=False, mode='fwd')
prob.run_model()

print(prob['c5.y'])
[stdout:1] [39.]
[stdout:0] [39.]
[stdout:2] [39.]
[stdout:3] [39.]

In this example, components c1 through c4 will be executed in parallel, provided that the ParallelGroup is given four MPI processes. If the name of the python file containing our example were my_par_model.py, we could run it under MPI and give it two processes using the following command:

    mpirun -n 4 python my_par_model.py

Note

This will only work if you’ve installed the mpi4py and petsc4py python packages, which are not installed by default in OpenMDAO.

In the previous example, all four components in the ParallelGroup required just a single MPI process, but what happens if we want to add subsystems to a ParallelGroup that has other processor requirements? In OpenMDAO, we control process allocation behavior by setting the min_procs and/or max_procs or proc_weight args when we call the add_subsystem function to add a particular subsystem to a ParallelGroup.

Group.add_subsystem(name, subsys, promotes=None, promotes_inputs=None, promotes_outputs=None, min_procs=1, max_procs=None, proc_weight=1.0, proc_group=None)[source]

Add a subsystem.

Parameters:
namestr

Name of the subsystem being added.

subsys<System>

An instantiated, but not-yet-set up system object.

promotesiter of (str or tuple), optional

A list of variable names specifying which subsystem variables to ‘promote’ up to this group. If an entry is a tuple of the form (old_name, new_name), this will rename the variable in the parent group.

promotes_inputsiter of (str or tuple), optional

A list of input variable names specifying which subsystem input variables to ‘promote’ up to this group. If an entry is a tuple of the form (old_name, new_name), this will rename the variable in the parent group.

promotes_outputsiter of (str or tuple), optional

A list of output variable names specifying which subsystem output variables to ‘promote’ up to this group. If an entry is a tuple of the form (old_name, new_name), this will rename the variable in the parent group.

min_procsint

Minimum number of MPI processes usable by the subsystem. Defaults to 1.

max_procsint or None

Maximum number of MPI processes usable by the subsystem. A value of None (the default) indicates there is no maximum limit.

proc_weightfloat

Weight given to the subsystem when allocating available MPI processes to all subsystems. Default is 1.0.

proc_groupstr or None

Name of a processor group such that any system with that processor group name within the same parent group will be allocated on the same mpi process(es). If this is not None, then any other systems sharing the same proc_group must have identical values of min_procs, max_procs, and proc_weight or an exception will be raised.

Returns:
<System>

The subsystem that was passed in. This is returned to enable users to instantiate and add a subsystem at the same time, and get the reference back.

If you use both min_procs/max_procs and proc_weight, it can become less obvious what the resulting process allocation will be, so you may want to stick to just using one or the other. The algorithm used for the allocation starts, assuming that the number of processes is greater than or equal to the number of subsystems, by assigning the min_procs for each subsystem. It then adds any remaining processes to subsystems based on their weights, being careful not to exceed their specified max_procs, if any.

If the number of processes is less than the number of subsystems, then each subsystem, one at a time, starting with the one with the highest proc_weight, is allocated to the least-loaded process. An exception will be raised if any of the subsystems in this case have a min_procs value greater than one.