Parallel Derivative ComputationΒΆ

Computing derivatives with respect to multiple variables in parallel can result in significant performance increases over computing them in serial, but this should only be done for models where the sets of variables that are dependent on the variables of interest have little or no overlap between them.

For example, a very simple and clear-cut case where parallel derivative computation would be recommended is the following model, where only one constraint is dependent on each design variable. Con1.y only depends on Indep1.x and Con2.y only depends on Indep2.x. Assuming the same sizes for the design variables and constraints, parallel derivatives would work equally well with this model in fwd or rev mode.

An obvious case where parallel derivatives are appropriate.

Often, more realistic models do not demonstrate such complete independence between the variables of interest, but even in some of these cases, parallel derivative computation can still be of significant benefit. For example, suppose we have a model where we perform some sort of preliminary calculations that don’t take vary long to run, and we feed those results into multiple components, for example CFD components, that do take a long time to run.

A less obvious case where parallel derivatives are appropriate.

In the model above, both of our constraints, Con1.y and Con2.y are dependent on our design variable Indep1.x. Let’s assume here also that the size of our design variable is the same as the combined size of our constraints, and that Con1 and Con2 take much longer to run than Comp1. If we solve for our derivatives using adjoint (rev) mode and we group Con1.y and Con2.y by specifying that they have the same parallel_deriv_color, we will compute derivatives for Con1 and Con2 concurrently while solving for the derivatives of Con1.y wrt Indep1.x and Con2.y wrt Indep1.x. This will require that the Comp1 derivative computation is duplicated in each process, but we don’t care since it’s fast compared to Con1 and Con2.

The code below defines the model described above:

class PartialDependGroup(Group):
def setup(self):
size = 4

Indep1 = self.add_subsystem('Indep1', IndepVarComp('x', np.arange(size, dtype=float)+1.0))

self.linear_solver = LinearBlockGS()
self.linear_solver.options['iprint'] = -1
pargroup.linear_solver = LinearBlockGS()
pargroup.linear_solver.options['iprint'] = -1

delay = .1
Con1 = pargroup.add_subsystem('Con1', SlowComp(delay=delay, size=2, mult=2.0))
Con2 = pargroup.add_subsystem('Con2', SlowComp(delay=delay, size=2, mult=-3.0))

self.connect('Indep1.x', 'Comp1.x')
self.connect('Comp1.y', 'ParallelGroup1.Con1.x')
self.connect('Comp1.y', 'ParallelGroup1.Con2.x')

color = 'parcon'


And here we see that rev mode with parallel derivatives is roughly twice as fast as fwd mode when our ‘slow’ components have a delay of .1 seconds. Without parallel derivatives, the fwd and rev speeds are roughly equivalent.

import time

import numpy as np

from openmdao.api import Problem, PETScVector
from openmdao.core.tests.test_parallel_derivatives import PartialDependGroup

size = 4

of = ['ParallelGroup1.Con1.y', 'ParallelGroup1.Con2.y']
wrt = ['Indep1.x']

# run first in fwd mode
p = Problem(model=PartialDependGroup())
p.setup(vector_class=PETScVector, check=False, mode='fwd')
p.run_model()
elapsed_fwd = time.time()
J = p.compute_totals(of, wrt, return_format='dict')
elapsed_fwd = time.time() - elapsed_fwd

print(J['ParallelGroup1.Con1.y']['Indep1.x'][0])

(rank 0) [ 2.  2.  2.  2.]
(rank 1) [ 2.  2.  2.  2.]
print(J['ParallelGroup1.Con2.y']['Indep1.x'][0])

(rank 0) [-3. -3. -3. -3.]
(rank 1) [-3. -3. -3. -3.]
# now run in rev mode and compare times for deriv calculation
p = Problem(model=PartialDependGroup())
p.setup(vector_class=PETScVector, check=False, mode='rev')
p.run_model()
elapsed_rev = time.time()
J = p.compute_totals(of, wrt, return_format='dict')
elapsed_rev = time.time() - elapsed_rev

print(J['ParallelGroup1.Con1.y']['Indep1.x'][0])

(rank 0) [ 2.  2.  2.  2.]
(rank 1) [ 2.  2.  2.  2.]
print(J['ParallelGroup1.Con2.y']['Indep1.x'][0])

(rank 0) [-3. -3. -3. -3.]
(rank 1) [-3. -3. -3. -3.]
# make sure that rev mode is faster than fwd mode
print(elapsed_fwd / elapsed_rev)

(rank 0) 1.8965326709256212
(rank 1) 1.896473860687437