# Parallel Derivative ComputationΒΆ

Computing derivatives with respect to multiple variables in parallel can result in significant performance increases over computing them in serial, but this should only be done for models where the sets of variables that are dependent on the variables of interest have little or no overlap between them.

For example, a very simple and clear-cut case where parallel derivative
computation would be recommended is the following model, where only one constraint
is dependent on each design variable. *Con1.y* only depends on *Indep1.x* and
*Con2.y* only depends on *Indep2.x*. Assuming the same sizes for the design variables
and constraints, parallel derivatives would work equally well with this model in
fwd or rev mode.

Often, more realistic models do not demonstrate such complete independence between the variables of interest, but even in some of these cases, parallel derivative computation can still be of significant benefit. For example, suppose we have a model where we perform some sort of preliminary calculations that don’t take vary long to run, and we feed those results into multiple components, for example CFD components, that do take a long time to run.

In the model above, both of our constraints, *Con1.y* and *Con2.y* are dependent
on our design variable *Indep1.x*. Let’s assume here also that the size of our
design variable is the same as the combined size of our constraints, and that
*Con1* and *Con2* take much longer to run than *Comp1*.
If we solve for our derivatives using adjoint (rev) mode and we group *Con1.y* and
*Con2.y* by specifying that they have the same *parallel_deriv_color*, we will
compute derivatives for *Con1* and *Con2* concurrently while solving for
the derivatives of *Con1.y wrt Indep1.x* and *Con2.y wrt Indep1.x*. This will
require that the *Comp1* derivative computation is
duplicated in each process, but we don’t care since it’s fast compared
to *Con1* and *Con2*.

The code below defines the model described above:

```
class PartialDependGroup(Group):
def setup(self):
size = 4
Indep1 = self.add_subsystem('Indep1', IndepVarComp('x', np.arange(size, dtype=float)+1.0))
Comp1 = self.add_subsystem('Comp1', SumComp(size))
pargroup = self.add_subsystem('ParallelGroup1', ParallelGroup())
self.linear_solver = LinearBlockGS()
self.linear_solver.options['iprint'] = -1
pargroup.linear_solver = LinearBlockGS()
pargroup.linear_solver.options['iprint'] = -1
delay = .1
Con1 = pargroup.add_subsystem('Con1', SlowComp(delay=delay, size=2, mult=2.0))
Con2 = pargroup.add_subsystem('Con2', SlowComp(delay=delay, size=2, mult=-3.0))
self.connect('Indep1.x', 'Comp1.x')
self.connect('Comp1.y', 'ParallelGroup1.Con1.x')
self.connect('Comp1.y', 'ParallelGroup1.Con2.x')
color = 'parcon'
self.add_design_var('Indep1.x')
self.add_constraint('ParallelGroup1.Con1.y', lower=0.0, parallel_deriv_color=color)
self.add_constraint('ParallelGroup1.Con2.y', upper=0.0, parallel_deriv_color=color)
```

And here we see that rev mode with parallel derivatives is roughly twice as fast as fwd mode when our ‘slow’ components have a delay of .1 seconds. Without parallel derivatives, the fwd and rev speeds are roughly equivalent.

```
import time
import numpy as np
from openmdao.api import Problem, PETScVector
from openmdao.core.tests.test_parallel_derivatives import PartialDependGroup
size = 4
of = ['ParallelGroup1.Con1.y', 'ParallelGroup1.Con2.y']
wrt = ['Indep1.x']
# run first in fwd mode
p = Problem(model=PartialDependGroup())
p.setup(vector_class=PETScVector, check=False, mode='fwd')
p.run_model()
elapsed_fwd = time.time()
J = p.compute_totals(of, wrt, return_format='dict')
elapsed_fwd = time.time() - elapsed_fwd
print(J['ParallelGroup1.Con1.y']['Indep1.x'][0])
```

(rank 0) [ 2. 2. 2. 2.] (rank 1) [ 2. 2. 2. 2.]

```
print(J['ParallelGroup1.Con2.y']['Indep1.x'][0])
```

(rank 0) [-3. -3. -3. -3.] (rank 1) [-3. -3. -3. -3.]

```
# now run in rev mode and compare times for deriv calculation
p = Problem(model=PartialDependGroup())
p.setup(vector_class=PETScVector, check=False, mode='rev')
p.run_model()
elapsed_rev = time.time()
J = p.compute_totals(of, wrt, return_format='dict')
elapsed_rev = time.time() - elapsed_rev
print(J['ParallelGroup1.Con1.y']['Indep1.x'][0])
```

(rank 0) [ 2. 2. 2. 2.] (rank 1) [ 2. 2. 2. 2.]

```
print(J['ParallelGroup1.Con2.y']['Indep1.x'][0])
```

(rank 0) [-3. -3. -3. -3.] (rank 1) [-3. -3. -3. -3.]

```
# make sure that rev mode is faster than fwd mode
print(elapsed_fwd / elapsed_rev)
```

(rank 0) 1.8965326709256212 (rank 1) 1.896473860687437