# Speeding up Derivative Approximations with Parallel Finite Difference and Parallel Complex Step¶

If you have multiple processors available to you, it’s possible to speed up the calculation of approximated partials or totals by computing multiple columns of the approximated Jacobian matrix simultaneously across multiple processes. Setting up parallel finite difference or parallel complex step is identical to setting up for serial execution, except for the addition of a single __init__ arg that sets num_par_fd on the System(s) where you want to compute the approximate derivatives. The num_par_fd arg specifies the number of approximated Jacobian columns that will be computed in parallel, assuming that enough processes are provided when the problem script is executed.

As a simple example, lets use parallel finite difference to compute the partial and total derivatives using a simple model containing a component that computes its outputs by multiplying its inputs by a constant matrix. The partial jacobian for that component will be identical to that constant matrix. The component we’ll use is shown below:

%%px

import time
import openmdao.api as om

class MatMultComp(om.ExplicitComponent):
def __init__(self, mat, approx_method='exact', sleep_time=0.1, **kwargs):
super().__init__(**kwargs)
self.mat = mat
self.approx_method = approx_method
self.sleep_time = sleep_time

def setup(self):
self.num_computes = 0

def setup_partials(self):
self.declare_partials('*', '*', method=self.approx_method)

def compute(self, inputs, outputs):
outputs['y'] = self.mat.dot(inputs['x'])
self.num_computes += 1
time.sleep(self.sleep_time)


Our model will also contain an IndepVarComp that we’ll connect to our MatMultComp, and we’ll compute total derivatives of our MatMultComp outputs with respect to our IndepVarComp output.

More details about setting up for partial derivative approximation can be found here and an explanation of total derivative approximation can be found here.

## Allocating processes¶

In both cases we’ll be running our Python script under MPI using 3 processes. We need 3 processes because our model requires 1 process and we’ll be using a num_par_fd value of 3, so the number of processes we need is 3 * 1 = 3. In general, When we set num_par_fd, it acts as a multiplier on the number of processes needed for any given system when running under MPI. For example, if a given system requires N processes, then that same system when using parallel finite difference will require N * num_par_fd processes. This is because we’re duplicating the given system num_par_fd times and using those duplicate systems to solve for different columns of our approximated Jacobian in parallel.

As a reminder, to run our script and give it 3 processes, we would do the following:

    mpirun -n 3 python my_problem.py


## Examples¶

First, let’s compute the partial derivatives across our MatMultComp. We’ll use a matrix with 6 columns and a num_par_fd value on our MatMultComp component of 4, meaning that each of our 4 processes should compute 1 or 2 columns of the partial Jacobian.

%%px

mat = np.arange(30, dtype=float).reshape(5, 6)

p = om.Problem()
model = p.model
comp = model.add_subsystem('comp', MatMultComp(mat, approx_method='fd', num_par_fd=4))

model.set_input_defaults('comp.x', val=np.ones(mat.shape))
p.setup(mode='fwd')
p.run_model()

pre_count = comp.num_computes

# calling compute_totals will result in the computation of partials for comp
p.compute_totals(of=['comp.y'], wrt=['comp.x'])

# get the partial jacobian matrix
J = comp._jacobian['y', 'x']

post_count =  comp.num_computes

# how many computes were used in this proc to compute the partial jacobian?
# Each proc should be doing 2 computes.
jac_count = post_count - pre_count

print(jac_count)

[stdout:0] 2
[stdout:1] 2
[stdout:2] 1
[stdout:3] 1

%%px

# J and mat should be the same
print(np.linalg.norm(J - mat))

[stdout:0] 3.1735129369127036e-08
[stdout:1] 3.1735129369127036e-08
[stdout:2] 3.1735129369127036e-08
[stdout:3] 3.1735129369127036e-08


Next, let’s compute the total derivatives of our MatMulComp outputs with respect to its input. Again, we’ll be using a num_par_fd value of 2 and a matrix having 6 columns, so each process should compute either 1 or 2 columns of the total Jacobian. This time, however, we set the num_par_fd on our model instead of on our MatMultComp.

%%px

mat = np.arange(30, dtype=float).reshape(5, 6)

p = om.Problem(model=om.Group(num_par_fd=4))
model = p.model
model.approx_totals(method='fd')

model.set_input_defaults('comp.x', val=np.ones(mat.shape))
p.setup(mode='fwd')
p.run_model()

pre_count = comp.num_computes

J = p.compute_totals(of=['comp.y'], wrt=['comp.x'], return_format='array')

post_count =  comp.num_computes

# how many computes were used in this proc to compute the total jacobian?
# Each proc should be doing 2 computes.
jac_count = post_count - pre_count

print(jac_count)

[stdout:0] 2
[stdout:1] 2
[stdout:2] 1
[stdout:3] 1

%%px

# J and mat should be the same
print(np.linalg.norm(J - mat))

[stdout:0] 3.337203127736024e-08
[stdout:1] 3.337203127736024e-08
[stdout:2] 3.337203127736024e-08
[stdout:3] 3.337203127736024e-08