How Total Derivatives are Computed

This is a comprehensive document about how OpenMDAO solves for total derivatives. Total derivatives are primarily needed for gradient based optimizations methods. While it is possible to use finite-differences to approximate total derivatives, for larger multidisciplinary models this approach is notoriously inaccurate. Using OpenMDAO’s total derivatives features can significantly improve the efficiency of your gradient based optimization implementation.

Note

Total derivatives are also useful for other applications such as gradient enhanced surrogate modeling and dimensionality reduction for active subspaces.

The goal of this document is to help you understand how the underlying algorithms work, and when they are appropriate to apply to your model. It is designed to be read in the order it’s presented, with later sections assuming understanding of earlier ones.

Terminology

Before diving into how OpenMDAO solves for total derivatives, it is important that we define a pair of key terms. Within the context of an OpenMDAO model we recognize two types of derivatives:

  • Partial Derivative: Derivatives of the outputs or residuals of a single component with respect to that component’s inputs.
  • Total Derivative: Derivative of an objective or constraint with respect to design variables.

Partial derivatives are either provided by the user, or they can be computed numerically using finite-difference or complex-step. Although partial derivatives are an important subject, this document is focused on the computation of total derivatives via the solution of a linear system. Since we are focused on total derivatives, we will assume that the partial derivatives for any given component are known a priori, and not address them further in this part of the Theory Manual.

Unified Derivatives Equations

A model’s fundamental purpose is to compute an objective or constraint as a function of design variables. In order to perform those computations the model moves data through many different calculations (defined in the components) using many intermediate variables. Internally, OpenMDAO doesn’t distinguish between objective/constraint variables, design variables, or intermediate variables; they are all just variables, following the mathematical formulation prescribed by the MAUD architecture, developed by Hwang and Martins. Using that formulation, it is possible to compute the total derivative of any explicit variable with respect to any other explicit variable by solving a linear system of equations, called the Unified Derivative Equations (UDE).

\[\left[\frac{\partial \mathcal{R}}{\partial o}\right] \left[\frac{do}{dr}\right] = \left[ I \right],\]

or by solving a linear system in the reverse (adjoint) form:

\[\left[\frac{\partial \mathcal{R}}{\partial o}\right]^T \left[\frac{do}{dr}\right]^T = \left[ I \right].\]

Where \(o\) denotes the vector of all the variables within the model (i.e. every output of every component), \(\mathcal{R}\) denotes the vector of residual functions, \(r\) is the vector of residual values, \(\left[\frac{\partial \mathcal{R}}{\partial o}\right]\) is the Jacobian matrix of all the partial derivatives, and \(\left[\frac{do}{dr}\right]\) is the matrix of total derivatives of \(o\) with respect to \(r\) .

It might not seem like derivatives with respect to residual values are inherently useful, however we can define the residual of explicit functions carefully in order to make use of the UDE to compute meaningful total derivatives. If you have an explicit function, \(f = F(o)\), then you can define an equivalent implicit function as

\[r_f = f - F(o) = 0\]

Then it follows that

\[\left[\frac{do}{dr_f}\right] = \left[\frac{do}{df}\right]\]

Thus, since \(\left[\frac{\partial \mathcal{R}}{\partial o}\right]\) is known because all the components provide their respective partial derivatives, we can solve the UDE linear system to compute the total derivatives we need for optimization. A one column from the identity matrix is chosen for the right hand side and the solutions provides one piece of \(\left[\frac{do}{dr}\right]\).

In forward form, one linear solve is performed per design variable and the solution vector of the UDE gives one column of \(\left[\frac{do}{dr}\right]\). In reverse form, one linear solve is performed per objective/constraint and the solution vector of the UDE gives one column of \(\left[\frac{do}{dr}\right]^T\) (or one row of \(\left[\frac{do}{dr}\right]\)). Selecting between forward and reverse linear solver modes is just a matter of counting how many design variables and constraints you have, and picking whichever form yields the fewest linear solves.

Although the forward and reverse forms of the unified derivatives equations are very simple, solving them efficiently for a range of different kinds of models requires careful implementation. In some cases, it is as simple as assembling the partial derivative Jacobian matrix and inverting it. In other cases, a distributed memory matrix-free linear solver is needed. Understanding a bit of the theory will help you to properly leverage the features in OpenMDAO to set up linear solves efficiently.

Setting Up a Model for Efficient Linear Solves

There are a number of different features that you can use control how the linear solves are performed that will have an impact on both the speed and accuracy of the linear solution. A deeper understanding of how OpenMDAO solves the unified derivatives equations is useful in understanding when to apply certain features, and may also help you structure your model to make the most effective use of them. The explanation of OpenMDAO’s features for improving linear solver performance are broken up into three sections below:

Advanced Linear Solver Features for Special Cases

There are certain cases where it is possible to further improve linear solver performance via the application of specialized algorithms. In some cases, the application of these algorithms can have an impact on whether you choose the forward or reverse mode for derivative solves. This section details the types of structures within a model that are necessary in order to benefit from these algorithms.