Cart-Pole Optimal Control

Cart-Pole Optimal Control#

This example is authored by Shugo Kaneko and Bernardo Pacini of the MDO Lab. The cart-pole problem is an instructional case described in An introduction to trajectory optimization: How to do your own direct collocation [Kel17], and is adapted to work within Dymos. We consider a pole that can rotate freely attached to a cart, on which we can exert an external force (control input) in the \(x\)-direction.

Our goal is to bring the cart-pole system from an initial state to a terminal state with minimum control efforts. The initial state is the stable stationary point (the cart at a stop with the pole vertically down), and the terminal state is the unstable stationary state (the cart at a stop but with the pole vertically up). Friction force is ignored to simplify the problem.

Cart-pole optimal control from the initial state to the terminal state.

Trajectory Optimization Problem#

We use the following quadratic objective function to approximately minimize the total control effort:

(91)#\[\begin{equation} J = \int_{t_0}^{t_f} F(t)^2 dt ~~ \rightarrow ~ \min \end{equation}\]

where \(F(t)\) is the external force, \(t_0\) is the initial time, and \(t_f\) is the final time.

Dynamics#

The equations of motion of the cart-pole system are given by

(92)#\[\begin{equation} \begin{bmatrix} \ddot{x} \\ \ddot{\theta} \end{bmatrix} = \begin{bmatrix} \cos \theta & \ell \\ m_1 + m_2 & m_2 \ell \cos \theta \end{bmatrix}^{-1} \begin{bmatrix} -g \sin \theta \\ F + m_2 \ell \dot{\theta}^2 \sin \theta \end{bmatrix} \end{equation}\]

where \(x\) is the cart location, \(\theta\) is the pole angle, \(m_1\) is the cart mass, \(m_2\) is the pole mass, and \(\ell\) is the pole length.

Schematic of the cart-pole system.

Now, we need to convert the equations of motion, which are a second-order ODE, to a first-order ODE. To do so, we define our state vector to be \(X = [x, \dot{x}, \theta, \dot{\theta}]^T\). We also add an “energy” state \(e\) and set \(\dot{e} = F^2\) to keep track of the accumulated control input. By setting setting \(e_0 = 0\), the objective function is equal to the final value of the state \(e\):

(93)#\[\begin{equation} J = \int_{t_0}^{t_f} \dot{e} ~dt = e_f \end{equation}\]

To summarize, the ODE for the cart-pole system is given by

(94)#\[\begin{equation} \begin{bmatrix} \dot{x} \\ \dot{\theta} \\ \ddot{x} \\ \ddot{\theta} \\ \dot{e} \end{bmatrix} = f \left( \begin{bmatrix} x \\ \theta \\ \dot{x} \\ \dot{\theta} \\ e \end{bmatrix} \right)= \begin{bmatrix} \dot{x} \\ \dot{\theta} \\ \frac{-m_2 g \sin \theta \cos \theta - (F + m_2 \ell \dot{\theta}^2 \sin \theta)}{m_2 \cos^2 \theta - (m_1 + m_2)} \\ \frac{(m_1 + m_2) g \sin \theta + \cos \theta (F + m_1 \ell \dot{\theta}^2 \sin \theta)}{m_2 \ell \cos^2 \theta - (m_1 + m_2) \ell} \\ F^2 \\ \end{bmatrix} \end{equation}\]

Initial and terminal conditions#

The initial state variables are all zero at \(t_0 = 0\), and the final conditions at time \(t_f\) are

(95)#\[\begin{align} x_f &= d \\ \dot{x}_f &= 0 \\ \theta_f &= \pi \\ \dot{\theta_f} &= 0 \end{align}\]

Parameters#

The fixed parameters are summarized as follows.

Parameter	Value	Units	Description
\(m_1\)	1.0	kg	Cart mass
\(m_2\)	0.3	kg	Pole mass
\(\ell\)	0.5	m	Pole length
\(d\)	2	m	Cart target location
\(t_f\)	2	s	Final time

Implementing the ODE#

We first implement the cart-pole ODE as an ExplicitComponent as follows:

import openmdao.api as om
om.display_source("dymos.examples.cart_pole.cartpole_dynamics")

"""
Cart-pole dynamics (ODE)
"""

import numpy as np
import openmdao.api as om


class CartPoleDynamics(om.ExplicitComponent):
    """
    Computes the time derivatives of states given state variables and control inputs.

    Parameters
    ----------
    m_cart : float
        Mass of cart.
    m_pole : float
        Mass of pole.
    l_pole : float
        Length of pole.
    theta : 1d array
        Angle of pole, 0 for vertical downward, positive counter clockwise.
    theta_dot : 1d array
        Angluar velocity of pole.
    f : 1d array
        x-wise force applied to the cart.

    Returns
    -------
    x_dotdot : 1d array
        Acceleration of cart in x direction.
    theta_dotdot : 1d array
        Angular acceleration of pole.
    e_dot : 1d array
        Rate of "energy" state.
    """

    def initialize(self):
        self.options.declare(
            "num_nodes", default=1, desc="number of nodes to be evaluated (i.e., length of vectors x, theta, etc)"
        )
        self.options.declare("g", default=9.81, desc="gravity constant")

    def setup(self):
        nn = self.options["num_nodes"]

        # --- inputs ---
        # cart-pole parameters
        self.add_input("m_cart", shape=(1,), units="kg", desc="cart mass")
        self.add_input("m_pole", shape=(1,), units="kg", desc="pole mass")
        self.add_input("l_pole", shape=(1,), units="m", desc="pole length")
        # state variables. States x, x_dot, energy have no influence on the outputs, so we don't need them as inputs.
        self.add_input("theta", shape=(nn,), units="rad", desc="pole angle")
        self.add_input("theta_dot", shape=(nn,), units="rad/s", desc="pole angle velocity")
        # control input
        self.add_input("f", shape=(nn,), units="N", desc="force applied to cart in x direction")

        # --- outputs ---
        # rate of states (accelerations)
        self.add_output("x_dotdot", shape=(nn,), units="m/s**2", desc="x acceleration of cart")
        self.add_output("theta_dotdot", shape=(nn,), units="rad/s**2", desc="angular acceleration of pole")
        # also computes force**2, which will be integrated to compute the objective
        self.add_output("e_dot", shape=(nn,), units="N**2", desc="square of force to be integrated")

        # ---  partials ---.
        # Jacobian of outputs w.r.t. state/control inputs is diagonal
        # because each node (corresponds to time discretization) is independent
        self.declare_partials(of=["*"], wrt=["theta", "theta_dot", "f"], method="exact", rows=np.arange(nn), cols=np.arange(nn))

        # partials of outputs w.r.t. cart-pole parameters. We will use complex-step, but still declare the sparsity structure.
        # NOTE: since the cart-pole parameters are fixed during optimization, these partials are not necessary to declare.
        self.declare_partials(of=["*"], wrt=["m_cart", "m_pole", "l_pole"], method="cs", rows=np.arange(nn), cols=np.zeros(nn))
        self.set_check_partial_options(wrt=["m_cart", "m_pole", "l_pole"], method="fd", step=1e-7)

    def compute(self, inputs, outputs):
        g = self.options["g"]
        mc = inputs["m_cart"]
        mp = inputs["m_pole"]
        lpole = inputs["l_pole"]
        theta = inputs["theta"]
        omega = inputs["theta_dot"]
        f = inputs["f"]

        sint = np.sin(theta)
        cost = np.cos(theta)
        det = mp * lpole * cost**2 - lpole * (mc + mp)
        outputs["x_dotdot"] = (-mp * lpole * g * sint * cost - lpole * (f + mp * lpole * omega**2 * sint)) / det
        outputs["theta_dotdot"] = ((mc + mp) * g * sint + cost * (f + mp * lpole * omega**2 * sint)) / det
        outputs["e_dot"] = f**2

    def compute_partials(self, inputs, jacobian):
        g = self.options["g"]
        mc = inputs["m_cart"]
        mp = inputs["m_pole"]
        lpole = inputs["l_pole"]
        theta = inputs["theta"]
        theta_dot = inputs["theta_dot"]
        f = inputs["f"]

        # --- derivatives of x_dotdot ---
        # Collecting Theta Derivative
        low = mp * lpole * np.cos(theta) ** 2 - lpole * mc - lpole * mp
        dhigh = (
            mp * g * lpole * np.sin(theta) ** 2 -
            mp * g * lpole * np.cos(theta) ** 2 -
            lpole**2 * mp * theta_dot**2 * np.cos(theta)
        )
        high = -mp * g * lpole * np.cos(theta) * np.sin(theta) - lpole * f - lpole**2 * mp * theta_dot**2 * np.sin(theta)
        dlow = 2.0 * mp * lpole * np.cos(theta) * (-np.sin(theta))

        jacobian["x_dotdot", "theta"] = (low * dhigh - high * dlow) / low**2
        jacobian["x_dotdot", "theta_dot"] = (
            -2.0 * theta_dot * lpole**2 * mp * np.sin(theta) / (mp * lpole * np.cos(theta) ** 2 - lpole * mc - lpole * mp)
        )
        jacobian["x_dotdot", "f"] = -lpole / (mp * lpole * np.cos(theta) ** 2 - lpole * mc - lpole * mp)

        # --- derivatives of theta_dotdot ---
        # Collecting Theta Derivative
        low = mp * lpole * np.cos(theta) ** 2 - lpole * mc - lpole * mp
        dlow = 2.0 * mp * lpole * np.cos(theta) * (-np.sin(theta))
        high = (mc + mp) * g * np.sin(theta) + f * np.cos(theta) + mp * lpole * theta_dot**2 * np.sin(theta) * np.cos(theta)
        dhigh = (
            (mc + mp) * g * np.cos(theta) -
            f * np.sin(theta) +
            mp * lpole * theta_dot**2 * (np.cos(theta) ** 2 - np.sin(theta) ** 2)
        )

        jacobian["theta_dotdot", "theta"] = (low * dhigh - high * dlow) / low**2
        jacobian["theta_dotdot", "theta_dot"] = (
            2.0 *
            theta_dot *
            mp *
            lpole *
            np.sin(theta) *
            np.cos(theta) /
            (mp * lpole * np.cos(theta) ** 2 - lpole * mc - lpole * mp)
        )
        jacobian["theta_dotdot", "f"] = np.cos(theta) / (mp * lpole * np.cos(theta) ** 2 - lpole * mc - lpole * mp)

        # --- derivatives of e_dot ---
        jacobian["e_dot", "theta"] = 0.0
        jacobian["e_dot", "theta_dot"] = 0.0
        jacobian["e_dot", "f"] = 2.0 * f

Building and running the problem#

The following is a runscript of the cart-pole optimal control problem. First, we instantiate the OpenMDAO problem and set up the Dymos trajectory, phase, and transcription.

"""
Cart-pole optimizatio runscript
"""

import numpy as np
import openmdao.api as om
import dymos as dm
from dymos.examples.plotting import plot_results
from dymos.examples.cart_pole.cartpole_dynamics import CartPoleDynamics

p = om.Problem()

# --- instantiate trajectory and phase, setup transcription ---
traj = dm.Trajectory()
p.model.add_subsystem('traj', traj)
phase = dm.Phase(transcription=dm.GaussLobatto(num_segments=40, order=3, compressed=True, solve_segments=False), ode_class=CartPoleDynamics)
# NOTE: set solve_segments=True to do solver-based shooting
traj.add_phase('phase', phase)

<dymos.phase.phase.Phase at 0x7fd8fe9ed4d0>

Next, we add the state variables, controls, and cart-pole parameters.

# --- set state and control variables ---
phase.set_time_options(fix_initial=True, fix_duration=True, duration_val=2., units='s')
# declare state variables. You can also set lower/upper bounds and scalings here.
phase.add_state('x', fix_initial=True, lower=-2, upper=2, rate_source='x_dot', shape=(1,), ref=1, defect_ref=1, units='m')
phase.add_state('x_dot', fix_initial=True, rate_source='x_dotdot', shape=(1,), ref=1, defect_ref=1, units='m/s')
phase.add_state('theta', fix_initial=True, rate_source='theta_dot', shape=(1,), ref=1, defect_ref=1, units='rad')
phase.add_state('theta_dot', fix_initial=True, rate_source='theta_dotdot', shape=(1,), ref=1, defect_ref=1, units='rad/s')
phase.add_state('energy', fix_initial=True, rate_source='e_dot', shape=(1,), ref=1, defect_ref=1, units='N**2*s')  # integration of force**2. This does not have the energy unit, but I call it "energy" anyway.

# declare control inputs
phase.add_control('f', fix_initial=False, rate_continuity=False, lower=-20, upper=20, shape=(1,), ref=0.01, units='N')

# add cart-pole parameters (set static_target=True because these params are not time-depencent)
phase.add_parameter('m_cart', val=1., units='kg', static_target=True)
phase.add_parameter('m_pole', val=0.3, units='kg', static_target=True)
phase.add_parameter('l_pole', val=0.5, units='m', static_target=True)

We set the terminal conditions as boundary constraints and declare the optimization objective.

# --- set terminal constraint ---
# alternatively, you can impose those by setting `fix_final=True` in phase.add_state()
phase.add_boundary_constraint('x', loc='final', equals=1, ref=1., units='m')  # final horizontal displacement
phase.add_boundary_constraint('theta', loc='final', equals=np.pi, ref=1., units='rad')  # final pole angle
phase.add_boundary_constraint('x_dot', loc='final', equals=0, ref=1., units='m/s')  # 0 velocity at the and
phase.add_boundary_constraint('theta_dot', loc='final', equals=0, ref=0.1, units='rad/s')  # 0 angular velocity at the end
phase.add_boundary_constraint('f', loc='final', equals=0, ref=1., units='N')  # 0 force at the end

# --- set objective function ---
# we minimize the integral of force**2.
phase.add_objective('energy', loc='final', ref=100)

Next, we configure the optimizer and declare the total Jacobian coloring to accelerate the derivative computations. We then call the setup method to setup the OpenMDAO problem.

# --- configure optimizer ---
p.driver = om.pyOptSparseDriver()
p.driver.options["optimizer"] = "IPOPT"
# IPOPT options
p.driver.opt_settings['mu_init'] = 1e-1
p.driver.opt_settings['max_iter'] = 600
p.driver.opt_settings['constr_viol_tol'] = 1e-6
p.driver.opt_settings['compl_inf_tol'] = 1e-6
p.driver.opt_settings['tol'] = 1e-5
p.driver.opt_settings['print_level'] = 0
p.driver.opt_settings['nlp_scaling_method'] = 'gradient-based'
p.driver.opt_settings['alpha_for_y'] = 'safer-min-dual-infeas'
p.driver.opt_settings['mu_strategy'] = 'monotone'
p.driver.opt_settings['bound_mult_init_method'] = 'mu-based'
p.driver.options['print_results'] = False

# declare total derivative coloring to accelerate the UDE linear solves
p.driver.declare_coloring()

p.setup(check=False)

--- Constraint Report [traj] ---
    --- phase ---
        [final]   1.0000e+00 == x [m]
        [final]   3.1416e+00 == theta [rad]
        [final]   0.0000e+00 == x_dot [m/s]
        [final]   0.0000e+00 == theta_dot [rad/s]
        [final]   0.0000e+00 == f [N]

<openmdao.core.problem.Problem at 0x7fd8fe9abf50>

Now we are ready to run optimization. But before that, set the initial optimization variables using set_val methods to help convergence.

# --- set initial guess ---
# The initial condition of cart-pole (i.e., state values at time 0) is set here because we set `fix_initial=True` when declaring the states.
phase.set_time_val(initial=0.0)  # set initial time to 0.
phase.set_state_val('x', vals=[0, 1, 1], time_vals=[0, 1, 2], units='m')
phase.set_state_val('x_dot', vals=[0, 0.1, 0], time_vals=[0, 1, 2], units='m/s')
phase.set_state_val('theta', vals=[0, np.pi/2, np.pi], time_vals=[0, 1, 2], units='rad')
phase.set_state_val('theta_dot', vals=[0, 1, 0], time_vals=[0, 1, 2], units='rad/s')
phase.set_state_val('energy', vals=[0, 30, 60], time_vals=[0, 1, 2])
phase.set_control_val('f', vals=[3, -1, 0], time_vals=[0, 1, 2], units='N')

# --- run optimization ---
dm.run_problem(p, run_driver=True, simulate=True, simulate_kwargs={'method' : 'Radau', 'times_per_seg' : 10})
# NOTE: with Simulate=True, dymos will call scipy.integrate.solve_ivp and simulate the trajectory using the optimized control inputs.

Model viewer data has already been recorded for Driver.

Model viewer data has already been recorded for Driver.

Full total jacobian for problem 'problem' was computed 3 times, taking 0.5698703309999473 seconds.
Total jacobian shape: (206, 281) 


Jacobian shape: (206, 281)  (2.82% nonzero)
FWD solves: 10   REV solves: 0
Total colors vs. total size: 10 vs 281  (96.44% improvement)

Sparsity computed using tolerance: 1e-25
Time to compute sparsity:   0.5699 sec
Time to compute coloring:   0.0880 sec
Memory to compute coloring:   0.2500 MB
Coloring created on: 2024-08-15 12:49:33

INFO: checking out_of_order

INFO: checking system

INFO: checking solvers

INFO: checking dup_inputs

INFO: checking missing_recorders

WARNING: The Problem has no recorder of any kind attached

INFO: checking unserializable_options

INFO: checking comp_has_no_outputs

INFO: checking auto_ivc_warnings

Simulating trajectory traj
Model viewer data has already been recorded for Driver.
INFO: checking out_of_order

INFO: checking system

INFO: checking solvers

INFO: checking dup_inputs

INFO: checking missing_recorders

WARNING: The Problem has no recorder of any kind attached

INFO: checking unserializable_options

INFO: checking comp_has_no_outputs

INFO: checking auto_ivc_warnings

/usr/share/miniconda/envs/test/lib/python3.11/site-packages/openmdao/core/group.py:1101: DerivativesWarning:Constraints or objectives [ode_eval.control_interp.control_rates:f_rate2] cannot be impacted by the design variables of the problem because no partials were defined for them in their parent component(s).

Done simulating trajectory traj

Problem: problem
Driver:  pyOptSparseDriver
  success     : True
  iterations  : 37
  runtime     : 2.0638E+00 s
  model_evals : 37
  model_time  : 3.4594E-02 s
  deriv_evals : 36
  deriv_time  : 4.3745E-01 s
  exit_status : SUCCESS

After running optimization and simulation, the results can be plotted using the plot_results function of Dymos.

# --- get results and plot ---
# objective value
obj = p.get_val('traj.phase.states:energy', units='N**2*s')[-1]
print('objective value:', obj)

# get optimization solution and simulation (time integration) results
sol = om.CaseReader('dymos_solution.db').get_case('final')
sim = om.CaseReader('dymos_simulation.db').get_case('final')

# plot time histories of x, x_dot, theta, theta_dot
plot_results([('traj.phase.timeseries.time', 'traj.phase.timeseries.x', 'time (s)', 'x (m)'),
              ('traj.phase.timeseries.time', 'traj.phase.timeseries.x_dot', 'time (s)', 'vx (m/s)'),
              ('traj.phase.timeseries.time', 'traj.phase.timeseries.theta', 'time (s)', 'theta (rad)'),
              ('traj.phase.timeseries.time', 'traj.phase.timeseries.theta_dot', 'time (s)', 'theta_dot (rad/s)'),
              ('traj.phase.timeseries.time', 'traj.phase.timeseries.f', 'time (s)', 'control (N)')],
             title='Cart-Pole Problem', p_sol=sol, p_sim=sim)

# uncomment the following lines to show the cart-pole animation
### x = sol.get_val('traj.phase.timeseries.x', units='m')
### theta = sol.get_val('traj.phase.timeseries.theta', units='rad')
### force = sol.get_val('traj.phase.timeseries.f', units='N')
### npts = len(x)

### from dymos.examples.cart_pole.animate_cartpole import animate_cartpole
### animate_cartpole(x.reshape(npts), theta.reshape(npts), force.reshape(npts), interval=20, force_scaler=0.02)

objective value: [58.83924782]

(<Figure size 1000x800 with 5 Axes>,
 array([<Axes: xlabel='time (s)', ylabel='x (m)'>,
        <Axes: xlabel='time (s)', ylabel='vx (m/s)'>,
        <Axes: xlabel='time (s)', ylabel='theta (rad)'>,
        <Axes: xlabel='time (s)', ylabel='theta_dot (rad/s)'>,
        <Axes: xlabel='time (s)', ylabel='control (N)'>], dtype=object))

The optimized cart-pole motion should look like the following:

Cart-pole optimized trajectory.

References#

[Kel17]

Matthew Kelly. An introduction to trajectory optimization: How to do your own direct collocation. SIAM Review, 59(4):849–904, 2017. doi:10.1137/16M1062569.