Warning: This is a very advanced topic for dealing with parallel compute resources in OpenMDAO. If you have not gone through at least our basic tutorial and our optimization tutorial, then you should become more familiar with the foundations of OpenMDAO before delving into this section.
OpenMDAO uses the concept of resource allocators to decouple the specification of a simulation from the computational resources it requires. This makes simulations more portable and also allows for the currently “best” resource to be used. The ResourceAllocationManager (RAM) manages the selection of a server from one or more registered allocators. ExternalCode, CaseIteratorDriver, and DOEdriver are able to use this facility.
During ExternalCode execution, if the instance has an empty resources dictionary, then the external code is run locally and started directly by the ExternalCode instance. If, however, the resources dictionary is not empty, then it is used to allocate a server process which can support the resource request. This allocation process is performed by the RAM. Once a suitable server is found, the ExternalCode instance will send any input files to the server, invoke execute_command() on the server, and then retrieve any output files.
During CaseIteratorDriver or DOEdriver execution, a resource allocation is performed for each case to be evaluated (unless sequential execution is specified). Once the server is allocated, the sub-model egg is loaded into the server, input variables are set, the model is run, and outputs are retrieved. The resource allocator normally just looks for a compatible server based on the sub-model’s Python requirements, but you can add additional resource information via the extra_resources attribute. The RAM method max_request() can be useful for generating the extra_resources attribute value when the sub-model contains resource descriptions (for example, when the sub-model contains a wrapper for a parallel CFD code). In some circumstances, particularly when submitting from a Windows client to a Linux server (or vice-versa), there will be spurious Python incompatibilities. You can try forcing a submission by setting the ignore_egg_requirements attribute to True.
There are several OpenMDAO resource allocators available:
Since some types of allocated servers are capable of submitting jobs to queuing systems, a resource description is a dictionary that can include both allocation and queuing information. Allocation keys are used to find suitable servers while queuing keys are used to describe the job to be submitted.
Allocation Key | Value | Description |
---|---|---|
allocator | string | Name of allocator to use |
localhost | bool | Must be/must not be on the local host |
exclude | list | Hostnames to exclude |
required_distributions | list | List of pkg_resources.Distribution or package requirement strings |
orphan_modules | list | List of “orphan” module names |
python_version | string | Python version required (e.g., “2.7”) |
python_platform | string | Python platform required (e.g., “linux-x86_64”) |
min_cpus | int | Minimum number of CPUs/cores required |
max_cpus | int | Maximum number of CPUs/cores that can be used |
min_phys_memory | int | Minimum amount of memory required (KB) |
Values for required_distributions and orphan_modules are typically taken from the return value of component.save_to_egg(). The value for python_platform is typically taken from the return value of distutils.util.get_platform(). The min_phys_memory key is also used as a queuing key. The min_cpus and max_cpus keys are also used as queuing keys for parallel applications. They are analogous to the DRMAA (Distributed Resource Management Application API) minSlots and maxSlots attributes, with the intent that a “cpu” can execute an MPI process (A DRMAA “slot” is opaque and can have different interpretations).
Most of the queuing keys are derived from the DRMAA standard JobTemplate:
Queuing Key | Value | Description |
---|---|---|
remote_command | string | Command to execute (just the command, no arguments) |
args | list | Arguments for the command |
submit_as_hold | bool | Submit job to start in HOLD state |
rerunnable | bool | Job is rerunnable (default False) |
job_environment | dict | Any additional environment variables needed |
working_directory | string | Directory to execute in (use with care) |
job_category | string | Type of job, useful for parallel codes |
list | List of email addresses to notify | |
email_on_started | bool | Notify when jobs starts |
email_on_terminated | bool | Notify when job terminates |
job_name | string | Name for the submitted job |
input_path | string | Path for stdin |
output_path | string | Path for stdout |
error_path | string | Path for stderr |
join_files | bool | If True, stderr is joined with stdout |
reservation_id | string | ID of reservation (obtained externally) |
queue_name | string | Name of queue to use |
priority | int | Queuing priority |
start_time | datetime | Timestamp for when to start the job |
deadline_time | datetime | Timestamp for when the job must be complete |
resource_limits | dict | Job resource limits (see below) |
accounting_id | string | ID used for job accounting |
native_specification | list | Queuing system specific options |
Using native_specification is discouraged since that makes the submitting application less portable. However, at times its use is necessary in order to access specific features of a queuing system.
DRMAA derived job categories:
Category | Environment |
---|---|
MPI | Any MPI environment |
GridMPI | A GridMPI environment |
LAM-MPI | A LAM/MPI environment |
MPICH1 | A MPICH version 1 environment |
MPICH2 | A MPICH version 2 environment |
OpenMPI | A OpenMPI environment |
PVM | A PVM environment |
OpenMP | A OpenMP environment |
OpenCL | A OpenCL environment |
Java | A Java environment |
DRMAA derived resource limits:
Name | Type |
---|---|
core_file_size | Soft |
data_seg_size | Soft |
file_size | Soft |
open_files | Soft |
stack_size | Soft |
virtual_memory | Soft |
cpu_time | Hard |
wallclock_time | Hard |
Soft limits do not affect scheduling decisions. Hard limits may be used for scheduling.
Times are in seconds.
The HOME_DIRECTORY and WORKING_DIRECTORY constants in openmdao.main.resource may be used as placeholders in path specifications. They are translated at the server.
Not all resource allocators support all the features listed above. Consult the allocator documentation to see what is supported and to find out how the features are translated to the system the allocator interfaces with.