GPU-enabled MPAS-A: Lagged Radiation¶

To make more effective use of the full computational hardware available on a node, and to reduce GPU memory requirements, the RRTMG radiation schemes are always run on the CPUs of a node, while the rest of the physics and the dynamics are run on the GPUs of a node.

During the first model timestep of a simulation, the following take place:

The model state on the CPUs is updated
The radiation schemes are run to produce tendencies due to radiation
Those tendencies are transferred to the GPUs for use in the first dynamics time step

Thereafter, at the radiation calling interval specified in the namelist.atmosphere file, the current model state is transferred to the MPI tasks running the RRTMG schemes, and radiation tendencies computed from the lagged model state are transferred to the MPI tasks running the dynamics. In this way, the model dynamics running on the GPUs applies physics tendencies that were computed from a model state valid one radiation calling interval in the past.

A timeline of the model execution is illustrated in the figure, below.

Overlap of radiation executing on CPUs with other physics and dynamics executing on GPUs. 𝜙𝑚 represents the model state after 𝑚 dynamics steps, and 𝜙˙𝑚𝑅 represents the tendencies due to radiation computed from 𝜙𝑚. The ratio of the radiation interval to the dynamics time step is 𝑛.

The figure also illustrates an important point regarding model throughput: the time spent by the CPUs to call the RRTMG schemes should ideally match the time spent by the GPUs to run the rest of the model during each radiation coupling interval.

Several parameters may be used to balance the radiation computation with the computation in the rest of the model:

The radiation calling interval, specified in the namelist.atmosphere file with the config_radtlw_interval and config_radtsw_interval options
The number of MPI ranks assigned to run the RRTMG schemes on CPUs
The number of MPI ranks assigned to run the non-radiation physics and dynamics on GPUs

See Running GPU-enabled MPAS-A for additional detail regarding The specification of the number of MPI ranks that will run on CPUs and on GPUs.