Running very large problems on very large numbers of processors
The distributed-memory parallel version of MM5 is intended to be used for
large problems running on large numbers of processors. Certain model configuration
options are specified at compile time. This page describes how to modify
these limits.
Hard limits
In the currently distributed version of MM5 there are several hard-coded
upper limits on domain size and on the number of processors that can be
used. These are:
Maximum number of processors = 128 (version 3.3), 256 (version 3.4; some places it is 128)
Maximum number of model domains = 10 (version 3.3), 5 (version 3.4), 6 (version 3.7)
Maximum value for MJX = 512 (all versions)
To modify the maximum number of processors:
-
Edit MPP/RSL/RSL/makefile and change definition of the make macro MAX_PROC
-
Edit MPP/RSL/parallel_src/lb_alg.c and change definition of the cpp macro
MAXPROC_MAKE
-
Edit MPP/RSL/parallel_src/kill_model.c and change definition of the cpp
macro MAXPROC_MAKE
-
Type 'make uninstall' in top level directory
-
Type 'make mpp' in top level directory
To modify the maximum number of domains, follow the steps shown above but
modify MAX_DOMAINS in the RSL makefile and MAXDOM_MAKE in the lb_alg.c
and kill_model.c files.
To modify the maximum allowed value of MJX, edit the file MPP/RSL/LMexp.m4
(LMvpp.m4 on Fujitsu and NEC) and change the definition of rsl_JJX_x on
the first line of the file (you are definining the value of an M4 macro
in this case).
Memory scaling
The memory scaling factors are specified at compile-time by changing the
values of PROCMIN_EW and PROCMIN_NS in the configure.user file. For a detailed
discussion of these parameters, please see Helpdesk note 1999
04 25. Hint: compiling the model for the exact number of processors
has been found to run more efficiently than compiling for a
minimum number.
MPI Buffer Limits during Model Output
Very large problems may cause deadlocks doing model output if the buffers of the
underlying message passing layer (usually MPI) are filled. If this occurs, edit the
file MPP/RSL/RSL/makefile.arch and add -DRSL_SYNCIO to the CFLAGS option. This
will cause each process to wait for a short "go ahead" message from process 0 (zero) before sending
its output.
-Rotang, July 05, 2000
-Modified April 27, 2005, change max number of model domains to 6 for v3.7