A serious error was detected and reported by MM5 MPP User Yiwen Xu at Argonne National Laboratory. To my knowledge, it has only manifested itself on the IBM SP2 at ANL in the form of a segmentation violation and model abort during initialization. However, the nature of the bug is such that it may manifest itself unpredictably on other platforms and it requires immediate correction.
This page contains additional details about the problem and a patch file that may be downloaded and installed to correct the problem. Note that the error also appears in the previous release, MM5v3.1, but the patch file here may only be applied to MM5v3.2 (subsequent releases of the model will contain the fix without the need for a patch). If you are using MM5v3.1 and wish to avoid upgrading to MM5v3.2 at this time, please see the section below providing information about how to install the patch manually.
Update, Oct. 2, 1999:
Please note: the MM5 V3.2 distribution tar file
MM5.TAR.gz file on the mesouser ftp download site has
been updated with this patch. Instead of downloading and installing
the patch, you can get the fixed version of the code by downloading and
installing this new distribution file.
http://www2.mmm.ucar.edu/mm5/mpp/mpp_patch_991001.tar.gz
You may, instead, download the file from:
ftp://ftp.mcs.anl.gov/chammp/mpp_patch_991001.tar.gz
(Note you cannot list the chammp directory on this server).
In your top-level MM5v3.2 source directory (the directory which contains the file configure.user, the Makefile, and the subdirectories containing model source code), execute the following commands:
gunzip -c mpp_patch_991001.tar.gz | tar xvf - make mpclean make mpp
The files affected by this patch are:
domain/boundary/bdyin.F domain/initial/init.F domain/initial/param.F domain/io/rdinit.F domain/io/rdter.F fdda/grid/in4dgd.F
The error occurs only in the MPP version of the code in calls the the routine DM_BCAST_REALS. In cases where the second argument uses the integer parameter NUMINT, this should be changed to NUMREAL. For example, in domain/io/init.F, the incorrect code is:
CALL DM_BCAST_REALS(JBHR,NUMINT*NUMPROGS) INIT.657
This should read, instead:
CALL DM_BCAST_REALS(JBHR,NUMREAL*NUMPROGS) INIT.657
The parameter NUMINT is defined to be considerably larger than NUMREAL so that the incorrect use of NUMINT will result in memory being overwritten at the end of the JBHR buffer.
To apply the fix manually to the code, edit each of the files listed above and modify instances of calls to DM_BCAST_REALS that use NUMINT in the second argument to use NUMREAL instead. DM_BCAST_REALS may appear as either upper or lower case in the code. Note that there are instances of calls to DM_BCAST_REALS that do not use expressions involving NUMINT for their second argument. These should be left as-is.
---
- Rotang
October 1, 1999