Quilt Fix for Very Large Datasets

Bug report Gerardo Cisneros (SGI) and James Abeles (IBM).
Fix provided by Gerardo Cisneros.


Description

Model hangs when writing very large restart files such as are generated by the 2.5km CONUS benchmark if asynchronous I/O is enabled (nio_tasks_per_group set greater than zero in namelist.input). The output server tasks watch for a negative send size from the compute clients to detect end of the model run. Writing a very large dataset such as the 2.5km CONUS restart causes a 32-bit signed integer rollover in these messages and the servers shut down prematurely, since they have received a negative buffer size. The fix provided by Gerardo Cisneros promotes the relevant variables to 64-bit integers.

To apply the fix

Note, this patch is intended for WRFV2.1.1 only.

Download the tar file: wrf211bigquilt.tar:

Untar using

  tar xvf wrf211bigquilt.tar
in the WRFV2/frame directory. The files in the tar file:
  -rw-r--r-- 22370/20  129418 Dec 29 13:48 2005 module_io_quilt.F
  -rw-r--r-- 22370/20   25912 Dec 29 07:59 2005 module_quilt_outbuf_ops.F
  -rw-r--r-- 22370/20    6819 Dec 29 13:26 2005 pack_utils.c
will overwrite the corresponding files in the frame directory.

Clean and recompile the code.

Additional notes

2005 12 31. Please be careful of this fix. It was developed on a little endian platform; additional testing on big-endian platforms indicates problems. Watch this space for additional information.

Please note, the issue this fix addresses only applies to very large datasets (such as CONUS 2.5km benchmark) *and* only if you are specifying quilt servers (nio_tasks_per_group > 0) in namelist.input. For computational-only benchmarking purposes (that is, not benchmarking the I/O) it would not be necessary to use async I/O.


Posted 30 December 2005, John Michalakes. Thanks, Gerardo Cisneros and James Abeles.