Quilt Fix for Very Large Datasets
Bug report Gerardo Cisneros (SGI) and James Abeles (IBM).
Fix provided by Gerardo Cisneros.
Description
Model hangs when writing very large restart files such as are generated
by the 2.5km CONUS benchmark if asynchronous I/O is enabled
(nio_tasks_per_group set greater than zero in namelist.input). The
output server tasks watch for a negative send size from the compute clients
to detect end of the
model run. Writing a very large dataset such as
the 2.5km CONUS restart causes a 32-bit signed integer rollover in these messages and the
servers shut down prematurely, since they have received a negative buffer size. The fix provided by Gerardo Cisneros promotes the
relevant variables to 64-bit integers.
To apply the fix
Note, this patch is intended for WRFV2.1.1 only.
Download the tar file: wrf211bigquilt.tar:
Untar using
tar xvf wrf211bigquilt.tar
in the WRFV2/frame directory. The files in the tar file:
-rw-r--r-- 22370/20 129418 Dec 29 13:48 2005 module_io_quilt.F
-rw-r--r-- 22370/20 25912 Dec 29 07:59 2005 module_quilt_outbuf_ops.F
-rw-r--r-- 22370/20 6819 Dec 29 13:26 2005 pack_utils.c
will overwrite the corresponding files in the frame directory.
Clean and recompile the code.
Additional notes
2005 12 31. Please be careful of this fix. It was developed on a little endian platform; additional testing on big-endian platforms indicates problems. Watch this space for additional information.
Please note, the issue this fix addresses only applies to very large datasets (such as CONUS 2.5km benchmark) *and* only if you are specifying quilt servers (nio_tasks_per_group > 0) in namelist.input. For computational-only benchmarking purposes (that is, not benchmarking the I/O) it would not be necessary to use async I/O.
Posted 30 December 2005, John Michalakes. Thanks, Gerardo Cisneros and James Abeles.