IBM Benchmark Settings

This page contains the LSF batch settings used for the benchmarks submitted by Jim Edwards of IBM, 7/15/2009. Many of these make a difference for performance but may not be known by WRF users. The settings are being provided here to help make the most of bluefire.ucar.edu and perhaps other IBM Power6 systems.

The settings appear as part of the LSF batch scripts that were run for the benchmarks. Certain user or charging information has been removed. Otherwise, these scripts should "just work" on bluefire.

Some of the scripts are MPI-only (strictly distributed memory parallel), some are hybrid MPI/OpenMP (distributed memory tasks, each of which runs more than one OpenMP thread).

The scripts used for the various core (physical processor) counts are provided below. All of the scripts are set up to use Simultaneous Multithreading (SMT), which means that there are twice as many threads of execution as physical processes. That is, each node of bluefire.ucar.edu, which has 32 physical processor cores, is given 64 threads of execution to run. That can be 64 MPI tasks, or 32 2-way-threaded MPI tasks, 8 8-way threaded MPI tasks, etc. This is controlled in the LSF scripts using #BSUB -R span[ptile=n] and for hybrid jobs, the setting of OMP_NUM_THREADS, in the scripts. In some cases, noted below, variables in the WRF namelist.input file are also modified (numtiles, nproc_x, and nproc_y). Several of the scripts below also use a separate file of settings named wrfenv.

Scripts

128 cores (128 single threaded MPI tasks spanning 2 32-core nodes) uses namelist.input.256

256 cores (128 two-way threaded MPI tasks spanning 4 32-core nodes), uses namelist.input.suga and wrfenv

512 cores (256 four-way threaded MPI tasks spanning 8 32-core nodes), uses namelist.input.suga.4 and wrfenv

1024 cores (256 eight-way threaded MPI tasks spanning 16 32-core nodes), uses namelist.input.256a and wrfenv

----------------------------------------------

created July 15, 2009
michalak@ucar.edu