12 KM CONUS Benchmark

The domain is 12 KM horizontal resolution on a 425 by 300 grid with 35 vertical levels, with a time step of 72 seconds. The benchmark is restarted on October 25, 2001 00Z and run for 3 hours and then the time per timestep is averaged over all but the first time step, which is discarded. The average time per time step is divided by the operation count for an average time step, 30.1 billion floating point operations, to determine floating point rate. Return to the WRF V3 Benchmarks Page for additional information. Operation count determined on Cray X1E by Peter Johnsen, Cray Inc., November 2008.

System Descriptions

SGI (Various Best)

Results submitted by Gerardo Cisneros, SGI Corp., Summer 2008.

The graph above represents the best time submitted across numbers of nodes, options, and systems submitted. For system descriptions and greater detail, click here.

Endeavor Cluster (Intel)

Results submitted by Roman S. Dubtsov, Intel Corp. November, 2008

1. Name of system (product name, hostname, institution): Endeavor, Intel

2. Model version of WRF: 3.0.1

3. Operating system and version: RHEL 4U4

4. Compiler and version: Intel Fortran Compiler 10.1.015

5. Processor manufacturer, type, and speed: Intel Xeon E5462, 2.8 GHz 12MB L2 cache

6. Cores per socket and sockets per node: 4/2

7. Main memory per core: 16 GB FB DIMM

8. Interconnect type, product name, topology: DDR Infiniband, fat tree

9. Other information: Intel Xeon 5400 chipset, 1600 MT/s FSB

AMD(+NVIDIA) Cluster (qp.ncsa.uiuc.edu, NCSA)

Results submitted by John Michalakes, NCAR/U. Colorado. January, 2008

1. Name of system (product name, hostname, institution): Linux Cluster, qp.ncsa.uiuc.edu, NCSA at U. Illinois

2. Model version of WRF: 3.0

3. Operating system and version: Linux

4. Compiler and version: Intel Fortran Compiler, CUDA (1.0)

5. Processor manufacturer, type, and speed: AMD Opteron dual-core 2.4 GHz + NVIDIA Quadro 5600

6. Cores per socket and sockets per node: 2/2 (CPUs) + 4 GPUs per node (1 per CPU core)

7. Main memory per core: n.a.

8. Interconnect type, product name, topology: Infiniband

9. Other information:

a. One plot shows performance on Opteron cores only,

b. 2^nd plot is performance with WSM5 microphysics offloaded to GPUs.

c. Ran 4 MPI tasks per node (1 task per CPU-GPU pair)

IBM Power6 (bluefire.ucar.edu)

Results submitted by John Michalakes, NCAR. November, 2008

1. Name of system (product name, hostname, institution): IBM SP Power 6, bluefire.ucar.edu, NCAR

2. Model version of WRF: 3.0.1

3. Operating system and version: AIX

4. Compiler and version: IBM XLF Fortran

5. Processor manufacturer, type, and speed: IBM Power 6, 4.7 GHz, 4MB L2 (per core), 32 MB (per 2 cores)

6. Cores per socket and sockets per node: 2/16

7. Main memory per core: 4 or 2 GB per core

8. Interconnect type, product name, topology: Infiniband

9. Other information:

a. Additional system specs at: http://www.cisl.ucar.edu/computers/bluefire

b. Code compiled with –O3 optimization

c. Not dedicated time (there were other users on the system )

d. Ran with Simultaneous Multi-Threading (SMT, 64 MPI tasks per 32 core node)

Created Nov. 9, 2008. John Michalakes, NCAR. michalak@ucar.edu