MM5 Version 2 Timing Results
The MM5 Version 2 timing results are obtained for the following
model configuration:
Benchmark case: 1979 SESAME squall line simulation
Domain size: coarse mesh: 90 km, 25x28x23, 270 second time step
fine mesh: 30 km, 34x37x23, 90 second time step
NESTI = 8, NESTJ = 9
Starting time: 1200 UTC April 10, 1979
Forecast length: 720 min (160 coarse domain time steps,
480 fine mesh time steps)
Memory required: 16 Mb
Number of CPU: 1
Model options: Non-hydrostatic dynamics (NHYDRO=1)
Grell cumulus scheme (ICUPA=3)
Dudhia microphysics (IMPHYS=4)
Dudhia atmospheric radiation (FRAD=2)
Blackadar PBL (IBLTYP=2,ISOIL=0)
IVTADV=IVQADV=0 (in mm5.deck)
Download the MM5 program tar file from the
ftp site. The benchmark input data for this model configuration
is available from NCAR's ftp
site, and the configure.user and mmlif files for
the benchmark case is also available from the same site:
benchmark_config.tar.gz.
Machine
|
CPU Time (hh:mm:ss)
|
Megaflops (floating ops/CPU sec)
|
Cray YMP
|
0:17:56.07
|
86.25
|
Cray J90
|
0:34:17.50
|
45.11
|
SGI 2200 400 MHz (*)(7.3 Compiler + IPA)
|
0:04:48.76
|
321.41
|
SGI Origin2000 Pre-MR 300 MHz (*)(7.3-Beta Compiler + IPA)
|
0:06:42.26
|
230.72
|
SGI Origin2000 250 MHz/4MB cache (*)(7.3-Beta Compiler and
IPA)
|
0:08:27.59
|
182.84
|
SGI Origin2000 225 MHz/2MB cache (*)(7.3-Beta Compiler and
IPA)
|
0:09:43.66
|
159.01
|
SGI Origin 2000 195MHz/4 Mb cache (*) (use 7.1 compiler and
Interprocedural Analysis option)
|
0:12:37.09
|
122.59
|
SGI Origin 200 180MHz/1 Mb cache (*) (use 7.2 compiler)
|
0:14:33.16
|
106.29
|
SGI Octane 195MHz/1 Mb cache (*)
|
0:13:50.16
|
111.80
|
SGI O2 R10000 (*)
|
0:22:30.16
|
68.74
|
SGI O2 R5000 (*)
|
0:39:06.44
|
39.55
|
SGI R10000 195 MHz/2 Mb cache (*)
|
0:13:41.73
|
115.62
|
SGI R8000
|
0:29:04.26
|
51.44
|
SGI R5000 (compiled with O2 on 6.2 OS/Fortran compiler 7.0)
|
1:32:22.62
|
16.74
|
SGI R4400
|
3:05:40.76
|
08.33
|
SGI R4000
|
4:14:31.96
|
07.95
|
Compaq XP1000 EV6 @667 MHz
|
0:03:24.9
|
452.95
|
DEC GS140 @525 MHz (+)
|
0:04:51.8
|
318.06
|
DEC DS20 @500 MHz (+)
|
0:04:52.7
|
317.09
|
DEC XP1000 EV6 Workstation @500 MHz/4 Mb cache (compiled
with f77 5.1) (c)
|
0:05:25.0
|
285.57
|
DEC Alpha DPW Personal Workstation @767 MHz/2 Mb cache (compiled
with f77) (++)
|
0:06:31.7
|
236.94
|
DEC AlphaServer 8400 5/625 @612.5 MHz/4 Mb cache (compiled
with kf77) (+)
|
0:07:56.5
|
194.78
|
DEC AlphaStation 500/500MHz
|
0:11:19.01
|
136.68
|
DEC AlphaServer 8400/440MHz/4 Mb cache (+)
|
0:12:17.2
|
125.90
|
DEC AlphaServer 4100/400MHz/4 Mb cache (+)
|
0:13:16.5
|
116.52
|
DEC AlphaServer 4100/300MHz
|
0:18:30.55
|
83.53
|
DEC ALPHA 500/xxxMHz
|
0:33:16.50
|
48.01
|
DEC ALPHA 2100
|
0:54:33.40
|
30.32
|
SUN ULTRA 10 (333 MHz/2Mb cache) (a)
|
0:16:20.05
|
94.70
|
SUN Enterprise 4500 (300 MHz/4Mb cache) (b)
|
0:15:27.00
|
100.11
|
SUN ULTRA 10 (300 MHz/.5Mb cache) (b)
|
0:18:53.00
|
81.92
|
SUN SPARC ULTRA 1 ($)
|
1:04:20.6
|
24.04
|
SUN Fujitsu HALstation 300
|
1:30:07.09
|
17.17
|
SUN SPARC 20
|
2:29:37.30
|
10.34
|
IBM AIX 200 MHz Power 3
|
0:10:59.3
|
144.77
|
IBM AIX
|
2:58:59.87
|
08.64
|
HP SPP-UX
|
23:12.90
|
66.63
|
HP 735
|
1:13:41.95
|
20.95
|
Intel Pentium II @650MHz, (compiled with pgf77 for Linux)
|
0:10:48.59
|
143.10
|
Intel Pentium II @300MHz, 512KB cache (compiled with pgf77
for Linux)
|
0:23:44.39
|
65.15
|
Intel Pentium II @266MHz, 512KB cache (compiled with pgf77
1.6-4 for Linux) (#)
|
0:26:20.61
|
58.72
|
Intel Pentium Pro @200MHz, 256KB cache (compiled with pgf77
1.6-4 for Linux) (#)
|
0:36:38.75
|
42.21
|
Gateway Pentium II @ 400 MHz, 512KB cache (compiled with
pgf77 on Linux) (##)
|
0:15:33.98
|
99.37
|
DEC Alpha PC (NT 4.0 Workstation) @ 533 MHz, 2 MB cache (compiled
with Visual Digital Fortran 5.0, with MKS Toolkit) ($$)
|
0:11:34.52
|
133.63
|
*
+
++
$
#
##
$$
a
b
c
|
numbers provided by Wesley Jones of SGI.
numbers provided by Dick Foster of DEC.
number provided by Don Dossa of DEC.
numbers provided by Lei Shi of SeaSpace.
number provided by George Lai of NASA.
number provided by David Bright of NWS, Tucson.
number provided by Mariusz Pagowski of University of Toronto.
number provided by Bob Hart of Penn State University.
number provided by Harry Edmon of University of Washington.
number provided by Greg Chesney of iMSC Corporation.
|
More on timing ..
MM5 tests performed on architectures available at NCAR for the
SESAME case and the World
Series Rainout case are reported. The SESAME case is fairly
small and shows the relative performance of the various ev56 machines,
while the rainout case is 70X larger and shows comparisons to other
machines. Also of note are the compiler times reported in the World
Series Rainout case.
For more information on MM5 Version 2 release, please read the
Release Notes from MM5 Version 2 page.
The benchmark timing for MM5 V3 is about 4.8% faster than that
of V2's.
For benchmarks of MPP MM5, please see
http://www2.mmm.ucar.edu/mm5/mpp/cowbench/.
|