Troubleshooting Exercise¶
This exercise is intentionally designed to include common errors. Your task is to identify and correct them. This practice should prove valuable if you encounter similar issues in the future.
Note
Try to solve the problems yourself. If stuck, click “Solution” links to correct.
Do not edit namelists files initially. Simply run each program to reveal errors and work through them one at a time. After correcting each issue, re-run the executable. Continue correcting issues until the program runs without errors.
Move to the appropriate directory:
cd /glade/derecho/scratch/$USER/practice_exercises/troubleshooting_exercise
A listing (ls
) in this directory shows a wrf and a wps directory. Move to the wps directory:
cd wps
Run Geogrid¶
Run geogrid:
./geogrid.exe
Error 1¶
ERROR: For nest 2, (e_we-s_we+1) must be one greater than an integer multiple of the parent_grid_ratio of 3.
Hint: This is related to domain 2’s dimensions. Solution
Error 2¶
ERROR: Could not open /mmm/users/wrfhelp/WPS_GEOG/topo_gmted2010_30s/index
Hint: Where are the static data located? Solution
All should be fixed! - Run geogrid one more time for a successful execution, and then move on to ungrib.
Run Ungrib¶
The case dates for this exercise are 2021-03-14_00:00:00 to 2021-03-14_12:00:00.
Run ungrib:
./ungrib.exe
Error 1¶
ERROR: edition_num: unable to open GRIBFILE.AAA
Hint: How does ungrib know which meteorological data to use? Solution
Error 2¶
***** ERROR in Subroutine PARSE_TABLE:
Problem opening file Vtable.
File ''Vtable'' does not exist.
ERROR: ***** Stopping in Subroutine PARSE_TABLE
Hint: Is there a Vtable available for ungrib to use? Solution
Error 3¶
ERROR: Data not found: 2007-09-15_00:00:00.0000
Hint: Look in the directory /glade/campaign/mmm/wmr/wrf_tutorial/input_data/co_blizzard to see at which times the data are available. Solution
Now run ungrib successfully.
Run Metgrid¶
Run metgrid:
./metgrid.exe
Error 1¶
WARNING: Couldn't open file FILE:2021-03-14_00 for input.
ERROR: The mandatory field TT was not found in any input data.
Hint: What is the input to metgrid? And what was the output from ungrib? Solution
Error 2¶
Processing domain 2 of 2
GETH_IDTS: Month of NDATE = 0
GETH_IDTS: Month of ODATE = 0
GETH_IDTS: Day of NDATE = 0
GETH_IDTS: Day of ODATE = 0
ERROR: Screwy NDATE: 0000-00-00_00:00:00
Hint: What were domain 2’s dates set to? Solution
Success - let’s move on to the WRF model
Run real.exe¶
Move to the appropriate directory:
cd /glade/derecho/scratch/$USER/practice_exercises/troubleshooting_exercise/wrf/test/em_real
Run real.exe by issuing qsub runreal.sh
. Errors will appear in the rsl.error.0000 files after running. Use this command to take a look at the end of the file:
tail rsl.error.0000
(to see more, issue cat rsl.error.0000
or open the full file to view). Fix the errors one at a time until real runs successfully:
Error 1¶
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: LINE: 406
error opening met_em.d01.2000-01-24_12:00:00.nc for input; bad date in namelist or file not in directory
-------------------------------------------
Error 2¶
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: LINE: 406
error opening met_em.d01.2021-03-14_00:00:00.nc for input; bad date in namelist or file not in directory
-------------------------------------------
Oops - thought we just fixed this one? But look carefully at the file it is complaining about. Hint : met_em.d01.2000-01-24_12:00:00.nc
Solution
Error 3¶
Note
You will probably need to open the rsl.out.0000 file to view the entire error message. Using the “tail” command will not show the full message.
----------------- ERROR -------------------
namelist : num_metgrid_soil_levels = 2
input files : NUM_METGRID_SOIL_LEVELS = 4 (from met_em files).
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and global attribute NUM_METGRID_SOIL_LEVELS
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist e_we = 74
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file WEST-EAST_GRID_DIMENSION = 75
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist e_sn = 61
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file SOUTH-NORTH_GRID_DIMENSION = 70
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist num_metgrid_levels = 27
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file BOTTOM-TOP_GRID_DIMENSION = 34
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
NOTE: 4 namelist vs input data inconsistencies found.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: LINE: 1301
NOTE: Please check and reset these options
-------------------------------------------
Hint: How many soil levels are coming from the met_em files? Also check the dimension sizes set in the WPS namelist and met_em files. Solution
Run wrf.exe¶
Run wrf.exe:
qsub runwrf.sh
Again the errors will appear in the rsl.error.0000 file. Fix the errors one at a time until wrf runs successfully:
Note
You will probably need to open the rsl.out.0000 file to view the entire error message. Using the “tail” command will not show the full message.
Error 1¶
For domain 1 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
e_we = 75, nproc_x = 10, with cell width in x-direction = 7
e_sn = 70, nproc_y = 10, with cell width in y-direction = 7
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
For domain 2 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
e_we = 94, nproc_x = 10, with cell width in x-direction = 9
e_sn = 115, nproc_y = 10, with cell width in y-direction = 11
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 2815
NOTE: 1 namelist settings are wrong. Please check and reset these options
-------------------------------------------
Error 2¶
D01: Time step = 200.0000 (s)
D01: Grid Distance = 30.00000 (km)
D01: Grid Distance Ratio dt/dx = 6.666667 (s/km)
D01: Ratio Including Maximum Map Factor = 6.696797 (s/km)
D01: NML defined reasonable_time_step_ratio = 6.000000
The time step is probably too large for this grid distance, reduce it.
If you are sure of your settings, set reasonable_time_step_ratio in namelist.input > 6.696797
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 341
--- ERROR: Time step too large
-------------------------------------------
See Choosing an Appropriate Number of Processors for additional details.
Wrf should now run successfully. Though you can, it’s not necessary to wait for wrf to complete before moving to another exercise.
Return to the Practice Exercise home to page to run another exercise.