Troubleshooting Exercise


This exercise is intentionally designed to include common errors. Your task is to identify and correct them. This practice should prove valuable if you encounter similar issues in the future.


Note

  • Try to solve the problems yourself. If stuck, click “Solution” links to correct.

  • Do not edit namelists files initially. Simply run each program to reveal errors and work through them one at a time. After correcting each issue, re-run the executable. Continue correcting issues until the program runs without errors.



Move to the appropriate directory:

cd /glade/derecho/scratch/$USER/practice_exercises/troubleshooting_exercise

A listing (ls) in this directory shows a wrf and a wps directory. Move to the wps directory:

cd wps




Run Geogrid

Run geogrid:

./geogrid.exe



Error 1

ERROR: For nest 2, (e_we-s_we+1) must be one greater than an integer multiple of the parent_grid_ratio of 3.

Hint: This is related to domain 2’s dimensions. Solution




Error 2

ERROR: Could not open /mmm/users/wrfhelp/WPS_GEOG/topo_gmted2010_30s/index

Hint: Where are the static data located? Solution



All should be fixed! - Run geogrid one more time for a successful execution, and then move on to ungrib.





Run Ungrib

The case dates for this exercise are 2021-03-14_00:00:00 to 2021-03-14_12:00:00.


Run ungrib:

./ungrib.exe



Error 1

ERROR: edition_num: unable to open GRIBFILE.AAA

Hint: How does ungrib know which meteorological data to use? Solution




Error 2

***** ERROR in Subroutine PARSE_TABLE:
Problem opening file Vtable.
File ''Vtable'' does not exist.
ERROR: ***** Stopping in Subroutine PARSE_TABLE

Hint: Is there a Vtable available for ungrib to use? Solution




Error 3

ERROR: Data not found: 2007-09-15_00:00:00.0000

Hint: Look in the directory /glade/campaign/mmm/wmr/wrf_tutorial/input_data/co_blizzard to see at which times the data are available. Solution



Now run ungrib successfully.



Run Metgrid

Run metgrid:

./metgrid.exe



Error 1

WARNING: Couldn't open file FILE:2021-03-14_00 for input.
ERROR: The mandatory field TT was not found in any input data.

Hint: What is the input to metgrid? And what was the output from ungrib? Solution




Error 2

Processing domain 2 of 2
GETH_IDTS: Month of NDATE = 0
GETH_IDTS: Month of ODATE = 0
GETH_IDTS: Day of NDATE = 0
GETH_IDTS: Day of ODATE = 0
ERROR: Screwy NDATE: 0000-00-00_00:00:00

Hint: What were domain 2’s dates set to? Solution



Success - let’s move on to the WRF model





Run real.exe

Move to the appropriate directory:

cd /glade/derecho/scratch/$USER/practice_exercises/troubleshooting_exercise/wrf/test/em_real

Run real.exe by issuing qsub runreal.sh. Errors will appear in the rsl.error.0000 files after running. Use this command to take a look at the end of the file:

tail rsl.error.0000

(to see more, issue cat rsl.error.0000 or open the full file to view). Fix the errors one at a time until real runs successfully:




Error 1

-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: LINE: 406
error opening met_em.d01.2000-01-24_12:00:00.nc for input; bad date in namelist or file not in directory
-------------------------------------------

Solution




Error 2

-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: LINE: 406
error opening met_em.d01.2021-03-14_00:00:00.nc for input; bad date in namelist or file not in directory
-------------------------------------------

Oops - thought we just fixed this one? But look carefully at the file it is complaining about. Hint : met_em.d01.2000-01-24_12:00:00.nc
Solution




Error 3

Note

You will probably need to open the rsl.out.0000 file to view the entire error message. Using the “tail” command will not show the full message.


----------------- ERROR -------------------
namelist : num_metgrid_soil_levels = 2
input files : NUM_METGRID_SOIL_LEVELS = 4 (from met_em files).
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and global attribute NUM_METGRID_SOIL_LEVELS
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist e_we = 74
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file WEST-EAST_GRID_DIMENSION = 75
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist e_sn = 61
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file SOUTH-NORTH_GRID_DIMENSION = 70
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist num_metgrid_levels = 27
d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file BOTTOM-TOP_GRID_DIMENSION = 34
d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
NOTE: 4 namelist vs input data inconsistencies found.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: LINE: 1301
NOTE: Please check and reset these options
-------------------------------------------

Hint: How many soil levels are coming from the met_em files? Also check the dimension sizes set in the WPS namelist and met_em files. Solution





Run wrf.exe

Run wrf.exe:

qsub runwrf.sh


Again the errors will appear in the rsl.error.0000 file. Fix the errors one at a time until wrf runs successfully:


Note

You will probably need to open the rsl.out.0000 file to view the entire error message. Using the “tail” command will not show the full message.




Error 1

For domain            1 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
e_we =    75, nproc_x =   10, with cell width in x-direction =    7
e_sn =    70, nproc_y =   10, with cell width in y-direction =    7
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
For domain            2 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
e_we =    94, nproc_x =   10, with cell width in x-direction =    9
e_sn =   115, nproc_y =   10, with cell width in y-direction =   11
--- ERROR: Reduce the MPI rank count, or redistribute the tasks.
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  <stdin>  LINE:    2815
NOTE:       1 namelist settings are wrong. Please check and reset these options
-------------------------------------------

Solution




Error 2

D01: Time step                              =    200.0000      (s)
D01: Grid Distance                          =    30.00000      (km)
D01: Grid Distance Ratio dt/dx              =    6.666667      (s/km)
D01: Ratio Including Maximum Map Factor     =    6.696797      (s/km)
D01: NML defined reasonable_time_step_ratio =    6.000000
The time step is probably too large for this grid distance, reduce it.
If you are sure of your settings, set reasonable_time_step_ratio in namelist.input >    6.696797
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  <stdin>  LINE:     341
--- ERROR: Time step too large
-------------------------------------------

Solution


See Choosing an Appropriate Number of Processors for additional details.



Wrf should now run successfully. Though you can, it’s not necessary to wait for wrf to complete before moving to another exercise.






Return to the Practice Exercise home to page to run another exercise.