Solutions for the Troubleshooting Exercise¶

Geogrid Error 1¶

In namelist.wps, see the following:

parent_grid_ratio =   1,   3,
e_we              =  75, 93,
e_sn              =  70,  115,

The parent_grid_ratio is a 3:1 ratio, while e_we and e_sn for domain 02 are 93 and 115.

To test if domain 02 has the correct dimensions, the following equation must result in a whole number: \((dimension - 1) \div parent\_grid\_ratio\)

e_we: \((93-1) \div 3 = 30.667\) (not a whole number)
e_sn: \((115-1) \div 3 = 38\) (a whole number)

Solution: Set e_we=94 for domain 02

Geogrid Error 2¶

On these machines the static data for geogrid resides in /glade/work/wrfhelp/WPS_GEOG.

Solution: Set the following to make sure the static data is able to be read by geogrid:

geog_data_path = '/glade/work/wrfhelp/WPS_GEOG'

Ungrib Error 1¶

For ungrib to process data, you need to link in the files you want to use. In our case the data are located in /glade/campaign/mmm/wmr/wrf_tutorial/input_data/co_blizzard.

Solution : Link in the files using the following command.

./link_grib.csh /glade/campaign/mmm/wmr/wrf_tutorial/input_data/co_blizzard/fnl_2021031

Ungrib Error 2¶

The correct Vtable must be linked in so that ungrib will know how/which data to extract from the files you are processing. We are using GFS-FNL data.

Solution : Link in the GFS Vtable.

ln -sf ungrib/Variable_Tables/Vtable.GFS Vtable

Ungrib Error 3¶

The dates for the GFS data we are using are : 2021-03-14_00 to 2021-03-14_12_00, but in the namelist we have start and end dates 2007-09-15_00:00:00 to 2007-09-17_00:00:00.

Solution: change the namelist start and end dates to 2021-03-14_00:00:00 and 2021-03-14_12:00:00

Metgrid Error 1¶

When running ungrib we created files with output names starting with FNL

&ungrib
prefix = 'FNL',

But we used FILE as the starting name when running metgrid:

fg_name = 'FILE'

Solution : make sure the metgrid name matches the suffix name used during ungrib. Set:

fg_name = 'FNL'

Note

Metgrid gave you a warning about the name, but an error because it could not find data. The code was deliberately writted this way to add flexibility so that you could use multiple sources of data.

Metgrid Error 2¶

Metgrid requires a date entry for all domains, but we only have information for domain 1.

Solution : Add date information for domain 2. Note that domain 2 only requires data at the initial time, so the following setting will be sufficient

start_date = '2021-03-14_00:00:00', '2021-03-14_00:00:00',
end_date   = '2021-03-14_12:00:00', '2021-03-14_00:00:00',

Note

Why did we not need both times for ungrid? Ungrib is not linked to a specific domain, it only reformats data into a common format, which later, during metgrid, is placed on the WRF domains, therefore ungrib only needs a common start and end date.

Real Error 1¶

Our case data is not 2000-01-24_12, but if real is looking for this date, then there must be an error associated with dates in the namelist.

Solution : fix the start and end dates for both domains

start_year      = 2021, 2021,
start_month     = 03,   03,
start_day       = 14,   14,
start_hour      = 00,   00,
end_year        = 2021, 2021,
end_month       = 03,   03,
end_day         = 14,   14,
end_hour        = 12,   12,

Real Error 2¶

real.exe uses met_em* data as input, so for real to run successfully, we need to link the met_em* data files from the wps directory to our current working directory:

Solution:

ln -sf ../../../wps/met_em.d0* .

Real Error 3¶

We need to edit the namelist.input file to specify the exact number of soil levels the input data have. To find the number of input soil levels, issue ncdump -h met_em.d01.2021-03-14_00:00:00.nc | more and look for the variables num_st_layers and num_sm_layers. You will see they are set to 4.

Solution : Correct this information in the namelist.

num_metgrid_soil_levels  = 4

Additionally, look back at the WPS namelist and note that we set the number of grid points for domain 1 to 75x70, but the WRF namelist is set to 74x61, so change the settings for e_we and e_sn in the namelist. Also note that num_metgrid_levels in namelist.wps is set to 34, so make sure to set that correctly.

e_we = 75
e_sn = 70
num_metgrid_levels = 34

While you’re at it, go ahead and make sure these setting for domain 02 are correct, or you will will get another error when you try to run again.

e_we                = 75,    94,
e_sn                = 70,   115,

Note

The parameter values for input files is always the correct value to use. The namelist values must be set to match the input file values.

WRF Error 1¶

The maximum number of processors allowed is determined by the number of grid spaces in the domain (i.e., e_we and e_sn). During simulation, the domain is divided into tiles, where the number of tiles is determined by the total number of processors you use (one tile per processor). If there are too few grid spaces within each tile (fewer than 10x10), you will get this error message.

Note

This message was implemented in more recent versions of the model. In older versions, the model would likely just crash and it would’ve been up to you to determine that the cause was related to the number of processors)

Decomposition is based on the two closest factors of the value of the total number of processors. When choosing the appropriate number of processors, the smallest of the domains should be considered. In this case, that is \(75 \times 70\). Therefore we need to use a number of processors where:

in the x-direction \((e\_we) \div (x-tiles) \geq 10\)
in the y-direction \((e\_sn) \div (y-tiles) \geq 10\)

For this simulation, modify the runwrf.sh script so that it reads:

#PBS -l select=1:ncpus=49:mpiprocs=49

The closest factors of 49 are 7 and 7, and (\(75 \div 7 > 10\)) and (\(70 \div 7 = 10\)), so this value would be okay. It’s also nice that it’s a squared number, meaning the decomposition will balance nicely.

WRF Error 2¶

The time_step namelist parameter should be no larger than \(6 \times DX\).

Solution:

Set time_step = 180, which is \(6 \times 30km\).

Note

The model now stops and provides this error message when this is the case, but in earlier versions of WRF, there was no stop. If you later use an earlier version of code, and the model stops, try searching for “cfl” in all of the rsl* files. If you find cfl errors, it likely means your time_step needs to be reduced. This error can sometimes occur even when your time_step is within the boundaries of the \(6 \times DX\) rule. If it does, see Segmentation Faults - Helpful Information for suggestions.