Home Agenda Lectures Virtual Mtg. Etiquette



  Quick Links

Troubleshooting Exercise

We have purposefully set this case up to contain some common errors. It is your task to find and correct them. This should help if, in the future, you run into any of these same errors.


Notes

  • To make this exercise work correctly, you must start with the specific code built for this case. The code is in /glade/derecho/scratch/$USER/practice_exercises/troubleshooting_exercise, and inside that directory are wps and wrf directories, which is where you will work.

  • Try to solve the problems yourself, but if you get stuck, there are hints to help you out. Just hover your mouse over them to see a pop-up box appear.


  • **Click on the links to jump to the desired section.**

    1. Run Geogrid
    2. Run Ungrib
    3. Run Metgrid
    4. Run Real
    4. Run WRF



    Run Geogrid

    1. Make sure you are in the correct wps directory.
      cd /glade/derecho/scratch/$USER/practice_exercises/troubleshooting_exercise/wps
      Do not make any edits to the namelist.wps file. Simply start running the code.
       
       
    2. Run geogrid.

      Since multiple errors are built in, let's work through them one by one.
      After you have a solution for the problem - run geogrid.exe again to determine if there are more errors.
       
      1. ERROR: For nest 2, (e_we-s_we+1) must be one greater than an integer multiple of the parent_grid_ratio of 3.
        Hint: This has to do with domain 2's dimensions.
        See if you can correct this on your own, or find the solution here (hover over it).
         
      2. ERROR: Could not open /mmm/users/wrfhelp/WPS_GEOG/topo_gmted2010_30s/index
        Hint: Where are your static data located?
        Solution.


        All is fixed! - let's move on to ungrib.


    Go back to top



    Run Ungrib

    For this exercise the case dates are 2021-03-14_00:00:00 to 2021-03-14_12:00:00.

    Let's run ungrib:


    You should see the following problems. See if you can figure them out before viewing the solution:

    1. ERROR: edition_num: unable to open GRIBFILE.AAA
      Hint: How does ungrib know which data to use?
      Solution


    2. ***** ERROR in Subroutine PARSE_TABLE:
      Problem opening file Vtable.
      File ''Vtable'' does not exist.
      ERROR: ***** Stopping in Subroutine PARSE_TABLE

      Hint: Is there a Vtable available for ungrib to use?
      Solution


    3. ERROR: Data not found: 2007-09-15_00:00:00.0000
      Hint: Look in the directory /glade/campaign/mmm/wmr/wrf_tutorial/input_data/co_blizzard to see at which times the data are available.
      Solution

       


    Go back to top



    Run Metgrid

    Now let's us run metgrid:

    You should see the following problems. Fix them one at a time untill metgrid runs successfully:

    1. WARNING: Couldn't open file FILE:2021-03-14_00 for input.
      ERROR: The mandatory field TT was not found in any input data.

      Hint: What is the input to metgrid? And what was the output from ungrib?
      Solution
       
    2. Processing domain 2 of 2
      GETH_IDTS: Month of NDATE = 0
      GETH_IDTS: Month of ODATE = 0
      GETH_IDTS: Day of NDATE = 0
      GETH_IDTS: Day of ODATE = 0
      ERROR: Screwy NDATE: 0000-00-00_00:00:00

      Hint: What were domain 2's dates set to?
      Solution

    Success - let's move on to the WRF model
     



    Go back to top



    Run real.exe

    1. Make sure you are in the /glade/derecho/scratch/$USER/practice_exercises/troubleshooting_exercise/wrf/test/em_real directory.

    2. Simply start by running real.exe by issuing qsub runreal.sh.

    3. You should see the following problems in your "rsl.error.0000" file. Use this command to take a look at the end of the file:
      tail rsl.error.0000
      Fix the errors one at a time until real runs successfully:




      1. -------------- FATAL CALLED ---------------
        FATAL CALLED FROM FILE: LINE: 406
        error opening met_em.d01.2000-01-24_12:00:00.nc for input; bad date in namelist or file not in directory
        -------------------------------------------

        Hint: 'met_em.d01.2000-01-24_12:00:00.nc'?
        Solution
         
      2. -------------- FATAL CALLED ---------------
        FATAL CALLED FROM FILE: LINE: 406
        error opening met_em.d01.2021-03-14_00:00:00.nc for input; bad date in namelist or file not in directory
        -------------------------------------------

        Hint: Oops - thought we just fixed this one? But look carefully at the file it is complaining about.
        Solution
         
      3. NOTE: You will probably need to open the rsl.out.0000 file to view the entire error message. Using the "tail" command will not show the full message.

        ----------------- ERROR -------------------
        namelist : num_metgrid_soil_levels = 2
        input files : NUM_METGRID_SOIL_LEVELS = 4 (from met_em files).
        d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and global attribute NUM_METGRID_SOIL_LEVELS
        d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist e_we = 74
        d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file WEST-EAST_GRID_DIMENSION = 75
        d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
        d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist e_sn = 61
        d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file SOUTH-NORTH_GRID_DIMENSION = 70
        d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
        d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: namelist num_metgrid_levels = 27
        d01 2021-03-14_00:00:00 input_wrf.F: SIZE MISMATCH: input file BOTTOM-TOP_GRID_DIMENSION = 34
        d01 2021-03-14_00:00:00 ---- ERROR: Mismatch between namelist and input file dimensions
        NOTE: 4 namelist vs input data inconsistencies found.
        -------------- FATAL CALLED ---------------
        FATAL CALLED FROM FILE: LINE: 1301
        NOTE: Please check and reset these options
        -------------------------------------------

        Hint: How many soil levels are coming from the met_em files? Also check the dimension sizes set in the WPS namelist and met_em files.
        Solution






      4. Go back to top



        Run wrf.exe

        1. Simply run wrf.exe.
          qsub runwrf.sh

        2. You should see the following problems in your "rsl.error.0000" file. Use this command to take a look at the end of the file:
          tail rsl.error.0000
          Fix the errors one at a time until wrf runs successfully:

          1. NOTE: You will probably need to open the rsl.out.0000 file to view the entire error message. Using the "tail" command will not show the full message.

            Max map factor in domain 1 = 1.00. Scale the dt in the model accordingly.
            D01: Time step = 200.0000 (s)
            D01: Grid Distance = 30.00000 (km)
            D01: Grid Distance Ratio dt/dx = 6.666667 (s/km)
            D01: Ratio Including Maximum Map Factor = 6.696797 (s/km)
            D01: NML defined reasonable_time_step_ratio = 6.000000
            The time step is probably too large for this grid distance, reduce it.
            If you are sure of your settings, set reasonable_time_step_ratio in namelist.input > 6.696797
            -------------- FATAL CALLED ---------------
            FATAL CALLED FROM FILE: LINE: 341
            --- ERROR: Time step too large
            -------------------------------------------


            Solution


          2. For domain 1 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
            Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
            e_we = 75, nproc_x = 8, with cell width in x-direction = 9
            e_sn = 70, nproc_y = 16, with cell width in y-direction = 4
            --- ERROR: Reduce the MPI rank count, or redistribute the tasks.
            For domain 2 , the domain size is too small for this many processors, or the decomposition aspect ratio is poor.
            Minimum decomposed computational patch size, either x-dir or y-dir, is 10 grid cells.
            e_we = 94, nproc_x = 8, with cell width in x-direction = 11
            e_sn = 115, nproc_y = 16, with cell width in y-direction = 7
            --- ERROR: Reduce the MPI rank count, or redistribute the tasks.
            -------------- FATAL CALLED ---------------
            FATAL CALLED FROM FILE: LINE: 2815
            NOTE: 1 namelist settings are wrong. Please check and reset these options
            -------------------------------------------


            Solution

            See Choosing an Appropriate Number of Processors for additional details about this concept.


        That's all! Both real and wrf should now run successfully! You can wait for wrf to run, or you can move on to another exercise now. Since you already know how to run wrf, it's not necessary that you actually run it all the way through.

        WRF Tutorial Exercises



        Continue to More Exercises

        If you plan to attempt more exercises right now, you can access the cases studies menu by clicking here.