Monday, April 19, 1999 3:29 PM Email from T3E user: Subject: MPI version of MM5 on large domain I am trying to run the largedomain using the latest version of MM5 and MPP.tar. It seems like the jobs are just hanging, and continuing to use up cpu time, and the rsl_output files are not created until the core file, but all the timesteps seem to have completed, and a fort.41 generated. I am following the configure.usr and instructions to run the mpp version on the t3e given: http://www.mmm.ucar.edu/mm5/mpp/basic_info.html I've run this case before, but that was prior to using this latest version of MM5 with the release 12. Totalview gives the following traceback information from the core file: MM5, OUTPUT, OUTTAP(MM5:811), RSL_WRITE (OUTTAP:1677), MPI_Send (RSL_WRITE:752), post_big_send (MPI_Send:414), abort (post_big_send:4536), raise (abort:127), _lwp_kill (raise:30) --- this user solved the problem themself, as noted in later email --- I remember now that we have already identified the fix for this problem. >The MPI on T3E uses underlying shmem calls, and on some > applications, sending processors can exceed the default > fixed-sized pool (i.e., senders are generating messages > much faster than receivers can get them). The above environment > variable allows you to bump up the size of the fixed-sized > pools. > > Example in csh: > > setenv MPI_SM_POOL 16000 > > Creates a pool of 16000 bytes. >