MPICH/P4 environment on Linux PC cluster for large domain



From: Norm Henry [Norm.Henry@met.co.nz]
Sent: Friday, March 23, 2001 12:01 AM
Subject: RE: MM5 MPP problem

MM5 MPP Users:

This note describes a setup problem we encountered running MM5 MPP on a 
Linux PC cluster. Our domain is sufficiently large that the default memory 
allocation for shared-memory communication is too small, which leads to 
error messages about p4_shmalloc returning NULL. As per section 3.1.5 of 
the MPICH user guide, the memory allocation can be increased using the 
$P4_GLOBMEMSIZE environment variable.

In our original configuration we set $P4_GLOBMEMSIZE globally in 
/etc/profile, which normally works fine. However, if the model is executed 
with the -nolocal option it fails with the p4_shmalloc error. This is 
because /etc/profile is only read by login shells, and the shell spawned on 
the remote machine for process 0 is not a login shell.

The solution is to define $P4_GLOBMEMSIZE in the .bashrc file in the user's 
home directory on the remote machine, which is read by both login and 
non-login shells. For tcsh users the variable should be defined in either 
.tcshrc or .cshrc.

Furthermore, for scheduled jobs that are run locally (i.e. -nolocal option 
NOT used) by cron, $P4_GLOBMEMSIZE needs to be defined either in 
/etc/crontab or in the crontab file itself, since cron does not access 
/etc/profile or ~/.bashrc.

Thanks to John Michalakes for his assistance with this.

Norm Henry
National Weather Services
Meteorological Service of New Zealand