Sun/Solaris compilation: workaround for 'sort' bug

Gentle users,

This entry describes the fix for a problem that was encountered by a user of a Solaris distributed memory system. The original message to the helpdesk, with some additional detail about the problem, appears after the reply.

-Rotang, April 25, 2001


Reply, with workaround:


Date: Wed, 25 Apr 2001 16:06:11 -0600 (MDT)
From: John Michalakes 
To: Michael.Walters@afit.edu
Cc: michalak@ucar.edu
Subject: RE: MPP on Sun

Michael,

It turns out that the seg-fault is coming from the Solaris verison of
the 'sort' command in the FLIC script.  First the fix, then a bug
report you can hand off to Sun if you want.

1) Fix

In the file MPP/FLIC/FLIC/flic.csh, add the line indicated by
the comment with the word 'kludge' in it:

if ( $s1_nocomments != yes ) then
  $FGREP TCOMMENT $TMP/flic_scanned.$$ |\
    $AWK -F: '{print $4}' > $TMP/flic_cnum.$$
  $FGREP TCOMMENT $TMP/flic_scanned.$$ |\
    $SED 's/^.*TCOMMENT://' > $TMP/flic_coms2.$$
  $PASTE $TMP/flic_cnum.$$ $TMP/flic_coms2.$$ > $TMP/flic_coms.$$
  $HARDRM $TMP/flic_cnum.$$ $TMP/flic_coms2.$$
# kludge for Solaris sort command that segfaults if TMP/flic_coms is zero length
  echo " " >> $TMP/flic_coms.$$
#
  $SORT -nm +0 -1 $TMP/flic_dat.$$ $TMP/flic_coms.$$ | $CUT -f2- | \
       $REASSEMBLE $TMP/bbb.$$ | $SED 's/CFLICBYE //'
else
  $CUT -f2- $TMP/flic_dat.$$ | $SED 's/CFLICBYE //'
endif

After you make this change, you will need to 'make uninstall' and
then 'make mpp' again, so that FLIC is completely rebuilt.

2) Bug report for Sun

This is a log of a terminal session on your system that
demonstrates the bug in sort:

------------------------------------------------------
% cat > flic_dat
   1          SUBROUTINE KFBMDATA
#include
#include
      FLIC_RUN_DECL
   2          COMMON /VAPPRS/ALIQ,BLIQ,CLIQ,DLIQ,AICE,BICE,CICE,DICE,XLS0,XLS1
   3          DATA ALIQ,BLIQ,CLIQ,DLIQ/613.3,17.502,4780.8,32.19/
   4          DATA AICE,BICE,CICE,DICE/613.2,22.452,6133.0,0.61/
   5          DATA XLS0,XLS1/2.905E6,259.532/
CFLIC END DECLARATIONS
   6          RETURN 
   7          END 

% touch flic_coms
% ls -l flic_dat flic_coms
-rw-rw-r--   1 jmichala staff          0 Apr 25 18:03 flic_coms
-rw-rw-r--   1 jmichala staff        414 Apr 25 18:03 flic_dat
% sort -nm +0 -1 flic_dat flic_coms
Segmentation fault
% echo " " >> flic_coms
% sort -nm +0 -1 flic_dat flic_coms
 
   1          SUBROUTINE KFBMDATA
#include
#include
      FLIC_RUN_DECL
   2          COMMON /VAPPRS/ALIQ,BLIQ,CLIQ,DLIQ,AICE,BICE,CICE,DICE,XLS0,XLS1
   3          DATA ALIQ,BLIQ,CLIQ,DLIQ/613.3,17.502,4780.8,32.19/
   4          DATA AICE,BICE,CICE,DICE/613.2,22.452,6133.0,0.61/
   5          DATA XLS0,XLS1/2.905E6,259.532/
CFLIC END DECLARATIONS
   6          RETURN 
   7          END 

% 
------------------------------------------------------
The gist of this is that if the second file in the sort/merge
is zero length, sort seg-faults.

-John

 ----------------------------------------------------------------------
 John Michalakes,  michalak@ucar.edu,  http://www.mcs.anl.gov/~michalak
 ----------------------------------------------------------------------
 MCS Division                | MMM Division
 Argonne National Laboratory | National Center for Atmospheric Research
                             | 3450 Mitchell Lane, Boulder, CO 80301
                             | 303-497-8199
 ----------------------------------------------------------------------


Original message to help desk

Hello,

I am having trouble compiling the MPP version of MM5V3 under Sun MPI on
a cluster of Sparc workstations.  I am using the sunmpi portion of the
configure.user file in section 7 without modification.  Specifically,
several object files are not built properly so the link fails.  The
compiler errors are:

ld: fatal: file fkill_model.o: cannot open file: No such file or directory
ld: fatal: file kfbmdata.o: cannot open file: No such file or directory
ld: fatal: file mparrcopy.o: cannot open file: No such file or directory
ld: fatal: file savread.o: cannot open file: No such file or directory
ld: fatal: file write_flag.o: cannot open file: No such file or directory
ld: fatal: File processing errors. No output written to mm5.mpp
*** Error code 1 (ignored)

Each of the associated files (like fkill_model.f) contains only a
single line with a "Segmentation Fault (core dumped)" error, so it
appears to me that the FLIC processing of these files is not proceeding
properly.  The FLIC processing and compilation of all the other files
appears to proceed without problem.

I am using Sun Workshop 6 Fortran and C compilers.  I have also tried
using the gnu version of make without success.

If anybody has any ideas about what is causing this problem, I would
appreciate hearing from them.  Thanks

Mike Walters
Department of Engineering Physics
Air Force Institute of Technology