A Note on Radar Averaging Techniques for the Lightning Launch Commit Criterion (LLCC)
{Airborne Field Mill (ABFM )} Program


Francis J. Merceret/NASA/KSC/YA-D

4 March 2003

 

Introduction:  At a joint meeting of the ABFM science team and the Lightning Advisory Panel (LAP) in November 2002, the attendees reached a consensus that the most promising proxy for electric fields aloft was a suitable composite radar parameter.  The primary candidates for this parameter were the average of radar reflectivity or the peak reflectivity in a box surrounding the position of interest.  The exact size of the box and the height of its top and bottom were to be determined.  The group discussed three methods for computing average radar reflectivity in the box:

This note compares the properties of the three proposed averaging methods with each other and with those of the peak value.

 

Methodology:  As originally proposed, this investigation would rely entirely on a Monte Carlo simulation using a pseudo-Gaussian random number generator to produce each “run” of 2000 samples to be averaged by the three techniques and from which the peak value would be taken.  The sample size was based on information from the Principal Investigator, Dr. Jim Dye, that the most likely box sizes for use in LLCC analysis would contain about that many radar samples.  The Monte Carlo simulation was conducted.  The details are presented below.  I soon realized, however,  that some analysis was also possible.

 

If a Gaussian model is assumed, it is possible to derive analytical results for the three averages.  The straight average should return the mean value supplied to the random number generator within the sampling error for the process.  The truncated average can be computed using the “Left-Truncated Normal” distribution (Johnson and Thomopoulos, 2002) that will be described below.  The average of Z values can be computed based on the fact that if dBZ is normally distributed, then Z is lognormally distributed with a known relation between the lognormal parameters and the Gaussian parameters. This also will be discussed below.

 

The Monte Carlo simulation produced 100 runs of 2000 samples per run for each assigned pair of Gaussian parameters m and s, where m and s are respectively the mean and standard deviation of the distribution of dBZ values. For each run, the peak value and the average value computed by each of the three methods was computed.  The standard deviation of the data was also computed as a check on the pseudo-Gaussian random number generator.  For each m and s, the average of the 100 values of the peak and the three averaging methods plus the standard deviation was computed.  These 100 run averages are the basis for this analysis.

 

The concept of the truncated Gaussian is simple enough.  A Gaussian probability density function is given by

 

f(z) = (2p)-1/2 exp (-z2/2)    - inf <= z <= inf

 

 

where z = (x-m)/s is the normalized standard variable.  The “left-truncated” version is simply

 

f(z) = (2p)-1/2 exp (-z2/2)    kL <= z <= inf

f(z) = 0    -inf <= z < kL.

 

Johnson and Thomopoulos (2002) create a new variable t = z – kL and compute its mean, mt, using an intermediate variable, H(k), the area under f(z) from k upward.  The result is

 

mt = H(kL)-1[f(kL) – kLH(kL)].

 

This can be converted back to x = dBZ by converting t to z and then z to x using the equations above.

 

The lognormal distribution is well known. If x is normally distributed, then y = exp(x) is lognormally distributed with parameters M and S such that M is the mean of x and S is the standard deviation of x.  In that case, the mean of y is given by

 

my = exp (M+S2/2)         (Aitchison and Brown, 1966).

 

Since dBZ = 10 Log10 (Z), then Z = 10dBZ/10 which will be lognormal if dBZ is Gaussian.

The lognormal parameters are given by M= bm and S = bs where m and s are the mean and standard deviation of dBZ.  b = ln(10)/10 = 0.23026.  One word of caution is necessary here:  lognormal distributions are highly skewed and critically dependent on large sample sizes for convergence to the theoretical values of the moments.  Computed moments may differ significantly from measured moments if extreme values are not realized in the sample (Smith and Merceret, 2000).

 

The range of values selected for m and s are as follows:

 

m = 5, 15, 25 and 40

s = 3, 8 and 20.

 

All twelve combinations were run.

 

Results – Straight Average:  The straight average faithfully reproduced the mean of the sample.  The standard deviations also agreed with the input parameter, thus verifying the correct functioning of the pseudo-Gaussian random number generator.  The details are presented in Tables 1 and 2.

 

Mean\Sigma

3

8

20

5

4.99

4.98

5.03

15

15.00

15.01

15.01

25

25.00

25.00

25.02

40

40.00

39.99

40.07

Average dBZ

Table 1.  Mean of means for 100 runs of 2000 samples of the “straight average”

 

Mean\Sigma

3

8

20

5

2.99

8.02

19.99

15

3.00

8.01

20.05

25

3.00

8.02

19.98

40

3.01

8.02

19.94

Sigma dBZ

Table 2.  Mean of sigmas for 100 runs of 2000 samples of the “straight average”

 

Results – Truncated Average:  The truncated average showed a positive bias as expected.  When you eliminate data asymmetrically from a Gaussian process, the asymmetry necessarily introduces bias.  By selectively eliminating the smaller values, the mean becomes larger.  For large means and small standard deviations, the effect is small.  As the mean approaches one standard deviation of the cut-off value, the bias becomes large.  The results are shown in table 3.

 

Mean\Sigma

3

8

20

5

5.30

8.61

18.01

15

15.00

15.56

22.87

25

25.00

25.02

29.13

40

40.00

39.99

41.14

Average dBZ>0 dBZ

Table 3.  Mean of means for 100 runs of 2000 samples of the “truncated average”

 

I performed the analysis using the formulas of Johnson and Thomopoulos (2002) for m = 15 and s = 20.  The computed value for the mean was 23.26, which compares well with the realized value of 22.87.  This confirms that the simulation is accurately modeling the process.  These calculations are a bit awkward, and no others were performed since the simulation results are sufficient for our purposes.

 

The key result is that when the cutoff approaches within one standard deviation or less of the actual mean, the bias in the calculated mean grows rapidly.

 

Results – Average of Z:  The sense of the science team has been that the average of Z values is dominated by the peaks and is not really representative of the lower values.  The simulation confirms this intuition.  The results of the average of Z are presented in Table 4.  The comparison with the peak results will be presented in the discussion section.

 

 

 

Mean\Sigma

3

8

20

5

6.01

12.23

42.66

15

16.04

22.21

52.77

25

26.04

32.21

62.41

40

41.03

47.23

77.44

Average Z dBZ

Table 4.  Mean of means expressed as dBZ for 100 runs of 2000 samples of the average of Z.

 

To confirm these computations, I selected a run from the 100 in the set and computed the expected lognormal mean from the sample m and s of 39.873 and 19.586 respectively.  This was one run from the set with m and s  = 40 and 20.  The calculated lognormal mean was 84.0 dBZ whereas the measured mean was 74.6 dBZ, a significant difference.  The peak value for this realization was 101.5 dBZ, which is below the average for this set of runs.  Given the propensity mentioned above for lognormal distributions to be very sensitive to sample size and to extreme values, I selected another run from the same set having a peak value of 115.6 dBZ, which is above average.  The calculated lognormal average based on the observed m and s of 40.121 and 20.033 was 86.3 dBZ. The measured value was 85.1 dBZ.  This confirms that the simulation is working correctly.

 

Results – Peak Value:  The results of the peak value method are presented in Table 5.

 

Mean\Sigma

3

8

20

5

15.0

31.6

70.9

15

24.9

41.4

81.3

25

34.9

51.5

90.8

40

49.9

66.6

105.7

Max dBZ

Table 5.  Mean of maximum dBZ values for 100 runs of 2000 samples.

 

Discussion:  These results suggest several things.  First, as noted above, the truncated average can be seriously biased for mean values within about one standard deviation of the threshold or less.  The difference between the two methods is shown in Table 6.

 

Mean\Sigma

3

8

20

5

0.31

3.63

12.98

15

0.00

0.55

7.86

25

0.00

0.02

4.11

40

0.00

0.00

1.07

(Average dBZ>0 dBZ)-(Average dBZ)

Table 6.  The difference between the means of means for 100 runs of 2000 samples of the thresholding method and those of the straight average method.

 

Note that the difference is small when the difference between the actual mean and the threshold is more than 2s.

 

Second, the difference between the average peak value and either of the non-thresholded averaging methods depends only on the standard deviation as shown in Tables 7 and 8.

 

Mean\Sigma

3

8

20

5

10.0

26.6

65.9

15

9.9

26.4

66.3

25

9.9

26.5

65.8

40

9.9

26.6

65.6

(Max dBZ)-(Average dBZ)

Table 7.  The difference between the means of the maximums for 100 runs of 2000 samples and the means of the means of the straight average method.

 

Mean\Sigma

3

8

20

5

8.99

19.37

28.24

15

8.86

19.19

28.53

25

8.86

19.29

28.39

40

8.87

19.37

28.26

(Max dBZ)-(Average Z dBZ)

Table 8. The difference between the means of the maximums for 100 runs of 2000 samples and the means of the means of the Z average method.

 

Of course, then this must also be true for the difference between the Z-average method and the straight average method, as shown in Table 9.

 

Mean\Sigma

3

8

20

5

1.0

7.3

37.6

15

1.0

7.2

37.8

25

1.0

7.2

37.4

40

1.0

7.2

37.4

(Average Z dBZ)-(Average dBZ)

Table 9. The differences of the means of the means of the Z-average and straight average methods.

 

Since the standard deviation of real data may vary widely from cloud to cloud, these constant connections between the various methods should not be used in selecting a candidate methodology.  These relations hold only in the aggregate, not for any single run.  They also depend on the distribution being Gaussian, which the real world hardly ever is when non-linear processes like cloud formation and electrification are afoot.

 

The peak value is probably too sensitive to the whims of sampling to make a good indicator for operational decisions.  Although in the aggregate over a large number of runs, it is related consistently to the input distribution, the individual cases examined in the verification of the lognormal computations showed peak values differing by more than 10 dBZ from the same population.  To a considerable extent, the Z-average process also shares this disadvantage.

 

The truncated average seems to have no real advantages over any of the other methods and it has the serious disadvantage of being a biased estimator of the process.

 

Thus, the outcome of this study suggests that the best methodology of the candidates here for generating a radar box parameter is to use a straight average of dBZ values including all points down to the noise level in the average.  The open question is what to do with “empty” cells in the box. 

 

 

References:

 

Aitchison, J. and J.C. Brown (1966): The Lognormal Distribution, 1st Ed., Cambridge Univ. Press, Cambridge, England, p.8.

 

Johnson, A.C. and N.T. Thomopoulos (2002):  Use of the Left-Truncated Normal Distribution for Improving Achieved Service Levels, Proceedings of the 2002 Annual Meeting of the Decision Sciences Institute, pp. 2033 – 2041.

 

Smith, Brian and Francis J. Merceret (2000): The Lognormal Distribution, College Mathematics Journal, 31, #4, 259-261.