A.2 PROBABILITY DISTRIBUTIONS (PD-1)

A series of Monte Carlo simulations was run using probability distributions with different CVs. Table A-2 summarizes distribution variability (based on CV) and results for selected sampling designs and performance metrics (both Student’s-t and Chebyshev UCLs).

Table A-2. Summary of simulation results using lognormal distributions

95UCL >= true mean [Overestimate of Mean]   95UCL < true mean [Underestimate of Mean]
Statistcs Chebyshev UCL Student’s-t   Chebyshev UCL Student’s-t
2 Reps 3 Reps 5 Reps 7 Reps 2 Reps 3 Reps 5 Reps 7 Reps   2 Reps 3 Reps 5 Reps 7 Reps 2 Reps 3 Reps 5 Reps 7 Reps
m=30, CV=1   m=30, CV=1
count of simulations 4,571 4,835 4,956 4,981 4,693 4,664 4,689 4,678   429 165 44 19 307 336 311 322
UCL coverage 91% 97% 99% 100% 94% 93% 94% 94%   91% 97% 99% 100% 94% 93% 94% 94%
mean RPD 27% 22% 18% 16% 37% 16% 10% 8%   -4% -3% -2% -1% -4% -3% -2% -2%
5th %ile RPD 3% 4% 5% 5% 4% 2% 1% 1%   -11% -7% -7% -3% -11% -8% -6% -5%
50th %ile RPD 22% 21% 17% 15% 31% 14% 9% 7%   -4% -2% -1% -1% -4% -2% -2% -1%
95th %ile RDP 65% 48% 34% 28% 91% 34% 20% 15%   0% 0% 0% 0% 0% 0% 0% 0%
  m=30, CV=4   m=30, CV=4
count of simulations 4,346 4,690 4,852 4,909 4,519 4,430 4,333 4,351   654 310 148 91 481 570 667 649
UCL coverage 87% 94% 97% 98% 90% 89% 87% 87%   87% 94% 97% 98% 90% 89% 87% 87%
mean RPD 93% 80% 63% 55% 129% 57% 36% 28%   -13% -10% -7% -6% -13% -10% -8% -6%
5th %ile RPD 6% 8% 9% 10% 9% 4% 3% 2%   -30% -23% -18% -15% -30% -25% -19% -17%
50th %ile RPD 65% 59% 50% 44% 90% 41% 27% 22%   -12% -8% -6% -5% -11% -8% -6% -5%
95th %ile RDP 272% 214% 155% 129% 374% 157% 92% 73%   -1% -1% 0% 0% -1% -1% -1% -1%
  m=30, CV=7   m=30, CV=7
count of simulations 4,171 4,532 4,740 4,820 4,414 4,187 4,101 4,137   829 468 260 180 586 813 899 863
UCL coverage 83% 91% 95% 96% 88% 84% 82% 83%   83% 91% 95% 96% 88% 84% 82% 82%
mean RPD 140% 117% 94% 83% 189% 86% 55% 45%   -18% -13% -10% -8% -18% -14% -11% -9%
5th %ile RPD 8% 8% 9% 11% 11% 5% 4% 3%   -39% -31% -24% -21% -41% -32% -28% -23%
50th %ile RPD 82% 73% 65% 59% 111% 54% 36% 30%   -16% -11% -8% -6% -16% -12% -9% -8%
95th %ile RDP 457% 358% 271% 227% 609% 272% 164% 133%   -2% -1% 0% -1% -1% -1% -1% -1%
  m=100, CV=1   m=100, CV=1
count of simulations 4,604 4,827 4,946 4,979 4,720 4,960 4,687 4,669   396 173 54 21 280 310 313 331
UCL coverage 92% 97% 99% 100% 94% 94% 94% 93%   92% 97% 99% 100% 94% 94% 94% 93%
mean RPD 27% 23% 18% 16% 38% 16% 10% 8%   -4% -3% -2% -1% -5% -3% -2% -2%
5th %ile RPD 3% 5% 5% 5% 4% 3% 2% 1%   -12% -8% -5% -3% -12% -9% -6% -5%
50th %ile RPD 22% 21% 18% 15% 32% 14% 9% 7%   -3% -2% -2% -1% -4% -3% -2% -1%
95th %ile RDP 66% 49% 34% 28% 93% 35% 20% 15%   0% 0% 0% 0% 0% 0% 0% 0%
  m=100, CV=4   m=100, CV=4
count of simulations 4,358 4,674 4,858 4,926 4,547 4,435 4,375 4,395   642 326 142 74 453 565 625 605
UCL coverage 87% 93% 97% 99% 91% 89% 88% 88%   87% 93% 97% 99% 91% 89% 88% 88%
mean RPD 95% 79% 64% 55% 130% 57% 36% 28%   -13% -10% -6% -6% -13% -10% -7% -6%
5th %ile RPD 6% 9% 10% 10% 9% 5% 3% 2%   -30% -23% -18% -18% -31% -25% -18% -16%
50th %ile RPD 65% 59% 51% 45% 89% 41% 27% 22%   -11% -8% -5% -5% -12% -9% -6% -5%
95th %ile RDP 280% 211% 157% 129% 380% 155% 95% 73%   -1% -1% 0% 0% -1% -1% 0% 0%
  m=100, CV=7   m=100, CV=7
count of simulations 4,115 4,509 4,739 4,839 4,362 4,186 4,092 4,119   885 491 261 161 638 814 908 881
UCL coverage 82% 90% 95% 97% 87% 84% 82% 82%   82% 90% 95% 97% 87% 84% 82% 82%
mean RPD 135% 114% 93% 80% 183% 84% 54% 43%   -17% -13% -9% -8% -17% -14% -11% -9%
5th %ile RPD 7% 8% 9% 9% 10% 5% 3% 3%   -39% -29% -22% -20% -38% -31% -26% -23%
50th %ile RPD 82% 74% 64% 58% 111% 53% 36% 30%   -15% -11% -8% -6% -15% -12% -9% -8%
95th %ile RDP 417% 321% 251% 210% 557% 240% 156% 122%   -1% -1% -1% -1% -1% -1% -1% -1%

Each scenario can be thought of as a special case of the simulations with maps (M-1, M-2, M-3) presented later in this appendix. With sampling from probability distributions, each increment is an independent, random sample obtained from the same defined distribution (i.e., identically distributed), which is analogous to using simple random for increment collection if applied to a real site. The assumption is that the overall distribution throughout the DU is homogeneous and can be described by a single population. It is important to note that, while this approach is useful for conveying important concepts about ISM, sampling from a probability distribution is an oversimplification for the following reasons:

  1. There is no attempt to quantify the relative contributions of different sources of heterogeneity or errors introduced in both the field and laboratory. The variance is viewed as a "lumping" term that represents the variability in concentrations in soil if the site were divided into samples of some mass. In practice, the expected error in the estimate of the mean depends, in part, on the mass of soil collected with each increment (see discussion of Gy sampling principles in the glossary). Therefore, it is convenient to think of the population as having a fixed mean concentration but a variance contingent on the sample mass. The simulations with defined distributions do not explore the effect of sample mass (see discussion of sample support in the glossary) on performance metrics. Instead, it is assumed that the specified variance simply reflects the collective sources of heterogeneity.

  2. The defined populations used in the simulations are not described as representing a DU of a specific size. At many sites, it is common for concentrations to exhibit spatial patterns, including subareas of elevated concentrations and overlapping sources (i.e., mixtures). This may be true even for very small DUs where concentrations from samples collected within a 1-foot radius differ by more than an order of magnitude. Most of the simulations do not explicitly model these conditions but instead presume that the overall population for the DU can be approximated by a lognormal distribution, regardless of any spatial arrangement of the contaminant mass.

  3. Only lognormal probability distributions are defined. Alternative positively skewed probability distributions were not explored. In general, because lognormal distributions give greater weight to results in the upper tail than alternative choices (e.g., gamma or Weibull distribution), the standard error for the mean and the corresponding UCLs tends be higher than that of comparable distributions with the same population mean and variance.