# 4.3.4.1 Sample size (number of increments and replicates)

One option for reducing errors in sampling designs is to increase the sample size. For ISM, sample size can pertain to the mass per increment (i.e., sample support), number of increments (n), and number of replicates (r). Assuming a uniform mass per increment, several observations were made regarding the effects of increasing n and r on estimates of the mean (also see Appendix A, Table A-1):

- Increasing the n has a direct effect on the standard deviation of the replicates. Specifically, the central limit theorem suggests the standard deviation of the replicates (which is a measure of the standard error of the mean) reduces by a factor of the square root of n. For example, all other things being equal, if the SD of replicates is 4.0 with n = 30, doubling the increments to n = 60 would reduce SD by the square root of 2 (or 1.414) to approximately 2.8.
- Increasing r does not reduce the standard deviation of the replicates although it does improve the estimate of the SD by reducing the variability in the estimate. Likewise, increasing r reduces the standard error for the grand mean. Specifically, the standard error reduces by the square root of r.
- The overall reduction in the standard error for the (grand) mean is a function of the total mass collected and spatial area represented (i.e., increments × replicates), and this observation applies to parameter estimation with discrete sampling as well. Increasing the number of increments and/or replicates reduces the variability in ISM estimates of the mean.
- Increasing the number of increments (n) or sample mass reduces the potential for errors in terms of both frequency and magnitude of underestimation of the mean.
- For nonnormal distributions, increasing r above 3 provides marginal return in terms of improving coverage of a UCL when the Chebyshev calculation method is used; however, increasing r does not improve coverage of the Student's-t UCL.
- Increasing r reduces (i.e., improves) the RPD, meaning it will produce estimates of the 95% UCL closer to the DU mean. Therefore, increasing r may be an important sampling strategy when errors of either underestimation or overestimation of the mean can have significant consequences. The difference between Chebyshev and Student's-t UCLs can sometimes lead to different decisions for a DU. While the Chebyshev method typically provides greater coverage, it also tends to have higher RPDs. Project teams must balance both properties of UCLs when deciding which method(s) to use.
- Simulations produced varying results in terms of improvement in coverage by increasing the number of increments. In some simulations, increasing the number of increments produced little or no observable difference. In others, increasing increments twofold or more from typical increment numbers used in ISM resulted in marginal improvement. As with increasing replicates, increasing the number of increments decreases (i.e., improves) the RPD. The improvement in RPD performance is marginal when the underlying CV is small.
- Simulations showed that coverage provided by the two UCL calculation methods depends upon the degree of variance (or dispersion) of the contaminant distribution within the DU. A variety of statistics provide a measure of dispersion including the CV (i.e., SD normalized by the mean) and the geometric SD (specific to lognormal distributions). Table 4-4 summarizes findings grouped by CV (and GSD). Note that in this case, the CV reflects the SD of the increments divided by the mean and not the SD of the replicates divided by the mean. In practice, individual increments are typically not retained for analysis, so there may be no direct measure of the CV. If there is no site knowledge available to support an assumption about the degree of dispersion (i.e., low, medium, high) of increments, then the Chebyshev UCL may be the preferred calculation method because it is more likely to achieve the desired coverage than the Student's-t UCL. The CV (or SD) of the replicates is not a useful metric for determining which UCL method provides sufficient coverage.

UCL Method | Dispersion Among Individual Increments |
||
---|---|---|---|

Low
(CV <1.5 or GSD <3)
| Medium
(1.5 < CV < 3 or 3 < GSD < 4.5)
| High
(CV >3 or GSD >4.5)
| |

Student’s-t |
Yes | No | No |

Chebyshev | Yes | Yes | Maybe |

Coefficient of variation (CV) = standard deviation (SD)/mean.
Geometric standard deviation (GSD) = exp[sqrt(ln(CV^{2} + 1))] for lognormal distributions.