4.3.2 Simulation Study Approach

Simulation studies were conducted to determine the performance of several aspects of ISM:
  • number of increments and replicates
  • sampling pattern
  • heterogeneity and variability of concentrations

A computer-based simulation is a numerical experiment in which a hypothetical site is sampled many times. The key utility of simulations is that the contaminant distribution can be specified so the population parameters are known. This is in contrast to actual sampling with ISM, in which the potential bias in results or coverage of the 95% UCL cannot be quantified since the true mean is not known. Simulation is a convenient tool to evaluate the performance of alternative ISM approaches based on comparisons to the true mean. Furthermore, a variety of different incremental sampling and statistical methods can be simultaneously applied to the same exact scenario to facilitate a comparison of sampling strategies. Each simulation followed this general five-step process:

  1. Define the population. This may be a probability distribution (e.g., lognormal or Gamma), a mixture of probability distributions, or a 2-D map of the concentration surface of a DU. For some scenarios, CH and DH may be explicitly defined, while for others the assumption is that the population variance represents the combination of both elements of heterogeneity and other sources of error.
  2. Define an ISM sampling strategy. This step identifies the size and placement of DU, number of increments, sampling strategy (e.g., systematic random, random sampling within grid, simple random sampling; see Section for more description), and number of replicates.
  3. Implement a Monte Carlo analysis (MCA). Using MCA (described below), repeat the same ISM sampling strategy many times (e.g., 2000 iterations or more).
  4. Calculate statistics. For each iteration of MCA, calculate the DU statistics, including the grand mean (i.e., mean of replicate samples), RSD of the replicate samples, bias in mean (i.e., estimated mean minus population mean), and 95% UCL using Student’s-t UCL and Chebyshev UCL.
  5. Evaluate performance metrics.  In this step, the statistics are used to evaluate performance metrics, including coverage of 95% UCL, magnitude of UCL error, bias of the means, and RSD.

Using simulation, we can evaluate a variety of different statistical properties of ISM and determine if factors that can be controlled in the sampling design (e.g., number of increments, number of replicates, DU size, and use of multiple SUs) can be adjusted to achieve the sampling objectives. Furthermore, by running MCA simulations on a variety of different scenarios, we can develop an understanding of the alternative ISM sampling strategies under different conditions. For example, 30 increments and 3 replicates may be sufficient to obtain a reliable 95% UCL for a DU that is described well by a single probability distribution with relatively low DH, whereas greater numbers of samples may be needed for a DU with multiple overlapping contamination sources and relatively high CH and DH. Pitard (1993) highlights the value of summarizing such relationships with sampling nomographs, which are the "best available tool to quickly estimate sampling errors, effectively design optimum sampling procedures, and find reasonable economical compromises."

Simulations can be used to determine the performance of ISM under very specific conditions and, therefore, the results cannot be expected to apply to all sites. Table 4-3 provides details regarding the range of conditions that have been investigated and summarized in this document.

Table 4-3. Summary of scenarios investigated with simulations

Condition Levels
Increments 15–100
Replicates 2–5
Sampling method Simple random sampling, random within grid, and systematic random
Sampling pattern Entire DU and subdivided DU
Range of symmetry and dispersion Normal data and multiple skewed data sets (lognormal and Gamma) with CV ranging 0.7–6
DU variability Homogenous and multiple levels of heterogeneity
DU spatial patterns Ranged from evenly distributed to localized elevated regions of differing sizes

A comprehensive review of the performance of discrete sampling methods for 95% UCL calculations already exists (USEPA 2010b) and discrete sampling was not evaluated here.