2.1 Introduction

At their most basic level, the purpose of most environmental investigations is to make decisions about volumes of media which may contain contaminants at concentrations above some level of concern. The concentration of contaminants must be measured to determine whether remediation or other action is required. Such decisions are often made based on an estimate of the mean concentration of contaminants within the identified volume of media. Risk management decisions based on contaminant concentration estimates often involve large volumes of soil at individual sites. The totality of soil management actions throughout the nation each year has enormous public health and economic consequences.

Because it is impractical to collect and analyze the entire volume of soil for which decisions must be made, samples are collected and the results used to represent that entire volume of soil. The industry of environmental investigation, regulation, and laboratory analysis has, to a large extent, developed around the practice of using discrete samples to meet all decision goals, including estimating mean contaminant concentrations. There are many reasons why a mean concentration may be of interest for decision-making purposes, as discussed in Section 3 and Hyperlink 1.

Relying on an estimate of the mean contaminant concentration in a volume of soil using a small number of discrete samples can lead to costly decision errors.

Estimates of the mean may be based on arithmetic or geometric means of discrete sampling data or on upper confidence limits (UCLs). Since the costs of sample analysis can be high, the number of discrete samples collected is often driven down by project budgetary constraints. Collective experience, statistical simulation, empirical data, and sampling theory indicate that in many situations estimates of mean contaminant concentrations in soil made from small numbers of discrete samples are unlikely to be accurate or precise, and are, therefore, more likely to result in decision errors. These decision errors can go both ways. An erroneous decision of “clean” can lead to unacceptable exposure to contaminants. On the other hand, an erroneous decision of “dirty” can lead to a waste of resources “cleaning up” soil unnecessarily.

By its very nature, soil is a highly heterogeneous solid with many components. Sampling soil for the purpose of obtaining an estimate of the mean contaminant concentration is highly susceptible to sampling errors from a variety of sources. One goal of a sampling design should be to minimize the errors that can occur in each step of the sampling and analytical process. Historically, the focus has been on controlling errors associated with the analytical part of the process. A great deal of effort is invested in ensuring good data by requiring strict adherence to analytical methodologies and laboratory QA/QC procedures. But all this attention addresses only the tail end of the process. There are many more steps to the data quality chain that require attention for the output to be good data. According to USEPA’s soil screening guidance (USEPA 1996b),

Data users often look at a concentration obtained from a laboratory as being “the concentration” in the soil, without realizing that the number generated by the laboratory is the end point of an entire process, extending from design of the sampling, through collecting, handling, processing, analysis, quality evaluation, and reporting.

Steps usually overlooked when evaluating data quality include sampling design, sample collection techniques, sample processing, and field and laboratory subsampling. However, there is a growing body of evidence that the predominant source of error in the “entire process” to which USEPA refers is sampling error, which occurs because contaminant concentrations in soil are highly heterogeneous. Heterogeneity makes representative sampling difficult. Sampling errors are manifested as variability (i.e., imprecision observed as large differences in results between replicate samples) and/or bias in the data set (i.e., data results significantly over or under the true concentrations). Data variability is easily measured to evaluate the effects of sampling error on data quality. If concentrations are close to a decision threshold and sampling errors are not controlled, data variability can lead to highly uncertain estimates of mean concentrations, which in turn lead to considerable uncertainty about whether the mean is above or below a decision threshold. Hyperlink 2 provides an example illustrating the importance of considering data variability in decision making. Poorly thought out sampling procedures produce misleading data that can cause decision errors, as illustrated in Figure 2.1, no matter how good the analytical step is.

Heterogeneous nature of contaminants in soils may lead to decision errors. Sampling Errors

Figure 2-1. Heterogeneous nature of contaminants in soils may lead to decision errors.

This document focuses on how to obtain an unbiased and precise estimate of the mean concentration, including UCLs, in heterogeneous bulk volumes of soil with a relatively small number of laboratory analyses using a process called "incremental sampling methodology." ISM is a suite of planning, sampling, sample preparation, and subsampling techniques that address heterogeneous soil contamination and thereby control sampling errors that may otherwise lead to incorrect decisions.

By controlling sampling error throughout the entire sampling and analysis process, ISM can provide a precise and unbiased estimate of the mean using a relatively small number of laboratory analyses.

The sampling theory of Pierre Gy and his procedures for sampling bulk particulate materials have been used and validated for many years in the mining industry. However, only in the last few years has the environmental industry at large become familiar with this set ofmethods. ISM is based on many of the principles of Pierre Gy's sampling theory and is intended to address the problem of making decisions about highly heterogeneous bulk volumes of particulate material (e.g., soil) based on estimates of the mean derived from a relatively small number of samples of that material. Note, however, that many of the principles discussed in this section are also applicable to collecting and processing discrete samples. More attention to Gy theory and management of heterogeneity could reduce sampling error and improve data quality for discrete samples as well.