The effect of subsample mass on data variability

Figure 2-9 presents the result of a study performed by the U.S. Department of Energy (DOE) in the mid-1970s. A large (~4 kg) soil field sample was milled to < 10-mesh (2 mm) particle size. Twenty replicate aliquots of various masses were taken from the prepared sample and analyzed. Despite the homogenization efforts, the < 10-mesh particle size allowed particle size effects and heterogeneity to persist. The concentration units in this figure are in nanoCuries per gram (nCi/g), and the vertical line at 2 nCi/g approximately represents the true concentration for the 4 kg sample. This experiment demonstrated the relationship between analytical sample mass, data variability, and potential decision errors. The results show that data from subsamples of smaller mass, such as the ≤1 g mass commonly used for metals analysis, show more data variability than analytical subsamples of larger mass.

Smaller analytical masses contribute to high data variability. Source: Data from an experimental study on radioactively contaminated soil (Gilbert and Doctor 1985).

Figure 2-9. Smaller analytical masses contribute to high data variability.
Source: Data from an experimental study on radioactively contaminated soil (Gilbert and Doctor 1985).

The data variability caused by heterogeneity affects the statistical distribution of the data, as seen in the three curves in the diagram. Data from smaller subsample masses form more lognormal-like statistical distributions. For example, notice how the right side of the 1 g sample mass curve (blue curve) is pulled out or "skewed" to the right much more than the left side. Because it is easy for small subsamples to miss contaminated particles, many small subsamples have low concentrations. However, sometimes more contaminated particles wind up in a small sample, causing a high concentration data result that is nonrepresentative of the parent material and producing the right-skewed "tail" of a lognormal distribution.

The smaller the analytical sample mass, the more likely that some data results will exceed an action level. Whether or not a volume of soil is considered compliant with an action level can depend on how big the analytical subsample is.Note that some skewing is still present in the 10 g subsamples (green curve). For the 100 g subsamples the skewing is basically gone and the distribution is normalized (red curve). Figure 2-9 also illuminates how sampling error and the data variability it causes can lead to decision errors. For the sake of illustration, assume that 3 nCi/g is an action level. Because of its skewed distribution, some individual data results from the 1 g small mass data set will sometimes exceed the action level, even though the mean and most of the data results are below the action level. For the 10 g subsamples, there will be fewer results above the action level. For 100 g subsamples of the same soil, there would be no results above the action level. Recall that these effects are apparent even after the parent sample was milled and sieved to 2 mm. This level of sample preparation goes beyond that which is typical for most environmental analyses.

In summary, Figure 2-9 illustrates clearly how matrix heterogeneity and particle size effects manifest as data variability and nonnormal, skewed statistical data distributions. These effects increase the possibility of decision errors.