Hyperlink 14. Select Gy Sampling Theory Equations

According to Gy’s sampling theory, the overall estimation error (OE) is the sum of the total sampling error (TE) and the analytical error (AE):

OE = TE + AE

For each stage in the sampling and analytical process, TE is composed of a sampling error (SE) and a preparation error (PE), thus:

TE = SE + PE

The different sampling and preparation stages can be thought of as an initial sample event followed by subsequent subsampling events. For a typical environmental analytical process, there are two sampling and preparation stages, one occurring in the field and one occurring in the laboratory.

If there is subsampling in the field, or if more than one subsampling stage occurs in the laboratory, then each stage contributes a total sampling error component to the overall estimation error.

As described in Section 2.5, Gy recognizes seven basic sampling errors that comprise the total sampling error:

Thus:

TE = FE + GSE + CE2 + CE3 + DE + EE + PE

Note: It is actually the variances of the errors that are additive in the above equations, rather than the errors themselves. The total sampling error is a measure of how well one has controlled the various errors described above.

The errors are all derived mathematically by Gy and can be found in Sampling for Analytical Purposes (Gy 1998) and Pierre Gy’s Sampling Theory and Sampling Practice: Heterogeneity, Sampling Correctness, and Statistical Process Control (Pitard 1993). To simplify this document for the average practitioner and regulator, the mathematical derivations have been omitted.

The overall estimation error can be determined through the collection of replicate samples. The differences between the field replicates (i.e., coefficient of variation [CV]) are an estimate of the OE. When laboratory replicates are analyzed, the TE for the analytical stages occurring in the laboratory can be estimated, and according to the formula above, the TE can be estimated by subtracting the total analytical error from the OE.

To correctly collect samples, as defined by Gy, all these errors must be addressed. In practice, the focus is usually on FE and GSE; however, the other errors can be important if correct sampling procedures are not used. The FE can be minimized by collecting sufficient mass of sample, and the GSE and SE can be minimized by collecting numerous increments.

The effects of sample mass and particle size are shown in the following equation for variance of the FE:

s2FE = cβfgd3/Ms

where

Ms =  mass of the sample =  constitution parameter =  dimensionless liberation factor =  dimensionless shape factor =  dimensionless size range factor =  diameter of the mesh opening that retains no more than 5% of the sample

It is apparent from this relationship that the mass of sample necessary to minimize the FE is primarily controlled by the largest particle size of the population being sampled since this term is raised to the third power. The other factors can be thought of as constants since they do not have great variability in their values.

The constitution parameter, c, depends on the amount of the analyte of interest, a, in the lot and the mean density of the lot. If the amount of a in the lot is small, a <<1, then an approximation for c is given by c = δM/aL, where δM is the mean density of the lot and aL is the decimal fraction of a in the lot.

The number of increments is a more complex derivation and is related to the magnitude of heterogeneity present at the site. If the total sampling error is high and the FE has been appropriately minimized, then it may be that the GSE is not being properly controlled.

One can compare the FE inherent in the usual practice of collecting discrete samples to the FE associated with using an ISM approach.

Typically, during discrete sampling the amount of soil collected in the field is enough to fill the required sample container for the specific analyte. For metals analysis and most organic analyses, the amount of soil in a 4-ounce container is adequate. In the laboratory an aliquot of 1 g is taken for metals analysis, while an aliquot of 30 g is taken for most organic analyses. Thus, there are two sampling stages: the first the field sampling and the second is the subsampling in the laboratory.

By using the following values for the parameters in the equation for the variance of the fundamental error: δM == 1.6 g/cm3 (a typically density for soil), β = 1, f = 0.5, g = 0.25 (for unsieved soils), and d = 0.2 cm (from the definition of soil), one can solve for the variance of the FE. The mass of a discrete soil sample collected in a 4-ounce container would be about 180 g. Using an example concentration for the analyte of interest (action level) of 100 ppm (mg/kg) gives a value for aL = 1 × 10–4. Thus, FE(field) = 30%.

Applying the same equation and values for the subsampling stage at the laboratory, for metals with a 1 g subsample, then FE(lab) = 400%. The overall variance of the FE is the sum of the variances of the FE for each sampling stage; . Thus the total FE = 401%.

Applying the same calculations for organic analysis where the mass of the laboratory subsample is 30 g, one obtains FE(lab) 73% = and FE(total) = 79%.

In comparison, for ISM a 2 kg field sample is typically collected. Applying the same assumptions as above, one obtains a FE(field) of 9%.

If the sample is ground in the laboratory to 60 mesh (d = 0.0251 cm) and a 1 g sample is taken for metals, the FE(lab) = 18% and FE(total) = 20%. If the sample is instead ground to 100 mesh (d = 0.0152 cm), then FE(lab) = 8% and FE(total) = 12%.

For organic analysis if the sample is not ground, then FE(lab) = 73% and FE(total) = 74%. If the sample is ground to 60 mesh, then FE(lab) = 3% and FE(total) = 10%.

One can conclude from this analysis that the conventional practice of taking discrete samples results in a large FE, as is evident from the often-observed differences in the lab results between laboratory duplicates and field duplicates. By contrast, the techniques of ISM address the factors that lead to a large FE and ultimately result in less data variability.