7.2.4 Decision Mechanism 4: Comparison to Background

Background data from an appropriate reference area are used to evaluate site data for many environmental projects. With discrete sampling, comparisons between site and background data are generally done in one of two ways: point-by-point comparison of site data to an upper bound of background conditions (e.g., UTL) or distributional comparison using hypothesis tests to determine whether the differences in the central tendency (i.e., mean or median) or upper tails are statistically significant. USEPA guidance on hypothesis testing (e.g., USEPA 2002c, 2007, 2009) was developed with discrete sampling in mind and includes the following elements:

  • Set the null hypothesis to state that the central tendency (e.g., mean) for the site distribution is statistically greater than that of the background distribution. This places the “burden of proof” on the data to show that site concentrations are not greater than background and is considered a more conservative (health-protective) approach.
  • The use of nonparametric procedures such as the Wilcoxon rank sum (WRS) test relax the assumption of normality but not the assumption of equal variance. Therefore, it is possible that a test outcome is influenced more by differences in variance than by differences in central tendency, for example. For this reason, statistical tests should be accompanied by exploratory graphical analysis (e.g., histograms) to support the overall conclusion regarding background/site comparisons.
  • Welch’s test (also called Satterthwaite’s-t or the unequal-variance t) is a modified Student’s-t test that attempts to correct for unequal variances, though it still requires the assumption of normality. Simulations suggest that results are robust to moderate deviations from normality (i.e., moderate asymmetry).
  • Both central tendency and upper tail tests should be evaluated to determine whether background and site concentrations are significantly different. A difference in either may suggest significant difference from background. The emphasis on the use of upper tail tests is that it is informative to understand whether subareas of the DU are elevated compared to background.
  • Decision errors and determinations of statistical significance are closely tied to sample size and distribution shape, as well as the specified significance level (e.g., α = 0.01, 0.05, 0.10, etc). When sample sizes are small for either data set, a formal statistical test may not be appropriate. For example, using WRS with n = 4 in both data sets and α = 0.01, one can never identify a significant difference between two populations. This principle is true no matter what the sample concentrations are, even if all four site measurements are larger than background. WRS requires at least n = 5 in a group, or a higher (less-protective) level of significance (e.g., α = 0.05 or 0.10).

As discussed in Section, ISM results are not suitable for point-by-point comparison with UTLs generated from discrete sample background data because ISM and discrete data sets have fundamentally different characteristics. If background and site data are both generated using ISM, comparisons of central tendencies (e.g., medians) can be made using hypothesis testing, but statistical power to detect differences will be low due to the limited number of replicates in most ISM data sets. Similarly, at least N = 8 observations per group is desired before using hypothesis tests to compare upper tails (e.g., quantile test). Nonetheless, hypothesis tests are not the only tool available to determine whether there are important differences between site and background distributions. Simple graphical analysis can provide useful information and serve as a semiquantitative means of comparison.

Decision Mechanism 4 example

Continuing with the example presented in Decision Mechanisms 1–3, five replicate samples are collected from a reference area unimpacted by site contamination for comparison with the site data. The reported concentrations of benzo(a)pyrene from the reference area ISM samples are 0.05, 0.10, 0.12, 0.20, and 0.40 mg/kg. The sample mean and SD of the reference area samples are 0.17 and 0.14 mg/kg, respectively. By comparison, the site sample ISM replicate results are 0.12, 0.16, and 0.26, and the mean and SD are 0.18 and 0.07 mg/kg, respectively. Therefore, the sample means are almost the same, but the SD is greater in the reference area by a factor of 2.

Figure 7-1 provides a graphical comparison of the two ISM data sets using side-by-side dot plots. For context, the action level for benzo(a)pyrene of 0.21 mg/kg is also shown on Figure 7-1. Presenting the information this way, it is clear those concentrations in the reference area exhibit greater variability and that the difference may be partly explained by the difference in sample sizes. If more ISM replicates had been collected at the site, then perhaps more extreme high and low concentrations would have also been observed.

Dot plot comparison of background (reference area) and site ISM results.

Figure 7-1. Dot plot comparison of background (reference area) and site ISM results.

Since the sample sizes are too small to evaluate a GOF to a normal distribution, a secondary line of evidence may be provided by hypothesis testing (noting the limitations in applying these tests as discussed above). For purposes of this example, a nonparametric WRS test (α = 0.05) was applied. Using a one-sided null hypothesis specified as site median less than or equal to the background median, one would not reject the null hypothesis (p = 0.33) and, therefore, conclude that the distribution on site is comparable to background. By contrast, using a one-sided null hypothesis specified as site median greater than or equal to the background median, which is consistent with USEPA guidance (e.g., USEPA 2002c, 2007, 2010a), one would again fail to reject the null (p = 0.77) and, therefore, conclude that the distribution on site is elevated with respect to background. The result is contingent on the form of the hypothesis test that is selected. Since the latter hypothesis puts the burden of proof on the data to demonstrate that the distributions are comparable, the small sample sizes from ISM data sets very often yield a conclusion that site > background even when the ranges overlap as shown in this example. Therefore, statistical significance should be interpreted with caution.