# 4.1.2 Spatial Scale, Mixtures, and Autocorrelation

It is important to recognize that the extent of heterogeneity can vary depending on how DUs are defined. In fact, one way to manage the difficulty of estimating the mean when greater heterogeneity is present is to designate DUs based on anticipated concentrations, defining DUs in such a way as to minimize the concentration variability within each. Other approaches for creating DUs, such as designating DUs according to anticipated exposure patterns (i.e., to correspond with exposure units), could result in greater heterogeneity within the DUs but may be appropriate for risk assessment.

Sampling designs, including designations of sampling units and decisions units, may need to accommodate multiple contaminants with different spatial patterns.Heterogeneity may be different between contaminants being characterized within the same DU. Different sources or release mechanisms, as well as different transport mechanisms, can lead to differing degrees of heterogeneity among chemicals that need to be addressed through a single sampling plan. This fact can complicate decisions regarding the appropriate sampling approach. In general, the sampling strategy must be designed to accommodate the contaminant expected to have the greatest heterogeneity in order for good estimates of the mean to be obtained for all contaminants of interest.

For both discrete and ISM approaches, random sampling yields unbiased parameter estimates, even when a contaminant exhibits high spatial autocorrelation.Yet another potential complicating factor is spatial relationships. For most sites, contaminants in soil exhibit some degree of positive spatial autocorrelation, meaning that the variance in the concentration reduces as the distance between sample locations decreases. It is well established that strong autocorrelation can reduce the effective statistical sample size of a data set (i.e., number of samples needed to achieve acceptable decision errors) because each sample provides some redundant information (Cressie 1993). In statistical terms, this redundancy violates the assumption that observations are independent. ISM confidence intervals generated from sampling of a site with high spatial autocorrelation can be too narrow, resulting in a higher frequency of decision errors. Spatial autocorrelation may also introduce bias in estimates of the mean and variance (and corresponding calculations of confidence intervals), depending on the sampling protocol. Random sampling strategies yield unbiased parameter estimates, whereas sampling that is targeted towards areas of suspected high or low concentrations can introduce redundancies that result in inaccurate calculations of confidence intervals and inaccurate estimation of decision errors. For targeted (nonrandom) sampling, the direction of the bias is generally towards overestimation of the mean since suspected source areas may be intentionally oversampled relative to the rest of the site. Nonrandom sampling of sites where contaminants exhibit positive spatial autocorrelation is an issue that applies to discrete as well as ISM sampling. With discrete sampling, spatial weighting methods are sometimes used to reduce the sampling bias. For ISM, spatial weighting methods do not apply since no information is retained from the individual increments collected throughout the DU. Nevertheless, since most ISM sampling protocols incorporate some variation of random sampling and a relatively large number of increments (i.e., n ≥ 30), spatial autocorrelation is unlikely to impact the statistical performance metrics of ISM (Section 4.3). See Appendix A.3 for an example and additional discussion of this factor.