# 4.4.1 Combining DUs

On occasion, there might be a desire to combine information from multiple DUs into a single, larger area. There are two primary explanations for when this might occur:

• A site has areas with different conceptual models in terms of expected contamination, as could happen when there is, for example, a stream channel, a meadow, and a rocky out-cropping in an area that we would like to define as an exposure unit. Each of those areas might be investigated as a separate DU for site characterization but then combined to define a single exposure unit.
• For ecological and human health risk assessment, we might need to consider a variety of sizes of DUs to accommodate multiple receptor scenarios. For example, if the area of a pocket mouse habitat is a quarter that of a muskrat, which is an eighth of that of an eagle, then we might need to sample in DUsof a size defined for pocket mice, but then combine DUs for the receptors with larger home ranges.

When these considerations are incorporated in the initial planning stages, they can be addressed by using a stratified sampling design. Within each strata, it may be appropriate to use ISM, but then one encounters the challenge of combining the ISM data from the strata into the larger DU. Conversely, this issue may also arise when ISM data are collected from multiple DUs and combined to estimate the mean in a single, larger DU. Whether preplanned or not, the same treatment of the data is appropriate.

When there are multiple samples in each strata, the overall mean of the larger DU can be estimated using the following formulae. Let ni represent the number of samples from region i, x‑bari represent the mean of the ISM samples from region i, si represent the SD of the replicate ISM samples from region i, and wi represent the weight, i.e., the relative size associated with region i. Note that if all strata are of the same size, the wi are equal, and these equations simplify to the more common calculation methods for the mean and standard deviation. The relative size is the percentage of the larger DU that is made up of region i. The weighted mean is thus:

The standard error associated with the weighted mean is:

which has degrees of freedom approximated by the Welch-Satterthwaite approximation (Cochran, 1977):

Table 4-5 provides a numerical example of this calculation is provided where data from two DUs are combined to derive a 95% UCL for a larger DU. In this example, an elementary school is divided into two DUs representing different play areas: DU1 is the kindergarten playground, and DU2 is the playground for older children. A maintenance worker has contact with both DUs, and a separate DU is constructed to reflect exposure of this worker.

Assume the concentrations of replicate results in DU1 and DU2 are as follows, based on n = 30 increments per replicate:

Table 4-5. Summary statistics used to combine DUs

Playground area

Area

(acres)
Sample statistics 95% UCL
Replicates Mean SDa Student’s-t Chebychev
DU 1 (kindergarten) 0.25 25, 100, 140 88.3 58.4 187 235
DU 2 (older child) 0.50 5, 25, 305 111.7 167.7 394 534
Equal weight 0.75 25, 100, 140, 5, 25, 305 100 113 193 301

a SD = standard deviation.

The 95% UCLs for each DU are given for both the Student’s-t and Chebyshev methods. Section 4.3.4 provides a discussion of different performance metrics for the UCL that can be used to determine which UCL method may be more likely to achieve the study objectives. Because the true mean for each DU is unknown, the RPD between the UCL and mean cannot be calculated. Figure 4-5 provides examples of the 5th, 50th, and 95th percentile RPDs of UCLs calculated with r = 3 replicates for lognormal distributions with CVs of 1 and 4 when the UCL exceeds the true mean. Recall that the CV in this context refers to the dispersion of the underlying distribution (e.g., distributions given by individual increments), not the distribution of means given by the ISM results. The mean of the ISM replicates can be assumed to approximate the mean of the underlying distribution, and the SD of the replicates can be assumed to approximate the standard error of the mean of the underlying distribution: . We can rearrange to solve for SD: . So for n = 30, we can estimate SD of the underlying distribution by multiplying the SD of the ISM results by . Therefore, the following are estimates of the SD and corresponding CV of the underlying distributions for each DU and the combination of DUs:

• CV of DU1 = SD/mean = (58.4 × 5.5)/88.3 = 3.6
• CV of DU2 = SD/mean = (167.7 × 5.5)/111.7 = 8.3
• CV of DU1 + DU2 (equally weighted) = SD/mean = (113 × 5.5)/100 = 6.2

For r = 3 replicates and CV = 4, Figure 4-5 suggests that the median RPD for both UCL methods is 90% and the 95th percentile is about 200% for Chebyshev and 150% for Student’s-t. The magnitude of the RPDs is expected to be even more pronounced for CV = 8.

As summarized in Table 4-4, the coverage of the UCLs also depends on the CV of the underlying distribution. Both DUs appear to have high CVs (i.e., >3), and the Student’s-t UCL is not expected to yield a coverage close to 95%, even if the number of replicates were increased. Therefore, Chebyshev UCL is expected to yield more reliable results (based on coverage).

If it is assumed that, on average, a maintenance worker spends equal time in DU1 and DU2, then the replicates from each DU can be weighted equally, yielding the results shown in the third row of Table 4-5. Alternatively, it may be assumed that a maintenance worker’s exposure is proportional to the respective areas of each DU and the equations from Section 4.4.1 can be used to generate summary statistics for the combined area (0.75 acres). The weighting factors applied to each DU should sum to 1.0, which is achieved by dividing each area by the sum of the two areas:

• w1 = 0.25/0.75 = 0.33
• w2 = 0.50/0.75 = 0.66

Weighted mean =

Standard error for mean (SE) =

Degrees of freedom (df) =

Student’s-t 95% UCL =

Chebyshev 95% UCL =