An official website of the United States government.

This is not the current EPA website. To navigate to the current EPA website, please go to www.epa.gov. This website is historical material reflecting the EPA website as it existed on January 19, 2021. This website is no longer updated and links to external websites and some internal pages may not work. More information »

CADDIS Volume 4

Using R to Compute Weighted Average Inferences

How to Compute Weighted Average Inferences

To compute a weighted average inference, we first need to compute central tendencies for all taxa using the regional EMAP-West data (site.species) and then use those central tendencies to assess test sites in a data set collected from western Oregon (site.species.or). Before beginning, make sure that you have downloaded both EMAP-West data and Oregon data (see Download Scripts and Sample Data in the Helpful Links box) and merged environmental and biological data.
 

Next, identify and save the names of taxa that are found in both data sets.

# Compare taxa names in tolerance value and assessment data.
# Make sure all taxa names are in capital letters only
names.tv <- toupper(names(site.species)[-1])
names.assess <- toupper(names(site.species.or)[-1])

# Combine taxa names from both datasets in one vector
# and then find taxanames that are repeated
names.all <- c(names.tv, names.assess)
names.match <- names.all[duplicated(names.all)]

print("Taxa in both databases")
print(sort(names.match))
    


To apply assessment tools, we need to compute central tendencies for as many taxa as possible. To do this, expand the list of taxa to include all taxa that occur in at least 20 sites in the EMAP-West data set. (The 20 site limit is imposed to avoid overfitting a model to a rare taxon.)


# Get names of all taxa in the data set
taxa.names.init <- names(site.species)[-1]

# Compute the number of occurrence of each taxon
getocc <- function(x) sum(x>0)
numocc <- apply(site.species[, taxa.names.init], 2, getocc)

# Save all taxa names that occur in at least 20 sites
taxa.names <- taxa.names.init[numocc >=  20]
    
Now, recompute central tendencies for the expanded list of taxa by running the central tendencies script again(see Central Tendencies in the Helpful Links box).  Make sure you run the script for all taxon names identified above. Depending on the number of taxa selected, this may take some time.
 

Continuous tolerance values (e.g., weighted averages) can be classified into tolerance categories, but it is preferable to use them in conjunction with a mean tolerance value metric.

Mean tolerance values are the best metric to use in conjunction with continuous-valued tolerance values such as weighted averages or optima. The following script assumes that weighted averages have been computed for all taxa listed in names.match. Other tolerance values can be substituted into the third line of code as desired.

# Only select taxa for which tolerance values
#   have been computed. 
mat1 <- as.matrix(dfmerge.or[, names.match])
            
# First get total abundance
tot.abn <- apply(mat1, 1, sum)        
            
# Use matrix multiplication to compute the sum of all 
# observed tolerance values, and then divide by total 
# abundance to get the mean tolerance value.
mean.tv <- (mat1 %*% WA[names.match])/tot.abn        

plot(dfmerge.or$temp, mean.tv, xlab = "Temperature", 
     ylab = "Mean tolerance value")

Top of Page