An official website of the United States government.

This is not the current EPA website. To navigate to the current EPA website, please go to www.epa.gov. This website is historical material reflecting the EPA website as it existed on January 19, 2021. This website is no longer updated and links to external websites and some internal pages may not work. More information »

CADDIS Volume 4

Using R to Compute the Area Under the ROC Curve

How to Compute the Area Under the ROC Curve

The area under the receiver operating characteristic (ROC) curve provides an indication of how accurately a model classifies sites into sites with the taxon present and sites with the taxon absent (see Assessing Model Fit page in the Helpful Links box).
 

The area under the ROC curve for each model can be computed by first imagining a pair of sites where the species of interest is present at one site but absent at the other. We would expect that the probabilities of occurrence predicted by the regression model would be greater at the site where the species is present than at the site where the species is absent. The area under the ROC is equivalent to the proportion of all such pairwise comparisons in which this condition is satisfied. The following script performs this computation.

# Define storage vector for ROC
roc <- rep(NA, times = length(taxa.names))

for (i in 1:length(taxa.names)) {
  # Compute mean predicted probability of occurrence
  predout <- predict(modlist.glm, type = "response")

  # Generate logical vector corresponding to presence/absence
  resp <- dfmerge[, taxa.names[i]] > 0

  # Divide predicted probabilities into sites where
  # species is present ("x") and sites where the species is
  # absent ("y").
  x <- predout[resp]
  y <- predout[! resp]

  # Now perform all pairwise comparisons of x vs. y
  # and store results in a matrix
  rocmat <- matrix(NA, nrow = length(x), ncol = length(y))
  for (j in 1:length(x)) {
    rocmat[j,] <- as.numeric(x[j] > y)
  }

  # Summarize all comparisons to compute area under ROC
  roc[i] <- sum(rocmat)/(length(x)*length(y))
}
names(roc)<- taxa.names
print(roc)

Top of Page