An official website of the United States government.

This is not the current EPA website. To navigate to the current EPA website, please go to This website is historical material reflecting the EPA website as it existed on January 19, 2021. This website is no longer updated and links to external websites and some internal pages may not work. More information »

CADDIS Volume 4

Using Taxon-Environment Relationships: Compute Biological Inferences

Compute Biological Inferences

Biological inferences based on Operational Taxonomic Units (OTUs) are computed using a maximum likelihood approach.  More information and approaches can be found on the Computing Inferences Tab (above).

Instructions for computing inferences with existing OTUs:

  1. Check that you have loaded the biological inference library and a taxon-environment coefficient file.

    To load the biological inference library, type at the R prompt:


    A file starting with coef (e.g., coef.west.wt) should have been loaded when you set up your workspace (see Step 6 on the Download Files page).

    You should also have run get.otu and saved the results in a data frame (e.g., bcnt.otu).
  2. Run the R script that generates a site-OTU matrix. Type at the R prompt:

    ss <- makess(bcnt.otu)

    This command runs the R script makess and stores the resulting site-OTU matrix in the data frame ss. The only input to the script is bcnt.otu, a benthic count file with OTUs assigned (i.e., the output from get.otu).
  3. Run the R script that computes inferences.

    Type at the R prompt:

    inferences <- mlsolve(ss, coef.west.wt)

    The script mlsolve solves the maximum likelihood inference that provides the most probable environmental conditions at the site, given biological assemblage composition at the site and given taxon-environment relationships. Two inputs to the script are required: the site-OTU matrix generated by the previous step (e.g., ss) and the taxon-environment data file (e.g., coef.west.wt).

    Inferences for each sample in the data set are stored in the data frame inferences.
  4. Interpret inference results.

    The inference data frame will have a column with the site identifier, columns for each of the inferred environmental variables, and a column labeled "Inconsistent". Sites at which Inconsistent is TRUE are sites in which the solution algorithm did not converge to a single solution. These sites typically do not have enough taxa to confidently infer environmental conditions, so inferences at these sites should be used with caution.

    Inferences provide an estimate of the environmental conditions at a site. This estimate is comparable to a direct environmental measurement and can be analyzed in a similar way to help inform a causal analysis. In particular, environmental inferences must be compared with inferences at reference sites to establish whether or not the conditions at the site have departed from natural expectations.

Top of Page