# Using R for Non-Parametric Regression

## How to Fit Non-Parametric Regressions

Non-parametric regressions (see *Non-Parametric Regression* page, Equation 8) can be computed with a set of commands similar to those of parametric regressions (see the *Parametric Regressions *page in the Helpful Links Box). In this case, generalized additive models (GAM) are used to fit nonparametric curves to the data.

First, install the GAM library into R. Type at the R prompt:

install.packages("gam")

You will then need to select a mirror site from the provided list, and the package should install automatically.

Next, make sure that you have loaded the sample biological and environmental data and merged them into a single data frame called dfmerge (see the *Download Scripts and Sample Data* page in the Helpful Links Box).

Also make sure that you have selected the taxa for which you wish to calculate environmental limits and saved them in the vector taxa.names (see the description on the *Central Tendencies* page in the Helpful Links Box).

# Load GAM library

library(gam)

modlist.gam <- as.list(rep(NA, times = length(taxa.names)))

for (i in 1:length(taxa.names)) {

# Create a logical vector is true if taxon is

# present and false if taxon is absent.

resp <- dfmerge[, taxa.names[i]] > 0

# Fit the regression model, specifying two degrees of freedom

# to the curve fit.

modlist.gam <- gam(resp ~ s(temp, df = 2), data = dfmerge,

family = "binomial")

}

To plot model results (similar to those shown in Figure 7 in the

*Non-Parametric Regression*page) run the following script.

# Specify 3 plots per page

par(mfrow = c(1,3), pty = "s")

for (i in 1:length(taxa.names)) {

# Compute mean predicted probability of occurrence

# and standard errors about this predicted probability.

predres <- predict(modlist.gam, type= "link", se.fit = T)

# Compute approximate upper and lower 90% confidence limits

up.bound.link <- predres$fit + 1.65*predres$se.fit

low.bound.link <- predres$fit - 1.65*predres$se.fit

mean.resp.link <- predres$fit

# Convert from logit transformed values to probability.

up.bound <- exp(up.bound.link)/(1+exp(up.bound.link))

low.bound <- exp(low.bound.link)/(1+exp(low.bound.link))

mean.resp <- exp(mean.resp.link)/(1+exp(mean.resp.link))

# Sort the environmental variable.

iord <- order(dfmerge$temp)

# Define bins to summarize observational data as

# probabilities of occurrence

nbin <- 20

# Define bin boundaries so each bin has approximately the same

# number of observations.

cutp <- quantile(dfmerge$temp,

probs = seq(from = 0, to = 1, length = 20))

# Compute the midpoint of each bin

cutm <- 0.5*(cutp[-1] + cutp[-nbin])

# Assign a factor to each bin

cutf <- cut(dfmerge$temp, cutp, include.lowest = T)

# Compute the mean of the presence/absence data within each bin.

vals <- tapply(dfmerge[, taxa.names[i]] > 0, cutf, mean)

# Plot binned observational data as symbols.

plot(cutm, vals, xlab = "Temperature",

ylab = "Probability of occurrence", ylim = c(0,1),

main = taxa.names[i])

# Plot mean fit as a solid line

lines(dfmerge$temp[iord], mean.resp[iord])

# Plot confidence limits as dotted lines.

lines(dfmerge$temp[iord], up.bound[iord], lty = "dotted")

lines(dfmerge$temp[iord], low.bound[iord], lty = "dotted")

}