TITLE: Making abundance matrices DATE: 2020-10-31 AUTHOR: John L. Godlee ==================================================================== There are lots of R packages to generate species by site abundance matrices from a long-format dataframe of records. For example, labdsv::matrify() takes a matrix like this: Site Species Abundance ------ ---------------- ----------- A Quercus robur 10 B Quercus robur 2 B Betula pendula 30 ... ... ... This method relies on already having the data summarised, but what if each row was a record, as would be the case if you had raw tree diameter measurements, rather than merely a count of abundance: Site Species DBH ------ ---------------- ------ A Quercus robur 15.6 A Quercus robur 5.4 A Betula pendula 11.0 ... ... ... It wouldn't be hard to turn this into a summary table with some dplyr: count(dat, Site, Species) Additionally, what if individuals vary according sampling effort, for example if species less than 10 cm DBH were only measured in a 20x10 m box within a large 20x50 m plot: Site Species DBH FPC ------ ---------------- ------ ----- A Quercus robur 15.6 1 A Quercus robur 5.4 0.2 A Betula pendula 11.0 1 ... ... ... ... Or if the measure of abundance isn't individual presence, but the canopy cover of the individual: Site Species DBH Cover ------ ---------------- ------ ------- A Quercus robur 15.6 2.53 A Quercus robur 5.4 1.01 A Betula pendula 11.0 2.40 ... ... ... ... Then it becomes much harder to create one of these matrices. Wouldn't it be nice to have a base R function to create species by site abundance matrices, which can deal with sampling effort, alternative methods of abundance, and unsummarised data. #' Generate a species by site abundance matrix #' #' @param x dataframe of individual records #' @param site_id column name string of site IDs #' @param species_id column name string of species names #' @param fpc optional column name string of sampling weights of each record, #' between 0 and 1 #' @param abundance optional column name string with an alternative abundance #' measure such as biomass, canopy cover, body length #' #' @return dataframe of species abundances (columns) per site (rows) #' #' @examples #' x <- data.frame(site_id = rep(c("A", "B", "C"), each = 3), #' species_id = sample(c("a", "b", "c", "d"), 9, replace = TRUE), #' fpc = rep(c(0.5, 0.6, 1), each = 3), #' abundance = seq(1:9)) #' abMat(x, "site_id", "species_id") #' abMat(x, "site_id", "species_id", "fpc") #' abMat(x, "site_id", "species_id", "fpc", "abundance") #' #' @export #' abMat <- function(x, site_id, species_id, fpc = NULL, abundance = NULL) { # If no fpc or abundance, make 1 if (is.null(fpc)) { x$fpc <- 1 } else { x$fpc <- x[[fpc]] } if (is.null(abundance)) { x$abundance <- 1 } else { x$abundance <- x[[abundance]] } # Get all species and sites species <- unique(x[[species_id]]) sites <- unique(x[[site_id]]) # Create empty species by site matrix comm <- matrix(0, nrow = length(sites), ncol = length(species)) # Fill matrix for (i in seq(length(sites))) { for(j in seq(length(species))) { abu <- x[x[[site_id]] == sites[i] & x[[species_id]] == species[j], c(site_id, species_id, "fpc", "abundance")] comm[i,j] <- sum(1 * abu$abundance / abu$fpc, na.rm = TRUE) } } # Make tidy with names comm <- data.frame(comm) names(comm) <- species row.names(comm) <- sites return(comm) }