Finite mixture of regression modeling for high-dimensional count and biomass data in ecology

Understanding how species distributions respond as a function of environmental
gradients is a key question in ecology, and will benefit from a multi-species approach.
Multi-species data are often high dimensional, in that the number of species sampled
is often large relative to the number of sites, and are commonly quantified as either
presence–absence, counts of individuals, or biomass of each species. In this paper,
we propose a novel approach to the analysis of multi-species data when the goal is
to understand how each species responds to their environment. We use a finite mixture
of regression models, grouping species into “Archetypes” according to their environmental
response, thereby significantly reducing the dimension of the regression
model. Previous research introduced such Species Archetype Models (SAMs), but only
for binary assemblage data. Here, we extend this basic framework with three key
innovations: (1) the method is expanded to handle count and biomass data, (2) we
propose grouping on the slope coefficients only, whilst the intercept terms and nuisance
parameters remain species-specific, and (3) we develop model diagnostic tools
for SAMs. By grouping on environmental responses only, the model allows for interspecies
variation in terms of overall prevalence and abundance. The application of our
expanded SAM framework data is illustrated on marine survey data and through simulation.

Document type: