Cooperative Fish and Wildlife Research Units Program: New York
Education, Research and Technical Assistance for Managing Our Natural Resources

New York Project

Capture-recapture meets big data: integrating statistical classification with ecological models of species abundance and occurrence

September 2019 - April 2022


Participating Agencies

  • Powell Center

Advances in new technologies such as remote cameras, noninvasive genetics and bioacoustics provide massive quantities of electronic data. Much work has been done on automated (“machine learning”) methods of classification which produce “sample class designations” (e.g., identification of species or individuals) that are regarded as observed data in ecological models. However, these “data” are actually derived quantities (or synthetic data) and subject to various important sources of bias and error. If the derived quantities are used to make ecological determinations without consideration of these biases, those inferences which inform monitoring, conservation, and management will be flawed. We propose to develop the concept of coupled classification in which statistical classification models are linked to ecological models of species abundance or occurrence. In this new framework, classification (e.g., species identification) takes into account the local structure of populations, communities and landscapes and does not assume that where a sample is collected is independent of the class structure of the population, as all current classification methods do. The proposed work addresses a significant bottleneck in the utilization of data from new technologies for monitoring and assessment of populations and communities – the lack of formal statistical frameworks (which fully propagate uncertainty) for automatically integrating observed digital monitoring data to ecological objectives of scientific and management concern. This connection between digital data and ecological objectives has yet to be made, except as outlined in our proposal. The work is transformative because it provides a mechanism for directly integrating remotely sensed “big data” with ecological models while accounting for misclassification. With a coupled classification system there stands the possibility of fully automated data collection and processing systems.