Enable Dark Mode!
Exploring complexity in biodiversity: Unveiling large-scale insights with synthetic data

By Gabriel Muñoz, a PhD student at Concordia University

Life on Earth thrives through its remarkable diversity of species. Different species combinations across the globe shape unique ecosystems, while interactions among them and their environment regulate nutrient flow and genetic exchange. During the past centuries, scientists around the globe have independently collected data on biodiversity, most of which is now found in stored as museum specimens and text in scientific publications. Recent efforts have consolidated much of this data into comprehensive and digitally available datasets (Figure 1). Globally aggregated data allows researchers to uncover large biodiversity patterns and investigate the drivers behind them.  

Figure 1: Global Biodiversity Information Facility (GBIF), a repository of global data on species occurrences. https://www.gbif.org/

Understanding community assembly, the process driving biodiversity patterns, requires more than mere species counts. Comprehensive knowledge of species traits and species interactions is critical. Despite the efforts in data collection and synthesis, significant knowledge gaps remain. Artificial intelligence can offer a powerful tool to generate insights, by leveraging the structure and relationships of observed data to fill knowledge gaps. 

As part of my PhD thesis, I used machine learning models trained with data collected from literature and museological records, to predict synthetic variables representing species’ multitrophic traits (Figure 2)  and cross trophic  interactions. These synthetic variables are numerical variables that capture observed as well as un-observed ecological relationships, allowing a bridge between knowledge gaps and facilitating ecological inferences at large scales from sparse observations. Specifically, I harnessed global datasets on palms and mammals to train random forest models, enabling the prediction of multitrophic traits. Additionally, I employed neural network models to predict species interactions in the neotropics, allowing for the generation of synthetic datasets at a continental scale. By integrating these modeled data with maps detailing species’ geographic ranges, I explored the differences in functional diversity across trophic levels and constructed probabilistic networks for any given region of the Neotropics. Our results can inform conservation efforts and help to understand the potential consequences of global climate change on the structure of seed dispersal networks. Last year, in Vancouver, I had the privilege of presenting my research at the International Biogeography Meeting (IBS), thanks to the support of the QCBS Excellence Award. 

Figure 2: Synthetic data representing the trait-matching space of plant-frugivore interactions between Neotropical palms and mammals. (Left) Each point represents a palm (green) or mammal (brown) in the Neotropics. Observed data represents data from available datasets, we used random forests to impute species with no data while preserving the structure of the trait variation in the species pool. Species are embedded in a multitrophic trait space, which trait variation is represented as the vectors at the right panel. Munoz et al., 2023. Unpublished thesis. 

Leveraging AI and synthetic data, we can embark on an era of ecological discovery, striving for a comprehensive understanding of Earth’s biodiversity and working towards a sustainable future. However, it is important that we must not leave data collection aside, as for now there are few only taxa with enough data to train models at a global scale, and huge geographical biases are present in data completion between northern and southern hemispheres. Globally inclusive collaborative efforts among scientists, natural historians, and data experts will pave the way for a deeper understanding of the intricate dynamics that govern all of Earth’s ecosystems. 

Figure 3: Dalle-3 rendered impression of the following prompt, “a hyperrealistic figure  representing humanity leveraging AI and synthetic data to embark on an era of ecological discovery”.

About the author: Gabriel Muñoz is a PhD candidate working in the Community Ecology and Biogeography Lab at Concordia University.

Post date: January 26, 2024


Submit a Comment

Your email address will not be published. Required fields are marked *