Unveiling the Intricacies of Bird Evolution with Advanced Bioinformatics
In the ever-evolving field of evolutionary biology, understanding the relationships and historical trajectories of species is paramount. This quest for knowledge has recently brought breathtaking insights into the avian world, courtesy of advanced computational tools and collaborative scientific efforts. A remarkable study published in Nature delves deep into the complexity of bird evolution, showcasing the power of bioinformatics in shedding light on this intricate web of life.
The foundation for this groundbreaking research was laid a decade ago, with a significant article in the Science journal highlighting the bird tree of life. Back then, the emphasis was on the crucial role of algorithms and supercomputers in modern research on evolutionary biology across all types of living beings. Fast forward to today, and the advancements in bioinformatics tools have taken a giant leap, enabling an international team of researchers to analyze 363 bird species. This analysis was conducted under the ambit of the “Bird 10,000 Genomes Project” (B10K), where scientists utilized intergenic regions of genomes and a comprehensive array of computational methods.
The outcome is a well-supported avian phylogenetic tree teeming with unexpected discordance, a testament to the complexity of avian evolution. The sheer amount of data needed to resolve these discrepancies speaks volumes about the diversity of species sampled, the phylogenetic methods employed, and the genomic regions chosen for study. A notable contribution to processing this immense data comes from the Computational Molecular Evolution group (CME) at the Heidelberg Institute for Theoretical Studies (HITS) and their collaborators from the sister group, the Biodiversity Computing Group (BCG) at the Institute of Computer Science (ICS) of the Foundation for Research and Technology Hellas (FORTH) in Heraklion, Greece.
Empowering Evolutionary Biology Research
One of the lead authors of the study, Josefin Stiller, highlighted the significance of the new computational approaches, stating, “The new computational approaches allowed us to reconstruct over 150,000 local phylogenies across the whole genome, each of which provides a small window into the evolutionary history of birds.” This achievement underscores the critical role of software, algorithms, and model development in facilitating evolutionary biology research.
Alexandros Stamatakis, the CME group leader and an EU-funded ERA chair at FORTH, shared insights into the tools central to their research. “The ParGenes software, for example, which is very central for the paper, efficiently schedules the inference of a huge number of per-gene phylogenetic trees on distinct input gene datasets on a large compute cluster.” Stamatakis emphasized the fundamental computer science aspect focused on efficient job scheduling.
ParGenes leverages RAxML-NG for phylogenetic inference and Modeltest-NG for selecting the best fit statistical model of evolution. The “NG” stands for Next Generation, signifying a complete overhaul of existing tools to enhance maintainability, versatility, and scalability. RAxML-NG’s seamless scalability from laptops to supercomputers makes it especially flexible and was used independently in this study to infer trees from entire genomes on a supercomputer.
The Predictive Power of “Pythia”
An innovative addition to this research was the “Pythia” difficulty prediction tool. Developed by Julia Haag, a PhD student in Stamatakis’s group, Pythia uses machine learning to assess the phylogenetic difficulty of a dataset. “It predicts how much signal for a single tree there is in the data,” explains Stamatakis. This tool proved instrumental in providing phylogenetic difficulty scores for the diverse genomic regions of the bird genome analyzed in the study.
A Toolbox for Life Sciences Research
The open-source tools developed by the CME group, including the highly cited RAxML-NG, are empowering researchers across the life sciences. Their versatility was particularly highlighted during the pandemic when RAxML-NG was used to study the evolution of various viral strains. Stamatakis finds great satisfaction in providing a basic toolbox that enables scientists to conduct their research effectively. “I personally find this very gratifying,” he concludes, underscoring the profound impact of these bioinformatics tools on the field of evolutionary biology and beyond.
In the intricate dance of avian evolution, where each species reveals a unique step, bioinformatics tools like those developed by the CME group and their collaborators are indispensable. They not only provide a deeper understanding of the evolutionary history but also pave the way for future discoveries in the vast, uncharted territories of the tree of life.