1. Identifying the genes that contribute to the species diversification and adaptation
Horizontal gene transfer (HGT) is an important evolutionary force shaping prokaryotic and eukaryotic genomes. HGT-acquired genes have been sporadically reported in insects, a lineage containing >50% of animals. We systematically examined HGT in 218 high-quality genomes of diverse insects and found that they acquired 1,410 genes exhibiting diverse, including many not previously reported, functions via 741 distinct transfers from non-metazoan (mostly bacterial) donors. Lepidopterans had the highest average number of HGT-acquired genes (16 genes per species). HGT-acquired genes containing introns exhibited substantially higher expression levels than genes lacking introns, suggesting that intron gains were likely involved in HGT adaptation (Li et al. 2022, Cell).
2. Dissecting factors that contribute to the phylogenetic incongruence
Phylogenetic trees are essential for understanding how genes, genomes, organisms, and species evolve, yet their reliability and robustness still receive challenges across different methods and studies, even when genome-scale data are adopted. After tackling these challenges, we found the causes of incongruences are:
1) A tiny subsets of very large data matrices can exhibit a disproportionate amount of influence and drive the resolution of contentious branches through quantifying their distribution of phylogenetic signal among genes and sites from phylogenomic data (Shen et al. 2017, Nat. Ecol. Evol.).
2) Low phylogenetic informativeness (e.g., low percentage of parsimony-informative sites and low branch support values), processor type, and multithreading can contribute to a considerable fraction of irreproducible single-gene ML trees, even when two replicates ran with exactly the same parameters including sequence alignment, program, substitution model, random starting seed number, number of tree search, and log-likelihood epsilon value. (Shen et al. 2018, Nature Comm.).
3. Inferring species phylogeny with genome-scale data
With the advances of genomic tools and affordable high throughput sequencing, phylogenomic analyses based on data matrices comprised of hundreds to thousands of genes from dozens of taxa have witnessed brisk progresses in resolving the tree of life such as placement of lungfishes (Liang et al. 2013, Mol. Biol. Evol.), phylogenomics of Ascomycota (Shen et al. 2020, Science Advances). More recently, one of my projects is involved in the Y1000+ Project (http://y1000plus.org) that focuses on sequencing the genomes of the entire subphylum Saccharomycotina of yeasts (>1,000) and understanding the evolution of their biodiversity (Shen et al. 2018, Cell).