Telomere-to-telomere assemblies of cattle and sheep Y-chromosomes uncover divergent structure and gene content – Nature Communications
In this groundbreaking study, researchers have unveiled the complete and gapless telomere-to-telomere (T2T) assemblies of cattle and sheep Y-chromosomes. Utilizing advanced sequencing technologies like Illumina short reads, Pacific Biosciences, and Oxford Nanopore Technology, the team dissected the structural architecture of these sex chromosomes. This comprehensive examination reveals new features in previously inaccessible regions and offers insights into the similarities and differences between them.
The research provides invaluable resources for understanding ruminant biology and extends to broader mammalian studies. By exploring these T2T Y-chromosomes, long-standing questions about the structure and evolution of Y-chromosomes in the Bovidae family can now be explored more thoroughly.
Complete and Haplotype-Phased Assemblies
The T2T Y-chromosome assemblies for cattle and sheep were derived from draft versions of haplotype-phased whole genome assemblies of 120-day and 100-day gestation F1 individuals. These were obtained from Wagyu×Charolais cattle and Churro×Friesian sheep breeds, respectively. The assemblies were accomplished using a combination of ONT ultra-long reads and PacBio HiFi reads to resolve the highly repetitive regions, including telomeres and centromeres. For cattle, approximately 626.4 Gbp raw ONT reads and 328.6 Gbp PacBio HiFi raw reads were used. Similarly, sheep assemblies utilized 600.8 Gbp raw ONT reads and 258.1 Gbp PacBio HiFi raw reads.
Metrics and Quality Validation
The paternal haplotype assemblies were highly contiguous, with contig N50 values of 96.68 Mb for cattle and 108.17 Mb for sheep. Completeness evaluations with Merqury indicated high k-mer survival rates, and BUSCO analysis revealed that 98.11-99.78% of the genes were complete. Visualization using Bandage showed that the Y-chromosomes were assembled into single contigs. Merqury QV scores were 62.38 for cattle and 59.95 for sheep, emphasizing the high quality of these assemblies.
Structural Features
The T2T assemblies revealed several novel features. Telomere sequences were located at the distal ends of both chromosomes, characterized by the 6-mer sequence CCCTAA and its reverse complement. The cattle Y-chromosome carried 2,447 copies on the p-arm and 3,387 on the q-arm, while sheep contained 3,211 copies on the p-arm and 2,962 on the q-arm.
The total lengths of the Y-chromosomes were significantly different: 59.4 Mb for cattle and 25.9 Mb for sheep. The centromere size in sheep was 120.76 kb, far smaller than the cattle centromere at 2.52 Mb. About half of both chromosomes were occupied by repetitive DNA, with LINE elements being the most prevalent. The sheep Y-chromosome featured more of other repetitive elements compared to cattle.
Pseudoautosomal Region (PAR) and Male-Specific Y (MSY)
The PAR on the sheep Y-chromosome was slightly longer (7,018,329 bp) compared to cattle (6,822,380 bp). The MSY region, which contains gene-rich euchromatin and gene-poor heterochromatin, was also investigated. It was found that more protein-coding genes were present on the cattle Y-chromosome (352) compared to sheep (109), but sheep had more pseudogenes (150) than cattle (79).
All previously identified mammalian PAR genes were found on both assemblies, confirming their evolutionary conservation. Specific genes such as PLCXD1 and GPR143 were pinpointed within the PAR of both chromosomes.
Ampliconic Genes
A detailed analysis of ampliconic genes, crucial for spermatogenesis and fertility, revealed significant disparities. Cattle had approximately four times more protein-coding ampliconic genes (187) compared to sheep (46). Despite more pseudogenes in sheep, ampliconic genes like TSPY demonstrated acute divergence in copy number. These genes were typically tandemly arrayed on cattle but not on sheep, suggesting different evolutionary trajectories.
Centromeric Analysis
The centromere structure varied significantly between cattle and sheep. The cattle centromere featured a 73 bp monomeric satellite repeat unit arranged into a higher-order repeat (HOR). This is a novel sequence not observed in other studies. On the other hand, the sheep centromere was composed of a complex structure of two ruminant-specific transposable elements (TEs), BOV-A2 and BovB, interspersed with spacer sequences.
The methylation patterns and CENP-A signals provided additional epigenetic support for the annotations of the centromeres. Despite attempts to trace the origins of these repeat units, no conclusive results were found, suggesting that these sequences are novel and unique to these species.
Comparison with Publicly Available Assemblies
The T2T assemblies were benchmarked against existing cattle and sheep Y-chromosome sequences available on NCBI. Significant gaps and discrepancies were identified, particularly in highly repetitive regions, highlighting the improvements offered by the T2T assemblies. The T2T assemblies account for over 15 Mb of sequence missing from the previously available cattle Y-chromosome assembly.
Conclusion
This study underscores the complexity and importance of Y-chromosome assemblies in understanding ruminant biology. The high-quality, gapless T2T assemblies of the cattle and sheep Y-chromosomes provide a new reference that will support future biological and evolutionary studies. The intricate differences and unique features unveiled in these assemblies pave the way for developing deeper insights into chromosome evolution and function in the Bovidae family.
As this field advances, it will be essential to continue refining these assemblies and exploring their functional implications, particularly in relation to fertility and genetic diversity among ruminants and other mammals.