Chromosome-level Genome Assembly of Cryptosporidium parvum by Long-read Sequencing of Ten Oocysts: Scientific Data
Cryptosporidium parvum, a zoonotic parasite that affects the intestines, poses a significant threat to both human and animal health. A notable challenge in studying this parasite has been the difficulty of obtaining substantial numbers of oocysts for genome sequencing, given the limitations of in vitro culture methods. To overcome this obstacle, researchers have employed a novel approach: whole-genome amplification of merely 10 oocysts, followed by long-read sequencing. This strategy has yielded a high-quality genome assembly of the C. parvum IIdA19G1 subtype, which was isolated from a pre-weaning calf experiencing diarrhea.
The resultant assembled genome measures 9.13 megabases (Mb) and consists of eight chromosomes. Impressively, six of them are capped with telomeric sequences at one or both ends. The study predicts a total of 3,915 protein-coding genes, demonstrating a high level of completeness with 98.2% single-copy BUSCO genes. To date, this represents the first chromosome-level genome assembly of C. parvum, achieved through the combined use of whole-genome amplification of 10 oocysts and long-read sequencing techniques. This accomplishment not only enhances our understanding of the genetic architecture of this zoonotic intestinal parasite but also provides invaluable resources for comparative and evolutionary genomic studies within the Cryptosporidium clade.
Cryptosporidium species, parasitic apicomplexans, are known to cause moderate-to-severe diarrhea in both humans and animals. Given the lack of effective medications and an available vaccine, managing cryptosporidiosis heavily relies on infection prevention measures, underscoring the need for innovative interventions. These parasites have been detected in 155 mammalian species, including primates, with at least 44 distinct Cryptosporidium species identified. Among these, Cryptosporidium parvum, Cryptosporidium ubiquitum, and Cryptosporidium muris, in particular, exhibit broad host ranges, resulting in zoonotic infections.
Whole-genome sequencing (WGS) and comparative genomic analysis have provided crucial insights into the genetic foundations responsible for host range variations among different Cryptosporidium species, as well as the host adaptation processes within each species. The advent of next-generation sequencing (NGS) technologies has brought whole-genome sequencing to the forefront of Cryptosporidium characterization efforts. To date, 15 species have undergone genome sequencing, including C. parvum, Cryptosporidium hominis, C. ubiquitum, Cryptosporidium meleagridis, and others. Of these, a significant proportion pertains to the zoonotic C. parvum, yet only two sequences have been fully annotated.
The initial comprehensive genome assembly for C. parvum Iowa II was made public in 2004, utilizing a random shotgun sequencing technique that yielded 9.1 Mb of DNA sequences distributed across eight chromosomes. Previous studies have estimated genetic divergence between C. parvum and C. hominis to be approximately 3%-5% at the DNA level.
A major hurdle in Cryptosporidium genomics research has been the limited availability of sufficiently purified oocysts for NGS analysis, partly due to the absence of an efficient in vitro culture system. Previous genomic analyses have often relied on oocysts sourced from infected laboratory animals. Pioneering work by Troell et al. demonstrated the feasibility of obtaining high-quality genetic data from single-celled eukaryotes, utilizing single-oocyst genome sequencing followed by comprehensive comparative genome analysis.
The research discussed here addresses previous limitations by generating a reference genome for C. parvum using long-read sequencing data from Oxford nanopore technology (ONT) and PacBio high fidelity (HiFi) sequencing platforms. Short-read data were used for error correction. The assembled genome, measuring 9.13 Mb, displays a high level of completeness, including 98.2% single-copy BUSCO genes and a prediction of 3,915 protein-coding genes, 93.6% of which were functionally annotated.
This study represents a significant achievement in generating a high-quality chromosome-level genome assembly of Cryptosporidium species through the amplification of merely ten oocysts, combined with long-read sequencing technology. This innovative approach could serve as an effective model for genome sequencing projects of other elusive or uncultivable pathogens.