The Emerging Role of Tandem Repeats in Complex Traits
Tandem repeats (TRs), sequences that consist of adjacent, repeated segments of DNA, have emerged as key players in our understanding of genetic variation and its impact on complex traits. These repeats are categorized mainly into short tandem repeats (STRs), with units 1-6 base pairs (bp) long, and variable number tandem repeats (VNTRs), with units at least 7 bp long. With millions of loci across the human genome, TRs are subject to high mutation rates, leading to substantial genetic diversity.
Traditionally, the study of TRs has been confined to their role in monogenic disorders, diseases caused by variations in a single gene, through rare repeat expansions. Recently, however, the spotlight has shifted towards the influence of TRs on polygenic, or complex, traits. These are characteristics influenced by several genes and are common in the general population. The intriguing possibility, now supported by emerging research, is that TRs could be integral to understanding a wide range of complex traits thanks to their high polymorphism rate.
This new era of discovery positions TRs as potential causal factors for numerous traits, a hypothesis that contrasts with their previous exclusion from genome-wide association studies (GWAS) and sequencing analyses. These past studies predominantly focused on single nucleotide polymorphisms (SNPs), overlooking the rich variability offered by TRs. Yet, with TRs being highly multi-allelic and mutating rapidly, it is increasingly apparent that they could explain genetic influences on traits that SNPs cannot fully account for.
The integration of TRs into genetic studies has faced significant technical challenges, primarily due to difficulties in accurately genotyping them. Factors such as increased sequencing error rates at TR loci, the length of many TRs exceeding that of typical sequencing reads, and inconsistencies in repeat annotations have all posed obstacles. Moreover, the genotyping arrays used in many GWAS are not designed to directly genotype TRs.
However, advances in bioinformatics and sequencing technologies have begun to overcome these barriers. It is now feasible to genotype most TRs with high accuracy using short-read sequencing technologies, and for those that remain elusive, long-read sequencing is paving the way for novel discoveries. Remarkably, it is also possible to impute TR genotypes with a high degree of accuracy from SNP array data, opening up the potential to study TRs in a broader range of cohorts.
Despite these advances, the analysis of TRs in association with phenotypes introduces another layer of complexity, requiring innovative analytical frameworks. Unlike the straightforward approach applied to bi-allelic SNPs, the analysis of TRs necessitates accounting for their highly multi-allelic nature. Initial studies have explored linear associations between repeat copy number and phenotypes, as well as associations involving expansions beyond certain thresholds. While these methods have yielded important insights, there is potential for alternative strategies that accommodate the non-linear dynamics between TRs and complex traits to uncover even more associations.
As bioinformatic methods and sequencing technologies continue to evolve, the role of TRs in genetic studies is set to expand dramatically. This burgeoning field promises not only to redefine our understanding of genetic variation but also to illuminate the genetic underpinnings of complex traits in unprecedented ways.