TY - JOUR
T1 - Accessing the Variability of Multicopy Genes in Complex Genomes using Unassembled Next-Generation Sequencing Reads
T2 - The Case of Trypanosoma cruzi Multigene Families
AU - Reis-Cunha, João Luís
AU - Coqueiro-Dos-Santos, Anderson
AU - Pimenta-Carvalho, Samuel Alexandre
AU - Marques, Larissa Pinheiro
AU - Rodrigues-Luiz, Gabriela F.
AU - Baptista, Rodrigo P.
AU - de Almeida, Laila Viana
AU - Medeiros Honorato, Nathan Ravi
AU - Lobo, Francisco Pereira
AU - Fraga, Vanessa Gomes
AU - da Cunha Galvão, Lucia Maria
AU - Bueno, Lilian Lacerda
AU - Fujiwara, Ricardo Toshio
AU - Cardoso, Mariana Santos
AU - Cerqueira, Gustavo Coutinho
AU - Bartholomeu, Daniella C.
N1 - Funding Information:
This work was supported by the Brazilian Federal Agency for Support of Graduate Education (CAPES), Brazilian Council for Scientific and Technological Development (CNPq), Minas Gerais State Agency for Research and Development (FAPEMIG), National Institute for Science and Technology in Vaccines (INCTV) and Pró-reitoria de Pesquisa, Universidade Federal de Minas Gerais. D.C.B., L.L.B., and R.T.F. are CNPq research fellows.
Funding Information:
A.C.-d.-S., S.A.P.C., L.P.M., L.A.V., and M.S.C. received scholarships from CNPq and NRMH from CAPES. We also thank Michele Matos for her technical support.
Publisher Copyright:
Copyright © 2022 Reis-Cunha et al.
PY - 2022/12
Y1 - 2022/12
N2 - Repetitive elements cause assembly fragmentation in complex eukaryotic genomes, limiting the study of their variability. The genome of Trypanosoma cruzi, the parasite that causes Chagas disease, has a high repetitive content, including multigene families. Although many T. cruzi multigene families encode surface proteins that play pivotal roles in host-parasite interactions, their variability is currently underestimated, as their high repetitive content results in collapsed gene variants. To estimate sequence variability and copy number variation of multigene families, we developed a read-based approach that is independent of gene-specific read mapping and de novo assembly. This methodology was used to estimate the copy number and variability of MASP, TcMUC, and Trans-Sialidase (TS), the three largest T. cruzi multigene families, in 36 strains, including members of all six parasite discrete typing units (DTUs). We found that these three families present a specific pattern of variability and copy number among the distinct parasite DTUs. Inter-DTU hybrid strains presented a higher variability of these families, suggesting that maintaining a larger content of their members could be advantageous. In addition, in a chronic murine model and chronic Chagasic human patients, the immune response was focused on TS antigens, suggesting that targeting TS conserved sequences could be a potential avenue to improve diagnosis and vaccine design against Chagas disease. Finally, the proposed approach can be applied to study multicopy genes in any organism, opening new avenues to access sequence variability in complex genomes. IMPORTANCE Sequences that have several copies in a genome, such as multicopy-gene families, mobile elements, and microsatellites, are among the most challenging genomic segments to study. They are frequently underestimated in genome assemblies, hampering the correct assessment of these important players in genome evolution and adaptation. Here, we developed a new methodology to estimate variability and copy numbers of repetitive genomic regions and employed it to characterize the T. cruzi multigene families MASP, TcMUC, and transsialidase (TS), which are important virulence factors in this parasite. We showed that multigene families vary in sequence and content among the parasite’s lineages, whereas hybrid strains have a higher sequence variability that could be advantageous to the parasite's survivability. By identifying conserved sequences within multigene families, we showed that the mammalian host immune response toward these multigene families is usually focused on the TS multigene family. These TS conserved and immunogenic peptides can be explored in future works as diagnostic targets or vaccine candidates for Chagas disease. Finally, this methodology can be easily applied to any organism of interest, which will aid in our understanding of complex genomic regions.
AB - Repetitive elements cause assembly fragmentation in complex eukaryotic genomes, limiting the study of their variability. The genome of Trypanosoma cruzi, the parasite that causes Chagas disease, has a high repetitive content, including multigene families. Although many T. cruzi multigene families encode surface proteins that play pivotal roles in host-parasite interactions, their variability is currently underestimated, as their high repetitive content results in collapsed gene variants. To estimate sequence variability and copy number variation of multigene families, we developed a read-based approach that is independent of gene-specific read mapping and de novo assembly. This methodology was used to estimate the copy number and variability of MASP, TcMUC, and Trans-Sialidase (TS), the three largest T. cruzi multigene families, in 36 strains, including members of all six parasite discrete typing units (DTUs). We found that these three families present a specific pattern of variability and copy number among the distinct parasite DTUs. Inter-DTU hybrid strains presented a higher variability of these families, suggesting that maintaining a larger content of their members could be advantageous. In addition, in a chronic murine model and chronic Chagasic human patients, the immune response was focused on TS antigens, suggesting that targeting TS conserved sequences could be a potential avenue to improve diagnosis and vaccine design against Chagas disease. Finally, the proposed approach can be applied to study multicopy genes in any organism, opening new avenues to access sequence variability in complex genomes. IMPORTANCE Sequences that have several copies in a genome, such as multicopy-gene families, mobile elements, and microsatellites, are among the most challenging genomic segments to study. They are frequently underestimated in genome assemblies, hampering the correct assessment of these important players in genome evolution and adaptation. Here, we developed a new methodology to estimate variability and copy numbers of repetitive genomic regions and employed it to characterize the T. cruzi multigene families MASP, TcMUC, and transsialidase (TS), which are important virulence factors in this parasite. We showed that multigene families vary in sequence and content among the parasite’s lineages, whereas hybrid strains have a higher sequence variability that could be advantageous to the parasite's survivability. By identifying conserved sequences within multigene families, we showed that the mammalian host immune response toward these multigene families is usually focused on the TS multigene family. These TS conserved and immunogenic peptides can be explored in future works as diagnostic targets or vaccine candidates for Chagas disease. Finally, this methodology can be easily applied to any organism of interest, which will aid in our understanding of complex genomic regions.
KW - MASP
KW - T. cruzi
KW - antigenicity
KW - complex genomes
KW - copy number variation
KW - mucins
KW - multicopy genes
KW - transsialidases
KW - variability
UR - http://www.scopus.com/inward/record.url?scp=85144497253&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144497253&partnerID=8YFLogxK
U2 - 10.1128/mbio.02319-22
DO - 10.1128/mbio.02319-22
M3 - Article
C2 - 36264102
SN - 2161-2129
VL - 13
SP - e0231922
JO - mBio
JF - mBio
IS - 6
ER -