The Open Science Revolution in Complex Disease Genetics: An Integrated Pipeline from FASTQ to GWAS and Functional Pleiotropy

Kaira Cristina Peralis Tomaz¹, Felipe Ciamponi², Rafaela Pacheco¹, Jennifer Santos¹, Mariana Cavalheiro³, Fabio Patroni1,4, Julio Vancini Bernardi1,5, Murilo Meneghetti¹ , Lorena6, Alexandre Rossi Paschoal6, Marcelo Mendes Brandão1*.

1 – Integrative and System Biology Laboratory (LaBIS); Universidade Estadual de Campinas (Unicamp), Campinas, Brazil; 2 – Suzano S.A. (FuturaGene—Biotech Division), Itapetininga, Brazil; 3 – Genomics for Climate Change Research Center, Universidade Estadual de Campinas, Campinas, SP, Brasil; 4 - Brazilian Centre for Research in Energy and Materials (CNPEM), Ministry for Science, Technology, and Innovations (MCTI), Campinas, Brazil; 5 - Laboratory of Enzymology and Molecular Biology of Microorganisms (LEBIMO), Universidade Estadual de Campinas, Campinas, SP, Brasil; 6 - Department of Computer Science, The Federal University of Technology – Paraná (UTFPR)The institution will open in a new tab, Cornélio Procópio, Brazil

* Correspondence author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Polygenic diseases, such as Alzheimer's disease (AD) and type 2 diabetes (T2D), present complex challenges in medical genetics due to their non-Mendelian inheritance patterns, which involve multiple alleles and environmental factors. The global incidence of AD is increasing, and diabetes is contributing to a growing healthcare burden. Recently, it has been noted that cognitive dysfunction is a significant comorbidity of diabetes, indicating a potential link between AD and diabetes and suggesting possible genetic connections. Advances in identifying specific genetic variants and understanding their interactions are paving the way for personalised medicine and thereby enhancing treatment effectiveness.
Bioinformatics analysis of genomic data offers valuable insights into the genetic foundations of Alzheimer's disease (AD) and diabetes, facilitating the development of targeted interventions. The GWAS (Genomewide Association Study) approach is an essential and well-established tool in bioinformatics for analysing genetic associations with phenotypes. However, despite the abundance of publicly available data on the internet, bioinformatics analyses remain a bottleneck in conducting biological studies.
Despite advances in genomics, bioinformatics workflows remain fragmented, limiting translational insights. Here, we present a fully open-source pipeline that streamlines polygenic disease analysis from raw sequencing data (FASTQ) to functional annotation. This pipeline solves important problems in making research repeatable and able to grow by combining tested methods for finding genetic variants, conducting GWAS, analysing pleiotropy, and adding regulatory information.
We also demonstrate how standardised, ethically curated datasets enable robust analyses of shared genetic mechanisms by leveraging the NIH’s Database of Genotypes and Phenotypes (dbGaP), a foundational open-science repository for genotype-phenotype studies. dbGaP’s dual-access model (open metadata vs. controlled individual-level data) allowed us to harmonise diverse cohorts while adhering to ethical guidelines, exemplifying how open data infrastructures can accelerate discoveries in comorbidities like AD-T2D.
Our pipeline's modular design enables researchers to bypass costly data generation phases and focus on hypothesis-driven exploration, democratising access to high-impact genomics. By aligning with open-science principles, this work mirrors transformative initiatives like the Human Genome Project, where shared data and tools spurred global collaboration. The integration of dbGaP datasets highlights the untapped potential of public repositories to fuel large-scale, reproducible studies — particularly in under-resourced settings. This system pushes forward genetic research and emphasises the need for open-source, community-focused science in genomics, encouraging collaboration across different fields and revealing the connections that contribute to complex disease studies.

Supplementary material

All supplementary material will be available at http://redu.unicamp.br

DeNSAS: De Novo Sequence Annotation System and the Critical Role of Functional Annotation in Modern Scientific Discovery

Marcelo Mendes Brandão1
1 Universidade Estadual de Campinas (UNICAMP), Campinas, SP, Brazil
This email address is being protected from spambots. You need JavaScript enabled to view it.

This article examines the fundamental importance of functional annotation in contemporary scientific research within the context of increasingly affordable molecular sequencing technologies and improved information accessibility. I explore how functional annotation serves as the critical bridge between raw sequence data and meaningful biological insights, driving progress across multiple scientific disciplines and applications. The integration of comprehensive functional annotation approaches is revealed as essential for maximizing the value of expand-ing molecular datasets and accelerating scientific discovery in the post-genomic era. In this context I present DeNSAS (De Novo Sequence Annotation System), a comprehensive, reference-free pipeline designed for functional annotation of transcripts, proteins, and genes in genome assemblies that lack reference annotation. Developed in-house, this automated system integrates multiple public databases and computational tools to provide accurate functional characterization of sequences without requiring a reference genome.

 

Pathogenic bacteria in the aquatic environment surrounding pig farms and cities: a microbiome analysis from Paraná and Santa Catarina, Brazil

Micael Siegert Schimmunecha, Carolina Deuttner Neumann Barrosoa, Anderson Ferreira Da Cunhab, Marcelo Mendes Brandãoc, Daniel Cruzd , Karina Ishidad, Marcelo Beltrão Molentoa,*
a Laboratory of Veterinary Clinical Parasitology, Department of Veterinary Medicine, Federal University of Paraná, Curitiba, PR, Brazil. This email address is being protected from spambots. You need JavaScript enabled to view it. 
b Laboratory of Biochemistry and Applied Genetics, Department of Genetics and Evolution, Federal University of São Carlos, São Carlos, Brazil.
c Laboratory of Integrative and Systemic Biology, Center of Molecular Biology and Genetic Engineering, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil.
d World Animal Protection. São Paulo, SP, Brazil.

Pig farming has a significant environmental impact and generates substantial waste. Conventional wastewater treatment on farms often fails to eliminate pathogens, posing transmission risks. This study examined the microbiome of water samples (n=10) from pig farm environments (n=6) and urban areas (n=4) in Brazil. Pig farm samples were from Itambaracá, Paraná, and Chapecó, Santa Catarina, while urban samples were from Curitiba, Paraná, Joinville, and Santa Catarina. Samples included two from before (UP) and after (DOWN) pig farm presence, and two from a city river (CITY) near pig farms. Urban samples were taken from river sources (n=2) and near metropolitan perimeters (n=2). Proteobacteria dominated 90% of samples, with only one sample (P6) showing Firmicutes as predominant. Two pre-DNA extraction techniques were compared: a commercial kit and membrane vacuum filtration. The commercial kit detected E. coli in all samples and P. aeruginosa in 60%. The vacuum filtration technique found Burkholderiales (64%), Burkholderiaceae (81%), and Acidovorax (40%) to be predominant. The commercial kit showed higher Proteobacteria prevalence (>82%) and 100% Gammaproteobacteria. E. coli O157:H7 and Shigella flexneri 2a str. 301 were detected in all commercial kit samples and one vacuum filtration sample. Urban river source samples exhibited high bacterial diversity (615 genera, 736 species in P1; 640 genera, 802 species in P5) with a predominance of Alphaproteobacteria (52%). Samples from the rivers' final courses had reduced diversity (270 genera, 386 species in P2; 25 genera, 24 species in P6) and Gammaproteobacteria predominance. The comparison between the points before (UP) and after (DOWN) did not result in a noticeable correlation of influence from pig farming on the alteration of microbial composition in the samples. Our results highlight the presence of pathogenic bacteria in the aquatic environment surrounding pig farms and emphasize the potential threat to human, animal, and environmental health.

 

Differential gene expression toward species of Aristolochia impairing the performance of the Troidini butterfly Battus polydamas

Karina L. Silva-Brandão1, Julia Cabral Teresa2, Clécio Fernando Klitzke, Marcelo M. Brandão3, José Roberto Trigo2

1 Leibniz Institute for the Analysis of Biodiversity Change, Museum of Nature Hamburg. Martin-Luther-King-Platz 3, 20421 Hamburg, Germany. Email: This email address is being protected from spambots. You need JavaScript enabled to view it.
2 Departamento de Biologia Animal, Instituto de Biologia, Universidade Estadual de Campinas. Rua Monteiro Lobato 255, Campinas, SP, Brazil.
3 Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas. Av. Cândido Rondon, 400, Campinas, SP, Brazil.

The neotropical swallowtail butterfly Battus polydamas is a specialist on Aristolochia (Aristochiaceae). These plants are rich in natural products such as terpenoids, lignans, β-phenylethylamines (βPEA), aporphine and isoquinoline alkaloids, as well as aristolochic acids (AAs). Larvae of B. polydamas sequester some of these compounds, such as AAs, and transfer them to adults through the pupae. AAs are considered defensive compounds against natural enemies, however, the amount of AA in the larvae's diet has an effect on their performance, which may mean a cost to eating on AA-containing leaves. In the present study we evaluated the performance of B. polydamas larvae fed from 1st instar through pupation on two host plants with different chemistry composition, A. ringens (which has several diterpenes) and A. gigantea (which has acyclic monoterpenoids and sesquiterpenoids, but no diterpenoids or AAs). Differential gene expression as response to different larval host plants was evaluated in three biological replications of gut and fat body tissues of six 5th instar larvae. We found significant differences in the survival of larvae feeding on the two host plants; the survival in A. gigantea being significantly higher than survival in A. ringens (GLM binomial, likelihood ratio test, df = 1, χ2 = 76.082, P < 0.001). In A. gigantea, 55% of the larvae persisted until pupation, while none of the larva feeding on A. ringens survived. 807 unique contigs identified by their molecular function were upregulated in the gut of larvae fed on A. ringens, while 298 were downregulated. Down-regulated contigs include genes encoding for ribosomal proteins, superoxide dismutase, P450s, UGTs, glutathione S-transferase and many proteases. Upregulated contigs comprise genes encoding for ribosomal proteins, protein farnesyltransferase, Phosphomevalonate kinase, Dolichyl-phosphate-mannose-protein mannosyltransferase 4 and O-glucosyltransferase (possibly involved in AAs metabolization). As expected, larvae of B. polydamas were strongly influenced by host plants exhibiting different concentrations of AAs, with higher concentrations leading to worse larval performance on key fitness components, such as life cycle performance attributes and larval survival. We suggest that there is a threshold of AA concentration in the host plant that larvae can tolerate, and above such a threshold the impact of plant secondary chemicals is no longer beneficial for the larvae, but negative, disrupting their detoxification mechanism.

Permanent link - https://url.bioinfoguy.net/battus2024

Data availability

It is necessary to properly cite the data repository (https://doi.org/10.25824/redu/A3GVHV) if you choose to utilise any of the data, script, or information provided in these files.

Comparison of bacterial diversity in wet- and dry-aged beef using traditional microbiology and next generation sequencing

Luiz Gustavo de Matosa, Anderson Clayton da Silva Abreua, Vanessa Pereira Perez Alonsoe, Juliano Leonel Gonçalvesa, Maristela da Silva do Nascimentob, Sérgio Bertelli Pflanzer Jrb, Jonatã Henrique Rezende-de-Souzab, Chiara Ginic, Natália Faraj Muradd, Marcelo Mendes Brandãod, Nathália Cristina Cirone Silvaa

aDepartment of Food Science and Nutrition, School of Food Engineering (FEA), Universidade Estadual de Campinas (UNICAMP), 13083-862, Campinas, Sao Paulo, Brazil
bDepartment of Food Engineering and Technology, School of Food Engineering (FEA), Universidade Estadual de Campinas (UNICAMP), 13083-862, Campinas, Sao Paulo, Brazil
cDepartment of Veterinary Medicine, Università degli Studi di Milano, Lodi, Lombardia, Italy
dLaboratory of Systemic and Integrative Biology, Center of Molecular Biology and Genetic Engineering, Universidade Estadual de Campinas, Campinas, São Paulo, Brazil
eIndependent researcher, Prague, Czechia

https://doi.org/10.1016/j.microb.2024.100035

The science behind dry-aged meat has increased during the last few years. Differently from wet-aging, where meat is vacuum packed, the dry-aging process happens without packaging or protection, which may change the bacterial diversity of the meat, and this change can alter the sensory characteristics of the meat. Different methods are used to identify the microbial of meat. The most used ones are traditional techniques and the Next Generation Sequencing (NGS), widely used to identify bacteria present in diverse types of food. The aim of this study was to evaluate the bacterial diversity of dry-aged and wet-aged beef by traditional microbiological tests and NGS to compare the bacterial diversity given by those different methodologies, as well as compare their specificity. Beef strip loins (n = 6) were collected directly from the slaughterhouse and transported to the laboratory. Samples were dry or wet-aged for 20 and 34 days. Before and after aging, samples were analyzed by Traditional microbiological analysis and NGS. It was observed, with traditional microbiology tests, a greater increase of total bacterial count in the wet-aged samples from 0 to 20 and 34 days, with psychotropic bacteria having the greatest increase. In the dry-aged samples there was a decrease in the total bacterial count, with only molds and yeast significant growth during aging. No E. coli growth was observed for any treatment. From metagenomics analysis, eleven main bacterial genera were detected in the meat microbiota, with a relative abundance higher than 2%, and the seven most abundant ones were Carnobacterium (47.9%), Pseudomonas (22.2%), Lactobacillus (5.4%), Romboutsia (2.8%), Leuconostoc (2.5%), Candidatus Nitrosotalea (2.4%) and Akkermansia (2.3%). Alpha diversity showed a higher richness on the non-aged samples, whereas wet-aged samples showed the smallest richness, the same for the samples aged for 34 days. In addition, beta diversity showed that the microorganisms are highly related when considering time, but different clustering when comparing aging types. Further, dry-aged beef showed a higher presence of Pseudomonas sp., which is a group of microorganisms with a large range of ideal bacterial growth conditions, whereas the wet-aged samples, due to their controlled anaerobic environment, a higher presence of Carnobacterium was observed. It was possible to observe that traditional microbiology is still an important tool in food safety, once it could clearly identify the main important groups of bacteria, once the microorganisms present in food are already very well described, allowing researchers and producers, depending on the methodology used, to check for them, while NGS show more groups, however, it is still an expensive tool, when considering the number of samples. Even showing different data between them, they were both efficient to differentiate the microbiota of the beef samples in their own specificity.