Categories
Nevin Manimala Statistics

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function

Bioinformatics. 2023 Jun 30;39(Supplement_1):i318-i325. doi: 10.1093/bioinformatics/btad208.

ABSTRACT

MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently.

RESULTS: We developed TransFun-a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating that the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy.

AVAILABILITY AND IMPLEMENTATION: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.

PMID:37387145 | DOI:10.1093/bioinformatics/btad208

Categories
Nevin Manimala Statistics

Deep statistical modelling of nanopore sequencing translocation times reveals latent non-B DNA structures

Bioinformatics. 2023 Jun 30;39(Supplement_1):i242-i251. doi: 10.1093/bioinformatics/btad220.

ABSTRACT

MOTIVATION: Non-canonical (or non-B) DNA are genomic regions whose three-dimensional conformation deviates from the canonical double helix. Non-B DNA play an important role in basic cellular processes and are associated with genomic instability, gene regulation, and oncogenesis. Experimental methods are low-throughput and can detect only a limited set of non-B DNA structures, while computational methods rely on non-B DNA base motifs, which are necessary but not sufficient indicators of non-B structures. Oxford Nanopore sequencing is an efficient and low-cost platform, but it is currently unknown whether nanopore reads can be used for identifying non-B structures.

RESULTS: We build the first computational pipeline to predict non-B DNA structures from nanopore sequencing. We formalize non-B detection as a novelty detection problem and develop the GoFAE-DND, an autoencoder that uses goodness-of-fit (GoF) tests as a regularizer. A discriminative loss encourages non-B DNA to be poorly reconstructed and optimizing Gaussian GoF tests allows for the computation of P-values that indicate non-B structures. Based on whole genome nanopore sequencing of NA12878, we show that there exist significant differences between the timing of DNA translocation for non-B DNA bases compared with B-DNA. We demonstrate the efficacy of our approach through comparisons with novelty detection methods using experimental data and data synthesized from a new translocation time simulator. Experimental validations suggest that reliable detection of non-B DNA from nanopore sequencing is achievable.

AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/bayesomicslab/ONT-nonb-GoFAE-DND.

PMID:37387144 | DOI:10.1093/bioinformatics/btad220

Categories
Nevin Manimala Statistics

Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes

Bioinformatics. 2023 Jun 30;39(Supplement_1):i260-i269. doi: 10.1093/bioinformatics/btad233.

ABSTRACT

MOTIVATION: Huge datasets containing whole-genome sequences of bacterial strains are now commonplace and represent a rich and important resource for modern genomic epidemiology and metagenomics. In order to efficiently make use of these datasets, efficient indexing data structures-that are both scalable and provide rapid query throughput-are paramount.

RESULTS: Here, we present Themisto, a scalable colored k-mer index designed for large collections of microbial reference genomes, that works for both short and long read data. Themisto indexes 179 thousand Salmonella enterica genomes in 9 h. The resulting index takes 142 gigabytes. In comparison, the best competing tools Metagraph and Bifrost were only able to index 11 000 genomes in the same time. In pseudoalignment, these other tools were either an order of magnitude slower than Themisto, or used an order of magnitude more memory. Themisto also offers superior pseudoalignment quality, achieving a higher recall than previous methods on Nanopore read sets.

AVAILABILITY AND IMPLEMENTATION: Themisto is available and documented as a C++ package at https://github.com/algbio/themisto available under the GPLv2 license.

PMID:37387143 | DOI:10.1093/bioinformatics/btad233

Categories
Nevin Manimala Statistics

SpatialSort: a Bayesian model for clustering and cell population annotation of spatial proteomics data

Bioinformatics. 2023 Jun 30;39(Supplement_1):i131-i139. doi: 10.1093/bioinformatics/btad242.

ABSTRACT

MOTIVATION: Recent advances in spatial proteomics technologies have enabled the profiling of dozens of proteins in thousands of single cells in situ. This has created the opportunity to move beyond quantifying the composition of cell types in tissue, and instead probe the spatial relationships between cells. However, most current methods for clustering data from these assays only consider the expression values of cells and ignore the spatial context. Furthermore, existing approaches do not account for prior information about the expected cell populations in a sample.

RESULTS: To address these shortcomings, we developed SpatialSort, a spatially aware Bayesian clustering approach that allows for the incorporation of prior biological knowledge. Our method is able to account for the affinities of cells of different types to neighbour in space, and by incorporating prior information about expected cell populations, it is able to simultaneously improve clustering accuracy and perform automated annotation of clusters. Using synthetic and real data, we show that by using spatial and prior information SpatialSort improves clustering accuracy. We also demonstrate how SpatialSort can perform label transfer between spatial and nonspatial modalities through the analysis of a real world diffuse large B-cell lymphoma dataset.

AVAILABILITY AND IMPLEMENTATION: Source code is available on Github at: https://github.com/Roth-Lab/SpatialSort.

PMID:37387130 | DOI:10.1093/bioinformatics/btad242

Categories
Nevin Manimala Statistics

Genome-wide scans for selective sweeps using convolutional neural networks

Bioinformatics. 2023 Jun 30;39(Supplement_1):i194-i203. doi: 10.1093/bioinformatics/btad265.

ABSTRACT

MOTIVATION: Recent methods for selective sweep detection cast the problem as a classification task and use summary statistics as features to capture region characteristics that are indicative of a selective sweep, thereby being sensitive to confounding factors. Furthermore, they are not designed to perform whole-genome scans or to estimate the extent of the genomic region that was affected by positive selection; both are required for identifying candidate genes and the time and strength of selection.

RESULTS: We present ASDEC (https://github.com/pephco/ASDEC), a neural-network-based framework that can scan whole genomes for selective sweeps. ASDEC achieves similar classification performance to other convolutional neural network-based classifiers that rely on summary statistics, but it is trained 10× faster and classifies genomic regions 5× faster by inferring region characteristics from the raw sequence data directly. Deploying ASDEC for genomic scans achieved up to 15.2× higher sensitivity, 19.4× higher success rates, and 4× higher detection accuracy than state-of-the-art methods. We used ASDEC to scan human chromosome 1 of the Yoruba population (1000Genomes project), identifying nine known candidate genes.

PMID:37387128 | DOI:10.1093/bioinformatics/btad265

Categories
Nevin Manimala Statistics

Radiogenomics in NF2-Associated Schwannomatosis (Neurofibromatosis Type II): Exploratory Data Analysis

Stud Health Technol Inform. 2023 Jun 29;305:588-591. doi: 10.3233/SHTI230565.

ABSTRACT

Our pilot study aimed at exploratory radiogenomic data analysis in patients with NF2-associated schwannomatosis (formerly neurofibromatosis type II) to assume the potential of image biomarkers in this pathology. Fifty-three unrelated patients (37 (69.8%) women, avg. age 30.2 ± 11.2 y.o.) were enrolled in the study. First-order, gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), and geometry-based statistics were calculated (3718 features per region of interest). We demonstrated imaging patterns and statistically significant differences in radiomic features potentially related to the genotype and clinical phenotype of the disease. However, the clinical utility of these patterns should be further evaluated. The study was supported by the Russian Science Foundation grant 21-15-00262.

PMID:37387099 | DOI:10.3233/SHTI230565

Categories
Nevin Manimala Statistics

Predicting In-Hospital Mortality During the COVID-19 Pandemic in Patients with Heart Failure: A Single-Center Exploratory Study

Stud Health Technol Inform. 2023 Jun 29;305:487-490. doi: 10.3233/SHTI230539.

ABSTRACT

The aim of this study was to investigate whether exposure to the pandemic was associated with increased in-hospital mortality for health failure. We collected data from patients hospitalized between 2019 and 2020 and we assessed the likelihood of in-hospital death. Although the positive association of exposure to the COVID period with an increased in-hospital mortality is not statistically significant, this may underscore other factors that may influence mortality. Our study was designed to contribute to a better understanding of the impact of the pandemic on in-hospital mortality and to identify potential areas for intervention in patient care.

PMID:37387073 | DOI:10.3233/SHTI230539

Categories
Nevin Manimala Statistics

COVID-19 in Eye Surgery: The Case of a University Hospital

Stud Health Technol Inform. 2023 Jun 29;305:479-482. doi: 10.3233/SHTI230537.

ABSTRACT

Coronavirus epidemic has quickly become a global health threat. The ophthalmology department, like all other departments, have adopted resource management and personnel adjustment maneuvers. The aim of this work was to describe the impact of covid on the Ophthalmology Department of University Hospital “Federico II” of Naples. In the study logistical regression was used for a comparison between the pandemic and the previous period, analyzing patient features. The analysis showed a decrease in the number of accesses; reduction of the length of stay; and the statistically dependent variables are as follows: LOS, discharge procedures and admission procedure.

PMID:37387071 | DOI:10.3233/SHTI230537

Categories
Nevin Manimala Statistics

Information Messages Related to Mental Health Status Among Caregivers in Rural of Thailand

Stud Health Technol Inform. 2023 Jun 29;305:475-476. doi: 10.3233/SHTI230535.

ABSTRACT

This cross-sectional study aimed to explore the Mental Health Status and the relationship between socioeconomic background and mean scores of mental health variables among Caregivers (CG) in Maha Sarakham province, Northeast of Thailand. A total of 402 CGs were recruited from 32 sub-districts in 13 districts to participate with interviewing form. Data analysing used descriptive statistics and Chi-square test for the relationship of the socioeconomic and the level of mental health status of caregivers. The results shown that; 99.77% were female, age average 49.89+8.14 (range 23-75), spent time look after the elderly for average 3 days per week, worked experience for 1-4 years (mean=3.27+1.66 years). Over 59 % have lower income than 150 USD. The gender of CG was a mainly statistically significant with the mental health status (MHS) (p=0.003). Although, the other variables were not significantly statistics test, however, it found that all variables indicated in the poor level of mental health status. Therefore, the stakeholders who involves with CG should have concern to reduce their burnout, regardless of compensation as well as set up the potential of family caregivers or young carers to help the elderly in the community.

PMID:37387069 | DOI:10.3233/SHTI230535

Categories
Nevin Manimala Statistics

Stroke Patients’ Management During COVID-19 Pandemic: Results from the Sun4Patients Web-Based Registry

Stud Health Technol Inform. 2023 Jun 29;305:464-468. doi: 10.3233/SHTI230532.

ABSTRACT

Covid-19 pandemic has influenced stroke care in different ways. Recent reports demonstrated a sharp decline in acute stroke admissions worldwide. Even for patients presented to dedicated healthcare services, management at the acute phase may be sub-optimal. On the other hand, Greece has been praised for the early initiation of restriction measures which were associated with a ‘milder’ surge of SARS-CoV-2 infection. Methods Data derived from a prospective cohort multicenter registry. The study population consisted of first-ever acute stroke patients, hemorrhagic or ischemic, admitted within 48 hours of symptom onset in seven national healthcare system (NHS) and University hospitals in Greece. Two different time periods have been considered, defined as “before Covid-19” (15/12/2019-15/02/2020) and “during Covid-19” (16/02/2020-15/04-2020) era. Statistical comparisons on acute stroke admission characteristics between the two different time periods have been performed. Results This exploratory analysis of 112 consecutive patients showed a reduction of acute stroke admissions by 40during Covid-19 period. No significant differences were observed regarding stroke severity, risk factor profile and baseline characteristics for patients admitted before and during Covid-19 pandemic period. There is a significant delay between symptom onset to CT scan during Covid-19 era compared to the period before pandemic reached Greece (p=0.03). Conclusions The rate of acute stroke admissions has been reduced by 40% during Covid-19 pandemic. Further research is needed to clarify whether the reduction in stroke volume is actual or not and identifying the reasons underlying the paradox.

PMID:37387066 | DOI:10.3233/SHTI230532