Categories
Nevin Manimala Statistics

Improved biomarker discovery through a plot twist in transcriptomic data analysis

BMC Biol. 2022 Sep 24;20(1):208. doi: 10.1186/s12915-022-01398-w.

ABSTRACT

BACKGROUND: Transcriptomic analysis is crucial for understanding the functional elements of the genome, with the classic method consisting of screening transcriptomics datasets for differentially expressed genes (DEGs). Additionally, since 2005, weighted gene co-expression network analysis (WGCNA) has emerged as a powerful method to explore relationships between genes. However, an approach combining both methods, i.e., filtering the transcriptome dataset by DEGs or other criteria, followed by WGCNA (DEGs + WGCNA), has become common. This is of concern because such approach can affect the resulting underlying architecture of the network under analysis and lead to wrong conclusions. Here, we explore a plot twist to transcriptome data analysis: applying WGCNA to exploit entire datasets without affecting the topology of the network, followed with the strength and relative simplicity of DEG analysis (WGCNA + DEGs). We tested WGCNA + DEGs against DEGs + WGCNA to publicly available transcriptomics data in one of the most transcriptomically complex tissues and delicate processes: vertebrate gonads undergoing sex differentiation. We further validate the general applicability of our approach through analysis of datasets from three distinct model systems: European sea bass, mouse, and human.

RESULTS: In all cases, WGCNA + DEGs clearly outperformed DEGs + WGCNA. First, the network model fit and node connectivity measures and other network statistics improved. The gene lists filtered by each method were different, the number of modules associated with the trait of interest and key genes retained increased, and GO terms of biological processes provided a more nuanced representation of the biological question under consideration. Lastly, WGCNA + DEGs facilitated biomarker discovery.

CONCLUSIONS: We propose that building a co-expression network from an entire dataset, and only thereafter filtering by DEGs, should be the method to use in transcriptomic studies, regardless of biological system, species, or question being considered.

PMID:36153614 | DOI:10.1186/s12915-022-01398-w

Categories
Nevin Manimala Statistics

Spatial and temporal parasite dynamics: microhabitat preferences and infection progression of two co-infecting gyrodactylids

Parasit Vectors. 2022 Sep 24;15(1):336. doi: 10.1186/s13071-022-05471-9.

ABSTRACT

BACKGROUND: Mathematical modelling of host-parasite systems has seen tremendous developments and broad applications in theoretical and applied ecology. The current study focuses on the infection dynamics of a gyrodactylid-fish system. Previous experimental studies have explored the infrapopulation dynamics of co-infecting ectoparasites, Gyrodactylus turnbulli and G. bullatarudis, on their fish host, Poecilia reticulata, but questions remain about parasite microhabitat preferences, host survival and parasite virulence over time. Here, we use more advanced statistics and a sophisticated mathematical model to investigate these questions based on empirical data to add to our understanding of this gyrodactylid-fish system.

METHODS: A rank-based multivariate Kruskal-Wallis test coupled with its post-hoc tests and graphical summaries were used to investigate the spatial and temporal parasite distribution of different gyrodactylid strains across different host populations. By adapting a multi-state Markov model that extends the standard survival models, we improved previous estimates of survival probabilities. Finally, we quantified parasite virulence of three different strains as a function of host mortality and recovery across different fish stocks and sexes.

RESULTS: We confirmed that the captive-bred G. turnbulli and wild G. bullatarudis strains preferred the caudal and rostral regions respectively across different fish stocks; however, the wild G. turnbulli strain changed microhabitat preference over time, indicating microhabitat preference of gyrodactylids is host and time dependent. The average time of host infection before recovery or death was between 6 and 14 days. For this gyrodactylid-fish system, a longer period of host infection led to a higher chance of host recovery. Parasite-related mortalities are host, sex and time dependent, whereas fish size is confirmed to be the key determinant of host recovery.

CONCLUSION: From existing empirical data, we provided new insights into the gyrodactylid-fish system. This study could inform the modelling of other host-parasite interactions where the entire infection history of the host is of interest by adapting multi-state Markov models. Such models are under-utilised in parasitological studies and could be expanded to estimate relevant epidemiological traits concerning parasite virulence and host survival.

PMID:36153606 | DOI:10.1186/s13071-022-05471-9

Categories
Nevin Manimala Statistics

Glut1 deficiency syndrome throughout life: clinical phenotypes, intelligence, life achievements and quality of life in familial cases

Orphanet J Rare Dis. 2022 Sep 24;17(1):365. doi: 10.1186/s13023-022-02513-4.

ABSTRACT

BACKGROUND: Glut1 deficiency syndrome (Glut1-DS) is a rare metabolic encephalopathy. Familial forms are poorly investigated, and no previous studies have explored aspects of Glut1-DS over the course of life: clinical pictures, intelligence, life achievements, and quality of life in adulthood. Clinical, biochemical and genetic data in a cohort of familial Glut1-DS cases were collected from medical records. Intelligence was assessed using Raven’s Standard Progressive Matrices and Raven’s Colored Progressive Matrices in adults and children, respectively. An ad hoc interview focusing on life achievements and the World Health Organization Quality of Life Questionnaire were administered to adult subjects.

RESULTS: The clinical picture in adults was characterized by paroxysmal exercise-induced dyskinesia (PED) (80%), fatigue (60%), low intelligence (60%), epilepsy (50%), and migraine (50%). However, 20% of the adults had higher-than-average intelligence. Quality of Life (QoL) seemed unrelated to the presence of PED or fatigue in adulthood. An association of potential clinical relevance, albeit not statistically significant, was found between intelligence and QoL. The phenotype of familial Glut1-DS in children was characterized by epilepsy (83.3%), intellectual disability (50%), and PED (33%).

CONCLUSION: The phenotype of familial Glut1-DS shows age-related differences: epilepsy predominates in childhood; PED and fatigue, followed by epilepsy and migraine, characterize the condition in adulthood. Some adults with familial Glut1-DS may lead regular and fulfilling lives, enjoying the same QoL as unaffected individuals. The disorder tends to worsen from generation to generation, with new and more severe symptoms arising within the same family. Epigenetic studies might be useful to assess the phenotypic variability in Glut1-DS.

PMID:36153584 | DOI:10.1186/s13023-022-02513-4

Categories
Nevin Manimala Statistics

Machine Learning Algorithms for understanding the determinants of under-five Mortality

BioData Min. 2022 Sep 24;15(1):20. doi: 10.1186/s13040-022-00308-8.

ABSTRACT

BACKGROUND: Under-five mortality is a matter of serious concern for child health as well as the social development of any country. The paper aimed to find the accuracy of machine learning models in predicting under-five mortality and identify the most significant factors associated with under-five mortality.

METHOD: The data was taken from the National Family Health Survey (NFHS-IV) of Uttar Pradesh. First, we used multivariate logistic regression due to its capability for predicting the important factors, then we used machine learning techniques such as decision tree, random forest, Naïve Bayes, K- nearest neighbor (KNN), logistic regression, support vector machine (SVM), neural network, and ridge classifier. Each model’s accuracy was checked by a confusion matrix, accuracy, precision, recall, F1 score, Cohen’s Kappa, and area under the receiver operating characteristics curve (AUROC). Information gain rank was used to find the important factors for under-five mortality. Data analysis was performed using, STATA-16.0, Python 3.3, and IBM SPSS Statistics for Windows, Version 27.0 software.

RESULT: By applying the machine learning models, results showed that the neural network model was the best predictive model for under-five mortality when compared with other predictive models, with model accuracy of (95.29% to 95.96%), recall (71.51% to 81.03%), precision (36.64% to 51.83%), F1 score (50.46% to 62.68%), Cohen’s Kappa value (0.48 to 0.60), AUROC range (93.51% to 96.22%) and precision-recall curve range (99.52% to 99.73%). The neural network was the most efficient model, but logistic regression also shows well for predicting under-five mortality with accuracy (94% to 95%)., AUROC range (93.4% to 94.8%), and precision-recall curve (99.5% to 99.6%). The number of living children, survival time, wealth index, child size at birth, birth in the last five years, the total number of children ever born, mother’s education level, and birth order were identified as important factors influencing under-five mortality.

CONCLUSION: The neural network model was a better predictive model compared to other machine learning models in predicting under-five mortality, but logistic regression analysis also shows good results. These models may be helpful for the analysis of high-dimensional data for health research.

PMID:36153553 | DOI:10.1186/s13040-022-00308-8

Categories
Nevin Manimala Statistics

Which patients bring the most costs for hospital? A study on the cost determinants among COVID-19 patients in Iran

Cost Eff Resour Alloc. 2022 Sep 24;20(1):52. doi: 10.1186/s12962-022-00386-9.

ABSTRACT

BACKGROUND: Accurate information on the cost determinants in the COVID-19 patients could provide policymakers a valuable planning tool for dealing with the future COVID-19 crises especially in the health systems with limited resources.

OBJECTIVES: This study aimed to determine the factors affecting direct medical cost of COVID-19 patients in Hamadan, the west of Iran.

METHODS: This study considered 909 confirmed COVID-19 patients with positive real-time reverse-transcriptase polymerase-chain-reaction test which were hospitalized from 1 March to 31 January 2021 in Farshchian (Sina) hospital in Hamadan, Iran. A checklist was utilized to assess the relationship of demographic characteristics, clinical presentation, medical laboratory findings and the length of hospitalization to the direct hospitalization costs in two groups of patients (patients with hospitalization ≤ 9 days and > 9 days). Statistical analysis was performed using chi-square, median test and multivariable quantile regression model at 0.05 significance levels with Stata 14 software program.

RESULTS: The median cost of hospitalization in patients was totally 134.48 dollars (Range: 19.19-2397.54) and respectively 95.87 (Range: 19.19-856.63) and 507.30 dollars (Range: 68.94-2397.54) in patients with hospitalization ≤ 9 days and > 9 days. The adjusted estimates presented that in patients with 9 or less hospitalization days history of cardiovascular disease, wheezing pulmonary lung, SPO2 lower than 90%, positive CRP, LDH higher than 942 U/L, NA lower than 136 mEq/L, lymphosite lower than 20% and patients with ICU experience had significantly positive relationship to the median of cost. Moreover, in patients with more than 9 hospitalization days, history of cardiovascular disease and ICU experience was statistically positive association and age older than 60 years and WBC lower than 4.5 mg/dL had statistically negative relationship to the median of hospitalization cost.

CONCLUSION: As the length of hospital stay, which can be associated with the severity of the disease, increases, health systems become more vulnerable in terms of resource utilization, which in turn can challenge their responsiveness and readiness to meet the specialized treatment needs of individuals.

PMID:36153533 | DOI:10.1186/s12962-022-00386-9

Categories
Nevin Manimala Statistics

Genomic prediction with whole-genome sequence data in intensely selected pig lines

Genet Sel Evol. 2022 Sep 24;54(1):65. doi: 10.1186/s12711-022-00756-0.

ABSTRACT

BACKGROUND: Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage.

METHODS: We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests.

RESULTS: The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected.

CONCLUSIONS: Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis.

PMID:36153511 | DOI:10.1186/s12711-022-00756-0

Categories
Nevin Manimala Statistics

The effect of amine-free initiator system and polymerization type on long-term color stability of resin cements: an in-vitro study

BMC Oral Health. 2022 Sep 24;22(1):426. doi: 10.1186/s12903-022-02456-z.

ABSTRACT

BACKGROUND: This in vitro study evaluated the effect of amine-free initiator system and polymerization type on long-term color change of amine-free light-cure and dual-cure resin cements.

METHODS: Sixty disk-shaped specimens (10 × 1 mm) were prepared from six different amine-free resin cements; NX3 Nexus light-cure (LC) and dual-cure (DC), Variolink Veneer (LC) and Variolink II (DC), Relyx Veneer (LC) and Rely X Ultimate (DC). A feldspathic porcelain specimen (12 × 14 × 0.8 mm) was obtained from a CAD/CAM block (Cerec Blocks; Sirona Dental Systems GmbH, Bensheim, Germany) for color testing. The feldspathic specimen was placed on the resin cement disk and all measurements were performed without cementation. A spectrophotometer was used for color measurements. Specimens were subjected to thermal aging (5 °C and 55 °C; 5000 and 20,000 cycles). Specific color coordinate differences (ΔL, Δa, and Δb) and the total color differences (ΔE00) were calculated after immersion in distilled water for different periods. Normality of data distribution was tested by using the Kolmogorov-Smirnov test. Data were statistically in a model of repeated measures, using multivariate tests and Tukey’s multiple comparison tests at a significance level of p < 0.05.

RESULTS: ∆E00 values of resin cements were influenced by cycle periods, significantly (p < 0.05). The highest ΔE00 values for long term were obtained in the NX3 (DC) (3.49 ± 0.87) and the lowest in the NX3 (LC) (1.41 ± 0.81). NX3 (LC), Variolink (DC), RELY X (LC) resin cements showed clinically acceptable color change after long-term aging (∆E00 < 1.8).

CONCLUSION: Light-cure resin cements should be preferred for long-term color stability of full ceramic restorations.

PMID:36153495 | DOI:10.1186/s12903-022-02456-z

Categories
Nevin Manimala Statistics

Sensitivity analyses for data missing at random versus missing not at random using latent growth modelling: a practical guide for randomised controlled trials

BMC Med Res Methodol. 2022 Sep 24;22(1):250. doi: 10.1186/s12874-022-01727-1.

ABSTRACT

BACKGROUND: Missing data are ubiquitous in randomised controlled trials. Although sensitivity analyses for different missing data mechanisms (missing at random vs. missing not at random) are widely recommended, they are rarely conducted in practice. The aim of the present study was to demonstrate sensitivity analyses for different assumptions regarding the missing data mechanism for randomised controlled trials using latent growth modelling (LGM).

METHODS: Data from a randomised controlled brief alcohol intervention trial was used. The sample included 1646 adults (56% female; mean age = 31.0 years) from the general population who had received up to three individualized alcohol feedback letters or assessment-only. Follow-up interviews were conducted after 12 and 36 months via telephone. The main outcome for the analysis was change in alcohol use over time. A three-step LGM approach was used. First, evidence about the process that generated the missing data was accumulated by analysing the extent of missing values in both study conditions, missing data patterns, and baseline variables that predicted participation in the two follow-up assessments using logistic regression. Second, growth models were calculated to analyse intervention effects over time. These models assumed that data were missing at random and applied full-information maximum likelihood estimation. Third, the findings were safeguarded by incorporating model components to account for the possibility that data were missing not at random. For that purpose, Diggle-Kenward selection, Wu-Carroll shared parameter and pattern mixture models were implemented.

RESULTS: Although the true data generating process remained unknown, the evidence was unequivocal: both the intervention and control group reduced their alcohol use over time, but no significant group differences emerged. There was no clear evidence for intervention efficacy, neither in the growth models that assumed the missing data to be at random nor those that assumed the missing data to be not at random.

CONCLUSION: The illustrated approach allows the assessment of how sensitive conclusions about the efficacy of an intervention are to different assumptions regarding the missing data mechanism. For researchers familiar with LGM, it is a valuable statistical supplement to safeguard their findings against the possibility of nonignorable missingness.

TRIAL REGISTRATION: The PRINT trial was prospectively registered at the German Clinical Trials Register (DRKS00014274, date of registration: 12th March 2018).

PMID:36153489 | DOI:10.1186/s12874-022-01727-1

Categories
Nevin Manimala Statistics

Quality of life of cutaneous leishmaniasis suspected patients in the Ecuadorian Pacific and Amazon regions: a cross sectional study

BMC Infect Dis. 2022 Sep 24;22(1):748. doi: 10.1186/s12879-022-07733-4.

ABSTRACT

BACKGROUND: Yearly, up to 1 million patients worldwide suffer from cutaneous leishmaniasis (CL). In Ecuador, CL affects an estimated 5000 patients annually. CL leads to reduced Health Related Quality of Life (HRQL) as a result of stigma in the Asian and Mediterranean contexts, but research is lacking for Ecuador. The objective of this study was to explore the influence of CL suspected lesions on the quality of life of patients in the Pacific and Amazon regions.

METHODS: Patients for this study were included in the Amazonian Napo, Pastaza, and Morona Santiago provinces and the Pacific region of the Pichincha province. Participating centers offered free of charge CL treatment. All patients suspected of CL and referred for a cutaneous smear slide microscopy examination were eligible. This study applied the Skindex-29 questionnaire, a generic tool to measure HRQL in patients with skin diseases. All statistical analysis was done with SPSS Statistics version 28.

RESULTS: The skindex-29 questionnaire was completed adequately by 279 patients who were included in this study. All patient groups from the Amazon scored significantly (P < 0.01) higher (indicating worse HRQL) on all the dimensions of the Skindex-29 questionnaire than Mestizo patients from the Pacific region. The percentage of patients with health seeking delay of less than a month was significantly (P < 0.01) lower in the Amazon region (38%) than in the Pacific (66%).

CONCLUSIONS: The present study revealed that the influence of suspected CL lesions on the HRQL of patients in the Ecuadorian Amazon and Pacific depends on the geographic region more than on patient characteristics such as gender, age, number of lesions, lesion type, location of lesions, health seeking delay, or posterior confirmation of the Leishmania parasite. The health seeking delay in the Amazon might result from a lack of health infrastructure or related stigma. Together, the impaired HRQL and prolonged health seeking delay in the Amazon lead to prolonged suffering and a worse health outcome. Determinants of health seeking delay should be clarified in future studies and CL case finding must be improved. Moreover, HRQL analysis in other CL endemic regions could improve local health management.

PMID:36153487 | DOI:10.1186/s12879-022-07733-4

Categories
Nevin Manimala Statistics

Effect of thoracic radiotherapy dose on the prognosis of advanced lung adenocarcinoma harboring EGFR mutations

BMC Cancer. 2022 Sep 24;22(1):1012. doi: 10.1186/s12885-022-10095-4.

ABSTRACT

BACKGROUND: The aim of this study was to investigate the effects of different thoracic radiotherapy doses on OS and incidence of radiation pneumonia which may provide some basis for optimizing the comprehensive treatment scheme of these patients with advanced EGFR mutant lung adenocarcinoma.

METHODS: Data from 111 patients with EGFR-mutant lung adenocarcinoma who received thoracic radiotherapy were included in this retrospective study. Overall survival (OS) was the primary endpoints of the study. Kaplan-Meier method was used for the comparison of OS. The Cox proportional-hazard model was used for the multivariate and univariate analyses to determine the prognostic factors related to the disease.

RESULTS: The mOS rates of the patients, who received radiotherapy dose scheme of less than 50 Gy, 50-60 Gy (including 50 Gy), and 60 Gy or more were 29.1 months, 34.4 months, and 51.0 months, respectively (log-rank P = 0.011). Although trend suggested a higher levels of pneumonia cases with increasing radiation doses, these lack statistical significance (χ2 = 1.331; P = 0.514). The multivariate analysis showed that the thoracic radiotherapy dose schemes were independently associated with the improved OS of patients (adjusted hazard ratio [HR], 0.606; 95% CI, 0.382 to 0.961; P = 0.033).

CONCLUSIONS: For the patients with advanced EGFR-mutant lung adenocarcinoma, the radical thoracic radiotherapy dose scheme (≥ 60 Gy) could significantly prolong the OS of patients during the whole course management.

PMID:36153486 | DOI:10.1186/s12885-022-10095-4