Categories
Nevin Manimala Statistics

SWALO: scaffolding with assembly likelihood optimization

Nucleic Acids Res. 2021 Aug 20:gkab717. doi: 10.1093/nar/gkab717. Online ahead of print.

ABSTRACT

Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.

PMID:34417615 | DOI:10.1093/nar/gkab717

Categories
Nevin Manimala Statistics

Risk factors for incomplete thrombosis in false lumen in sub-acute type B aortic dissection post-TEVAR

Heart Vessels. 2021 Aug 20. doi: 10.1007/s00380-021-01926-8. Online ahead of print.

ABSTRACT

There is scarce information about the risk factors for incomplete false lumen thrombosis (FLT) among type B aortic dissection (AD) patients, particularly in the sub-acute phase following thoracic endovascular aortic repair (TEVAR). We enrolled consecutive sub-acute type B AD patients at Xinqiao Hospital (Chongqing, China) from May 2010 to December 2019. Patients with severe heart failure, cancer, and myocardial infarction were excluded. The postoperative computed tomography angiography (CTA) data were extracted from the most recent follow-up aortic CTA. Multivariate logistic regressions were applied to identify the association between FLT and clinical or imaging factors. Fifty-five subjects were enrolled in our study. Twelve participants showed complete FLT, and 2 of these died during the follow-up, while 8 patients died in incomplete FLT group. In the multivariate analysis, maximum abdominal aorta diameter (OR 1.20, 95% CI 1.016-1.417 p = 0.032) and the number of branches arising from the false lumen (FL) (OR 15.062, 95% 1.681-134.982 p = 0.015) were significantly associated with incomplete FLT. The C-statistics was 0.873 (95% CI 0.773-0.972) for the model. The FL diameter (p < 0.001) was significantly shorter following TEVAR, while the true lumen diameter (p < 0.001) and maximum abdominal aorta diameter (p = 0.011) were larger after the aortic repair. There were 21.8% of sub-acute type B AD patients presented complete FLT post-TEVAR. Maximum abdominal aorta diameter and the number of branches arising from the FL were independent risk factors for incomplete FLT. The true lumen diameter, maximum abdominal aorta diameter, and FL diameter changed notably following TEVAR.

PMID:34417627 | DOI:10.1007/s00380-021-01926-8

Categories
Nevin Manimala Statistics

A non-invasive left ventricular pressure-strain loop study on myocardial work in primary aldosteronism

Hypertens Res. 2021 Aug 20. doi: 10.1038/s41440-021-00725-y. Online ahead of print.

ABSTRACT

We investigated the myocardial work derived from left ventricular pressure-strain loop in patients with primary aldosteronism or primary hypertension. We enrolled 50 patients with primary aldosteronism, 50 age- and sex-matched patients with primary hypertension, and 25 normotensive control subjects. We performed transthoracic echocardiography and speckle-tracking echocardiography-based left ventricular pressure-strain loop analysis to evaluate cardiac structure and function. Patients with primary aldosteronism and those with primary hypertension had similar clinic and ambulatory blood pressures, except that the former had a significantly (P = 0.03) higher nighttime systolic blood pressure. All subjects had normal left ventricular ejection fraction (66.4 ± 4.7%). Patients with primary aldosteronism had a greater left ventricular mass index than those with primary hypertension and the normal controls (111.0 ± 21.6 g/m2 versus 95.7 ± 17.7 and 77.9 ± 13.5 g/m2, respectively, P < 0.001). The global myocardial work index (GWI, 2336 ± 333, 2366 ± 288, and 2292 ± 249 mmHg%, respectively), and global constructive work (GCW, 2494 ± 325, 2524 ± 301, and 2391 ± 193 mmHg%, respectively), were comparable in the three groups (P ≥ 0.18). However, the global work efficiency (GWE) differed significantly (P < 0.001), being lowest in primary aldosteronism (91.1 ± 2.7%), intermediate in primary hypertension (93.5 ± 2.5%) and highest in controls (95.3 ± 1.5%). The opposite was true for the global wasted work (GWW) (205.6 ± 74.6, 142.0 ± 56.4 and 99.4 ± 33.7 mmHg%, respectively, P < 0.001). GWE was significantly correlated with the logarithmically transformed plasma concentration and the urinary excretion of aldosterone in patients with primary aldosteronism or primary hypertension (r = -0.43 for both, P < 0.001). The associations remained statistically significant (P ≤ 0.04) after further adjustment for several factors, including left ventricular mass index and clinic or nighttime blood pressure. In conclusion, GWE decreased and GWW increased in primary hypertension and further in primary aldosteronism, probably because of the adrenal aldosterone hypersecretion and the left ventricular mass index increase, while GWI and GCW were similar, indicating that similar and normalized total myocardial work might be a compensation in hypertension at the expense of work efficiency.

PMID:34417559 | DOI:10.1038/s41440-021-00725-y

Categories
Nevin Manimala Statistics

Exposure Assessment Techniques Applied to the Highly Censored Deepwater Horizon Gulf Oil Spill Personal Measurements

Ann Work Expo Health. 2021 Aug 21:wxab060. doi: 10.1093/annweh/wxab060. Online ahead of print.

ABSTRACT

The GuLF Long-term Follow-up Study (GuLF STUDY) is investigating potential adverse health effects of workers involved in the Deepwater Horizon (DWH) oil spill response and cleanup (OSRC). Over 93% of the 160 000 personal air measurements taken on OSRC workers were below the limit of detection (LOD), as reported by the analytic labs. At this high level of censoring, our ability to develop exposure estimates was limited. The primary objective here was to reduce the number of measurements below the labs’ reported LODs to reflect the analytic methods’ true LODs, thereby facilitating the use of a relatively unbiased and precise Bayesian method to develop exposure estimates for study exposure groups (EGs). The estimates informed a job-exposure matrix to characterize exposure of study participants. A second objective was to develop descriptive statistics for relevant EGs that did not meet the Bayesian criteria of sample size ≥5 and censoring ≤80% to achieve the aforementioned level of bias and precision. One of the analytic labs recalculated the measurements using the analytic method’s LOD; the second lab provided raw analytical data, allowing us to recalculate the data values that fell between the originally reported LOD and the analytical method’s LOD. We developed rules for developing Bayesian estimates for EGs with >80% censoring. The remaining EGs were 100% censored. An order-based statistical method (OBSM) was developed to estimate exposures that considered the number of measurements, geometric standard deviation, and average LOD of the censored samples for N ≥ 20. For N < 20, substitution of ½ of the LOD was assigned. Recalculation of the measurements lowered overall censoring from 93.2 to 60.5% and of the THC measurements, from 83.1 to 11.2%. A total of 71% of the EGs met the ≤15% relative bias and <65% imprecision goal. Another 15% had censoring >80% but enough non-censored measurements to apply Bayesian methods. We used the OBSM for 3% of the estimates and the simple substitution method for 11%. The methods presented here substantially reduced the degree of censoring in the dataset and increased the number of EGs meeting our Bayesian method’s desired performance goal. The OBSM allowed for a systematic and consistent approach impacting only the lowest of the exposure estimates. This approach should be considered when dealing with highly censored datasets.

PMID:34417597 | DOI:10.1093/annweh/wxab060

Categories
Nevin Manimala Statistics

Detection of fusion gene transcripts in the blood samples of prostate cancer patients

Sci Rep. 2021 Aug 20;11(1):16995. doi: 10.1038/s41598-021-96528-9.

ABSTRACT

Prostate cancer remains one of the most lethal cancers for men in the United States. The study aims to detect fusion transcripts in the blood samples of prostate cancer patients. We analyzed nine fusion transcripts including MAN2A1-FER, SLC45A2-AMACR, TRMT11-GRIK2, CCNH-C5orf30, mTOR-TP53BP1, KDM4-AC011523.2, TMEM135-CCDC67, LRRC59-FLJ60017 and Pten-NOLC1147 in the blood samples from 147 prostate cancer patients and 14 healthy individuals, using Taqman RT-PCR and Sanger’s sequencing. Similar analyses were also performed on 25 matched prostate cancer samples for matched-sample evaluation. Eighty-two percent blood samples from the prostate cancer patients were positive for MAN2A1-FER transcript, while 41.5% and 38.8% blood samples from the prostate cancer patients were positive for SLC45A2-AMACR and Pten-NOLC1, respectively. CCNH-c5orf30 and mTOR-TP53BP1 had low detection rates, positive in only 5.4% and 4% of the blood samples from the prostate cancer patients. Only 2 blood samples were positive for KDM4B-AC011523.2 transcript. Overall, 89.8% patients were positive for at least one fusion transcript in their blood samples. The statistical analysis showed varied sensitivity of fusion transcript detection in the blood based on the types of fusions. In contrast, the blood samples from all healthy individuals were negative for the fusion transcripts. Detection of fusion transcripts in the blood samples of the prostate cancer patients may be a fast and cost-effective way to detect prostate cancer.

PMID:34417538 | DOI:10.1038/s41598-021-96528-9

Categories
Nevin Manimala Statistics

The neurological level of spinal cord injury and cardiovascular risk factors: a systematic review and meta-analysis

Spinal Cord. 2021 Aug 20. doi: 10.1038/s41393-021-00678-6. Online ahead of print.

ABSTRACT

STUDY DESIGN: Systematic review and meta-analysis.

OBJECTIVE: To determine the difference in cardiovascular risk factors (blood pressure, lipid profile, and markers of glucose metabolism and inflammation) according to the neurological level of spinal cord injury (SCI).

METHODS: We searched 5 electronic databases from inception until July 4, 2020. Data were extracted by two independent reviewers using a pre-defined data collection form. The pooled effect estimate was computed using random-effects models, and heterogeneity was calculated using I2 statistic and chi-squared test (CRD42020166162).

RESULTS: We screened 4863 abstracts, of which 47 studies with 3878 participants (3280 males, 526 females, 72 sex unknown) were included in the meta-analysis. Compared to paraplegia, individuals with tetraplegia had lower systolic and diastolic blood pressure (unadjusted weighted mean difference, -14.5 mmHg, 95% CI -19.2, -9.9; -7.0 mmHg 95% CI -9.2, -4.8, respectively), lower triglycerides (-10.9 mg/dL, 95% CI -19.7, -2.1), total cholesterol (-9.9 mg/dL, 95% CI -14.5, -5.4), high-density lipoprotein (-1.7 mg/dL, 95% CI -3.3, -0.2) and low-density lipoprotein (-5.8 mg/dL, 95% CI -9.0, -2.5). Comparing individuals with high- vs. low-thoracic SCI, persons with higher injury had lower systolic and diastolic blood pressure (-10.3 mmHg, 95% CI -13.4, -7.1; -5.3 mmHg 95% CI -7.5, -3.2, respectively), while no differences were found for low-density lipoprotein, serum glucose, insulin, and inflammation markers. High heterogeneity was partially explained by age, prevalent cardiovascular diseases and medication use, body mass index, sample size, and quality of studies.

CONCLUSION: In SCI individuals, the level of injury may be an additional non-modifiable cardiovascular risk factor. Future well-designed longitudinal studies with sufficient follow-up and providing sex-stratified analyses should confirm our findings and explore the role of SCI level in cardiovascular health and overall prognosis and survival.

PMID:34417550 | DOI:10.1038/s41393-021-00678-6

Categories
Nevin Manimala Statistics

4210 Da and 1866 Da polypeptides as potential biomarkers of liver disease progression in hepatitis B virus patients

Sci Rep. 2021 Aug 20;11(1):16982. doi: 10.1038/s41598-021-96581-4.

ABSTRACT

HBV infection is recognized as a serious global health problem, and hepatitis B virus infection is a complicated chronic disease leading to liver cirrhosis (LC) and hepatocellular carcinoma (HCC). New biochemical serum markers could be used to advance the diagnosis and prognosis of HBV-associated liver diseases during the progression of chronic hepatitis B into cirrhosis and HCC. We determined whether the 4210 Da and 1866 Da polypeptides are serum metabolite biomarkers of hepatopathy with hepatitis B virus. A total of 570 subjects were divided into five groups: healthy controls, those with natural clearance, and patients with CHB, LC, and HCC. The 1866 Da and 4210 Da polypeptides were measured by Clin-ToF II MALDI-TOF-MS. There were significant differences in 4210 Da and 1866 Da levels among the five groups (P < 0.001). For the differential diagnosis of CHB from normal liver, the areas under the receiver operating characteristic (ROC) curve of 4210 Da and 1866 Da and their combination via logistic regression were 0.961, 0.849 and 0.967. For the differential diagnosis of LC from CHB, the areas under the ROC curve were 0.695, 0.841 and 0.826. For the differential diagnosis of HCC from CHB, the areas under the ROC curve were 0.744, 0.710 and 0.761, respectively. For the differential diagnosis of HCC from LC, the areas under the ROC curve of 4210 Da and 1866 Da were 0.580 and 0.654. The positive rate of 1866 Da was 45.5% and 69.0% in AFP-negative HCC patients and that of 4210 Da was 60.6% 58.6% in AFP-negative HCC patients of the study HCC vs. CHB and HCC vs. LC. The 4210 Da and 1866 Da polypeptide levels were positively correlated with HBV DNA levels (P < 0.001, r = 0.269; P < 0.001, r = 0.285). The 4210 Da and 1866 Da polypeptides had good diagnostic value for the occurrence and progression of HBV-related chronic hepatitis, liver cirrhosis and hepatocellular carcinoma and could serve to accurately guide treatment management and predict clinical outcomes.

PMID:34417517 | DOI:10.1038/s41598-021-96581-4

Categories
Nevin Manimala Statistics

Author Correction: Incorporating false negative tests in epidemiological models for SARS-CoV-2 transmission and reconciling with seroprevalence estimates

Sci Rep. 2021 Aug 20;11(1):17221. doi: 10.1038/s41598-021-96603-1.

NO ABSTRACT

PMID:34417536 | DOI:10.1038/s41598-021-96603-1

Categories
Nevin Manimala Statistics

Stability of diagnostic rate in a cohort of 38,813 colorectal polyp specimens and implications for histomorphology and statistical process control

Sci Rep. 2021 Aug 20;11(1):16942. doi: 10.1038/s41598-021-95862-2.

ABSTRACT

This work sought to quantify pathologists’ diagnostic bias over time in their evaluation of colorectal polyps to assess how this may impact the utility of statistical process control (SPC). All colorectal polyp specimens(CRPS) for 2011-2017 in a region were categorized using a validated free text string matching algorithm. Pathologist diagnostic rates (PDRs) for high grade dysplasia (HGD), tubular adenoma (TA_ad), villous morphology (TVA + VA), sessile serrated adenoma (SSA) and hyperplastic polyp (HP), were assessed (1) for each pathologist in yearly intervals with control charts (CCs), and (2) with a generalized linear model (GLM). The study included 64,115 CRPS. Fifteen pathologists each interpreted > 150 CRPS/year in all years and together diagnosed 38,813. The number of pathologists (of 15) with zero or one (p < 0.05) outlier in seven years, compared to their overall PDR, was 13, 9, 9, 5 and 9 for HGD, TVA + VA, TA_ad, HP and SSA respectively. The GLM confirmed, for the subset where pathologists/endoscopists saw > 600 CRPS each(total 52,760 CRPS), that pathologist, endoscopist, anatomical location and year were all strongly correlated (all p < 0.0001) with the diagnosis. The moderate PDR stability over time supports the hypothesis that diagnostic rates are amendable to calibration via SPC and outcome data.

PMID:34417490 | DOI:10.1038/s41598-021-95862-2

Categories
Nevin Manimala Statistics

Modelling menstrual cycle length in athletes using state-space models

Sci Rep. 2021 Aug 20;11(1):16972. doi: 10.1038/s41598-021-95960-1.

ABSTRACT

The ability to predict an individual’s menstrual cycle length to a high degree of precision could help female athletes to track their period and tailor their training and nutrition correspondingly. Such individualisation is possible and necessary, given the known inter-individual variation in cycle length. To achieve this, a hybrid predictive model was built using data on 16,524 cycles collected from a sample of 2125 women (mean age 34.38 years, range 18.00-47.10, number of menstrual cycles ranging from 4 to 53). A mixed-effect state-space model was fitted to capture the within-subject temporal correlation, incorporating a Bayesian approach for process forecasting to predict the duration (in days) of the next menstrual cycle. The modelling procedure was split into three steps (1) a time trend component using a random walk with an overdispersion parameter, (2) an autocorrelation component using an autoregressive moving-average model, and (3) a linear predictor to account for covariates (e.g. injury, stomach cramps, training intensity). The inclusion of an overdispersion parameter suggested that [Formula: see text] [Formula: see text] of cycles in the sample were overdispersed. The random walk standard deviation for a non-overdispersed cycle is [Formula: see text] [1.00, 1.09] days while under an overdispersed cycle, the menstrual cycle variance increase in 4.78 [4.57, 5.00] days. To assess the performance and prediction accuracy of the model, each woman’s last observation was used as test data. The root mean square error (RMSE), concordance correlation coefficient and Pearson correlation coefficient (r) between the observed and predicted values were calculated. The model had an RMSE of 1.6412 days, a precision of 0.7361 and overall accuracy of 0.9871. In conclusion, the hybrid model presented here is a helpful approach for predicting menstrual cycle length, which in turn can be used to support female athlete wellness.

PMID:34417493 | DOI:10.1038/s41598-021-95960-1