Categories
Nevin Manimala Statistics

Preparation and Curation of Omics Data for Genome-Wide Association Studies

Methods Mol Biol. 2022;2481:127-150. doi: 10.1007/978-1-0716-2237-7_8.

ABSTRACT

With the development of large-scale molecular phenotyping platforms, genome-wide association studies have greatly developed, being no longer limited to the analysis of classical agronomic traits, such as yield or flowering time, but also embracing the dissection of the genetic basis of molecular traits. Data generated by omics platforms, however, pose some technical and statistical challenges to the classical methodology and assumptions of an association study. Although genotyping data are subject to strict filtering procedures, and several advanced statistical approaches are now available to adjust for population structure, less attention has been instead devoted to the preparation of omics data prior to GWAS. In the present chapter, we briefly present the methods to acquire profiling data from transcripts, proteins, and small molecules, and discuss the tools and possibilities to clean, normalize, and remove the unwanted variation from large datasets of molecular phenotypic traits prior to their use in GWAS.

PMID:35641762 | DOI:10.1007/978-1-0716-2237-7_8

Categories
Nevin Manimala Statistics

Development, Preparation, and Curation of High-Throughput Phenotypic Data for Genome-Wide Association Studies: A Sample Pipeline in R

Methods Mol Biol. 2022;2481:105-125. doi: 10.1007/978-1-0716-2237-7_7.

ABSTRACT

Genome-wide association studies (GWAS) have benefited from the advances of sequencing methods for the generation of high-density genomic data. By bridging genotype to phenotype, several genes have been associated with traits of agricultural interest. Despite this, there is still a gap between genotyping and phenotyping due to the large difference in throughput between the two disciplines. Although cutting-edge phenomics technologies are available to the community, their costs are still prohibitive at the small lab level. Semiautomated methods of investigation provide a valid alternative to generate large-scale phenotyping data able to deeply investigate the characteristics of different plant organs. Beyond automation, phenomics data management is another major constraint to consider; while bioinformatics pipelines are well-trained for releasing high-quality genomic data, fewer efforts have been done for phenotyping information. This chapter provides a guide for generating large-scale data related to the size and shape of fruits, leaves, seeds, and roots and for downstream analysis for curation and preparation of clean datasets, through removal of outliers and performing primary statistical analysis. Different steps to be carried out in the R environment will be shown for gathering the appropriate input information to use in GWAS avoiding any possible bias.

PMID:35641761 | DOI:10.1007/978-1-0716-2237-7_7

Categories
Nevin Manimala Statistics

Preparation and Curation of Multiyear, Multilocation, Multitrait Datasets

Methods Mol Biol. 2022;2481:83-104. doi: 10.1007/978-1-0716-2237-7_6.

ABSTRACT

Genome-wide association studies (GWAS) are a powerful approach to dissect genotype-phenotype associations and identify causative regions. However, this power is highly influenced by the accuracy of the phenotypic data. To obtain accurate phenotypic values, the phenotyping should be achieved through multienvironment trials (METs). In order to avoid any technical errors, the required time needs to be spent on exploring, understanding, curating and adjusting the phenotypic data in each trial before combining them using an appropriate linear mixed model (LMM). The LMM is chosen to minimize as much as possible any effect that can lead to misestimation of the phenotypic values. The purpose of this chapter is to explain a series of important steps to explore and analyze data from METs used to characterize an association panel. Two datasets are used to illustrate two different scenarios.

PMID:35641760 | DOI:10.1007/978-1-0716-2237-7_6

Categories
Nevin Manimala Statistics

Interpretation of Manhattan Plots and Other Outputs of Genome-Wide Association Studies

Methods Mol Biol. 2022;2481:63-80. doi: 10.1007/978-1-0716-2237-7_5.

ABSTRACT

With increasing marker density, estimation of recombination rate between a marker and a causal mutation using linkage analysis becomes less important. Instead, linkage disequilibrium (LD) becomes the major indicator for gene mapping through genome-wide association studies (GWAS). In addition to the linkage between the marker and the causal mutation, many other factors may contribute to the LD, including population structure and cryptic relationships among individuals. As statistical methods and software evolve to improve statistical power and computing speed in GWAS, the corresponding outputs must also evolve to facilitate the interpretation of input data, the analytical process, and final association results. In this chapter, our descriptions focus on (1) considerations in creating a Manhattan plot displaying the strength of LD and locations of markers across a genome; (2) criteria for genome-wide significance threshold and the different appearance of Manhattan plots in single-locus and multiple-locus models; (3) exploration of population structure and kinship among individuals; (4) quantile-quantile (QQ) plot; (5) LD decay across the genome and LD between the associated markers and their neighbors; (6) exploration of individual and marker information on Manhattan and QQ plots via interactive visualization using HTML. The ultimate objective of this chapter is to help users to connect input data to GWAS outputs to balance power and false positives, and connect GWAS outputs to the selection of candidate genes using LD extent.

PMID:35641759 | DOI:10.1007/978-1-0716-2237-7_5

Categories
Nevin Manimala Statistics

Genome-Wide Association Study Statistical Models: A Review

Methods Mol Biol. 2022;2481:43-62. doi: 10.1007/978-1-0716-2237-7_4.

ABSTRACT

Statistical models are at the core of the genome-wide association study (GWAS). In this chapter, we provide an overview of single- and multilocus statistical models, Bayesian, and machine learning approaches for association studies in plants. These models are discussed based on their basic methodology, cofactors adjustment accounted for, statistical power and computational efficiency. New statistical models and machine learning algorithms are both showing improved performance in detecting missed signals, rare mutations and prioritizing causal genetic variants; nevertheless, further optimization and validation studies are required to maximize the power of GWAS.

PMID:35641758 | DOI:10.1007/978-1-0716-2237-7_4

Categories
Nevin Manimala Statistics

Designing a Genome-Wide Association Study: Main Steps and Critical Decisions

Methods Mol Biol. 2022;2481:3-12. doi: 10.1007/978-1-0716-2237-7_1.

ABSTRACT

In this introductory chapter, we seek to provide the reader with a high-level overview of what goes into designing a genome-wide association study (GWAS) in the context of crop plants. After introducing some general concepts regarding GWAS, we divide the contents of this overview into four main sections that reflect the key components of a GWAS: assembly and phenotyping of an association panel, genotyping, association analysis and candidate gene identification. These sections largely reflect the structure of the chapters which follow later in the book, and which provide detailed discussions of these various steps. In each section, in addition to providing external references from the literature, we also often refer the reader to the appropriate chapters in this book in which they can further explore a topic. We close by summarizing some of the key questions that a prospective user of GWAS should answer prior to undertaking this type of experiment.

PMID:35641755 | DOI:10.1007/978-1-0716-2237-7_1

Categories
Nevin Manimala Statistics

Preparation and Curation of Phenotypic Datasets

Methods Mol Biol. 2022;2481:13-27. doi: 10.1007/978-1-0716-2237-7_2.

ABSTRACT

Based on case studies, in this chapter we discuss the extent to which the number and identity of quantitative trait loci (QTL) identified from genome-wide association studies (GWAS) are affected by curation and analysis of phenotypic data. The chapter demonstrates through examples the impact of (1) cleaning of outliers, and of (2) the choice of statistical method for estimating genotypic mean values of phenotypic inputs in GWAS. No cleaning of outliers resulted in the highest number of dubious QTL, especially at loci with highly unbalanced allelic frequencies. A trade-off was identified between the risk of false positives and the risk of missing interesting, yet rare alleles. The choice of the statistical method to estimate genotypic mean values also affected the output of GWAS analysis, with reduced QTL overlap between methods. Using mixed models that capture spatial trends, among other features, increased the narrow-sense heritability of traits, the number of identified QTL and the overall power of GWAS analysis. Cleaning and choosing robust statistical models for estimating genotypic mean values should be included in GWAS pipelines to decrease both false positive and false negative rates of QTL detection.

PMID:35641756 | DOI:10.1007/978-1-0716-2237-7_2

Categories
Nevin Manimala Statistics

Predictive Ability of Multiple Mini-Interviews in Admissions on Programmatic Academic Achievement: A Systematic Review

J Allied Health. 2022 Summer;51(2):154-159.

ABSTRACT

BACKGROUND: Multiple mini-interviews (MMI) are emerging as the preferred interview format for admittance to health professions training programs.

OBJECTIVE: Evaluate the evidence regarding MMI as a predictor of programmatic academic achievement in graduate health professions train¬ing programs.

METHODS: Using the PRISMA method, two literature searches on PubMed and Google Scholar of publications from 2004-2019 were completed, identifying 7 unique references pertinent to the review’s objective. Head-to-head comparative analysis was completed between articles that had similar outcomes in addition to cumulative analysis of all included studies.

RESULTS: Of the 7 articles included in this systematic review, all had at least one statistically significant correlation between MMI used for admissions and programmatic academic achievement in graduate health professions training programs. Outcomes assessed were highly variable and included specific assessments, cumulative program GPA, and clinical performance. Studies from four unique health professions–dentistry, medicine, physician associate, and pharmacy–were included in the review.

CONCLUSIONS: Evidence indicates that MMI are an effective, valid, and reliable admissions interview format to identify future programmatic academic achievement of graduate health professions training students in specific scenarios. A head-to-head analysis comparing MMI with more traditional interview formats, particularly on an entire pool of qualified applicants, would be beneficial to assess for superiority.

PMID:35640296

Categories
Nevin Manimala Statistics

Measuring Medical Documentation Accuracy: A Novel Approach Offering Reproducible Objective Measurements Across Educational Levels and Documentation Style

J Allied Health. 2022 Summer;51(2):149-153.

ABSTRACT

Medical documentation is an important component of healthcare delivery and represents a significant portion of physician assistant educational efforts. Assessing proficiency in documentation is challenging, often relying on the subjective evaluation of documents. This study presents a novel approach using video-recorded provider-patient interactions and an interactive scoring rubric. Sixty-five participants, representing three cohorts, completed medical documentation with each document receiving accuracy scores from three independent scorers. Using this approach, the consistency of two versions of a provider-patient encounter with the same chief complaint but variable details was demonstrated (p = 0.239). Inter-scorer reliability using this method was maintained across participants with an average variation of 1.21 points on a 54-point scale between three independent scorers. Applying this method to evaluate educational preparedness, a statistically significant increase in accuracy was identified in cohorts approaching the completion of didactic education (p < 0.05). The method presented here provides a reliable platform to access medical documentation accuracy across educational levels and documentation styles.

PMID:35640295

Categories
Nevin Manimala Statistics

The Anatomy Glove Learning System for Hand Anatomy and Function: Use of Embodied Learning in Occupational Therapy Education

J Allied Health. 2022 Summer;51(2):143-148.

ABSTRACT

Occupational therapy students must build a solid foundation in hand anatomy to prepare for practice that includes interventions for people with hand injuries or impairments. To engage students in effective active learning, the Anatomy Glove Learning System (AGLS) was used in two entry-level occupational therapy programs. This cloth glove with imprinted bones was worn while students followed an online video series to draw the muscles and tendons on the hand and understand hand physiology. Research exploring student gains in hand anatomy knowledge and confidence at the two universities (n=199) over a 2-year period found statistically significant improvements in 12 of 15 items and in total scores of an anatomy quiz taken as a pre-test and post-test (t(198)=13.77, p<0.001, Cohen’s d = 1.142). In addition, statistically significant differences in student confidence related to hand anatomy (t(110)=24.47, p<0.01) and student reports of positive experiences were identified after using the AGLS. This active learning system utilized a form of embodied learning to facilitate preparedness for entry-level practice of occupational therapy students to address the needs of clients with hand impairments. Future research focused on the student experience may determine additional insights into the full benefits of the AGLS and similar active learning strategies regarding hand anatomy and physiology.

PMID:35640294