Categories
Nevin Manimala Statistics

Identification and Validation of Candidate Genes from Genome-Wide Association Studies

Methods Mol Biol. 2022;2481:249-272. doi: 10.1007/978-1-0716-2237-7_15.

ABSTRACT

Exploiting the statistical associations coming out from a GWAS experiment to identify and validate candidate genes may be potentially difficult and time consuming. To fill the gap between the identification of candidate genes toward their functional validation onto the trait performance, the prioritization of variants underlying the GWAS-associated regions is necessary. In parallel, recent developments in genomics and statistical methods have been achieved notably in human genetic and they are accordingly being adopted in plant breeding toward the study of the genetic architecture of traits to sustain genetic gains. In this chapter, we aim at providing both theoretical and practical aspects underlying three main options including (1) the MetaGWAS analysis, (2) the statistical fine mapping and (3) the integration of functional data toward the identification and validation of candidate genes from a GWAS experiment.

PMID:35641769 | DOI:10.1007/978-1-0716-2237-7_15

Categories
Nevin Manimala Statistics

Performing Genome-Wide Association Studies Using rMVP

Methods Mol Biol. 2022;2481:219-245. doi: 10.1007/978-1-0716-2237-7_14.

ABSTRACT

Genome wide association study (GWAS), which is a powerful tool to detect the relationship between the traits of interest and high-density markers, has provided unprecedented insights into the genetic basis of quantitative variation for complex traits. Along with the development of high-throughput sequencing technology, both sample sizes and marker sizes are increasing rapidly, which make computations more challenging than ever. Therefore, to efficiently process big data with limited computing resources in a reasonable time and to use state-of-the-art statistical models to reduce false positive and false negative rates have always been hot topics in the domain of GWAS. In this chapter, we describe how to perform GWAS using an R package, rMVP, which includes data preparation, evaluation of population structure, association tests by different models, and high-quality visualization of GWAS results.

PMID:35641768 | DOI:10.1007/978-1-0716-2237-7_14

Categories
Nevin Manimala Statistics

Performing Genome-Wide Association Studies with Multiple Models Using GAPIT

Methods Mol Biol. 2022;2481:199-217. doi: 10.1007/978-1-0716-2237-7_13.

ABSTRACT

Genome-wide association study (GWAS) is based on the linkage disequilibrium (LD) between phenotypes and genetic markers covering the whole genome. Besides the genetic linkage between the genetic markers and the causal mutations, many other factors contribute to the LD, including selection and nonrandom mating formatting population structure. Many methods have been developed with accompany of corresponding software such as multiple loci mixed model (MLMM). There are software packages that implement multiple methods to reduce the learning curve. One of them is the Genomic Association and Prediction Integrated Tool (GAPIT), which implemented eight models including GLM (General Linear Model), Mixed Linear Model (MLM), Compressed MLM, MLMM, SUPER (Settlement of mixed linear models Under Progressively Exclusive Relationship), FarmCPU (Fixed and random model Circulating Probability Unification), and BLINK (Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway). Besides the availability of multiple models, GAPIT provides comprehensive functions for data quality control, data visualization, and publication-ready quality graphic outputs, such as Manhattan plots in rectangle and circle formats, quantile-quantile (QQ) plots, principal component plots, scatter plot of minor allele frequency against GWAS signals, plots of LD between associated markers and the adjacent markers. GAPIT developers and users established a community through the GAPIT forum ( https://groups.google.com/g/gapit-forum ) with over 600 members for asking questions, making comments, and sharing experiences. In this chapter, we detail the GAPIT functions, input data frame, output files, and example codes for each GWAS model. We also interpret parameters, functional algorithms, and modules of GAPIT implementation.

PMID:35641767 | DOI:10.1007/978-1-0716-2237-7_13

Categories
Nevin Manimala Statistics

Data Integration, Imputation, and Meta-analysis for Genome-Wide Association Studies

Methods Mol Biol. 2022;2481:173-183. doi: 10.1007/978-1-0716-2237-7_11.

ABSTRACT

Growing genomic and phenotypic datasets require different groups around the world to collaborate and integrate these valuable resources to maximize their benefit and increase reference population sizes for genomic prediction and genome-wide association studies (GWAS). However, different studies use different genotyping techniques which requires a synchronizing step for the genotyped variants called “imputation” before combining them. Optimally, different GWAS datasets can be analysed within a meta-analysis, which recruits summary statistics instead of actual data. This chapter describes the general principles for genotypic imputation and meta-GWAS analysis with a description of study designs and command lines required for such analyses.

PMID:35641765 | DOI:10.1007/978-1-0716-2237-7_11

Categories
Nevin Manimala Statistics

Preparation and Curation of Omics Data for Genome-Wide Association Studies

Methods Mol Biol. 2022;2481:127-150. doi: 10.1007/978-1-0716-2237-7_8.

ABSTRACT

With the development of large-scale molecular phenotyping platforms, genome-wide association studies have greatly developed, being no longer limited to the analysis of classical agronomic traits, such as yield or flowering time, but also embracing the dissection of the genetic basis of molecular traits. Data generated by omics platforms, however, pose some technical and statistical challenges to the classical methodology and assumptions of an association study. Although genotyping data are subject to strict filtering procedures, and several advanced statistical approaches are now available to adjust for population structure, less attention has been instead devoted to the preparation of omics data prior to GWAS. In the present chapter, we briefly present the methods to acquire profiling data from transcripts, proteins, and small molecules, and discuss the tools and possibilities to clean, normalize, and remove the unwanted variation from large datasets of molecular phenotypic traits prior to their use in GWAS.

PMID:35641762 | DOI:10.1007/978-1-0716-2237-7_8

Categories
Nevin Manimala Statistics

Development, Preparation, and Curation of High-Throughput Phenotypic Data for Genome-Wide Association Studies: A Sample Pipeline in R

Methods Mol Biol. 2022;2481:105-125. doi: 10.1007/978-1-0716-2237-7_7.

ABSTRACT

Genome-wide association studies (GWAS) have benefited from the advances of sequencing methods for the generation of high-density genomic data. By bridging genotype to phenotype, several genes have been associated with traits of agricultural interest. Despite this, there is still a gap between genotyping and phenotyping due to the large difference in throughput between the two disciplines. Although cutting-edge phenomics technologies are available to the community, their costs are still prohibitive at the small lab level. Semiautomated methods of investigation provide a valid alternative to generate large-scale phenotyping data able to deeply investigate the characteristics of different plant organs. Beyond automation, phenomics data management is another major constraint to consider; while bioinformatics pipelines are well-trained for releasing high-quality genomic data, fewer efforts have been done for phenotyping information. This chapter provides a guide for generating large-scale data related to the size and shape of fruits, leaves, seeds, and roots and for downstream analysis for curation and preparation of clean datasets, through removal of outliers and performing primary statistical analysis. Different steps to be carried out in the R environment will be shown for gathering the appropriate input information to use in GWAS avoiding any possible bias.

PMID:35641761 | DOI:10.1007/978-1-0716-2237-7_7

Categories
Nevin Manimala Statistics

Preparation and Curation of Multiyear, Multilocation, Multitrait Datasets

Methods Mol Biol. 2022;2481:83-104. doi: 10.1007/978-1-0716-2237-7_6.

ABSTRACT

Genome-wide association studies (GWAS) are a powerful approach to dissect genotype-phenotype associations and identify causative regions. However, this power is highly influenced by the accuracy of the phenotypic data. To obtain accurate phenotypic values, the phenotyping should be achieved through multienvironment trials (METs). In order to avoid any technical errors, the required time needs to be spent on exploring, understanding, curating and adjusting the phenotypic data in each trial before combining them using an appropriate linear mixed model (LMM). The LMM is chosen to minimize as much as possible any effect that can lead to misestimation of the phenotypic values. The purpose of this chapter is to explain a series of important steps to explore and analyze data from METs used to characterize an association panel. Two datasets are used to illustrate two different scenarios.

PMID:35641760 | DOI:10.1007/978-1-0716-2237-7_6

Categories
Nevin Manimala Statistics

Interpretation of Manhattan Plots and Other Outputs of Genome-Wide Association Studies

Methods Mol Biol. 2022;2481:63-80. doi: 10.1007/978-1-0716-2237-7_5.

ABSTRACT

With increasing marker density, estimation of recombination rate between a marker and a causal mutation using linkage analysis becomes less important. Instead, linkage disequilibrium (LD) becomes the major indicator for gene mapping through genome-wide association studies (GWAS). In addition to the linkage between the marker and the causal mutation, many other factors may contribute to the LD, including population structure and cryptic relationships among individuals. As statistical methods and software evolve to improve statistical power and computing speed in GWAS, the corresponding outputs must also evolve to facilitate the interpretation of input data, the analytical process, and final association results. In this chapter, our descriptions focus on (1) considerations in creating a Manhattan plot displaying the strength of LD and locations of markers across a genome; (2) criteria for genome-wide significance threshold and the different appearance of Manhattan plots in single-locus and multiple-locus models; (3) exploration of population structure and kinship among individuals; (4) quantile-quantile (QQ) plot; (5) LD decay across the genome and LD between the associated markers and their neighbors; (6) exploration of individual and marker information on Manhattan and QQ plots via interactive visualization using HTML. The ultimate objective of this chapter is to help users to connect input data to GWAS outputs to balance power and false positives, and connect GWAS outputs to the selection of candidate genes using LD extent.

PMID:35641759 | DOI:10.1007/978-1-0716-2237-7_5

Categories
Nevin Manimala Statistics

Genome-Wide Association Study Statistical Models: A Review

Methods Mol Biol. 2022;2481:43-62. doi: 10.1007/978-1-0716-2237-7_4.

ABSTRACT

Statistical models are at the core of the genome-wide association study (GWAS). In this chapter, we provide an overview of single- and multilocus statistical models, Bayesian, and machine learning approaches for association studies in plants. These models are discussed based on their basic methodology, cofactors adjustment accounted for, statistical power and computational efficiency. New statistical models and machine learning algorithms are both showing improved performance in detecting missed signals, rare mutations and prioritizing causal genetic variants; nevertheless, further optimization and validation studies are required to maximize the power of GWAS.

PMID:35641758 | DOI:10.1007/978-1-0716-2237-7_4

Categories
Nevin Manimala Statistics

Designing a Genome-Wide Association Study: Main Steps and Critical Decisions

Methods Mol Biol. 2022;2481:3-12. doi: 10.1007/978-1-0716-2237-7_1.

ABSTRACT

In this introductory chapter, we seek to provide the reader with a high-level overview of what goes into designing a genome-wide association study (GWAS) in the context of crop plants. After introducing some general concepts regarding GWAS, we divide the contents of this overview into four main sections that reflect the key components of a GWAS: assembly and phenotyping of an association panel, genotyping, association analysis and candidate gene identification. These sections largely reflect the structure of the chapters which follow later in the book, and which provide detailed discussions of these various steps. In each section, in addition to providing external references from the literature, we also often refer the reader to the appropriate chapters in this book in which they can further explore a topic. We close by summarizing some of the key questions that a prospective user of GWAS should answer prior to undertaking this type of experiment.

PMID:35641755 | DOI:10.1007/978-1-0716-2237-7_1