Categories
Nevin Manimala Statistics

Machine learning approaches to identify genetic markers for goat climatic adaptation

3 Biotech. 2026 Jul;16(7):289. doi: 10.1007/s13205-026-04921-w. Epub 2026 Jun 27.

ABSTRACT

Whole-genome SNP data from 109 goats, including 55 indigenous Indian goats and 54 exotic goats, were analyzed to identify ancestry informative markers (AIMs) associated with climatic adaptation. After quality filtering, 41,254,122 SNPs were retained, and 10,000 SNPs were selected using FST, In, and delta statistics. A consensus approach identified 4,040 AIMs. Admixture analysis (K = 2) retained 43 individuals (17 cold-adapted and 26 hot-adapted) with ≥ 80% ancestry. Principal component analysis using 4,040 AIMs clearly separated climatic groups, with PC1 explaining 55.59% of the variance. Machine learning approaches refined AIMs for climatic classification. Random Forest identified 140 SNPs and achieved 100% test accuracy. Support Vector Machine selected 1,474 SNPs, Logistic Regression identified 241 SNPs, and k-Nearest Neighbors selected 147 SNPs, all achieving 100% test accuracy. XG-Boost selected 150 SNPs and achieved 89% test accuracy. After removing duplicates across models, 1,728 SNPs were retained for functional annotation. Validation using 706 shared markers in 54 exotic goats confirmed clear separation of hot- and cold-adapted populations. Variant annotation identified three key protein-coding genes: MSH5, PAPSS2, and SINHCAF. Missense variants in MSH5 (Ile581Val) and PAPSS2 (Ile225Val/Ile216Val) exhibited contrasting allele frequencies between climatic groups, while SINHCAF showed a synonymous variant (Ser153=) with distinct allele distribution patterns. Protein structural analysis revealed high-quality models, with > 90% residues in favoured Ramachandran regions. The MSH5 Ile581Val variant showed a stabilizing effect (ΔΔG = + 0.09 kcal/mol), whereas PAPSS2 variants (Ile225Val and Ile216Val) exhibited destabilizing effects (ΔΔG = – 0.72 and -1.24 kcal/mol) and moderate evolutionary conservation. Structural predictions further indicated that these variants were surface-exposed and potentially functionally relevant. These results highlight functionally important variants associated with climatic adaptation and demonstrate the effectiveness of integrating AIMs with machine learning to identify reduced marker sets. These findings further support the development of targeted, cost-effective SNP panels for climatic adaptation assessment and marker-assisted selection in goats.

SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-026-04921-w.

PMID:42371591 | PMC:PMC13310212 | DOI:10.1007/s13205-026-04921-w

By Nevin Manimala

Portfolio Website for Nevin Manimala