Elucidating genetic diversity within wild forms of modern crops is essential for understanding domestication and the possibilities of wild germplasm utilization. Gossypium hirsutum is a predominant source of natural plant fibers and the most widely cultivated cotton species. Wild forms of G. hirsutum are challenging to distinguish from feral derivatives, and truly wild populations are uncommon. Here we characterize a population from Mound Key Archaeological State Park, Florida using genome-wide SNPs extracted from 25 individuals over three sites. Our results reveal that this population is genetically dissimilar from other known wild, landrace, and domesticated cottons, and likely represents a pocket of previously unrecognized wild genetic diversity. The unexpected level of divergence between the Mound Key population and other wild cotton populations suggests that the species may harbor other remnant and genetically distinct populations that are geographically scattered in suitable habitats throughout the Caribbean. Our work thus has broader conservation genetic implications and suggests that further exploration of natural diversity in this species is warranted.
Discovery of genetic diversity left behind by the bottleneck of cotton domestication
This is needs some work to clean the code and put things in a logical ways! - Weixuan is on it 😝
-
All AD1 65 samples' raw reads were trimmed with Trimmomatic.
-
Following sentieon-dnaseq, trimmmed reads were mapped to the reference genome; then GVCF calling; VCF calling.
-
Using biallelic SNPs to estimate population genetic groups via PLINK (PCA) and LEA.
-
Building a rooted NJ-tree via PLINK & ape using bialleic SNPs by including additional two AD4 outgroups.
-
Pixy was applied to calculate Pi, Dxy and Fst of 65 AD1 samples between four groups and MK cotton.
-
Vcftools was applied to cacluate He, Fis
-
PopLDdecay was applied to estimate LD decay between all five groups.
- Bcftools and awk was applied to tabulate the novel SNPs.