Ancestry informative markers (AIMs), loci that effectively capture variation within population(s) of interest, can facilitate candidate gene and fine-structure association studies by allowing for efficient control of population stratification. AIMs for the domestic horse were identified from 43,106 autosomal SNPs using a cohort of 807 horses from 35 breeds. Velicer’s minimum average partial test was utilized along with principal component analysis to identify 33 significant principal components (PCs). The relationship between individual SNPs and PCs was assessed by the squared correlation (R2) between each SNP and each PC; statistical significance of individual SNPs was then determined based on a null distribution of R2 values created by permutation of genotypes within each locus. AIMs were selected from the set of SNPs by forward, step-wise, multivariate linear regression on the 33 significant PCs, selecting loci that capture the largest proportion of variation across the PCs while avoiding inclusion of loci in linkage disequilibrium. An overall R2 value was calculated for each set of AIMs by dividing the variance explained by those AIMs by the total variance across the 33 PCs. 800 SNPs account for all variation across the 33 PCs (R2=1), and subsets of 150, 300, and 500 SNPs account for 83%, 93% and 98% of variation, respectively. We demonstrated that these AIMs capture a significantly greater proportion of variance than randomly selected subsets of SNPs. The ability of AIMs to recapitulate admixture and clustering analyses in STRUCTURE, and their ability to correct for population stratification in association analysis is being assessed.

Identification of Ancestry Informative Markers in the Domestic Horse

FELICETTI, MICHELA;SILVESTRELLI, Maurizio;
2012

Abstract

Ancestry informative markers (AIMs), loci that effectively capture variation within population(s) of interest, can facilitate candidate gene and fine-structure association studies by allowing for efficient control of population stratification. AIMs for the domestic horse were identified from 43,106 autosomal SNPs using a cohort of 807 horses from 35 breeds. Velicer’s minimum average partial test was utilized along with principal component analysis to identify 33 significant principal components (PCs). The relationship between individual SNPs and PCs was assessed by the squared correlation (R2) between each SNP and each PC; statistical significance of individual SNPs was then determined based on a null distribution of R2 values created by permutation of genotypes within each locus. AIMs were selected from the set of SNPs by forward, step-wise, multivariate linear regression on the 33 significant PCs, selecting loci that capture the largest proportion of variation across the PCs while avoiding inclusion of loci in linkage disequilibrium. An overall R2 value was calculated for each set of AIMs by dividing the variance explained by those AIMs by the total variance across the 33 PCs. 800 SNPs account for all variation across the 33 PCs (R2=1), and subsets of 150, 300, and 500 SNPs account for 83%, 93% and 98% of variation, respectively. We demonstrated that these AIMs capture a significantly greater proportion of variance than randomly selected subsets of SNPs. The ability of AIMs to recapitulate admixture and clustering analyses in STRUCTURE, and their ability to correct for population stratification in association analysis is being assessed.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/764499
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact