|
Statistics Papers
Statistics Website
Policies
Search Statistics
Submit a Paper
Notify me of new papers
|
 |

Bayesian Gaussian Mixture Models for High Density Genotyping Arrays
Chiara Sabatti, Department of Statistics, UCLA
Kenneth Lange, Department of Statistics, UCLA
ABSTRACT: Affymetrix’s SNP (single nucleotide polymorphism) genotyping chips have increased the
scope and decreased the cost of gene mapping studies. Because each SNP is queried by mul-
tiple DNA probes, the chips present interesting challenges in genotype calling. Traditional
clustering methods distinguish the three genotypes of a SNP fairly well given a large enough
sample of unrelated individuals or a training sample of known genotypes. The present pa-
per describes our attempt to improve genotype calling by constructing Gaussian penetrance
models with empirically derived priors. The priors stabilize parameter estimation and borrow
information collectively gathered on tens of thousands of SNPs. When data from related family
members are available, Gaussian penetrance models capture the correlations in signals between
relatives. With these advantages in mind, we apply the models to Affymetrix probe intensity
data on 10,000 SNPs gathered on 63 genotyped individuals spread over eight pedigrees. We
integrate the genotype calling model with pedigree analysis and examine a sequence of sym-
metry hypotheses involving the correlated probe signals. The symmetry hypotheses raise novel
mathematical issues of parameterization. Using the BIC criterion, we select the best combi-
nation of symmetry assumptions. Compared to the genotype calling results obtained from
Affymetrix’s software, we are able to reduce the number of no-calls substantially and quan-
tify the level of confidence in all calls. Once pedigree analysis software can accommodate
soft penetrances, we can expect to see more reliable association and linkage studies with less
wasted genotyping data.
SUGGESTED CITATION: Chiara Sabatti and Kenneth Lange,
"Bayesian Gaussian Mixture Models for High Density Genotyping Arrays"
(April 1, 2005).
Department of Statistics, UCLA.
Department of Statistics Papers.
Paper 2005040102.
http://repositories.cdlib.org/uclastat/papers/2005040102
|