eScholarship Repository eScholarship Repository California Digital Library
eScholarship > UCLASTAT > PAPERS > Paper 2005040102

Statistics Papers

Statistics Website

Policies

Search Statistics

Submit a Paper

Notify me of new papers

institute_logo

Department of Statistics, UCLA
University of California, Los Angeles

Statistics Papers  •  Statistics Website  •  Policies  •  Search Statistics  •  Submit a Paper

Bayesian Gaussian Mixture Models for High Density Genotyping Arrays
Chiara Sabatti, Department of Statistics, UCLA
Kenneth Lange, Department of Statistics, UCLA

Download the Paper (4.0 MB, PDF file) - April 1, 2005 Tell a colleague about it.
Printing Tips: Select 'print as image' in the Acrobat print dialog if you have trouble printing.

ABSTRACT:
Affymetrix’s SNP (single nucleotide polymorphism) genotyping chips have increased the scope and decreased the cost of gene mapping studies. Because each SNP is queried by mul- tiple DNA probes, the chips present interesting challenges in genotype calling. Traditional clustering methods distinguish the three genotypes of a SNP fairly well given a large enough sample of unrelated individuals or a training sample of known genotypes. The present pa- per describes our attempt to improve genotype calling by constructing Gaussian penetrance models with empirically derived priors. The priors stabilize parameter estimation and borrow information collectively gathered on tens of thousands of SNPs. When data from related family members are available, Gaussian penetrance models capture the correlations in signals between relatives. With these advantages in mind, we apply the models to Affymetrix probe intensity data on 10,000 SNPs gathered on 63 genotyped individuals spread over eight pedigrees. We integrate the genotype calling model with pedigree analysis and examine a sequence of sym- metry hypotheses involving the correlated probe signals. The symmetry hypotheses raise novel mathematical issues of parameterization. Using the BIC criterion, we select the best combi- nation of symmetry assumptions. Compared to the genotype calling results obtained from Affymetrix’s software, we are able to reduce the number of no-calls substantially and quan- tify the level of confidence in all calls. Once pedigree analysis software can accommodate soft penetrances, we can expect to see more reliable association and linkage studies with less wasted genotyping data.

SUGGESTED CITATION:
Chiara Sabatti and Kenneth Lange, "Bayesian Gaussian Mixture Models for High Density Genotyping Arrays" (April 1, 2005). Department of Statistics, UCLA. Department of Statistics Papers. Paper 2005040102.
http://repositories.cdlib.org/uclastat/papers/2005040102

 
bar
Open Archives Initiative eScholarship is a service of the California Digital Library bepress