-
LPPH - A Linear-Time Algorithm for Perfect Phylogeny Haplotyping
LPPH is a linear-time program for inferring haplotypes from genotypes to determine if there are resulting haplotypes that fit a tree model (i.e. a perfect phylogeny, a coalescent).
In more population genetic terms, LPPH determines whether a set of SNP genotypes can be explained by haplotype pairs that could have evolved on a coalescent under the no-recombination, infinite sites model. Hence it determines for SNP genotype data what the three or four gametes test (depending on whether the tree is rooted or not) determines for haplotype data.
-
CCHI - Clark Consensus Haplotype Inference
Program CCHI implements the algorithm in the following paper:
Analysis and Exploration of the Use of Rule-Based Algorithms and Consensus Methods for the Inferral of Haplotypes.
Steven Hecht Orzack, Daniel Gusfield, Jeffrey Olson, Steven Nesbitt, Lakshman Subrahmanyan, and Vincent P. Stanton, Jr.
Genetics, Vol. 165, 915-928, October 2003.
Abstract of the paper:
The difficulty of experimental determination of haplotypes from phase-unknown genotypes has stimulated the development of nonexperimental inferral methods. One well-known approach for a group of unrelated individuals involves using the trivially deducible haplotypes (those found in individuals with zero or one heterozygous sites) and a set of rules to infer the haplotypes underlying ambiguous genotypes (those with two or more heterozygous sites). Neither the manner in which this "rule-based" approach should be implemented nor the accuracy of this approach has been adequately assessed. We implemented eight variations of this approach that differed in how a reference list of haplotypes was derived and in the rules for the analysis of ambiguous genotypes. We assessed the accuracy of these variations by comparing predicted and experimentally determined haplotypes involving nine polymorphic sites in the human apolipoprotein E (APOE) locus. The eight variations resulted in substantial differences in the average number of correctly inferred haplotype pairs. More than one set of inferred haplotype pairs was found for each of the variations we analyzed, implying that the rule-based approach is not sufficient by itself for haplotype inferral, despite its appealing simplicity. Accordingly, we explored consensus methods in which multiple inferrals for a given ambiguous genotype are combined to generate a single inferral; we show that the set of these "consensus" inferrals for all ambiguous genotypes is more accurate than the typical single set of inferrals chosen at random. We also use a consensus prediction to divide ambiguous genotypes into those whose algorithmic inferral is certain or almost certain and those whose less certain inferral makes molecular inferral preferable.
This work was supported by NSF grant IIS 0513910.
|