-
Yun Song, Zhihong Ding, Dan Gusfield, Charles Langley, and Yufeng Wu. Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover Recombination in the Derivation of SNP Sequences in Populations. In Proceedings of RECOMB 2006, pages 231-245, 2006. ( paper ) ( program )
Meiotic recombination is a fundamental biological event and one of the principal evolutionary forces responsible for shaping genetic variation within species. In addition to its fundamental role, recombination is central to several critical applied problems. The most important example is ``association mapping" in populations, which is widely hoped to help
find genes that influence genetic diseases. Hence, a great deal of recent attention has focused on problems of inferring the historical derivation of sequences in populations when both
mutations and recombinations have occurred. In the algorithms literature, most of that recent work has been directed to single-crossover recombination. However, gene-conversion is an
important, and more common, form of (two-crossover) recombination which has been much less investigated in the algorithms literature. In this paper we explicitly incorporate gene-conversion into discrete methods to study historical recombination. We are concerned with algorithms for identifying and locating the extent of historical crossing-over and gene-conversion (along with single-nucleotide mutation), and problems of constructing full putative
histories of those events. The novel technical issues concern the incorporation of gene-conversion into recently developed discrete methods that compute lower and upper-bound information on the amount of needed recombination without gene-conversion. We first examine the most natural extension of the lower bound methods, showing that the extension can be computed efficiently, but that this extension can only yield weak lower bounds. We then develop additional ideas that lead to higher lower bounds, and show how to solve, via integer-linear programming, a more biologically realistic version of the lower bound problem. We also show how to compute effective upper-bounds on the number of needed single-crossovers and gene-conversions, along with explicit networks showing a putative history of mutations, single-crossovers and gene-conversions. We validate the significance of these methods by showing that they can be effectively used to distinguish simulation-derived sequences generated without gene-conversion from sequences that were generated with gene-conversion. We apply the methods to recently studied sequences of Arabidopsis thaliana, identifying many more regions in the sequences than were previously identified, where gene-conversion may have played a significant role.
-
Zhihong Ding, Vladimir Filkov, and Dan Gusfield. A linear-time algorithm for the perfect phylogeny haplotyping (PPH) problem. In Proceedings of RECOMB 2005, pages 585-600, 2005. ( paper ) ( program )
The full journal version of this paper has appeared in Journal of
Computational Biology, 13(2):522-553, 2006.
Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in Recomb 2002, the problem of finding a linear-time (deterministic, worst-case)
solution for it has remained open, despite broad interest in the PPH problem and a series of
papers on various aspects of it. In this paper we solve the open problem, giving a practical,
deterministic linear-time algorithm based on a simple data-structure and simple operations on
it. The method is straightforward to program and has been fully implemented. Simulations show
that it is much faster in practice than prior nonlinear methods. The value of a linear-time
solution to the PPH problem is partly conceptual and partly for use in the inner-loop of
algorithms for more complex problems, where the PPH problem must be solved repeatedly.
|