Coalescence package for Mesquite
July 2001, version 0.98
This package of modules and library classes provides a few basic calculations, simulations and visualizations concerning gene tree coalescence in population genetics. It is a beta-test version. There is much more that can be done, and opportunities for other programmers to contribute coalescence calculations to Mesquite.
I plan to post the source code for this package soon under the LGPL license.
Just as MacClade was designed to provide tools to ask "what if the phylogenetic history had been like this", Mesquite is designed to extend such questions to other realms such as population genetics. Using, for instance, the insert node and the lineage width tools in the Tree Window, one can construct a population history with various expansions and contractions, and explore its consequences on its contained gene trees via simulations.The best way to understand what the package can do is to look at the example files (see below)
The coalescence package currently includes:
- Simple neutral coalescence simulations. Coalescent gene trees are simulated either within a single population, or within a tree of populations/species. These simulate the gene tree itself (topology and branch lengths); to simulate nucleotide sequence data, you'll need to simulate character evolution on these gene trees using the stochastic character evolution package.
- Reconstruction of the history of a gene tree within a species tree so as to minimize deep coalescences, and a tree drawing routine for species trees that draws their gene trees coalescing within them.
- Counting of Slatkin & Maddison's "s" for the discord between a gene tree and the different populations from which the genes were sampled.
- Counting of "deep coalescences" in the style of Maddison 1997 (Syst. Biol.)
- Various other modules that could be used outside of the context of coalescence, but which otherwise aren't part of the basic package of Mesquite modules (e.g., insert node, tree depth).
Thus, the modules are mostly focused on the topology of the gene tree and its relationship with a species tree. As shown in the examples, the package can be used to test some basic hypotheses in population genetics, especially hypotheses of population histories.
Effective population size is for the genes themselves, as if the organisms were haploid. Branch lengths of a population or species tree are treated as measured in units of generations. When a gene tree is drawn within a species tree, its branches are drawn green except if polytomies were automatically resolved to optimize fit to the species tree, in which case the resolved branches are drawn in magenta.
The modules included are:
- NeutralCoalescence ("Neutral Coalescence") -- Simulates neutral coalescence. Employed by CoalescentTrees and ContainedCoalescence in order to simulate gene trees.
- CoalescentTrees ("Coalescent Trees") -- Supplies trees simulated by a coalescent process. The effective population size is a parameter that can be adjusted.
- ContainedCoalescence ("Contained Coalescence within Current Tree") -- Supplies trees simulated by a coalescent process within the branches of a species tree obtained from a Tree Window or other tree context.
- ContainedCoalescenceMult ("Contained Coalescence in Species Trees") -- not yet ready.
Calculations with gene trees:
- RecCoalescenceHistory ("Reconstruct Deep Coalescence") -- Reconstructs the fit of a gene tree into a species tree so as to minimize deep coalescences in the sense of W. Maddison (1997, Syst. Biol.). Also counts the deep coalescence cost of such a fit. Options are (1) to treat the gene tree as rooted or unrooted and (2) to allow polytomies in the gene tree to resolve automatically to minimize deep coalescences further.
- DeepCoalescencesG ("Deep Coalescences (gene tree)") -- Counts the cost in deep coalescences to fit a gene tree in a species tree; treats this as a value for the gene tree.
- DeepCoalescencesSp ("Deep Coalescences (species tree)") -- Counts the cost in deep coalescences to fit a gene tree in a species tree; treats this as a value for the species tree.
- SlatkinMaddisonS ("s of Slatkin & Maddison") -- Counts the s value of Slatkin and Maddison (s is a measure of the discordance between a gene tree and a division into populations). Requires an available Association of genes into populations.
- TreeDepth ("Tree Depth") -- Determines the depth of the tree, measured as the sum of branch lengths from the root to the tallest terminal node.
- DeepCoalMultLoci ("Deep Coalescences Multiple Loci") -- not yet ready.
- ClosestCoalescence ("Closest coalescence between taxa") -- not yet ready.
- aCoalescencePkgIntro ("Coalescence Package Introduction") -- Introduces the coalescence package.
- LineageWidth ("Adjust lineage width") -- Allows the user to adjust the widths (e.g., effective population sizes) of branches of a population or species tree. This is not merely a graphical widening, but attaches a width parameter to the branches of the tree.
- InsertNode ("Insert Node") -- Inserts a node along a branch of a tree. This creates a node with only a single descendant. It can be used to break a branch into pieces, each of which is assigned its own effective population size (lineage width) and duration of time in generations (branch length).
The coalescence package depends on modules and libraries from the more general Taxa Associations package. The Taxa Associations package is not yet a standalone package, and is included in service of the coalescence routines. These are the included modules from the Taxa Associations package:
Management and Utilities:
- ManageAssociations ("Manage TaxaAssociation blocks") -- Reads and writes TaxaAssociation blocks to NEXUS files, and supervises their editing and manipulation by users.
- StoredAssociations ("Stored Taxa Associations") -- Supplies stored TaxaAssociations to calculations that need information on which taxa from one taxa block (e.g., genes) are associated with which taxa in another block (e.g., species).
- ManageDistributionBlock ("Read DISTRIBUTION blocks") -- Reads DISTRIBUTION blocks (e.g., used by Rod Page's GeneTree) in NEXUS files. Subsequent writing of the information is currently to separate TAXA, TREES and TaxaAssociation blocks.
Graphics and analysis:
- ContainedAssociates ("Contained Associates") -- Draws a species tree with broad branches, inside which are reconstructed and drawn gene trees within them.
What remains to be done
Improvements to the existing modules remain to be made, including making them more efficient and checking their calculations. The package has little documentation at the moment.
Many other calculations taking a gene tree perspective could be done, including those that have nucleotide sequence evolution occurring along the branches of the gene tree. This would allow direct comparison against observed sequences, without being forced to reconstruct a gene tree from the sequences before comparison with Mesquite's results. The solution to this will come from the stochastic character evolution package, which can be combined with the coalescence package to generate nucleotide sequences evolved on coalescence trees.
There are three folders (directories) whose contents need to be in the correct place for Mesquite to be able to use them. These three directories are called (1) "coalesce" and (2) "assoc" and (2) "treecomp". Find where you have Mesquite installed on your hard disk. These three directories should be in the "mesquite" directory within "Mesquite Folder".
There is a series of example data files in the directory "coalescence_examples". The files are self explanatory; begin with the file whose name begins with "00".
Since this package of modules is a beta-test version, it SHOULD NOT BE USED FOR PUBLISHED OR PRESENTED RESULTS. That is, it is not yet citable.
© Copyright 2001 W. Maddison