Introducing Mesquite Using Mesquite How Mesquite Works
What does it do? Why Mesquite? Other programs History/Acknowledgments Publishing Results Support

Why Mesquite was made

We give two answers, the practical and the poetic, and a comment on the relationship between MacClade and Mesquite.


The practical answer

Mesquite represents a new approach to computing for evolutionary biology. In recent years there has been a proliferation of computer programs for phylogenetic analysis, each designed for some particular analysis (e.g., see Felsenstein's compilation of programs). As these often involve unique file formats and user interfaces, it is difficult for users to move from one to another. Users tend to become constrained to a few familiar analyses, since any given program can't do everything, and each program has costs in learning. As a programmer one would like to respond by making a program that does everything, but there are now too many analyses available or conceivable for a single programmer or programming team to keep up. We have felt these constraints with MacClade: some users perform particular analyses in MacClade not because they are the most appropriate analyses for their questions, but simply because they are available in a familiar program. We would like to add more flexibility to MacClade, but in a monolithic program this can be difficult to do, and even if easy, there are more proposed methods than we could maintain in MacClade.

Hence, our goal was to design a general system for phylogenetic computing to which different programmers could contribute modules. Bringing different analytical tools into a common system increases possible analyses more than additively; as we will explain, the effects are closer to multiplicative. In the end, the system has grown beyond being strictly phylogenetic, including capabilities for calculations involving characteristics of many organisms (e.g. population genetics and morphometrics) that need not involve phylogeny.

A second advantage of Mesquite is that it has a graphical user interface and yet will operate, more or less without modification, under different operating systems (being written in Java).

Modularity and Flexibility

"Modularity" in computer progamming might follow different models. It could follow the "Mr. Potato Head" model, in which there is a central program to which different peripheral calculations can be attached in specific places. This allows useful, but limited, flexibility. Or, modularity could follow the "Lego" model, in which building blocks are attached to other building blocks, and so on indefinitely. This allows nearly unlimited flexibility. Mesquite's modularity is somewhat of a hybrid between these: there is a (small) central starting point to which modules attach, but from there modules can be attached to modules attached to modules, indefinitely, leading to great flexibility in the analyses that can be constructed.

To give an idea of the flexibility, consider the calculation of the parsimony score of a tree, the treelength. A treelength calculating module takes as input information a tree, and responds by returning its length. Such a module belongs to the general class of modules that return a number when passed a tree. Other modules belonging to this class ("NumberForTree") could return the likelihood of the tree, or a measure of the asymmetry of the tree's branching, or a measure of the tree's discordance with a containing species tree. A Tree Legend module can be written (and has been) that displays the treelength in a legend in the tree window, but the Legend module is designed so that the user can choose to display any other number for the tree, such as its likelihood, asymmetry, or discordance. If a programmer creates a new module to calculate a number for a tree such as the longest branch-length path from root to tip, and a user installs the module, then the longest path measurement would automatically become another option for the tree legend.

The Tree Legend is not the only place where analyses could use numbers for trees. A charting module could display the numbers calculated for a whole series of trees, or a tree search module could use the numbers to find a tree with minimum or maximum values for the number. When such modules are made, they can automatically have access to whatever NumberForTree modules are available. Thus, the chart could show treelength, or likelihood, or asymmetry, or discordance, or longest path. Likewise, the tree search module could seek to optimize any of those. If a programmer makes a new module to analyze numbers for trees, then suddenly all existing NumberForTree modules have a new context in which they can be analyzed. If a new NumberForTree module is made, it will appear as a new option under each of the modules making use of NumberForTree. Hence the number of alternative analyses rises as the product of numbers of modules of different interacting types.

Of course, the trees used had to come from somewhere. One module might supply the trees stored in a file, another might simulate trees using a simple markovian model of speciation and extinction, another might simulate trees as gene trees coalescing within a species tree. Characters likewise might come from a stored matrix, or might be simulated by a stochastic module of evolution, or might represent reshufflings of existing characters. This means that any calculations using trees or characters can either do their calculations on observed data and reconstructed trees, or can derive null distributions under stochastic models. The calculations don't have to do anything special to achieve this flexibility; they simply let the user choose the sources of trees and characters.

(For more details about modularity, see How Mesquite works)

A community of programmers

Our hope is that building-block style of the Mesquite system will encourage programmers to write modules for their own favorite analyses. Another attraction of the Mesquite system is that many of the details of reading and writing of files, user interface and graphical display are already taken care of, and the programmer might worry only about a single calculation. The system is built in Java and is therefore platform independent. It is also possible for programmers to build modules in languages other than Java, if they would rather write in C, C++, or some other language.

We have attempted to design the system so that a programmer's efforts can be recognized as an independent, citable contribution. Modules or suits of modules can have their own names, own manuals, be distributed and cited separately. They simply run within the Mesquite system.

We are expecting to make the code of the core libraries of Mesquite available, and to have the core libraries and at least the basic modules freely downloadable off of the web. It is likely we will move to some style of open source (though perhaps with restrictions on independent redistribution of modified code -- it is not clear to us how appropriate such redistribution would be for scientific software of this sort).


The poetic answer

The goals of Mesquite are these:

To change the economics of imagination in evolutionary biology — There are three ways we envision Mesquite stimulating imaginative ideas and their successful spread:

To continue to promote a phylogenetic perspective in evolutionary biology — The last few decades have seen the realization of the importance of viewing organismal diversity and evolution in the light of phylogeny. Maddison and Pérez (2000) have characterized this revolution as parallel, and equally fundamental, to the revolution in cosmology from a Newtonian view of space to an Einsteinian view of space. As they note, we might say that phylogeny has curved the space of biological diversity, provided a distortion on the distribution of traits of organisms we see around us. However, Maddison and Pérez argue that it would be even more appropriate to say that the curvature of diversity-time is such that a straight line between two species is not the horizontal path through a multidimensional character space from one extant phenotype or genotype to another, but rather the straight line in diversity-time follows down the phylogeny from one species, to the common ancestor, and up the branches to the second species. If this path seems curved, it is merely because our perspective is twisted by the narrow time slice in which we live. MacClade and Mesquite are both designed to provide a corrective lens, to help us to see organisms and their traits naturally falling within this curved space along the phylogeny. Mesquite's modularity allows this perspective to be extended to fields such as morphometrics, in which a phylogenetic perspective has relatively recently begun to suffuse the field.

* W. Maddison and T. Pérez, 2000. Biodiversidad y lecciones de la historia. In: Enfoques contemporáneos para el estudio de la biodiversidad [Hernández, H.M., A. García Aldrete, F. Álvarez and M. Ulloa, editors]. Instituto de Biología, UNAM, Mexico. Pp. 201-220.


Which to use, Mesquite or MacClade?

Version 4 of MacClade was recently released (October 2000). The reader might wonder why we have been working on two different programming efforts, and whether they are intended for different uses. Although Mesquite's extensibility means that eventually it could take on all of the functions of MacClade, in fact for the near future Mesquite will not. Some calculations and functions of MacClade's tree window might not be available in Mesquite for a while, including particular charts (e.g., Changes and Stasis), equivocal cycling, some of the parsimony options (ordered, stratigraphic, Dollo), and some options for tree printing (e.g., saving Tree as graphics file or to clipboard). The most significant advances of MacClade 4 over MacClade 3 are in the data editor, where editing of molecular sequences is much more sophisticated, with tools for manual sequence alignment and on-the-fly translation to amino acids. MacClade's data editor might maintain important advantages over that in Mesquite for a while.

In addition, for many of its functions MacClade will remain faster and easier to use than Mesquite. The speed advantage is due primarily to its being in native code instead of Java. MacClade, being a non-extensible program written for a single operating system, has its components more tightly integrated that Mesquite's modules can be, and its user interface tailored for the MacOS. The means that users may find MacClade easier and simpler to use than Mesquite. While we have worked hard to make Mesquite easy to understand and use, its modular nature means it is unlikely to be as simple to the user as MacClade.

Thus, MacClade will continue to be used and useful, even though Mesquite is based on a newer architecture. MacClade has its strengths, and Mesquite will have different strengths. We are using MacClade 4 with our own data (when we get time to work on our own data...), and expect to continue using it indefinitely.


© W. Maddison & D. Maddison 2001