General Modularity Example Module Projects & Files Commands & Scripting
Windows Menus Charts Tables Buttons & tools
Trees and Taxa Characters & Models Documentation General Utilities
Character Data Character Models Also: class CharacterState class CharacterData class CharacterDistribution class CharacterHistory class CharacterModel class ModelSet

Character Models

(updated August 2005)

A model of character evolution could be anything from the specification of a single parameter (e.g. a rate of change) to something more complex, whether in the parsimony realm (an indication of the cost for a transition from one state to another) or the likelihood realm (a transition probability or rate matrix). In Mesquite, such models are represented by objects of the class CharacterModel. Such models can be used in calculations in various manners. Subclasses of the class CharacterModel include ParsimonyModels and ProbabilityModels.

We recommend you look to curator modules for examples of how models are managed:

To add a new model to Mesquite, you will have to add the model class and a curator module (see Brownian and its model). This would be enough for simulations. For likelihood you may have to do more (see Mk1 as an example).

Storage of Models

Storage in project : Models are stored in each project (MesquiteProject) by calling their addToFile method (inherited from FileElement) and passing the project as the second parameter. This in turn calls the project's addFileElement method, which then places the character model object in the charModels listable vector of the project. Information about the character models stored in the project can be obtained from the getCharacterModels, getNumModels, getCharacterModelNames, getCharacterModel and getDefaultModel methods of MesquiteProject.

Storage in saved files: Typically, specifications for the models with fixed parameters (e.g., the predefined ones) will not be saved in NEXUS files, but the specifications for models with parameters that need specification (e.g., cost matrices; stochastic models with rate parameters) will be saved in NEXUS files. For this reason some of the modules that manage special types of models are called "curators" (such modules are subclasses of CharModelCurator), because they must create instances of such models, provide the user with some way to change their parameters, and handle saving a specification of the parameters to the NEXUS file.

Parsimony models

Parsimony models are the equivalent of the "transformation types" of MacClade. Each model represents an assumption that specifies how many steps there are for a change from one state to another, and it may impose constraints on changes. For categorical characters, some of the basic models are that character states are unordered (one step for any change of state), ordered (number of steps between state i and j = |i-j| ), and irreversible (ordered with restriction that state cannot decrease). For continuous characters, the two basic models are linear (cost of a change from x to y is |x-y|) and squared change (cost of a change from x to y is (x-y)2). These models are "predefined" in the sense that objects to represent them are automatically created and added to the project by the standard modules that manage categorical and continuous characters.

Each of these models is represented in Mesquite by objects of different subclasses of Parsimony model. Thus, for each MesquiteProject, there is an object representing the unordered model, another representing the ordered model, and so on. Currently the predefined subclasses of these models are defined in the modules that manage particular data types (e.g., ManageCategoricalChars and ManageContChars). For instance, ManageCategoricalChars defines UnorderedModel, OrderedModel, and IrreversibleModel classes as subclasses of CategParsimonyModel. Once a project is established, the module adds an instance of each of these models to the project (the project maintains a record of all the defined character models). Likewise, ManageContChars creates objects of class ContParsimonyModel. to represent squared-change parsimony and linear parsimony. These models are then available to calculations modules.

The objects representing these predefined parsimony models do almost nothing (unlike the case of ProbabilityModels, which may do much more). The predefined ParsimonyModels do little more than hold a record of their names (e.g., "Unordered"), the token used for them in NEXUS files (e.g., "unord"), and the type of data to which they apply (e.g., CategoricalState.class). All of the burden of calculations under these models is placed on the modules doing the calculations. The models exist as objects merely to serve as references (e.g., a data matrix can store for each character a reference to which parsimony model currently applies) and to aid in the compilation of available models for menus and lists.

In additon to these predefined parsimony models, modules (especially the CharModelCurators) may define their own classes of parsimony models. For instance, the StepMatrixCurator module can read and write cost matrices from and to NEXUS files, and also presents the user with a window by which to create new cost matrices or edit existing ones. When a new cost matrix object is created by reading a NEXUS file or at user's request, the curator adds it to the project so it is available to other modules. The CostMatrix subclass of CategParsimonyModel is defined in the categorical library and is thus available to modules doing calculations.

Modules coordinating parsimony calculations currently check to see what is the current model applied to a particular character, then seek a calculation module that is capable of dealing with that subclass of model. The appropriate method of the calculation module is called and the model object passed to it.

Probability models

Probability models are used both for likelihood calculations, and for stochastic simulations of evolution. They are subclasses of ProbabilityModel, and include subclasses of ProbabilityCategCharModel for categorical characters, and ProbabilityContCharModel for continuous-valued characters. At present, there are none built into the libraries, but Jukes-Cantor (for categorical characters) and Brownian (for continuous-valued characters) are defined in modules and are treated as defaults. Each character in a matrix is assigned a ProbabilityModel either explicitly or as defaults.

For simulations of character evolution, the model's evolveState(CharacterState beginState, CharacterState endState, Tree tree, int node) method can be used to generate a stochastically evolved state at the end of the branch, given the state at the beginning of the branch. The beginState CharacterState must be filled with the value of the state at the beginning of the branch (and must be of the appropriate data type for the model, e.g. categorical or continuous), and the endState object must be instantiated and of the appropriate type (in order to receive the resulting state). Subclasses of ProbabilityModel for particular data types have variants of the evolveState method which are tuned to their data type and which may be used for greater speed (i.e., for categorical characters they would take an int as a parameter for the beginning state, and return an int as the endState).

For likelihood calculations, there are not yet general (independent of data type) methods for returning probabilities, because for some data types probabilities would be appropriate, for others, probability densities. In ProbabilityCategCharModel, the method transitionProbability(int beginState, int endState, Tree tree, int node) returns the probability of a change from beginState to endState along the node of the tree.

Some probability-based calculations may use an existing, fully specified model currently stored in the project. This might be done for simulations, for example. Such models might simply be chosen by using the getCharacterModel and related methods of MesquiteProject. A method that allows the user to select an appropriate CharacterModel is the static method CharacterModel.chooseExistingCharacterModel(MesquiteModule m, Class modelClass, String explanation) of CharacterModel. Alternatively, such an existing model may be chosen because it is currently assigned to a particular character. A character's currently assigned model can be determined by hiring the ProbModelSource module "CurrentProbModels". A calculation might use a character's currently assigned model if it is calculating likelihood with fixed rates. If you want to use a model that is fully specified and constant, you may need to create and edit the model before using it.

Other probability-based calculations might instead take a new instance of a class of model (e.g., JukesCantorModel) and perhaps even adjust/estimate its parameters during the course of calculations. Currently, to do this, one can call the makeNewModel method of any available CharModelCurator modules. While one could hire such a module to obtain a model, there should be already-instantiated curator modules that are employees of the module ManageCharModels. One can obtain a list of these curators via the CharacterModel.findCurators(MesquiteModule m, Class modelClass).

A method that returns a new model, allowing the user to select the type, is the static method CharacterModel.chooseNewCharacterModel(MesquiteModule m, Class modelClass, String explanation) of CharacterModel. The modelClass passed is the general subclass desired (e.g., ProbabilityCategCharModel); the method will take care of finding what curators are available for specific subtypes.

Currently (May 2000), likelihood calculations use the fixed models currently applied to the characters, and the current branch lengths of the tree. Obviously alternative calculations should be eveloped which allow the branch lengths of the tree to be estimated in the process, and which allow the parameters of the model to be estimated in the process. For the former, it would make sense if the calculating module would maintain its own copy of the branch lengths that it adjusts, so as not to adjust any assigned branch lengths the belong to the tree itself. For the latter, some sort of estimateParameters(CharacterDistribution observedStates, Tree tree) needs to be made a standard part of ProbabilityModels.

Model Sets

To each character in a matrix, a character model can be assigned. A set of assignments for all the characters of a matrix is a ModelSet. ModelSets are (more or less) the equivalent of NEXUS's TYPESET. A whole series of alternative ModelSets can be associated with a matrix (i.e., with a CharacterData), but only one of them is treated as indicating the "current" assignment of models to characters. ModelSets can be associated with CharacterData by virtue of CharacterData being a subclass of AssociableWithSpecs.

Subclasses of ModelSet include ParsimonyModelSet, which assigns ParsimonyModels to characters and is the equivalent of a TYPESET, and ProbabilityModelSet which assigns ProbabilityModels to characters. In the NEXUS file format, TYPESET is currently used for ParsimonyModelSet and PROBMODELSET for ProbabilityModelSet.


© W. Maddison & D. Maddison 2005