(iv) Users can exclude individual sequences from the export so that their effects on phylogenetic signal can be studied. (iii) The program generates taxon and character sets according to the user’s specifications. (ii) Information from the individual gene files such as character set ranges and codon positions are preserved and exported with the concatenated matrix. The graphical user interface allows users to check whether the concatenation captured all input data. (i) Concatenation is fast and intuitive, so that even very large datasets with several hundred taxa and genes can be assembled quickly. We introduce the concatenation tool SequenceMatrix, which has the following desirable properties. This has led to the undesirable situation that many multi-locus datasets are only assembled toward the end of a project although evaluating preliminary datasets is important for exploring their phylogenetic signal, assessing the effects of missing data, monitoring the progress of a project and/or identifying sequences that may have been compromised through laboratory contamination. Maddison and Maddison, 2001, 2009 Jones and Blaxter, 2006 Roure et al., 2007 Goloboff et al., 2008 Smith and Dunn, 2008), but the concatenation tools are generally not particularly user-friendly, often do not preserve character set or codon position information, have limitations on the number of partitions that can be concatenated, and/or make it difficult for the user to check for concatenation errors. Many software packages are capable of concatenating individual character and gene files into such sets (e.g. Modern phylogenetic analyses typically infer relationships using multi-gene datasets. SequenceMatrix is Java-based and compatible with the Microsoft Windows, Apple MacOS X and Linux operating systems. ![]() One tool lists identical or near-identical sequences within genes, while the other compares the pairwise distance pattern of one gene against the pattern for all remaining genes combined. SequenceMatrix also includes two tools that help to identify sequences that may have been compromised through laboratory contamination or data management error. ![]() Data matrices can be re-split into their component genes and the gene fragments can be exported as individual gene files. Entire taxa, whole gene fragments, or individual sequences for a particular gene and species can be excluded from export. SequenceMatrix also creates taxon sets listing taxa with a minimum number of characters or gene fragments, which helps assess preliminary datasets. Matrices with hundreds of genes and taxa can be concatenated within minutes and exported in TNT, NEXUS, or PHYLIP formats, preserving both character set and codon information for TNT and NEXUS files. Alternatively, GenBank numbers for the sequences can be displayed and exported. A multi-gene dataset is concatenated and displayed in a spreadsheet each sequence is represented by a cell that provides information on sequence length, number of indels, the number of ambiguous bases (“Ns”), and the availability of codon information. ![]() Genes are concatenated by dragging and dropping FASTA, NEXUS, or TNT files with aligned sequences into the program window. We present SequenceMatrix, software that is designed to facilitate the assembly and analysis of multi-gene datasets.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |