Skip to main content

plotting - How to work with phylogenetic trees?


A phylogenetic tree can be represented as a comma-bracket string by the Newick format (see the Newick Standard, or this site for syntax details; here is an online tree visualizer). A simple Newick string example is resolved as follows:


"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;"

enter image description here


Note, that a (b1, b2, ...)node represents a tree rooted at node, with branches in the parenthesis; any node can have a lable l and a distance d to its parent in the form l:d; multiple trees can be separated by a ;. The format is rather flexible, as is apparent from the wiki. Both the label and the distance can be omitted, hence ((,),,); is a valid tree.



  • How can one parse a Newick tree to a Mathematica-usable object? (The other way around was already covered.)

  • How to convert a tree to a Graph object?

  • How to work with trees? E.g. how to measure distances of leafs/nodes?


  • How to display a tree in standard cladogram format (orthogonal edges, correct branching coordinates, vertex and edge labels, etc.)?

  • What is the relationship between a phylogenetic tree, a TreeGraph, a Dendrogram, and a DendrogramPlot of the "HierarchicalClustering`" package?



Answer



Download the Phylogenetics` package here: Phylogenetics on GitHub (direct link to latest release: source zip), available on PackageData.net here. Documentation: notebook / pdf / dark themed pdf (for the eyestrained).




Version 1.1.0. is out (2017 01. 31.)



  • added: CommonAncestor, DivergenceTime, AncestorIndex, TreeDistanceMatrix, TreeTop, $PhylogeneticsVersionNumber.

  • fixed: Usage message corrections.


  • feature: Straight Cladogram edges, branching according to angle.

  • modified: LeafQ and NodeQ now can be applied to a single argument representing a node association.

  • renamed: SubnodeIndex to DescendantIndex.

  • modified: Code was made compatible down to Mathematica 10.0. No compatibility before that due to heavy usage of associations.




I had to deal with some phylogenetic trees in papers that did not publish exact data for all nodes (e.g. see this paper). I ended up coding a full package to handle Newick and other phylogenetic trees. Hope it could save some time for others.


The Phylogenetics` package can:



  • parse a Newick tree string to a Mathematica-usable Tree object and write a Tree object to a Newick string;


  • parse a Cluster object of the HierarchicalClustering` package to a Tree and convert a Tree to Cluster representation;

  • test properties of Tree objects;

  • extract vertices, edges, internal nodes, leafs, sub- and supertrees of Tree objects;

  • directly calculate various properties of Tree objects like paths and distances between vertices;

  • convert a Tree object to Graph, and a tree Graph to Tree; any tree graph (TreeGraph, ClusteringTree, etc.) for which TreeGraphQ returns True can be converted to Tree;

  • convert a Tree object or a tree Graph object to Cladogram graph,

  • provide some example trees in Newick and Cluster format.


The Tree representation follows closely the Newick format, but is a full representation where nothing is omitted. A tree is technically a directed, acyclic graph with an explicit metric defined along the branching dimension (instead of between leaf nodes as with e.g. Dendrogram). Each vertex is represented as an association of basic properties minimally required to describe a tree:




  • "Name" specifies the unique name of the vertex, just like in Graph; can be any expression.

  • "Label" specifies the label of the vertex that might appear when the tree is displayed as a graph; can be any expression.

  • "BranchLength" specifies the numerical distance of the vertex to its parent vertex.

  • "Type" specifies the type of the vertex, either as "Node" for internal nodes or "Leaf" for terminal nodes.


Nodes are formatted by default to appear simplified as grey boxes not to increase clutter; the tree structure is best visible in this form.


The package lives on GitHub (link in first line above), and I will try to maintain and develop it as my time allows. Full documentation and examples are in the Phylogenetics documentation.nb notebook. Please report issues and bugs here. Collaborations are very welcome!




Examples


Load the package and convert a Newick string to a Tree.



Needs["Phylogenetics`"];

tree = NewickToTree["(A:0.1,B:0.2,((G:.1,H:0.2,I:0.3)C:0.3,D:0.4):0.5)F;"]

Mathematica graphics


TreeToGraph[tree, ImageSize -> Small]

Mathematica graphics


There are many example trees provided through PhylogeneticData. Cladogram offers visualization specific for cladistics.


tree = NewickToTree[PhylogeneticData["ExampleNewick"]];

Cladogram[tree, EdgeLabels -> Automatic, Frame -> True,
EdgeLabelStyle -> Orange, LayerSizeFunction -> (1 # &),
ImageSize -> Medium, VertexSize -> 0.01]

Mathematica graphics


Note, that unnamed nodes are automatically assigned a unique index to avoid name collision. Vertices, edges, leafs, internal nodes, subtrees and supertrees can be extracted easily:


tree = NewickToTree["(A:0.1,B:0.2,((G:.1,H:0.2,I:0.3)C:0.3,D:0.4)E:0.5)F;"]
VertexList[tree]
LeafList[tree]
Subtree[tree, "C"]

Supertree[tree, {"H", "D"}]

Mathematica graphics


Cladogram[tree, GraphHighlight -> Subtree[tree, #], PlotLabel -> #, 
VertexLabels -> "Name"] & /@ {"F", "E", "C", "A"}

Mathematica graphics


Graphs can be converted directly to cladograms:


graphs = {KaryTree[10, ImageSize -> Tiny], 
CompleteKaryTree[3, 3, ImageSize -> Tiny],

StarGraph[6, ImageSize -> Tiny],
TreeGraph[{1 -> 2, 1 -> 3, 1 -> 4}, ImageSize -> Tiny]}
Cladogram[#, Orientation -> Top, ImageSize -> Tiny] & /@ graphs

Mathematica graphics


Hierarchical clusters can be also converted, with similarity values indicating branching distance.


cluster = PhylogeneticData["ExampleCluster1"]
tree = ClusterToTree[cluster]

Mathematica graphics



{TreeToGraph[tree, 
GraphLayout -> {"LayeredEmbedding", Orientation -> Top}],
Cladogram[tree, Orientation -> Top, ImageSize -> Small,
ImagePadding -> 10],
DendrogramPlot[cluster, Orientation -> Top, LeafLabels -> (# &)]}

Mathematica graphics


Extract paths and measure distances:


tree = NewickToTree["(A:0.1,B:0.2,((G:.1,H:0.2,I:0.3)C:0.3,D:0.4)E:0.5)F;"];
list = {{"I", "F"}, {"F", "I"}, {"H", "H"},{"H", "B"}};

path = TreePath[tree, #] & /@ list;
dist = TreeDistance[tree, #] & /@ list;

MapThread[
Cladogram[tree, GraphHighlight -> #1, ImageSize -> Small,
PlotLabel -> Row@{#1, ": ", #2}, VertexLabels -> "Name"] &, {path,
dist}]

Mathematica graphics


A serious phylogenetic tree (from Parfrey et al. 2011):



Cladogram[NewickToTree[PhylogeneticData["NewickEukarya"]], 
ImageSize -> 500, ImagePadding -> {{1, 100}, {1, 1}},
LayerSizeFunction -> (1/5 # &), VertexSize -> None,
VertexLabelStyle -> Directive[5, Italic], ImageSize -> Medium]

Mathematica graphics


Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...