Skip to main content

plotting - How to work with phylogenetic trees?


A phylogenetic tree can be represented as a comma-bracket string by the Newick format (see the Newick Standard, or this site for syntax details; here is an online tree visualizer). A simple Newick string example is resolved as follows:


"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;"

enter image description here


Note, that a (b1, b2, ...)node represents a tree rooted at node, with branches in the parenthesis; any node can have a lable l and a distance d to its parent in the form l:d; multiple trees can be separated by a ;. The format is rather flexible, as is apparent from the wiki. Both the label and the distance can be omitted, hence ((,),,); is a valid tree.



  • How can one parse a Newick tree to a Mathematica-usable object? (The other way around was already covered.)

  • How to convert a tree to a Graph object?

  • How to work with trees? E.g. how to measure distances of leafs/nodes?


  • How to display a tree in standard cladogram format (orthogonal edges, correct branching coordinates, vertex and edge labels, etc.)?

  • What is the relationship between a phylogenetic tree, a TreeGraph, a Dendrogram, and a DendrogramPlot of the "HierarchicalClustering`" package?



Answer



Download the Phylogenetics` package here: Phylogenetics on GitHub (direct link to latest release: source zip), available on PackageData.net here. Documentation: notebook / pdf / dark themed pdf (for the eyestrained).




Version 1.1.0. is out (2017 01. 31.)



  • added: CommonAncestor, DivergenceTime, AncestorIndex, TreeDistanceMatrix, TreeTop, $PhylogeneticsVersionNumber.

  • fixed: Usage message corrections.


  • feature: Straight Cladogram edges, branching according to angle.

  • modified: LeafQ and NodeQ now can be applied to a single argument representing a node association.

  • renamed: SubnodeIndex to DescendantIndex.

  • modified: Code was made compatible down to Mathematica 10.0. No compatibility before that due to heavy usage of associations.




I had to deal with some phylogenetic trees in papers that did not publish exact data for all nodes (e.g. see this paper). I ended up coding a full package to handle Newick and other phylogenetic trees. Hope it could save some time for others.


The Phylogenetics` package can:



  • parse a Newick tree string to a Mathematica-usable Tree object and write a Tree object to a Newick string;


  • parse a Cluster object of the HierarchicalClustering` package to a Tree and convert a Tree to Cluster representation;

  • test properties of Tree objects;

  • extract vertices, edges, internal nodes, leafs, sub- and supertrees of Tree objects;

  • directly calculate various properties of Tree objects like paths and distances between vertices;

  • convert a Tree object to Graph, and a tree Graph to Tree; any tree graph (TreeGraph, ClusteringTree, etc.) for which TreeGraphQ returns True can be converted to Tree;

  • convert a Tree object or a tree Graph object to Cladogram graph,

  • provide some example trees in Newick and Cluster format.


The Tree representation follows closely the Newick format, but is a full representation where nothing is omitted. A tree is technically a directed, acyclic graph with an explicit metric defined along the branching dimension (instead of between leaf nodes as with e.g. Dendrogram). Each vertex is represented as an association of basic properties minimally required to describe a tree:




  • "Name" specifies the unique name of the vertex, just like in Graph; can be any expression.

  • "Label" specifies the label of the vertex that might appear when the tree is displayed as a graph; can be any expression.

  • "BranchLength" specifies the numerical distance of the vertex to its parent vertex.

  • "Type" specifies the type of the vertex, either as "Node" for internal nodes or "Leaf" for terminal nodes.


Nodes are formatted by default to appear simplified as grey boxes not to increase clutter; the tree structure is best visible in this form.


The package lives on GitHub (link in first line above), and I will try to maintain and develop it as my time allows. Full documentation and examples are in the Phylogenetics documentation.nb notebook. Please report issues and bugs here. Collaborations are very welcome!




Examples


Load the package and convert a Newick string to a Tree.



Needs["Phylogenetics`"];

tree = NewickToTree["(A:0.1,B:0.2,((G:.1,H:0.2,I:0.3)C:0.3,D:0.4):0.5)F;"]

Mathematica graphics


TreeToGraph[tree, ImageSize -> Small]

Mathematica graphics


There are many example trees provided through PhylogeneticData. Cladogram offers visualization specific for cladistics.


tree = NewickToTree[PhylogeneticData["ExampleNewick"]];

Cladogram[tree, EdgeLabels -> Automatic, Frame -> True,
EdgeLabelStyle -> Orange, LayerSizeFunction -> (1 # &),
ImageSize -> Medium, VertexSize -> 0.01]

Mathematica graphics


Note, that unnamed nodes are automatically assigned a unique index to avoid name collision. Vertices, edges, leafs, internal nodes, subtrees and supertrees can be extracted easily:


tree = NewickToTree["(A:0.1,B:0.2,((G:.1,H:0.2,I:0.3)C:0.3,D:0.4)E:0.5)F;"]
VertexList[tree]
LeafList[tree]
Subtree[tree, "C"]

Supertree[tree, {"H", "D"}]

Mathematica graphics


Cladogram[tree, GraphHighlight -> Subtree[tree, #], PlotLabel -> #, 
VertexLabels -> "Name"] & /@ {"F", "E", "C", "A"}

Mathematica graphics


Graphs can be converted directly to cladograms:


graphs = {KaryTree[10, ImageSize -> Tiny], 
CompleteKaryTree[3, 3, ImageSize -> Tiny],

StarGraph[6, ImageSize -> Tiny],
TreeGraph[{1 -> 2, 1 -> 3, 1 -> 4}, ImageSize -> Tiny]}
Cladogram[#, Orientation -> Top, ImageSize -> Tiny] & /@ graphs

Mathematica graphics


Hierarchical clusters can be also converted, with similarity values indicating branching distance.


cluster = PhylogeneticData["ExampleCluster1"]
tree = ClusterToTree[cluster]

Mathematica graphics



{TreeToGraph[tree, 
GraphLayout -> {"LayeredEmbedding", Orientation -> Top}],
Cladogram[tree, Orientation -> Top, ImageSize -> Small,
ImagePadding -> 10],
DendrogramPlot[cluster, Orientation -> Top, LeafLabels -> (# &)]}

Mathematica graphics


Extract paths and measure distances:


tree = NewickToTree["(A:0.1,B:0.2,((G:.1,H:0.2,I:0.3)C:0.3,D:0.4)E:0.5)F;"];
list = {{"I", "F"}, {"F", "I"}, {"H", "H"},{"H", "B"}};

path = TreePath[tree, #] & /@ list;
dist = TreeDistance[tree, #] & /@ list;

MapThread[
Cladogram[tree, GraphHighlight -> #1, ImageSize -> Small,
PlotLabel -> Row@{#1, ": ", #2}, VertexLabels -> "Name"] &, {path,
dist}]

Mathematica graphics


A serious phylogenetic tree (from Parfrey et al. 2011):



Cladogram[NewickToTree[PhylogeneticData["NewickEukarya"]], 
ImageSize -> 500, ImagePadding -> {{1, 100}, {1, 1}},
LayerSizeFunction -> (1/5 # &), VertexSize -> None,
VertexLabelStyle -> Directive[5, Italic], ImageSize -> Medium]

Mathematica graphics


Comments

Popular posts from this blog

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Adding a thick curve to a regionplot

Suppose we have the following simple RegionPlot: f[x_] := 1 - x^2 g[x_] := 1 - 0.5 x^2 RegionPlot[{y < f[x], f[x] < y < g[x], y > g[x]}, {x, 0, 2}, {y, 0, 2}] Now I'm trying to change the curve defined by $y=g[x]$ into a thick black curve, while leaving all other boundaries in the plot unchanged. I've tried adding the region $y=g[x]$ and playing with the plotstyle, which didn't work, and I've tried BoundaryStyle, which changed all the boundaries in the plot. Now I'm kinda out of ideas... Any help would be appreciated! Answer With f[x_] := 1 - x^2 g[x_] := 1 - 0.5 x^2 You can use Epilog to add the thick line: RegionPlot[{y < f[x], f[x] < y < g[x], y > g[x]}, {x, 0, 2}, {y, 0, 2}, PlotPoints -> 50, Epilog -> (Plot[g[x], {x, 0, 2}, PlotStyle -> {Black, Thick}][[1]]), PlotStyle -> {Directive[Yellow, Opacity[0.4]], Directive[Pink, Opacity[0.4]],