Skip to content

June 2018

Phylogenetics in the microbiome with SuchLinkedTrees

Article available as a Jupyter Notebook: SuchTree_2.ipynb

In the last article, we saw how to use SuchTree to probe the topology of very large trees. In this article, we're going to look at the other component of the package, SuchLinkedTrees.

If you are interested in studying how two groups of organisms interact (or, rather, have interacted over evolutionary time), you will find yourself with two trees of distinct groups of taxa that are linked by a matrix of interaction observations. This is sometimes called a 'dueling trees' problem.

dueling trees

If the trees happen to have the same number of taxa, and the interaction matrix happens to be a unit matrix, then you can compute the distance matrix for each of your trees and use the Mantel test to compare them. However, this is a pretty special case. Hommola et al. describe a method extends the Mantel test in this paper here :

This is implemented in scikit-bio as hommola_cospeciation. Unfortunately, the version in scikit-bio does not scale to very large trees, and does not expose the computed distances for analysis. This is where SuchLinkedTrees can help.

Analysis of giant phylogenetic trees with SuchTree

Article available as a Jupyter Notebook: SuchTree_1.ipynb

There are a lot of packages for working with and manipulating phylogenetic trees using python. Rather than compete with packages like dendropy and ete3 on the basis of features, SuchTree does one thing well -- its memory usage and algorithmic complexity scale linearly with the number of taxa in your tree. If you need to work with very large trees, this is very helpful.

Let's have a look at some useful things you can do with trees using the SuchTree class. First, let's get our modules loaded.

To run this notebook on an Ubuntu host, you will need the following system packages installed for ete3 to work :

  • python3-pyqt4.qtopengl
  • python3-pyqt5.qtopengl

and python packages :

  • SuchTree
  • pandas
  • cython
  • scipy
  • numpy
  • matplotlib
  • seaborn
  • fastcluster
  • dendropy
  • ete3

and obviously you'll want jupyter installed so you can run the notebook server. The Internet is full of opinions about how to set up your python environment. You should find one that works for you, but this guide is as good as any to get you started.

I'm going to start off by loading the required packages and suppressing some warnings that should be fixed in the next stable release of seaborn.

I'm going to assume that you are running this notebook out of a local copy of the SuchTree repository for any local file paths.