Friday, 1 February 2013

KEGGWatch, part III

In which I finally get around to sharing some code, and give some examples of downloading and modifying KEGG pathway maps.

As you may have seen in parts I and II, I've been thinking about writing some Python code to grab, parse, modify, and visualise KGML files.

tl;dr - I wrote it, and it's up here (https://github.com/widdowquinn/KGML) in rough form. Have fun, and let me know if you've got any problems or suggestions.

Structure

The module has four main files, KGML_parser.py, KGML_pathway.py, KGML_scrape.py and KGML_vis.py. There's also a unit test file test_KGML.py, with example files in the KEGG subdirectory.

KGML_pathway.py contains classes that collectively represent a KGML pathway map. The model follows the KGML specification quite closely, having a 'root' Pathway object that contains Entry, Reaction and Relation objects. These are organised just like KGML's hierarchy, which makes it nice and easy to use ElementTree to recombine elements (possibly after modification or trimming) into valid KGML for output. There's a certain amount of cross-referencing between reactions, relations and other entries to maintain self-consistency, and quite a few property decorations so that we can handle 'bounding boxes' for graphics elements, composite features, and have a more sensible internal representation for element property values, but it's all fairly straightforward.

KGML_parser.py provides a parser that returns a Pathway object. We only expect one pathway map per KGML file, so the read() function throws an error if it finds more. ElementTree is used to parse the KGML itself.

KGML_vis.py mainly provides a KGMLCanvas object that is a Reportlab Canvas-based representation of the pathway map. The idea is to be as simple as possible for basic use, so that you instantiate a KGMLCanvas with a Pathway, provide some formatting options, and call the draw() method. Since we may want to write out KGML that maintains modifications we make to the pathway and its representation, all changes to the representation of the pathway are made through the Pathway object, directly. Those changes can then be saved by writing the KGML returned by the Pathway.get_kgml() method to a file.

KGML_scrape.py provides helper functions to grab KGML from the KEGG site in raw form, as a stream/handle, as a Pathway object, or to write it to a file. There are also a couple of handy lists of the metabolic and non-metabolic pathway IDs (as at January 2013).

Examples

The simplest useful operation is probably just downloading a given KEGG pathway map to a local KGML file.  For this, you can use one of the utility functions for grabbing data from KEGG, found in KGML_scrape.py:

from KGML_scrape import retrieve_kgml_to_file
retrieve_kgml_to_file('ddc00190', 'ddc00190.kgml')
This two-liner grabs the ddc00190 pathway map, and writes it to ddc00190.kgml. From there we can treat it like any other KGML file in any pipeline we like.

Alternatively, if we want to deal directly with KGML in our code, and don't want to write an intermediate file, we can use KGML_scrape's functions to obtain the KGML as a handle, a string, or a KGMLPathway object, as we can see from the iPython session:

In [1]: from KGML_scrape import *
In [2]: ex1 = retrieve_kgml('eco00010')
In [3]: ex1[:100]
Out[3]: '<?xml version="1.0"?>\n<!DOCTYPE pathway SYSTEM "http://www.genome.jp/kegg/xml/KGML_v0.7.1_.dtd">\n<!-'
In [4]: ex2 = retrieve_kgml_stream('ype02040')
In [5]: type(ex2)
Out[5]: instance
In [6]: ex2.readline()
Out[6]: '<?xml version="1.0"?>\n'
In [7]: ex3 = retrieve_KEGG_pathway('ara01120')
In [8]: ex3
Out[8]: <KGML_Pathway.Pathway at 0x10f5e7bd0>
In [9]: print ex3
Pathway: Microbial metabolism in diverse environments
KEGG ID: path:ara01120
Image file: http://www.genome.jp/kegg/pathway/ara/ara01120.png
Organism: ara
Entries: 1662
Entry types:
ortholog: 447
gene: 291
compound: 884
map: 39
which is convenient for interactive use.

Example 1
To see the different forms of representation for one of the 'large' (ko01100, ko01110 and ko01120) maps, we can use this example code:

import KGML_parser
from KGML_scrape import retrieve_KEGG_pathway
from KGML_vis import KGMLCanvas
# Get the ko01110 map from KEGG, and write it out to file, visualised as
# the .png, and as the elements from the KGML file
pathway = retrieve_KEGG_pathway('ko01110')
kgml_map = KGMLCanvas(pathway, show_maps=True)
# Default settings are for the KGML elements only
kgml_map.draw('ex1_kgml_render.pdf')
# We need to use the image map, and turn off the KGML elements, to see
# only the .png base map. We could have set these values on canvas
# instantiation
kgml_map.import_imagemap = True
kgml_map.show_maps = False
kgml_map.show_orthologs = False
kgml_map.draw_relations = False
kgml_map.show_compounds = False
kgml_map.show_genes = False
kgml_map.draw('ex1_png_render.pdf')
# And rendering elements as an overlay
kgml_map.show_compounds = True
kgml_map.show_genes = True
kgml_map.show_orthologs = True
kgml_map.draw('ex1_overlay_render.pdf')
view raw kgml_ex1.py hosted with ❤ by GitHub
Here, the (near-)default rendering option is to show only the KGML entries with graphics elements. This renders at full-size, and mutes the colouring of any compounds that don't take part in any reaction for which there is a connecting ortholog.
KGML element-only rendering of ko01100


We can also render only the KEGG-drawn .png map, which I prefer for the formatting of the map elements that indicate where the other more specific KEGG pathway maps connect to this large metabolic overview.
KEGG-drawn .png-only rendering of ko01100


Finally, we render a hybrid, which retains the KEGG-drawn .png, but overlays the KGML information (which we can also modify).
Hybrid KEGG-drawn .png with KGML element overlay for ko01100


Example 2
For the next example we look at a similar rendering for a non-metabolic pathway, for which we need the KEGG-drawn .png to make sense of the KGML. I'm going for some blatant self-promotion and using Biopython's ColorSpiral utility (more on that, here).

import KGML_parser
from KGML_scrape import retrieve_KEGG_pathway
from KGML_vis import KGMLCanvas
from Bio.Graphics.ColorSpiral import ColorSpiral
# Get the ko03070 map from KEGG, and write it out to file, visualised as
# the .png, and as the elements from the KGML file
pathway = retrieve_KEGG_pathway('ko03070')
kgml_map = KGMLCanvas(pathway, show_maps=True)
# Let's use some arbitrary colours for the orthologs
cs = ColorSpiral(a=2, b=0.2, v_init=0.85, v_final=0.5,
jitter=0.03)
# Loop over the orthologs in the pathway, and change the
# background colour
orthologs = [e for e in pathway.orthologs]
for o, c in zip(orthologs,
cs.get_colors(len(orthologs))):
for g in o.graphics:
g.bgcolor = c
# Default settings are for the KGML elements only
kgml_map.draw('ex2_kgml_render.pdf')
# We need to use the image map, and turn off the KGML elements, to see
# only the .png base map. We could have set these values on canvas
# instantiation
kgml_map.import_imagemap = True
kgml_map.show_maps = False
kgml_map.show_orthologs = False
kgml_map.draw_relations = False
kgml_map.show_compounds = False
kgml_map.show_genes = False
kgml_map.draw('ex2_png_render.pdf')
# And rendering elements as an overlay
kgml_map.show_compounds = True
kgml_map.show_genes = True
kgml_map.show_orthologs = True
kgml_map.draw('ex2_overlay_render.pdf')
view raw kgml_ex2.py hosted with ❤ by GitHub
Just rendering the KGML elements shows exactly what is present, and can be modified:

KGML-only rendering of ko03070


The KEGGsketch .png looks gorgeous:

KEGG image map .png rendering of ko03070

And overlaying our data takes advantage of this image for context, but lets us add our own information:
Hybrid rendering of ko03070
which would be useful, for example, for indicating expression/transcription levels, or sequence similarity in heatmap form, or for showing presence/absence information.

Example 3
Now for something a little more complicated. Let's try to enhance the visibility of a set of pathways. The ko01100 pathway map should contain glycolysis and the TCA cycle, so we'll try to show the routes through these processes as thicker lines than usual:

import KGML_parser
from KGML_scrape import retrieve_KEGG_pathway
from KGML_vis import KGMLCanvas
# Get list of pathway elements to enhance
glyc_path = retrieve_KEGG_pathway('ko00010')
tca_path = retrieve_KEGG_pathway('ko00020')
enhance_list = []
for pathway in (glyc_path, tca_path):
for e in pathway.entries.values():
enhance_list.extend(e.name.split())
enhance_list = set(enhance_list)
# Get the pathway we want to render, and make all the lines
# that are also in glycolysis or TCA pathways thicker
met_pathway = retrieve_KEGG_pathway('ko01100')
mod_list = [e for e in met_pathway.entries.values() if \
len(set(e.name.split()).intersection(enhance_list))]
for e in mod_list:
for g in e.graphics:
g.width = 10
kgml_map = KGMLCanvas(met_pathway, show_maps=True)
kgml_map.draw('ex3_thick.pdf')
# Thin out any lines that aren't in the glycolysis/TCA pathways
mod_list = [e for e in met_pathway.entries.values() if \
not len(set(e.name.split()).intersection(enhance_list)) \
and e.type != 'map']
for e in mod_list:
for g in e.graphics:
g.width = .4
kgml_map.draw('ex3_thin.pdf')
# Or turn them grey, maybe:
for e in mod_list:
for g in e.graphics:
g.fgcolor = '#CCCCCC'
kgml_map.draw('ex3_grey.pdf')
view raw kgml_ex3.py hosted with ❤ by GitHub
which renders like this:
ko01100 with selected elements thickened
and, if we reduce the visibility of the other pathway components by thinning the lines, we get:
ko01100 with unselected elements thinned
Going even further, we can take all the non-matching components to grey:
ko01100 with unselected elements rendered to grey
Which I can imagine being useful for indicating, say, steady-state fluxes or elementary modes, amongst other things.

What next?

Well, the code's now in a repository at github: https://github.com/widdowquinn/KGML, and I hope that Biopython might take it up (in slightly tidier form), shortly.  In the meantime, I hope you find it useful.

2 comments:

  1. Hi,

    Thank you for this post it does exactly what I am looking to do but I note that it has now been deprecated and incorporated into Biopython. Is the usage of the Biopython KGML module identical to the above? Is there a tutorial or examples for using KGML within Biopython available anywhere?

    Thanks,
    Chris

    ReplyDelete
    Replies
    1. Thanks for you kind comments, Chris - I'm so pleased you find it useful.

      The main thing that has changed for Biopython is how you get data from KEGG - there's a much nicer interface that uses the REST API. The object model remains the same, though there are some colour/color spelling changes for consistency with the rest of Biopython.

      There's a short iPython notebook introduction at https://nbviewer.jupyter.org/github/widdowquinn/notebooks/blob/master/Biopython_KGML_intro.ipynb - we really ought to do more documentation, but if you fancy the challenge, any contributions in that area are very much appreciated ;)

      L.

      Delete