Friday, 1 February 2013

KEGGWatch, part III

In which I finally get around to sharing some code, and give some examples of downloading and modifying KEGG pathway maps.

As you may have seen in parts I and II, I've been thinking about writing some Python code to grab, parse, modify, and visualise KGML files.

tl;dr - I wrote it, and it's up here (https://github.com/widdowquinn/KGML) in rough form. Have fun, and let me know if you've got any problems or suggestions.

Structure

The module has four main files, KGML_parser.py, KGML_pathway.py, KGML_scrape.py and KGML_vis.py. There's also a unit test file test_KGML.py, with example files in the KEGG subdirectory.

KGML_pathway.py contains classes that collectively represent a KGML pathway map. The model follows the KGML specification quite closely, having a 'root' Pathway object that contains Entry, Reaction and Relation objects. These are organised just like KGML's hierarchy, which makes it nice and easy to use ElementTree to recombine elements (possibly after modification or trimming) into valid KGML for output. There's a certain amount of cross-referencing between reactions, relations and other entries to maintain self-consistency, and quite a few property decorations so that we can handle 'bounding boxes' for graphics elements, composite features, and have a more sensible internal representation for element property values, but it's all fairly straightforward.

KGML_parser.py provides a parser that returns a Pathway object. We only expect one pathway map per KGML file, so the read() function throws an error if it finds more. ElementTree is used to parse the KGML itself.

KGML_vis.py mainly provides a KGMLCanvas object that is a Reportlab Canvas-based representation of the pathway map. The idea is to be as simple as possible for basic use, so that you instantiate a KGMLCanvas with a Pathway, provide some formatting options, and call the draw() method. Since we may want to write out KGML that maintains modifications we make to the pathway and its representation, all changes to the representation of the pathway are made through the Pathway object, directly. Those changes can then be saved by writing the KGML returned by the Pathway.get_kgml() method to a file.

KGML_scrape.py provides helper functions to grab KGML from the KEGG site in raw form, as a stream/handle, as a Pathway object, or to write it to a file. There are also a couple of handy lists of the metabolic and non-metabolic pathway IDs (as at January 2013).

Examples

The simplest useful operation is probably just downloading a given KEGG pathway map to a local KGML file.  For this, you can use one of the utility functions for grabbing data from KEGG, found in KGML_scrape.py:

This two-liner grabs the ddc00190 pathway map, and writes it to ddc00190.kgml. From there we can treat it like any other KGML file in any pipeline we like.

Alternatively, if we want to deal directly with KGML in our code, and don't want to write an intermediate file, we can use KGML_scrape's functions to obtain the KGML as a handle, a string, or a KGMLPathway object, as we can see from the iPython session:

which is convenient for interactive use.

Example 1
To see the different forms of representation for one of the 'large' (ko01100, ko01110 and ko01120) maps, we can use this example code:

Here, the (near-)default rendering option is to show only the KGML entries with graphics elements. This renders at full-size, and mutes the colouring of any compounds that don't take part in any reaction for which there is a connecting ortholog.
KGML element-only rendering of ko01100


We can also render only the KEGG-drawn .png map, which I prefer for the formatting of the map elements that indicate where the other more specific KEGG pathway maps connect to this large metabolic overview.
KEGG-drawn .png-only rendering of ko01100


Finally, we render a hybrid, which retains the KEGG-drawn .png, but overlays the KGML information (which we can also modify).
Hybrid KEGG-drawn .png with KGML element overlay for ko01100


Example 2
For the next example we look at a similar rendering for a non-metabolic pathway, for which we need the KEGG-drawn .png to make sense of the KGML. I'm going for some blatant self-promotion and using Biopython's ColorSpiral utility (more on that, here).

Just rendering the KGML elements shows exactly what is present, and can be modified:

KGML-only rendering of ko03070


The KEGGsketch .png looks gorgeous:

KEGG image map .png rendering of ko03070

And overlaying our data takes advantage of this image for context, but lets us add our own information:
Hybrid rendering of ko03070
which would be useful, for example, for indicating expression/transcription levels, or sequence similarity in heatmap form, or for showing presence/absence information.

Example 3
Now for something a little more complicated. Let's try to enhance the visibility of a set of pathways. The ko01100 pathway map should contain glycolysis and the TCA cycle, so we'll try to show the routes through these processes as thicker lines than usual:

which renders like this:
ko01100 with selected elements thickened
and, if we reduce the visibility of the other pathway components by thinning the lines, we get:
ko01100 with unselected elements thinned
Going even further, we can take all the non-matching components to grey:
ko01100 with unselected elements rendered to grey
Which I can imagine being useful for indicating, say, steady-state fluxes or elementary modes, amongst other things.

What next?

Well, the code's now in a repository at github: https://github.com/widdowquinn/KGML, and I hope that Biopython might take it up (in slightly tidier form), shortly.  In the meantime, I hope you find it useful.

2 comments:

  1. Hi,

    Thank you for this post it does exactly what I am looking to do but I note that it has now been deprecated and incorporated into Biopython. Is the usage of the Biopython KGML module identical to the above? Is there a tutorial or examples for using KGML within Biopython available anywhere?

    Thanks,
    Chris

    ReplyDelete
    Replies
    1. Thanks for you kind comments, Chris - I'm so pleased you find it useful.

      The main thing that has changed for Biopython is how you get data from KEGG - there's a much nicer interface that uses the REST API. The object model remains the same, though there are some colour/color spelling changes for consistency with the rest of Biopython.

      There's a short iPython notebook introduction at https://nbviewer.jupyter.org/github/widdowquinn/notebooks/blob/master/Biopython_KGML_intro.ipynb - we really ought to do more documentation, but if you fancy the challenge, any contributions in that area are very much appreciated ;)

      L.

      Delete