As you may have seen in parts I and II, I've been thinking about writing some Python code to grab, parse, modify, and visualise KGML files.
tl;dr - I wrote it, and it's up here (https://github.com/widdowquinn/KGML) in rough form. Have fun, and let me know if you've got any problems or suggestions.
Structure
The module has four main files, KGML_parser.py, KGML_pathway.py, KGML_scrape.py and KGML_vis.py. There's also a unit test file test_KGML.py, with example files in the KEGG subdirectory.KGML_pathway.py contains classes that collectively represent a KGML pathway map. The model follows the KGML specification quite closely, having a 'root' Pathway object that contains Entry, Reaction and Relation objects. These are organised just like KGML's hierarchy, which makes it nice and easy to use ElementTree to recombine elements (possibly after modification or trimming) into valid KGML for output. There's a certain amount of cross-referencing between reactions, relations and other entries to maintain self-consistency, and quite a few property decorations so that we can handle 'bounding boxes' for graphics elements, composite features, and have a more sensible internal representation for element property values, but it's all fairly straightforward.
KGML_parser.py provides a parser that returns a Pathway object. We only expect one pathway map per KGML file, so the read() function throws an error if it finds more. ElementTree is used to parse the KGML itself.
KGML_vis.py mainly provides a KGMLCanvas object that is a Reportlab Canvas-based representation of the pathway map. The idea is to be as simple as possible for basic use, so that you instantiate a KGMLCanvas with a Pathway, provide some formatting options, and call the draw() method. Since we may want to write out KGML that maintains modifications we make to the pathway and its representation, all changes to the representation of the pathway are made through the Pathway object, directly. Those changes can then be saved by writing the KGML returned by the Pathway.get_kgml() method to a file.
KGML_scrape.py provides helper functions to grab KGML from the KEGG site in raw form, as a stream/handle, as a Pathway object, or to write it to a file. There are also a couple of handy lists of the metabolic and non-metabolic pathway IDs (as at January 2013).
Examples
The simplest useful operation is probably just downloading a given KEGG pathway map to a local KGML file. For this, you can use one of the utility functions for grabbing data from KEGG, found in KGML_scrape.py:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from KGML_scrape import retrieve_kgml_to_file | |
retrieve_kgml_to_file('ddc00190', 'ddc00190.kgml') |
Alternatively, if we want to deal directly with KGML in our code, and don't want to write an intermediate file, we can use KGML_scrape's functions to obtain the KGML as a handle, a string, or a KGMLPathway object, as we can see from the iPython session:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In [1]: from KGML_scrape import * | |
In [2]: ex1 = retrieve_kgml('eco00010') | |
In [3]: ex1[:100] | |
Out[3]: '<?xml version="1.0"?>\n<!DOCTYPE pathway SYSTEM "http://www.genome.jp/kegg/xml/KGML_v0.7.1_.dtd">\n<!-' | |
In [4]: ex2 = retrieve_kgml_stream('ype02040') | |
In [5]: type(ex2) | |
Out[5]: instance | |
In [6]: ex2.readline() | |
Out[6]: '<?xml version="1.0"?>\n' | |
In [7]: ex3 = retrieve_KEGG_pathway('ara01120') | |
In [8]: ex3 | |
Out[8]: <KGML_Pathway.Pathway at 0x10f5e7bd0> | |
In [9]: print ex3 | |
Pathway: Microbial metabolism in diverse environments | |
KEGG ID: path:ara01120 | |
Image file: http://www.genome.jp/kegg/pathway/ara/ara01120.png | |
Organism: ara | |
Entries: 1662 | |
Entry types: | |
ortholog: 447 | |
gene: 291 | |
compound: 884 | |
map: 39 |
Example 1
To see the different forms of representation for one of the 'large' (ko01100, ko01110 and ko01120) maps, we can use this example code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import KGML_parser | |
from KGML_scrape import retrieve_KEGG_pathway | |
from KGML_vis import KGMLCanvas | |
# Get the ko01110 map from KEGG, and write it out to file, visualised as | |
# the .png, and as the elements from the KGML file | |
pathway = retrieve_KEGG_pathway('ko01110') | |
kgml_map = KGMLCanvas(pathway, show_maps=True) | |
# Default settings are for the KGML elements only | |
kgml_map.draw('ex1_kgml_render.pdf') | |
# We need to use the image map, and turn off the KGML elements, to see | |
# only the .png base map. We could have set these values on canvas | |
# instantiation | |
kgml_map.import_imagemap = True | |
kgml_map.show_maps = False | |
kgml_map.show_orthologs = False | |
kgml_map.draw_relations = False | |
kgml_map.show_compounds = False | |
kgml_map.show_genes = False | |
kgml_map.draw('ex1_png_render.pdf') | |
# And rendering elements as an overlay | |
kgml_map.show_compounds = True | |
kgml_map.show_genes = True | |
kgml_map.show_orthologs = True | |
kgml_map.draw('ex1_overlay_render.pdf') |
![]() |
KGML element-only rendering of ko01100 |
We can also render only the KEGG-drawn .png map, which I prefer for the formatting of the map elements that indicate where the other more specific KEGG pathway maps connect to this large metabolic overview.
![]() |
KEGG-drawn .png-only rendering of ko01100 |
Finally, we render a hybrid, which retains the KEGG-drawn .png, but overlays the KGML information (which we can also modify).
![]() |
Hybrid KEGG-drawn .png with KGML element overlay for ko01100 |
Example 2
For the next example we look at a similar rendering for a non-metabolic pathway, for which we need the KEGG-drawn .png to make sense of the KGML. I'm going for some blatant self-promotion and using Biopython's ColorSpiral utility (more on that, here).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import KGML_parser | |
from KGML_scrape import retrieve_KEGG_pathway | |
from KGML_vis import KGMLCanvas | |
from Bio.Graphics.ColorSpiral import ColorSpiral | |
# Get the ko03070 map from KEGG, and write it out to file, visualised as | |
# the .png, and as the elements from the KGML file | |
pathway = retrieve_KEGG_pathway('ko03070') | |
kgml_map = KGMLCanvas(pathway, show_maps=True) | |
# Let's use some arbitrary colours for the orthologs | |
cs = ColorSpiral(a=2, b=0.2, v_init=0.85, v_final=0.5, | |
jitter=0.03) | |
# Loop over the orthologs in the pathway, and change the | |
# background colour | |
orthologs = [e for e in pathway.orthologs] | |
for o, c in zip(orthologs, | |
cs.get_colors(len(orthologs))): | |
for g in o.graphics: | |
g.bgcolor = c | |
# Default settings are for the KGML elements only | |
kgml_map.draw('ex2_kgml_render.pdf') | |
# We need to use the image map, and turn off the KGML elements, to see | |
# only the .png base map. We could have set these values on canvas | |
# instantiation | |
kgml_map.import_imagemap = True | |
kgml_map.show_maps = False | |
kgml_map.show_orthologs = False | |
kgml_map.draw_relations = False | |
kgml_map.show_compounds = False | |
kgml_map.show_genes = False | |
kgml_map.draw('ex2_png_render.pdf') | |
# And rendering elements as an overlay | |
kgml_map.show_compounds = True | |
kgml_map.show_genes = True | |
kgml_map.show_orthologs = True | |
kgml_map.draw('ex2_overlay_render.pdf') |
![]() |
KGML-only rendering of ko03070 |
![]() |
KEGG image map .png rendering of ko03070 |
And overlaying our data takes advantage of this image for context, but lets us add our own information:
![]() |
Hybrid rendering of ko03070 |
Example 3
Now for something a little more complicated. Let's try to enhance the visibility of a set of pathways. The ko01100 pathway map should contain glycolysis and the TCA cycle, so we'll try to show the routes through these processes as thicker lines than usual:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import KGML_parser | |
from KGML_scrape import retrieve_KEGG_pathway | |
from KGML_vis import KGMLCanvas | |
# Get list of pathway elements to enhance | |
glyc_path = retrieve_KEGG_pathway('ko00010') | |
tca_path = retrieve_KEGG_pathway('ko00020') | |
enhance_list = [] | |
for pathway in (glyc_path, tca_path): | |
for e in pathway.entries.values(): | |
enhance_list.extend(e.name.split()) | |
enhance_list = set(enhance_list) | |
# Get the pathway we want to render, and make all the lines | |
# that are also in glycolysis or TCA pathways thicker | |
met_pathway = retrieve_KEGG_pathway('ko01100') | |
mod_list = [e for e in met_pathway.entries.values() if \ | |
len(set(e.name.split()).intersection(enhance_list))] | |
for e in mod_list: | |
for g in e.graphics: | |
g.width = 10 | |
kgml_map = KGMLCanvas(met_pathway, show_maps=True) | |
kgml_map.draw('ex3_thick.pdf') | |
# Thin out any lines that aren't in the glycolysis/TCA pathways | |
mod_list = [e for e in met_pathway.entries.values() if \ | |
not len(set(e.name.split()).intersection(enhance_list)) \ | |
and e.type != 'map'] | |
for e in mod_list: | |
for g in e.graphics: | |
g.width = .4 | |
kgml_map.draw('ex3_thin.pdf') | |
# Or turn them grey, maybe: | |
for e in mod_list: | |
for g in e.graphics: | |
g.fgcolor = '#CCCCCC' | |
kgml_map.draw('ex3_grey.pdf') |
![]() |
ko01100 with selected elements thickened |
![]() |
ko01100 with unselected elements thinned |
![]() |
ko01100 with unselected elements rendered to grey |
Hi,
ReplyDeleteThank you for this post it does exactly what I am looking to do but I note that it has now been deprecated and incorporated into Biopython. Is the usage of the Biopython KGML module identical to the above? Is there a tutorial or examples for using KGML within Biopython available anywhere?
Thanks,
Chris
Thanks for you kind comments, Chris - I'm so pleased you find it useful.
DeleteThe main thing that has changed for Biopython is how you get data from KEGG - there's a much nicer interface that uses the REST API. The object model remains the same, though there are some colour/color spelling changes for consistency with the rest of Biopython.
There's a short iPython notebook introduction at https://nbviewer.jupyter.org/github/widdowquinn/notebooks/blob/master/Biopython_KGML_intro.ipynb - we really ought to do more documentation, but if you fancy the challenge, any contributions in that area are very much appreciated ;)
L.