KEGG - the Kyoto Encyclopaedia of Genes and Genomes - has been around for almost two decades, now. A prescient and visionary repository of metabolic pathways and other biochemical data, it has been a go-to resource for me for nearly 15 years. Unfortunately, and symptomatic of much of academia, after 16 years of free, open access, FTP access was transferred to a subscription model in mid-2011 (in case you're interested, the yearly subscription for a single user is just shy of my allocated yearly consumables allowance; I don't have a subscription). However, KEGG still presents a wide range of useful, if intentionally limited to avoid abuse, web services giving access to the data in various ways, including running functional annotation via KAAS, and the generation of pathway maps representing combined presence of enzymes corresponding to a reaction - a kind of pan-metabolome - across a range of organisms.
|KEGG map 01100 (metabolic pathways) for Dickeya, produced by KEGG.
So, there are a couple of issues (not including my needy spoilt-child demands): the KEGG web interface is never going to do exactly what I want, and nor is it going to do so for every KEGG map at once, when I run another comparative genomic analysis - they don't exist to cater for my whims; also, if I want to generate these images locally, the raw data from the FTP site is $2000 away. Ideally, what I want is a local (maybe programmatic) package that can grab KEGG maps and the associated data on-the-fly, and render publication-quality images with arbitrary data overlays representing the many forms and sources of data I haven't even thought of, yet. That's not a lot to ask, surely…
There are lots of good KEGG visualisation tools out there, including (but not limited to) Cytoscape's KGML plug-in kgmlreader, KGML-Ed, and KEGGtranslator (for example, the gorgeous but - in terms of analysing your own data, rather limited iPath; the webservice MicroarrayDB, which is nearly there, but not as flexible as I'd like; and KEGG-anim which, again, is lovely but not what I'm after). They're all good at what they do, if not entirely consistent with each other, or with KEGG. kgmlreader in particular is nice for the ability to mouseover detailed information about a pathway, and being able to exploit the power of Cytoscape to edit and render beautiful images. But they're not quite what I was after. For example, kgmlreader's representation of the large-scale metabolic pathways isn't as aesthetically-pleasing as I'd like (yeah, I know, "get over it"): you have to zoom in quite far, losing context, before you even see any labels; more importantly, all connections are straight lines, so you lose all that lovely layout information that's in the original file.
|KEGG map ko01110, rendered in Cytoscape with kgmlreader plug-in
|KEGG map ko00020, rendered with KEGGtranslator
|KEGG map ko00020, rendered with KGML-Ed
Cytoscape/kgmlreader renders ko00020 as:
|KEGG map ko00020, rendered in Cytoscape with kgmlreader plug-in
|KEGG layout of ko00020
That shouldn't be a barrier to reproduction (architectural/mapping software has been translating hand-drawings into transferable data for years), but KEGG provides the pathway information as the KGML dialect of XML (spec here) and, for these pathways, doesn't provide sufficient information to reconstruct their pretty layout. For ko00020, the graphics elements - which carry rendering information such as location, shape and colour - are restricted to the ortholog, gene, compound and map elements: these are the circles, rectangles, and rounded rectangles in each figure. The connections between those elements must be inferred from reaction and relation elements in the KGML file, and don't contain any graphical information. This leads, sensibly, to the 'every connection is a straight line' philosophy in these three rendering packages. However, the elegant manual solutions to the problem of laying out complex networks found by the KEGG team are not available, so there are multiple line-crossings, and much potential for confusion.
Also, having been through this process myself now, I can see issues with the renderings. For example, Cytoscape/kgmlreader and KGML-Ed have correctly rendered the distinction between maplink (dotted line) and ECrel (continuous line) relation elements, where KEGGtranslator has not. Also, something has gone wrong with C00074 (phosphoenolpyruvate) and its connection to glycolysis, and for C00068 (ThPP), which have (amongst other elements) both disappeared in the KEGGtranslator rendering.
Now, KEGG obviously uses software to render these maps. This is, as far as I can tell, KegSketch - described at various points on the web as 'in-house software'. I infer from references to it that each map is manually-drawn. I have, so far, been unable to obtain a copy or the co-ordinates for connecting lines - though I've not emailed and asked directly, yet.
The point about retaining the very elegant KEGG representations, while having some programmatic or other editorial control over data presentation, is made well when considering any of the pathway maps that contain figurative drawings, such as ko02040: flagellar assembly.
|KEGG layout of ko02040
|KEGGtranslator layout of ko02040
|Cytoscape/kgmlreader layout of ko02040
|KGML-Ed layout of ko02040
|Prototype Python KGML library output