tag:blogger.com,1999:blog-91420259726294497912024-03-14T07:40:14.851+00:00Armchair BiologyA science blog, by a scientist, mostly about computational biology and plant science, but a little rambly in places.Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-9142025972629449791.post-9693974477662578082014-07-27T16:46:00.000+01:002015-03-20T15:49:54.313+00:00What's The Points?In which I talk about points for wins, losses, and draws in rugby league, and how framing of quantitative data is important.
This week saw the announcement of major changes to the way that the top levels of rugby league in the UK would operate. The sport will be moving from two, essentially separate divisions (Super League and Championship) in 2014 to have a partially combined competition in Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-22785326354993952822014-07-11T11:04:00.003+01:002014-07-11T11:10:34.661+01:00The Baserate Fallacy, revisitedIn which I share some iPython code for an interactive demo to calculate and visualise the probability that a positive test implies a positive result. You can get it here, and preview it here. Then I get entangled in a topic outwith my expertise.
I was very pleased to be asked to teach on the recent EMBL Plant and Pathogen Genomics training course - a four-day introduction to bioinformatics for (Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-9456415313029158882013-11-10T10:26:00.001+00:002013-11-10T10:28:26.404+00:00ANI are you okay? Are you okay ANI?In which I describe Average Nucleotide Identity (ANI), which we can use to pigeonhole bacteria into conceptual boxes labelled as 'species'. And I share more code.
Defining bacterial species
Bacterial species status (or any species classification) can be a tricky thing to pin down. The kind of definition that is easy to use and intuitively familiar to most people for animals and plants (Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com11tag:blogger.com,1999:blog-9142025972629449791.post-24336201466456038992013-03-10T13:18:00.000+00:002013-03-10T13:23:16.562+00:00A Nice New Paradox Redux: PareduxIn which I admit to a mistake.
It looks like I was a little too smug about my own pedantry in my last post about the Tuesday Boy paradox: I wasn't pedantic enough. I fell prey to not thinking about the question clearly or deeply enough, and so I got the answer wrong (though see caveat below) - my apologies if I misled you - it was a genuine mistake, and thanks to JeffJo for pointing out my Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-439698678343840952013-02-13T17:32:00.001+00:002013-03-10T10:26:04.112+00:00A Nice New ParadoxIn which I work through a popular statistical puzzle/paradox (with potential implications for interpretation of large data studies). With example code.
I like mathematical puzzles, but I don't like mathematical puzzles. Especially paradoxes. The problem I have is that I tend intuitively to think of the wrong answer - just like pretty much everyone else. This reminds me to be cautious whenever I Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com4tag:blogger.com,1999:blog-9142025972629449791.post-23265695025060521252013-02-08T09:52:00.001+00:002013-02-10T17:08:58.507+00:00Surely this has been done already...In which it turns out that if it was done already, it was hiding somewhere. Also, I share a script that retrieves the corresponding nucleotide coding sequences from NCBI, given only a set of protein sequences.
I often have very specific little problems that seem, at least initially, like they're so simple that they should already have been solved and wrapped up in a nice library with Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-84967193818526840432013-02-01T19:05:00.000+00:002013-02-01T19:05:44.426+00:00KEGGWatch, part IIIIn which I finally get around to sharing some code, and give some examples of downloading and modifying KEGG pathway maps.
As you may have seen in parts I and II, I've been thinking about writing some Python code to grab, parse, modify, and visualise KGML files.
tl;dr - I wrote it, and it's up here (https://github.com/widdowquinn/KGML) in rough form. Have fun, and let me know if you've gotArmchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com2tag:blogger.com,1999:blog-9142025972629449791.post-51061444723899273272013-02-01T17:01:00.000+00:002013-02-01T17:01:04.800+00:00KEGGWatch, part IIIn which I don't quite get around to writing a KGML parser and visualisation module (all very Tristram Shandy, this!), with a view to submitting to Biopython. This post describes some of the rationale and design choices - tune into part III for code and examples of use.
So, if you've followed on from part I, you'll know that I've been looking at packages to visualise KEGG pathway maps. What may Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-37435581475876715642013-01-21T18:39:00.000+00:002013-01-23T23:50:51.929+00:00KEGGWatch, part IIn which I attempt to visualise metabolic maps for comparative genomics, and lead up to making a contribution to Biopython.
KEGG - the Kyoto Encyclopaedia of Genes and Genomes - has been around for almost two decades, now. A prescient and visionary repository of metabolic pathways and other biochemical data, it has been a go-to resource for me for nearly 15 years. Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-19018165961659903862012-09-23T13:38:00.000+01:002013-08-22T16:10:27.825+01:00The Colours, Man! The Colours!In which I take a short diversion into colour theory, and share some code to automate colour selection for class data.
Being a computational biologist, I often have to deal with large datasets and, in order to get rapid insight into the structure of that data, visualisation is a key tool. What I'm usually trying to achieve is to turn complex data into a static image that is easier for my mind toArmchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com1tag:blogger.com,1999:blog-9142025972629449791.post-11948337873080483662012-07-25T21:32:00.000+01:002012-09-15T11:13:57.517+01:0023 And Me and Me: Part 3In which I get a nice surprise.
23andMe advertise that it takes 2-3 weeks for results to become available, after sending your saliva sample. I must have caught them in a slow period, because my results have started to come through only a week after they received my spit tube.
Clearly, some analyses take more compute or result preparation time than others, as (most of) my ancestry results have Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com2tag:blogger.com,1999:blog-9142025972629449791.post-68873657745611529012012-07-19T21:37:00.000+01:002013-08-22T16:06:58.193+01:00On Reciprocal Best BLAST HitsIn which I narrowly avoid a rant. Reciprocal best BLAST hits can improve the quality of your searching, and are a good way to find candidate orthologues. There's evidence and everything.
I assume we're all interested in similar things.
When we've got a genome with lots of coding sequences in it, we're curious. We want to find out what the most similar known sequences to our nice new coding Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-34921860761344527022012-07-18T20:10:00.000+01:002012-09-15T11:16:11.575+01:0023 And Me And Me: Part 2In which the most difficult part of getting yourself genotyped turns out to be dealing with shopkeepers, and I impress the eight-year-old me.
Following on from part 1...
The spit-tube arrived while I was on holiday, couriered by DHL, and was kindly held for me by a neighbour. The package was expectation-deflatingly small (now where have I heard that, before?) and contained a brightly-coloured Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-32662194188574384472012-07-07T11:44:00.001+01:002013-08-22T16:12:53.620+01:0023 And Me And Me: Part 1In which I cave in to curiosity, and get myself genotyped.
A little knowledge is a dangerous thing, so I must be positively treacherous to myself as far as genomics is concerned. I've spent much of the last 16 years or so looking at gene and gene product sequences in exchange for a salary. Having participated in sequencing the odd organism or thirty, and with a fairly sound background in Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-22540683234931176332012-07-01T16:57:00.001+01:002012-07-01T16:58:21.247+01:00Dead fish, and multiple-test correctionIn which a salmon is resurrected, but not enough to really be significant. Why finding 20 positive results when your P-value threshold suggests you should only see 10 isn't necessarily anything to be excited about. And an introduction to Bonferroni and Benjamini-Hochberg multiple test correction.
This fish isn't dead. It's just resting.
A while ago, some mischievous Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-19246719763319447022012-06-23T10:57:00.000+01:002012-06-23T11:06:29.657+01:00The Base Rate Fallacy in Effector-Finding
In which an oft-overlooked bit of genome-mining statistics is considered, and your enjoyment of a holiday could depend heavily on other people's hygiene.
Last week I had the pleasure of giving a presentation about mining pathogen genomes for effector proteins, and helping to train a group of young plant pathology researchers from around the world, as part of an EMBO training course (Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-18069781419230233752012-04-22T14:28:00.000+01:002012-04-22T15:09:46.166+01:00What is this 'effector' thing, anyway?In which I opine about the definition of (plant) pathogen effectors.
I took part in a Twitter discussion between plant pathologists recently, attempting to reach some kind of agreement on what an 'effector' is. It's hard to make some points coherently in a 140-word limit, but in the end it seemed that everyone went away thinking that an understanding had been reached. There were someArmchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com5tag:blogger.com,1999:blog-9142025972629449791.post-6072629717778551122012-04-22T11:25:00.004+01:002012-04-22T11:26:16.496+01:00Not a facelift, just a bit of BotoxIn which I decide that even I can't face reading blogposts in the old template style, so make a change.
Is this better? I hope so.Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-54797917893990721012012-03-16T10:55:00.001+00:002012-03-16T14:12:09.348+00:00JambiguityIn which a blog post is brought to you by the letter 'J', but should probably have been 'X'-rated.
I've been spending much of the last year and a bit sequencing and annotating around 25 bacterial genomes. I may write more about that, if I can keep up this cripplingly ferocious rate of blog posting. Recently, I hit a problem when annotating some of the conceptual translations from Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com2tag:blogger.com,1999:blog-9142025972629449791.post-32750145274768081002011-09-08T13:07:00.000+01:002011-09-08T15:56:57.013+01:00A Pile of SheetsIn which a niggle with Excel files is easily crushed into nothingness. If you have a Mac.
Earlier today, I spotted this tweet from Neil Saunders (@neilfws), bemoaning the state in which we often receive data from our experimental colleagues:
Dear every biologist. The colours in your spreadsheet don't show up when I export it to CSV. Thanks.
Very true, and well said, sir!
I appreciate Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0tag:blogger.com,1999:blog-9142025972629449791.post-60078031551496518352011-01-16T17:29:00.000+00:002011-01-17T11:24:50.462+00:00New plots for old kineticsIn which basic enzyme kinetics are revisited, systematically incorrect lab statistics are bemoaned, and a little-known elegant 40 year old solution to estimating enzyme kinetic parameters is explored.
Michaelis-Menten enzyme kinetics. The backbone of everyone's first approximation to real kinetics. Basic biochemistry. Everyone knows everything there is to know about Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com1tag:blogger.com,1999:blog-9142025972629449791.post-85067985317964929632011-01-10T16:04:00.000+00:002011-01-16T17:39:52.926+00:00Begin at the beginningIn which a blog is begun, and an armchair biologist wonders whether, in the infinite void of the internet, anyone can be bothered to hear you scream.
So hello, and welcome. I'll be your armchair biologist for this blog, in which I will share my opinions on science news and other, hopefully interesting, things. You should expect a focus on computational biology and plant science.
I Armchair Biologisthttp://www.blogger.com/profile/16528412260692313555noreply@blogger.com0