BED files into R

Sometimes BED files drive me nuts. They’re great for their simplicity in specifying a genomic feature, but can be difficult to work with across programs because only the first three columns are required (chromosome, start, end) and there are several ways to specify any number of remaining columns. Well, maybe not any number, but looking at the following names, tell me if clarity jumps from the page: BED, BED6, BED8, BED12, BED6+4, BED15, any others? Even the spec at UCSC doesn’t explain the conventions behinds those names.

Working with BED files in R is typically fairly easy. The rtracklayer library contains import/export functions for reading or creating bed files. For instance, reading a bed file into a GRanges object is as simple as:

library(rtracklayer)
foo <- import("myPeaks.bed")

However, when those BED files are produced by programs like MACS2 that muck around with some of the fields, then things break, and you have to start figuring out all those unwritten rules about BED files. Typical UCSC BED format has 12 columns. MACS2 produces a narrowPeak file described as BED6+4. The R help for the import() function states the problem explicitly, and offers a solution: …many tools and organizations have extended BED with additional columns. These are not officially valid BED files…To import one of the multitude of BEDX+Y formats, such as those used to distribute ENCODE data through UCSC (narrowPeaks, etc), specify the ‘extraCols’ argument to indicate the expected names and classes of the special columns.

Now the problem is simply that we don’t know which of the columns are standard and which are “special”, and none of the columns of a BED file are “named”, leaving this bit of help a little confusing. But we can make some good guesses, and it turns out they work.

The first 6 fields of the MACS2 narrowPeak file have the properties expected by import() and conform mostly to the UCSC standard (the 5th column is supposed to be an integer between 0 and 1000, but for MACS2 it is simply the -log10 transformed peak Q-Value multiplied by 10 and converted to an integer). But the remaining 4 fields DO NOT match what import() expects, and thus have to be explicitly specified. Thus to get our MACS2 results into R we can do something like the following:

# declare names and types for the extra BED columns
extraCols_narrowPeak <- c(FoldChange="numeric", pVal="numeric",
qVal="numeric", summit="integer")

# import our peak data
peaks <- import.bed("my_peaks.narrowPeak", extraCols=extraCols_narrowPeak)

The result is a GRanges object called “peaks” with meta columns named based on what you gave to extraCols.

biohacking 2008

HOPE is a unique experience. Not a colorful crowd, literally. Cargo shorts and black shirts, as if everyone knows the uniform. But while you’re there, you feel a sense of connection with those around you unlike any other conference I’ve attended. The commonality goes beyond interest in a single subject matter and has to do with plain raw curiosity and freedom. Including the freedom to just be yourself without having to worry so much about social graces or comfort. Showers and sleep are completely dispensable. The talks go from 9 AM to midnight. I’ve sat down at 9 AM and not moved until 6 PM, to slip outside and grab a slice of pizza and a cup of coffee, only to get back ASAP and sit in talks until 12, before ending up at Harry’s or Sally’s down the street with various members of the 2600 crowd, and then wandering back to the hotel mezzanine to ride a Segway or pick some locks in the lock picking village. People wander around at all hours. The sidewalk in front of the hotel penn is busy around the clock. It’s surreal to step outside at 3 AM and hang with a crowd of like-minded people, all taking a break, while people are shuffling along the sidewalk and the street vendors sell sticks of grilled chicken.
This particular conference was jam packed with adventure and odd convergences for me. Adam Savage was one of the keynote speakers. I used to work down the street from the Alameda air base where so many episodes of Mythbusters were filmed, and having left the bay area just as the show was getting started, watching it always made me feel like I was home from afar. Kevin Mitnick was also a featured speaker. I grew up reading about him, read several books in which he is one of the main characters, followed his travails along with the 2600 crew while he was in in jail , and because of that, decided to omit making a connection during my talk between the discoverer of a cool protein often used in biohacking and the person who led to his capture. I didn’t want to take the glow off the protein as I didn’t think the crowd would be too enamored. I was interviewed for a documentary on hacking. It will be interesting to see if it gets off the ground. I had a lot of great conversations with people, including Bernie S,  whose voices I had only heard on the radio while I worked in the lab at Berkeley. It’s an odd thing when you find yourself face to face with people you feel some familiarity with, because you hear their voices or follow their lives or work, yet you only see them in a bar for a couple of nights every two years.
On the last night after the closing ceremonies, a large portion of the crowd sticks around to stack chairs, disassemble the large lighting structures, wrap and package up all the audio visual gear. Cord wrangling can be quite a skill. Kudos to those who have a hex wrench built into their keychains. After everything was cleaned up, I found myself wandering through the streets with Bunnie and the rest of the crew looking for a place to eat. We walked West and settled at a Chinese place. Emanuel sat across from me. As the waitress was taking drink orders, most everyone ordered an “OB”. I was in the mood for a beer, but I didn’t want to break the cycle, and was intrigued at what a Jedi sounding drink might be. I was happy and somewhat giddy when the waitress placed an Oriental Beer in front of me. Years of listening to him on the radio and reading his editorials might have led me to believe that Emmanuel speaks eloquently all the time about interesting issues on a range of topics. But after days of sleep deprivation, and only a few days of chance encounters in bars or the HOPE mezzanine, there was more silence between us than I had hoped for. Eventually more and more people started showing up, and introductions were made, but I was among people who knew each other much more than anyone knew me. If nothing else, at least the turquoise blue badge chord gave me speaker cred.
My talk was on Saturday afternoon in the small room with the portable wall separating it from the large room. I think I was competing with Rambom. He gives the same talk every time. Three hours of all the ways he can find information about anyone because of the way people voluntarily give up information on themselves. My talk was really well received. I actually ran out of time and had to skip several slides at the end. Nonetheless, I was hoping I could do a sufficient job explaining that biology is a lot like executing programs encoded in a sequence of information, and via synthetic biology we can make new programs.
For now, at least, I know I can look forward to the next hope.

Just some writing

For the last 10 years this site has been a workhorse of science for me. Hosting genomic databases, scripts for helping to process bioinformatic data, hosting my pdf library in which I could share my collection of papers on any given topic simply by sending someone a URL.

But I decided to move away from my own (poorly?) written software for creating dynamic content and install wordpress. It’s functional, easy, and there are people working on it! Hopefully I’ll find time every now and then to fill it with something.