# Russell's Blog

## Arduino in the lab, example 9,345,234

Posted by Russell on June 10, 2013 at 6:23 p.m.
Last week, my friend Emily at Pivot Bio was debugging some crazy protocol she's working on, and she was wondering if her incubators were maintaining correct temperature. So, we decided to find out.

For my own project, I happen to have a bunch of Arduino hardware. So, I threw together a quick solution based on Adafruit's thermistor tutorial (I have a bunch of thermistors sitting around). Here is Emily's setup, complete with masking tape junctions and protoboarded electronics :

A few hours later, and she's got a pretty good answer to one of science's oldest and most important questions : "Is this thing even working? WTF, man?" Here's the data collected at 1Hz with a one minute moving average :

I'm not sure what those wobbles around hour two are from. Hopefully someone just opened the door or something. Arduinos really are the Swiss Army Knife of the laboratory!

## Why doesn't your lab have a 3D printer yet?

Posted by Russell on November 20, 2012 at 6:30 a.m.
Electrophoresis setups are like Tupperware. You can never find the right lid when you need it, and someone always seems to be borrowing the doohicky you need.

Here in the Eisen Lab, it turns out we've been using Marc Facciotti's electrophoresis stuff for years. He keeps his stuff organized, and, well... that's not been our strong suit lately. John, our lab manager, has been gently but inexorably herding us towards a semblance of respectability in our lab behavior. As part of this, he decided that it was time for us to get our electrophoresis stuff straightened out. So, he ordered a bunch nice of gel combs from one of our suppliers. They cost $51 each (see the "12 tooth double-sided comb", catalog number 669-B2-12, for the exact one pictured below). We bought six of them with different sizes and spacing, for a total exceeding$300.

While I appreciate that companies need to make money, this is a ridiculous price for a lousy little scrap of plastic. $300 for a couple of gel combs is cartel pricing, not market pricing. Fortunately, we happen to have a very nice 3D printer. It is very good at making little scraps of plastic. So, I busted out the calipers and tossed together some models of gel combs in OpenSCAD. A few minutes of printing later, and the$51 gel combs are heading back to the store.

Here's the code for the six well 1.5mm by 9mm comb :

f=0.01;
difference(){
difference(){
union(){
cube( [ 80, 27, 3 ] );
translate( [ 5.25, 14.3, f ] ) cube( [ 68, 9.3, 7.25 ] );
}
for ( i = [ 0:5 ] ) {
translate( [ 17.1+i*11.0, -f, -f ] ) cube( [ 1.75, 12, 5 ] );
}
}
union(){
translate( [ -f,   -f, -f ] ) cube( [ 7,  12, 7] );
translate( [ 73+f, -f, -f ] ) cube( [ 7,  12, 7] );
translate( [ 0,    -f, 1.6] ) cube( [ 80, 12, 8] );
}
}

Pretty easy to grasp, even if you've never seen SCAD before.

So, how much did this cost?

I ordered this plastic from ProtoParadigm at $42 for a kilogram. That's about four pennies a gram. Each of these gel combs cost about 21 cents to print. That's 1/243rd the price. The 3D printer cost €1,194.00 ($1524.62), which is less than the laptop I use for most of my work. The savings on just these gel combs has recuperated 18% of the cost of the printer.

It's also important that I was able to make some minor improvements to the design. The printed combs fit into the gel mold a bit better than the "official" ones. I also made separate combs for the 1.0mm and 1.5mm versions, and the labels are easier to read. If I wanted, tiny tweaks to my SCAD file would let me make all sorts of fun combinations of thicknesses and widths that aren't available from the manufacturer. So, these gel combs are not only 1/243rd the price, but they are also better.

If you read the media hype about 3D printing, you will undoubtedly encounter a lot of fantastical-sounding speculation about how consumers will someday be able to print living goldfish, or computers, or bicycles. Maybe so. Maybe not. However, right now, you can print basic lab supplies and save a pile of money.

Buy your lab manager a little FDM printer and hook them up with some basic CAD training. Yes, the printer will probably mostly get used to make bottle openers and Tardis cookie cutters. So what? Your paper-printer, if you will excuse the retronym, mostly gets used for non-essential stuff too. I'd wager that for every important document printed in your lab, a hundred sheets have gone to Far Side cartoons and humorous notices taped up in the bathroom. It's a negligible expense compared to the benefits of having a machine that spits out documents when you really need them, and the social value of those the Far Side cartoons probably sums to a net positive anyway.

Conclusion : If you have a lab, and you don't have a 3D printer, you are wasting your money. Seriously.

In the time it took write this post, I printed $150 worth of gel combs, and it cost less than a cup of coffee. Updates : Here is the tweet I originally posted about this article, before the URL for it vanishes into Twitter's memory hole. Here's an encouraging post from the Genome Web blog, and a nice article by Tim Dean at Australian Life Scientist. My article here seems to have spawned a thread on BioStar. Also, it made Ed Yong's Missing Links for November 24 over at Discover, and Megan Treacy did a really spiffy article over at Treehugger. Many people have asked, and so I decided to see how well these kinds of 3D printed parts do in the autoclave. I tried it out with a couple of bad prints, and they seemed to hold up just fine after one or two cycles. Very thin parts did warp a bit, though, so I recommend printing parts you plan to autoclave nice and solid. Here is a before and after of a single-wall part (less than half a millimeter thick). I was expecting a puddle. Update 2 : Check out Lauren Wolf's awesome article in Chemical & Engineering News, featuring the infamous gel combs, among other things! ## Journals are the problem and the solution Posted by Russell on November 15, 2012 at 8:46 a.m. Titus wrote an interesting post yesterday about addressing some of what I'll call structural problems in scientific research. This is one of a bunch of posts on what I'm calling 'w4s' -- using the Web, and principles of the Web, to improve science. The others are: • The awesomeness we're experiencing, which provides some examples of current awesomeness in this area. • The challenges ahead, which covers some of the reasons why academia isn't moving very fast in this area. • Strategizing for the future, which talks about technical strategies and approaches for helping change things. • Tech wanted!, which gives some specific enabling technologies that I think are fairly easy to implement. He goes on to throw some well-aimed brickbats at the system of publishing and grant reviewing, and how that plays out for researchers who actually do things other than crank out traditional research papers : As an increasing amount of effort is put towards generating data sets and correlating across data sets, funding agencies are certainly trying to figure out how to reward such effort. The NSF is now explicitly allowing software and databases in the personnel BioSketches, for example, which is a great advance. Surely this is driving change? The obstacle, unfortunately, may be the peer reviewer system. Most grants and papers are peer reviewed, and "peers" in this case include lots of professors that venerate PDFs and two-significant-digit Impact Factors. Moreover, many reviewers value theory over practice -- Fernando Perez has repeatedly ranted to me about his experience on serving on review panels for capacity-building cyberinfrastructure grants where most of the reviewers pay no attention whatsoever to the plans for software or data release, and even poo-poo those that have explicit plans. And if a grant gets trashed by the reviewers, it's very hard for the program manager to override that. The same thing occurs with software, where openness and replicability don't figure into the review much. So there's a big problem in getting grants and papers if you're distracting yourself by trying to be useful in addition to addressing novelty, impact, etc. The career implications are that if you're stupid enough to make useful software and spend your time releasing useful data rather than writing papers, you can expect to be sidelined academically -- either because you won't get job offers, or because you won't get grants when you do have a job. I propose a simple solution : Start better journals. The fundamental unit of exchange in the academic reputation economy is the publication. At the moment, it is very hard to publish things that are not "normal" research papers. PLOS ONE has helped by shifting the review criteria towards correctness and leaving judgement of novelty to the community. However, PLOS ONE is still nevertheless an awkward place to publish a piece of software, for example. Tool builders are expected to produce three publications for every one research result. First they have to design, assemble, test and publish the tool. Then they have to write a paper about the tool. Then they have to document the tool. After that, they are expected to release updates, patches and improvements. The only one of these items that "counts" in the academic sense is the one that generates a DOI number -- the paper. This is usually the least valuable of all the things produced. Honestly, how many of you BLAST users have actually read the BLAST paper? What is needed are journals that let you publish things that aren't strictly papers. It seems perfectly reasonable that a panel of peers should be able review software on my GitHub account and bestow a DOI number on a tag they feel meets the criteria set by the editorial board that badgered them into participating in the review. It seems perfectly reasonable that when I cut a major update of the software, I might submit it for review. This would be extremely wonderful, as it would lead to reviews and critiques the code itself, which almost never happens when one submits an application note. The same should apply to tool documentation. Science suffers a great deal because documentation is missing, poor, or out of date. The only way to fix that is to let tool builders have their tool documentation reviewed and published. Perhaps some journals might review code only, others documentation only, and others code and documentation together. The same goes for updates. If you want academics to do something, you have to provide rewards in the currency of academia. It is also important not to overlook things beyond software. What about people building methods and protocols? There are already journals that publish methods papers, but a lot more could be done to make these publications more useful. JoVE, the Journal of Visualized Experiments, is a pretty interesting step in that direction, but much much more needs to be done. Most scientists have no experience whatsoever in video production, narration, editing, or any of the skills needed to make a really good JoVE publication. There is also another big gap in instrumentation and hardware. Every really productive laboratory has at least one person who builds things. For example, at the UC Davis Genome Center perhaps a third of all the science we publish involve some gizmo built by our in-house machinist Steve Lucero. Steve is definitely what I would regard as a practicing researcher, and has been (ahem) instrumental in a number of important publications. Nevertheless, he was invited to be an author on his first publication only this year. By a graduate student. I don't think this is particularly malicious on the part of the laboratories that use his creations in their work. The problem is that there isn't really a good venue for publishing things. Booting up new kinds of journals is a long-term endeavor. PLOS ONE is still struggling for acceptability in many circles. Nevertheless, getting researchers to actually look at each others' code would yield rewards in the short term while we wait for the resulting DOIs to be appreciated by the broader scientific community. ## Trouble with SoftSerial on the Arduino Leonardo Posted by Russell on May 25, 2012 at 12:05 p.m. While I was wandering around at Maker Faire last weekend, I heard someone say, "Woah, is this the Leonardo?" And lo, there was a handful of Arduino Leonardo boards lined up on a shelf for sale. I instantly grabbed one, and bundled it home to play with it. The Leonardo is Arduino's latest board, announced last September. It uses the Atmega32u4 chip, which has onboard USB. This has two important implications; first, the Leonardo costs less than the Uno, and second it will be able to operate in any USB mode. That means people can make Human Interface Devices (HID), like mice and keyboards and printers, with Arduino, and present themselves to the host using the standard USB interfaces for those devices. That means you can build things that don't need to talk via serial, and use the host's built-in drivers for mice and printers and whatnot. This is a big step forward for Open Hardware. Anyway, I'm developing an little remote environmental data logger to use for part of my dissertation project, and I thought I'd see if I could use the Leonardo board in my design. I'm using the Arduino board to talk to an Atlas Scientific pH stamp, which communicates by serial. It works fine on the Uno with SoftwareSerial (formerly known as NewSoftSerial until it was beamed up into the Arduino Core mothership). Unfortunately, it didn't go so well on the Leo. The board can send commands to the pH stamp, but doesn't receive anything. I swapped in an FTDI for the pH stamp, and confirmed that the Leonardo is indeed sending data, but it didn't seem to be able to receive any characters I sent back. I tried moving the rx line to each the digital pins, and had no luck. Here is my test program : #include <SoftwareSerial.h> #define rxPin 2 #define txPin 3 SoftwareSerial mySerial( rxPin, txPin ); byte i; byte startup = 0; void setup() { mySerial.begin( 38400 ); Serial.begin( 9600 ); } void loop() { if( startup == 0 ) { // begin startup for( i = 1; i <= 2; i++ ) { delay( 1000 ); mySerial.print( "l0\r" ); // turn the LED off delay( 1000 ); mySerial.print( "l1\r" ); // turn the LED on } startup = 1; // don't re-enter } // end startup Serial.println( "taking reading..." ); mySerial.print( "r\r" ); delay(1000); Serial.println( mySerial.available() ); }  On the Uno, I see the number increasing as the read buffer fills up : taking reading... 0 taking reading... 0 taking reading... 7 taking reading... 16 taking reading... 16  On the Leo, it seems that nothing ever gets added to the read buffer, no matter how any characters I send over from the FTDI or which pins I used for the rx line : taking reading... 0 taking reading... 0 taking reading... 0 taking reading... 0 taking reading... 0 taking reading...  I really wanted to see if I was crazy here, but I'm one of the first people among the General Public to get their hands on a Leonardo board. So, I started talking with Ken Jordan on #arduino on Freenode (he goes by Xark) who has a similar board, the Atmega32u4 Breakout+. It's based on the same chip as the Leonardo, but it has different pinouts and a different bootloader. He flashed the Leonardo bootloder onto his board, and worked out the following pin mapping : Arduino 1.0.1 Adafruit ATMEL digitalWrite pin atmega32u4+ pin AVR pin function(s) ---------------- --------------- ------------------ D0 D2 PD2 (#INT2/RXD1) D1 D3 PD3 (#INT3/TXD1) D2 D1 PD1 (#INT1/SDA) D3# D0 PD0 (#INT0/OC0B) D4/A6 D4 PD4 (ICP1/ADC8) D5# C6 PC6 (OC3A/#OC4A) D6#/A7 D7 PD7 (T0/OC4D/ADC10) D7 E6 (LED) PE6 (INT6/AIN0) D8/A8 B4 PB4 (PCINT4/ADC11) D9#/A9 B5 PB5 (OC1A/PCINT5/#OC4B/ADC12) D10#/A10 B6 PB6 (OC1B/PCINT6/OC4B/ADC13) D11# B7 PB7 (OC0A/OC1C/PCINT7/#RTS) D12/A11 D6 PD6 (T1/#OC4D/ADC9) D13# (LED) C7 PC7 (ICP3/CLK0/OC4A) D14 (MISO) B3 PB3 (PDO/MISO/PCINT3) D15 (SCK) B1 PB1 (SCLK/PCINT1) D16 (MOSI) B2 PB2 (PDI/MOSI/PCINT2) D17 (RXLED) B0 PB0 (SS/PCINT0) D18/A0 F7 PF7 (ADC7/TDI) D19/A1 F6 PF6 (ADC6/TDO) D20/A2 F5 PF5 (ADC5/TMS) D21/A3 F4 PF4 (ADC4/TCK) D22/A4 F1 PF1 (ADC1) D23/A5 F0 PF0 (ADC0) - (TXLED) D5 PD5 (XCK1/#CTS) - (HWB) - (HWB) PE2 (#HWB)  This was derived from the ATmega 32U4-Arduino Pin Mapping and ATMEL's datasheet for the ATmega32U4 chip. Once that was worked out, he flashed my test program onto his board, and also found that SoftwareSerial could transmit fine, but couldn't receive anything. Ken rummaged around a little more, and had this to say : The SoftSerial seems to use PCINT0-3 so there seems to me a minor problem in Leo-land in that only PCINT0 appears to be supported (and it is on "funky" output for RXLED). Hopefully I am just misunderstanding something (but it imay be the interrupt remap table is incorrect for Leo). Then he disappeared for a little while, and came back with : I have confirmed my suspicion. When I disassemble SoftSerial.cpp.o I can see that only __vector_9 is compiled (i.e., one of 4 #ifdefs for PCINT0-3) and the interrupt vector 10 is PCINT0 (0 is reset vector so offset by one makes sense). So, unless you hook serial to RXLED pin of CPU I don't believe it will work with the current libs. Also I believe the Leo page is just wrong when it says pins 2 & 3 support pin change interrupts (I think this was copied from Uno but it is incorrect, the only (exposed) pins are D8 D9 D10 and D11 that support PCINT according to the ATMEL datasheet (and these are PCINT 4-7 not the ones in the interrupt mapping table AFAICT). I believe this is where I can stop worrying that I'd be wasting the time of the core Arduino developers, and say quod erat demonstrandum; it a bug in SoftwareSerial. Hopefully they can update the Arduino IDE before the boards hits wider distribution. Update : So, it turns out that this is a known limitation of the Leonardo. David Mellis looked into it, and left this comment : You're right that the Leonardo only has one pin change interrupt, meaning that the software serial receive doesn't work on every pin. You should, however, be able to use pins 8 to 11 (inclusive) as receive pins for software serial. Additionally, the SPI pins (MISO, SCK, MOSI) available on the ICSP header and addressable from the Arduino software as pins 14, 15, and 16 should work. He is, of course, correct. I'm not sure why my testing didn't work on pins 8-11, but they do indeed work fine. Unfortunately, this means that the Leonardo is not compatible with a number of cool shields. The Arduino SoftSerial Library Reference documentation has been updated with a more detailed list of limitations. ## Blogging my candidacy exam Posted by Russell on March 04, 2012 at 4:02 p.m. (This is cross-posted from a guest article I wrote on Jonathan's Blog last week. I thought it would be cool to have it on my own blog too.) Because this seems to be my default mode of organizing my thoughts when it comes to research, I've decided to write my dissertation proposal as a blog post. This way, when I'm standing in front of my committee on Thursday, I can simply fall back on one my more more annoying habits; talking at length about something I wrote on my blog. Or, since he has graciously lent me his megaphone for the occasion, I can talk at length about something I wrote on Jonathan's blog. ### Introduction : Seeking a microbial travelogue Last summer, I had a lucky chance to travel to Kamchatka with Frank Robb and Albert Colman. It was a learning experience of epic proportions. Nevertheless, I came home with a puzzling question. As I continued to ponder it, the question went from puzzling to vexing to maddening, and eventually became an unhealthy obsession. In other words, a dissertation project. In the following paragraphs, I'm going to try to explain why this question is so interesting, and what I'm going to do to try answer it. About a million years ago (the mid-Pleistocene), one of Kamchatka's many volcanoes erupted and collapsed into its magma chamber to form Uzon Caldera. The caldera floor is now a spectacular thermal field, and one of the most beautiful spots on the planet. I regularly read through Igor Shpilenok's Livejournal, where he posts incredible photographs of Uzon and the nature reserve that encompasses it. It's well worth bookmarking, even if you can't read Russian. The thermal fields are covered in hot springs of many different sizes. Here's one of my favorites : Each one of these is about the size of a bowl of soup. In some places the springs are so numerous that it is difficult to avoid stepping in them. You can tell just by looking at these three springs that the chemistry varies considerably; I'm given to understand that the different colors are due to the dominant oxidation species of sulfur, and the one on the far left was about thirty degrees hotter than the other two. All three of them are almost certainly colonized by fascinating microbes. The experienced microbiologists on the expedition set about the business of pursuing questions like Who is there? and What are they doing? I was there to collect a few samples for metagenomic sequencing, and so my own work was completed on the first day. I spent the rest of my time there thinking about the microbes that live in these beautiful hotsprings, and wondering How did they get there? Extremophiles are practically made-to-order for this question. The study of extremophile biology has been a bonanza for both applied and basic science. Extremophiles live differently, and their adaptations have taught us a lot about how evolution works, about the history of life on earth, about biochemistry, and all sorts of interesting things. However, their very peculiarity poses an interesting problem. Imagine you would freeze to death at 80° Celsius. How does the world look to you? Pretty inhospitable; a few little ponds of warmth dotted across vast deserts of freezing death. Clearly, dispersal plays an essential role for the survival and evolution of these organisms, yet we know almost nothing about how they do it. The model of microbial dispersal that has reigned supreme in microbiology since it was first proposed in 1934 is Lourens Baas Becking's, "alles is overal: maar het milieu selecteert" (everything is everywhere, but the environment selects). This is a profound idea; it asserts that microbial dispersal is effectively infinite, and that differences in the composition of microbial communities is due to selection alone. The phenomenon of sites that seem identical but have different communities is explained as a failure to understand and measure their selective properties well enough. This model has been a powerful tool for microbiology, and much of what we know about cellular metabolism has been learned by the careful tinkering with selective growth media it exhorts one to conduct. Nevertheless, the Baas Becking model just doesn't seem reasonable. Microbes do not disperse among the continents by quantum teleportation; they must face barriers and obstacles, some perhaps insurmountable, as well as conduits and highways. Even with their rapid growth and vast numbers, this landscape of barriers and conduits must influence their spread around the world. Ecologists have known for a very long time that these barriers and conduits are crucial evolutionary mechanisms. Evolution can be seen as an ainteraction of two processes; mutation and selection. The nature of the interaction is determined by the structure of the population in which they occur. This structure is determined by biological processes such as sexual mechanisms and recombination, which are in turn is determined chiefly by the population's distribution in space and its migration in that space. As any sports fan knows, the structure of a tournament can be more important than the outcome of any particular game, or even the rules of the game. This is true for life, too. From one generation to the next, genes are shuffled and reshuffled through the population, and the way the population is compartmentalized sets the broad outlines of this process. A monolithic population -- one in which all players are in the same compartment -- evolves differently than a fragmented population, even if mutation, recombination and selection pressures are identical. And so, if we want to understand the evolution of microbes, we need to know something about this structure. Bass Becking's hypothesis is a statement about the nature of this structure, specifically, that the structure is monolithic. If true, it means that the only difference between an Erlenmeyer flask and the entire planet is the number of unique niches. The difference in size would be irrelevant. This is a pretty strange thing to claim. And yet, the Baas Becking model has proved surprisingly difficult to knock down. For as long as microbiologists have been systematically classifying microbes, whenever they've found similar environments, they've found basically the same microbes. Baas Becking proposed his hypothesis in an environment of overwhelming evidence. However, as molecular techniques have allowed researchers to probe deeper into the life and times of microbes (and every other living thing), some cracks have started to show. Rachel Whitaker and Thane Papke have challenged the Bass Becking model by looking at the biogeography of thermophilic microbes (such as Sulfolobus islan and Oscillatoria amphigranulata), first by 16S rRNA phylogenetics and later using high resolution, multi-locus methods. Both Rachel's work and Papke's work, as well as many studies of disease evolution, very clearly show that when you look within a microbial species, the populations do not appear quite so cosmopolitan. While Sulfolobus islandicus is found in hot springs all over the world, the evolutionary distance between each pair of its isolates is strongly correlated with the geographic distance between their sources. So, these microbes are indeed getting around the planet, but if we look at their DNA, we see that they are not getting around so quickly. However, Baas Becking has an answer for this; "...but the environment selects." What if the variation is due to selection acting at a finer scale? It's well established that species sorting effects play a major role in determining the composition of microbial communities at the species level. There is no particular reason to believe that this effect does not apply at smaller phylogenetic scales. The work with Sulfolobus islandicus attempts to control for this by choosing isolates from hot springs with similar physical and chemical properties, but unfortunately there is no such thing as a pair of identical hot springs. Just walk the boardwalks in Yellowstone, and you'll see what I mean. The differences among the sites from which these microbes were isolated can always be offered as an alternative explanation to dispersal. Even if you crank those differences down to nearly zero, one can always suggest that perhaps there is a difference that we don't know about that happened to be important. This is why the Baas Becking hypothesis is so hard to refute: One must simultaneously establish that there is a non-uniform phylogeographic distribution, and that this non-uniformity is not due to selection-driven effects such as species sorting or local adaptive selection. To do this, we need a methodology that allows us to simultaneously measure phylogeography and selection. There are a variety of ways of measuring selection. Jonathan's Evolution textbook has a whole chapter about it. I'll go into a bit more detail in Aim 3, but for now, I'd just like to draw attention to the fact that the effect of selection does not typically fall uniformly across a genome. This non-uniformity tends to leave a characteristic signature in the nucleotide composition of a population. Selective sweeps and bottlenecks, for example, are usually identified by examining how a population's nucleotide diversity varies over its genome. For certain measures of selection (e.g., linkage disequilibrium) one can design a set of marker genes that could be used to assay the relative effect of selection among populations. This could then extend the single species, multi-locus phylogenetic methods that have already been used to measure the biogeography of microbes to include information about selection. This could, in principle, allow one to simultaneously refute "everything is everywhere..." and "...but the environment selects." However, designing and testing all those markers, ordering all those primers and doing all those PCR reactions would be a drag. If selection turned out to work a little differently than initially imagined, the data would be useless. But, these are microbes, after all. If I've learned anything from Jonathan, it's that there is very little to be gained by avoiding sequencing. We're getting better and better at sequencing new genomes, but it is not a trivial undertaking. However, re-sequencing genomes is becoming routine enough it's replacing microarray analysis for many applications. The most difficult part of re-sequencing an isolate is growing the isolate. Fortunately, re-sequencing is particularly well suited for culture-independent approaches. As long as we have complete genomes for the organisms we're interested in, we can build metagenomes from environmental samples using our favorite second-generation sequencing platform. Then we simply map the reads to the reference genomes. The workflow is a bit like ChIP-seq, except without culturing anything and without the ChIP. We go directly from the environmental sample to sequencing to read-mapping. Maybe we can call it Eco-seq? That sounds catchy. Not only is the whole-genome approach better, but with the right tools, it is easier and cheaper that multi-locus methods, and allows one to include many species simultaneously. The data will do beautifully for phylogeography, and have the added benefit that we can recapitulate the multi-locus methodology by throwing away data, rather collecting more. To implement this, I have divided my project into three main steps : • Aim 1 : Develop a biogeographical sampling strategy to optimize representation of a natural microbial community • Aim 2 : Develop an apply techniques for broad matagenomic sampling, metadata collection and data processing • Aim 3 : Test the dispersal hypothesis using a phylogeographic model with controls for local selection But, before I get into the implementation, I should pause for a moment and make sure I've stated my hypothesis perfectly clearly : I think that dispersal plays a major role in establishing the composition of microbial communities. The Baas Becking hypothesis doesn't deny that dispersal happens, in fact, it asserts that dispersal is infinite, but that it is selection, not dispersal, that ultimately determines which microbes are found in any particular place. If I find instead that dispersal itself plays a major role in determining community composition, then the world is a very different place to be a microbe. ### Aim 1 : Develop a biogeographical sampling strategy to optimize the representation of a complete natural community While I would love to keep visiting places like Kamchatka and Yellowstone, I've decided to study the biogeography of halophiles, specifically in California and neighboring states. Firstly, because I can drive and hike to most of the places were they grow. Secondly, because the places where halophiles like to grow tend to be much easier to get permission to sample from. Some of them are industrial waste sites; no worry about disturbing fragile habitats. Thirdly, because our lab has been heavily involved in sequencing halophile genomes, which are necessary component of my approach. There is also a fourth reason, but I'm saving it for the Epilogue. As I have written about before, the US Geological Survey has built a massive catalog of hydrological features across the Western United States. It's as complete a list of the substantial, persistent halophile habitats one could possibly wish for. It has almost two thousand possible sites in California, Nevada and Oregon alone : USGS survey sites. UC Davis is marked with a red star. The database is complete enough that we can get a pretty good sense of what the distribution of sites looks like within this region just by looking at the map. The sites are basically coincident with mountain ranges. Even though they aren't depicted, the Coastal Range, the Sierras, the Cascades and the Rockies all stand out. This isn't surprising; salt lakes require some sort of constraining geographic topology, or the natural drainage would simply carry the salt into the ocean. Interestingly, hot springs are also usually found in mountains (some of these sites are indeed hot springs), but that has less to do with the mountains themselves as it does with the processes that built mountains. To put it more pithily, you find salt lakes where there are mountains, but you find mountains where there are hot springs. This database obviously contains too many sites to visit. It took Dr. Mariner's team forty years to gather all of this information. I need to choose from among these sites. But which ones? Is there a way to know if I'm making good selections? Does it even matter? As it turns out, it does matter. When we talk about dispersal in the context of biogeography, we are making a statement about the way organisms get from place to place. Usually, we expect to see a distance decay relationship, because we expect that more distant places are harder to get to, and thus the rates of dispersal across longer distances should be lower. I need to be reasonably confident that I will see the same distance-decay relationship within the sub-sample that I would have seen for every site in the database. This doesn't necessarily mean that the microbes will obey this relationship, but if they do, I need data that would support the measurement. There is a pretty straightforward way of doing this. If we take every pair of sites in the database, calculate the Great Circle distance between them, and then sort these distances, we can get spectrum of pairwise distances. Here's what that looks like for the sites in my chunk of the USGS database : The spectrum of pairwise distances among all sites in the USGS databse (solid black), among randomly placed sites over the same geographic area (dashed black), and among random sub-sample of 360 sites from the database (solid red). I've plotted three spectra here. The dashed black line is what you'd get if the sites had been randomly distributed over the same geographic area, and the solid black line is the spectra of the actual pairwise distances. As you can see, the distribution is highly non-random, but we already knew this just by glancing at the map. The red line is the spectrum of a random sub-sample of 360 sites from the database (I chose 360 because that is about how many samples I could collect in five one-week road trips). This sub-sample matches the spectrum of the database pretty well, but not perfectly. It's easy to generate candidate sub-samples, and they can be scored by how closely their spectra match the database. I'd like to minimize the amount of time it takes me to finish my dissertation, which I expect will be somewhat related to the number of samples I collect. There is a cute little optimization problem there. Although I've outlined the field work, laboratory work and analysis as separate steps, these things will actually take place simultaneously. After I return from the field with the first batch of samples, I will process and submit them for sequencing before going on the next collection trip. I can dispatch the analysis pipeline from pretty much anywhere (even with my mobile phone). That's why I've set aside sample selection and collection as a separate aim. The sample selection process determines where to start, how to proceed, and when I'm done. ### Aim 2 : Develop an apply techniques for broad matagenomic sampling, metadata collection and data processing In order to build all these genomes, I need to solve some technical problems. Building this many metagenomes is a pretty new thing, and so some of the tools I need did not exist in a form (or at a cost) that is useful to me. So, I've developed or adapted some new tools to bring the effort, cost and time for large-scale comparative metagenomics into the realm of a dissertation project. There are four technical challenges : • Quickly collect a large number of samples and transport them to the laboratory without degradation. • Build several hundred sequencing libraries. • Collect high-quality metadata describing the sites. • Assemble thousands of re-sequenced genomes. To solve each of these problems, I've applied exactly the same principle : Simplify and parallelize. I can't claim credit for the idea here, because I was raised on it. Literally. Sample collection protocol When I first joined Jonathan's lab, Jenna Morgan (if you're looking for her newer papers, make sure to add "Lang," as she's since gotten married) was testing how well metagenomic sequencing actually represents the target environment. In her paper, now out in PLoS ONE, one of the key findings is that mechanical disruption is essential. I learned during my trip to Kamchatka that getting samples back to the lab without degradation is very hard, and it really would be best to do the DNA extraction immediately. Unfortunately, another lesson I learned in Kamchatka is that it is surprisingly difficult to do molecular biology in the woods. One of the ways I helped out while I was there was to kill mosquitoes trying to bite our lab technician so she wouldn't have to swat them with her gloved hands. It's not easy to do this without making an aerosol of bug guts and blood over the open spin columns. So, I was very excited when I went to ASM last year, and encountered a cool idea from Zymo Research. Basically, it's a battery-operated bead mill, and a combined stabilization and cell lysis buffer. This solves the transportation problem and the bead-beating problem, without the need to do any fiddly pipetting and centrifuging in the field. Also, it looks cool. Unfortunately, the nylon screw threads on the sample processor tend to get gummed up with dirt, so I've designed my own attachment that uses a quick-release style fitting instead of a screw top. It's called the Smash-o-Tron 3000, and you can download it on Thingiverse. Sequencing library construction The next technical problem is actually building the sequencing libraries. Potentially, there could be a lot of them, especially if I do replicates. If I were to collect three biological replicates from every site on the map, I would have to create about six thousand metagenomes. I will not be collecting anywhere close to six thousand samples, but I thought it was an interesting technical problem. So I solved it. Well, actually I added some mechanization to a solution Epicentre (now part of Illumina) marketed, and my lab-mates Aaron Darling and Qingyi Zhang have refined into a dirt-cheap multiplexed sequencing solution. The standard technique for building Illumina sequencing libraries involves mechanically shearing the source DNA, ligating barcode sequences and sequencing adapters to the fragments, mixing them all together, and then doing size selection and cleanup. The first two steps of this process are fairly tedious and expensive. As it turns out, Tn5 transposase can be used to fragment the DNA and ligate the barcodes and adapters in one easy digest. Qingyi is now growing huge quantities of the stuff. The trouble is that DNA extraction yields an unpredictable amount of DNA, and the activity of Tn5 is sensitive to the concentration of target DNA. So, before you can start the Tn5 digest, you have to dilute the raw DNA to the right concentration and aliquat the correct amount for the reaction. This isn't a big deal if you have a dozen samples. If you have thousands, the dilutions become the rate limiting step. If I'm the one doing the dilutions, it becomes a show-stopper at around a hundred samples. I'm just not that good at pipetting. (Seriously.) The usual way of dealing with this problem is to use a liquid handling robot. Unfortunately, liquid handling robots are stupendously expensive. Even at their considerable expense, many of them are shockingly slow. To efficiently process a large number of samples, we need to be able to treat every sample exactly the same. This way, can bang through the whole protocol with a multichannel pipetter. It occurred to me that many companies sell DNA extraction kits that use spin columns embedded in 96-well plates, and we have a swinging bucket centrifuge with a rotor that accommodates four plates at a time. So, the DNA extraction step is easy to parallelize. The Tn5 digests work just fine in 96-well plates. We happen to have (well, actually Marc's lab has) a fluorometer that handles 96-well plates. Once the DNA extraction is finished, I can use a multichannel pipetter to make aliquats from the raw DNA, and measure the DNA yield for each sample in parallel. So far, so good. Now, to dilute the raw DNA to the right concentration for the Tn5 digest, I need to put an equal volume of raw DNA into differing amounts of water. This violates the principle of treating every sample the same, which means I can't use a multichannel pipetter to get the job done. That is, unless I have a 96-well plate that looks like this : Programmatically generated dilution plate CAD model I wrote a piece of software that takes a table of concentration measurements from the fluorometer, and designs a 96-well plate with wells of the correct volume to dilute each sample to the right concentration for the Tn5 digest. If I make one of these plates for each batch of 96 samples, I can use a multichannel pipetter throughout. Of course, unless you are Kevin Flynn, you can't actually pipette liquids into a 3D computer model and achieve the desired effect. To convert the model from bits into atoms, I ordered a 3D printer kit from Ultimaker. (I love working in this lab!) The Ultimaker kit After three days of intense and highly entertaining fiddling around, I managed to get the kit assembled. A few more days of experimentation yielded my first successful prints (a couple of whistles). A few days after that, I was starting my first attempts to build my calibrated volume dilution plates. Dawei Lin and his daughter waiting for their whistle (thing 1046) to finish printing. Learning about 3D printing has been an adventure, but I've got the basics down and I'm now refining the process. I'm now printing plates with surprisingly good quality. I've had some help from the Ultimaker community on this, particularly from Florian Horsch. Much to my embarrassment, the first (very lousy) prototype of my calibrated volume dilution plate ended up on AggieTV. Fortunately, the glare from the window made it look much more awesome than it actual was. The upshot is that if I needed to make ten or twenty thousand metagenomes, I could do it. I can print twelve 96-well dilution plates overnight. Working at a leisurely pace, these would allow me to make 1152 metagenome libraries in about two afternoons' worth of work. I'm pretty excited about this idea, and there are a lot of different directions one could take it. The College of Engineering here at UC Davis is letting me teach a class this quarter that I've decided to call "Robotics for Laboratory Applications," where we'll be exploring ways to apply this technology to molecular biology, genomics and ecology. Eight really bright UC Davis undergraduates have signed up (along with the director of the Genome Center's Bioinformatics Core), and I'm very excited to see what they'll do! Environmental metadata collection To help me sanity check the selection measurement, I decided that I wanted to have detailed measurements of environmental differences among sample sites. Water chemistry, temperature, weather, and variability of these are known to select for or against various species of microbes. The USGS database has extremely detailed measurements of all of these things, all the way down to the isotopic level. However, I still need to take my own measurements to confirm that the site hasn't changed since it was visited by the USGS team, and to get some idea of what the variability of these parameters might be. It would also be nice if I could retrieve the data remotely, and not have to make return trips to every site. Unfortunately, these products are are extraordinarily expensive. The ones that can be left in the field for a few months to log data cost even more. The ones that can transmit the data wirelessly are so expensive that I'd only be able to afford a handful if I blew an entire R01 grant on them. This bothers me on a moral level. The key components are a few probes, a little lithium polymer battery, a solar panel the size of your hand, and a cell phone. You can buy them separately for maybe fifty bucks, plus the probes. Buying them as an integrated environmental data monitoring solution costs tens of thousands of dollars per unit. A nice one, with weather monitoring, backup batteries and a good enclosure could cost a hundred thousand dollars. You can make whatever apology you like on behalf of the industry, but the fact is that massive overcharging for simple electronics is preventing science from getting done. So, I ordered a couple of Arduino boards and made my own. My prototype Arduino-based environmental data logger. This version has a pH probe, Flash storage, and a Bluetooth interface. The idea is to walk into the field with a data logger and a stick. Then I will find a suitable rock. Then I will pound the stick into the mud with the rock. Then I will strap the data logger to the stick, and leave it there while I go about the business of collecting samples. To keep it safe from the elements, the electronics will be entombedin a protective wad of silicone elastomer with a little solar panel and a battery. The bill of materials for one of these data loggers is about$200, and so I won't feel too bad about simply leaving them there to collect data. If the site has cell phone service, I will add a GSM modem to the datalogger (I like the LinkSprite SM5100B with SparkFun's GSM shield), and transmit the data to my server at UC Davis through an SMS gateway. Then I don't have to go back to the site to collect the data. This could easily save 200 worth of gasoline. I'll put a pre-paid return shipping labels on them so that they can find their way home someday. I'm eagerly looking forward to decades of calls from Jonathan complaining about my old grimy data loggers showing up in his mail. From the water, the data logger can measure pH, dissolved oxygen, oxidation/reduction potential, conductivity (from which salinity can be calculated), and temperature. I may also add a small weather station to record air temperature, precipitation, wind speed and direction, and solar radiation. I doubt if all of these parameters will be useful, but the additional instrumentation is not very expensive. Assembling the genomes The final technical hurdle is assembling genomes from the metagenomic data. If I have 360 sites and 100 reference genomes, I'm going to have to assemble 36,000 genomes. Happily, I am really re-sequencing them, which is much, much easier than de novo sequencing. Nevertheless, 36,000 is still a lot of genomes. For each metagenome, I must : • Remove adapter contamination with TagDust • Trim reads for quality, discard low quality reads • Remove PCR duplicates • Map reads to references with bwa, bowtie, SHRiMP, or whatever This yields a BAM file for each metagenome, each representing an alignment of reads to each scaffold of each reference genome. All of the reference genomes can be placed into a single FASTA file with a consistent naming scheme for distinguishing among scaffolds belonging to different organisms. A hundred-odd archaeal reference genomes is about 200-400 megabases, or an order of magnitude smaller than the human genome. Using the Burrows-Wheeler Aligner on a reasonably modern computer, this takes just a few minutes for each metagenome. I'm impatient, though, and so I applied for (and received) an AWS in Education grant. Then I wrote a script that parcels each metagenome off to a virtual machine image, and then unleashes all of them simultaneously on Amazon.com's thundering heard of rental computers. Once they finish their alignment, each virtual machine stores the BAM file in my Dropbox account and shuts down. The going rate for an EC2 Extra Large instance is0.68 per hour.

This approach could be used for any re-sequencing project, including ChIP-seq, RNA-seq, SNP analysis, and many others.

### Aim 3 : Test the dispersal hypothesis using a phylogeographic model with controls for local selection

In order to test my hypothesis, I need to model the dispersal of organisms among the sites. However, in order to do a proper job of this, I need to make sure I'm not conflating dispersal and selective effects in the data used to initialize the model. There are three steps :
• Identify genomic regions that have recently been under selection
• Build genome trees with those regions masked out
• Model dispersal among the sites
In all three cases, there are a large number of methods to choose from.

One way of detecting the effects of selection is Tajima's D. This measures deviation from the neutral model by comparing two estimators of the neutral genetic variation, one based on the nucleotide diversity and one based on the number of polymorphic sites. Neutral theory predicts that the two estimators are equal, and so genomic regions in which these two estimators are not equal are evolving in a way that is not predicted by the neutral model (i.e., they are under some kind of selection). One can do this calculation on a sliding window to measure Tajima's D for each coordinate of each the genome of each organism. As it turns out, this exact approach was used by David Begun's lab to study the distribution of selection across the Drosophilia genome.

I will delete the regions of the genomes that deviate significantly (say, by more than one standard deviation) from neutral. Then I'll make whole genome alignments, and build a phylogenetic trees for each organism. This tree would contain only characters that (at least insofar as you believe Tajima's D and Wu and Fey's FST) are evolving neutrally, and are not under selection.

A phylogenetic tree represents evolutionary events that have taken place over time. In order to infer the dispersal of the represented organisms, would need model where those events took place. Again, there are a variety of methods for doing this, and but my personal favorite is probably the approach used by Isabel Sanmartín for modeling dispersal of invertebrates among the Canary Islands. I don't know if this is necessarily the best method, but I like the idea that the DNA model and the dispersal model use the same mathematics, and are computed together. Basically, they allowed each taxa to evolve its own DNA model, but constrained by the requirement that they share a common dispersal model. Then they did Markov Chain Monte Carlo (MCMC) sampling of the posterior distributions of island model parameters (using MrBayes 4.0).

According to Wikipedia, the most respected and widely consulted authority on this and every topic, the General Time Reversible Model it is the most generalized model describing the rates at which one nucleotide replaces another. If we want to know the rate at which a thymine turns into a guanine, we look at elment (2,3) of this matrix :

πG is the stationary state frequency for guanine, and rTG is the exchangability rate between from T to G. However, if we think of this a little differently, as Sanmartín suggests in her paper, we can use the GTR model for the dispersal of species among sites (or islands). If we want to know the rate at which a species migrates from island B to island C, we look in cell (2,3) of a very similar matrix :

Here, πC is the relative carrying capacity of island C, and rBC is the relative dispersal rate from island B to island C. Thus, the total dispersal from island i to island j is

dij = Nπirijπjm

where N is the total number of species in the system, and m is the group-specific dispersal rate. This might look something like this :

One nifty thing I discovered about MrBayes is that it can link against the BEAGLE library, which can accelerate these calculations using GPU clusters. Suspiciously, Aaron Darling is one of the authors. If you were looking for evidence that the Eisen Lab is a den of Bayesians, this would be it.

This brings us, at last, back to the hypothesis and Baas Becking. Here we have a phylogeographic model of dispersal among sites within a metacommunity, with the effects of selection removed. If the model predicts well-supported finite rates of dispersal within the metacommunity, my hypothesis is sustained. If not, then Baas Becking's 78 year reign continues.

### Epilogue : Lourens Baas Becking, the man verses the strawman

Lourens Baas Becking

Microbiologists have been taking potshots at the Baas Becking hypothesis for a decade or two now, and I am no exception. I'm certainly hoping that the study I've outlined here will be the fatal blow.

However, it's important to recognize that we've been a bit unfair to Baas Becking himself. The hypothesis that carries his name is a model, and Baas Becking himself fully understood that dispersal must play an important role in community formation. He understood perfectly well that "alles is overal: maar het milieu selecteert" was not literally true; it is only mostly true, and then only in the context of the observational methodology available at the time. In 1934, in the same book where he proposed his eponymous hypothesis, he observed that there are some habitats that were ideally suited for one microbe or another, and yet these microbes were not present. He offered the following explanation: "There thus are rare and less rare microbes. Perhaps there are very rare microbes, i.e., microbes whose possibility of dispersion is limited for whatever reason."

Useful models are never "true" in the usual sense of the word. Models like the Baas Becking hypothesis divide the world into distinct intellectual habitats; one in which the model holds, and one in which it doesn't. At the shore between the two habitats, there is an intellectual littoral zone; a place where the model gives way, and something else rises up. As any naturalist knows, most of the action happens at interfaces; land and sea, sea and air, sea and mud, forest and prairie. The principle applies just as well to the landscape of ideas. The limits of a model, especially one as sweeping as Baas Becking's, provides a lot of cozy little tidal ponds for graduate students to scuttle around in.

By the way, guess where Lourens Baas Becking first developed his hypothesis? He was here in California, studying the halopiles of the local salt lakes. In fact, the very ones I will be studying.

## The moral imperative for Open Science

Posted by Russell on February 09, 2012 at 2:41 a.m.
So, there is this law in Congress called the Research Works Act. If enacted, it would prohibit open access mandates by federal granting agencies. It would end the policy that NIH-funded research be deposited in PubMed, and it would prevent other agencies from establishing such policies. It is a bad idea for a lot of reasons. If you want a elegant dissection of the reasons why the Research Works Act is a bad idea, I suggest Michael Eisen's many posts on the topic.

There is now an effort to boycott Elsevier, the authors and primary proponents of the Research Works Act. I signed the pledge, though unfortunately my name doesn't carry much weight. Last week, my advisor got a little worked up about it, and suggested that scientists should perhaps ignore papers published by Elsevier, and then changed his mind about it after some cogent arguments were raised.

It won't serve the progress of science to ignore new discoveries because we don't like the journals they were published in. However, I do not believe that this point, however cogent, is enough to carry the day. He wasn't suggesting that we ignore the discovery, but rather to ignore the publication. We can, and should, treat publications in closed access journals as illegitimate claims to the scientific record.

I was a little puzzled at first that Jonathan didn't make this point. After all, Michael and Jonathan are often called Open Access stormtroopers. But then I remembered that, despite their passion, storming and trooping are not really in their natures. They lack the necessary highhanded arrogance for those activities. As a physicist who jumped into biology late in my education, it's often been pointed out to me that there is nothing so noxiously arrogant as a physicist moonlighting as a biologist. I try very hard not to be "that guy," but I still get the occasional eye roll. So, just for this occasion, I'm going to uncork a little physicist's swagger by framing the need for Open Access scientific publication as a moral absolute.

Whenever something is unclear, the physicist in me always looks at two things. First, I look at the asymptotic behavior (the extremes), and then I look at simplified models that match the asymptotic behavior. Then I check to see if I've got the right model by looking at how it scales and generalizes to the full problem.

On the continuum of open to closed access publishing, the asymptotic behavior in the closed-access direction is simply not publishing a finding at all. In this case, the moral reasoning is simple (that is not to say universally agreed upon, but simple nevertheless). Take the invention of Calculus, for example. Leibniz, not Newton, was the one who did the work, took the risks, and invested time and effort to bring Calculus to the world. In my opinion, this is what matters in terms of establishing precedence. The more one examines Newton's behavior, the more one wishes to credit Leibniz. I am willing to take the plunge and assert that the precise chronology is irrelevant to the question of apportioning credit. What is relevant is the work of pedagogy.

Publishing in a closed-access journal is secret-keeping. It keeps the information confined among a certain group of people. Combined with copyright, it is a life-destroying secret to anyone who shares it without permission. I think it would be fair to weight the amount of credit according to the extent to which the discovery was shared.

Now I'm going to resort to another irritating behavior typical of physicists; I shall reduce a complicated, nuanced situation to a Gedankenexperiment preserving only the essential features, and then extrapolate the results back to the real world.

Suppose you are making soup for dinner, and you discover that a quart of bleach has somehow spilled into the soup. You immediately tell your family that the soup is poisoned. Nobody eats the soup. Dinner is ruined, but everyone is safe. This is what a normal person would do.

Now let's look at the other extreme. Suppose you noticed soup was poisoned, but you kept it to yourself. You watch your family eat the soup, and they get sick. Only a very, very bad person would do this.

We have a sort of emotional model that lets us make a moral judgement about the behavior of the discoverer in this situation. It's easy to make a moral judgement about the extremes, which is why we looked at them. Now, let's look at the in-between situation.

Suppose instead of announcing the discovery, you tell your son that you know something very important. You demand that he give you something precious in order for you to tell him what it is. Then, after taking away one of his favorite toys, you whisper your discovery about the soup into his ear. Then you tell him he mustn't tell anyone, or he will be in very big trouble. So much trouble that you will take all his toys away and never speak to him again. Then you let your spouse and daughter eat the soup, and they get sick.

The in-between behavior is more unethical than the "bad" extreme! Yes, it's better that one fewer person is hurt, but that is a statement about the outcome, not about the behavior of the discoverer. Selecting the in-between behavior just as callous, but adds cruelty and selfishness.

If you behaved this way, it is clear you would not deserve full credit for your discovery. At most, you could claim one third of the possible credit for your discovery because you only shared the knowledge with one third of the people who stood to be affected by it. Most people would give you much less credit than that. We have many pungent words for people who behave like this which I shall not enumerate.

This scenario is exactly equivalent to publishing in a closed access journal. An author cannot excuse themselves by drawing a distinction between the practices of the journal and their own practices and wishes; by choosing a journal, the author chooses that journal's behavior. There are hundreds of journals with a rich spectrum of behaviors ranging from upstanding and public-spirited to cynical and predatory. Through your choice of journal, you must own that journal's behavior.

Science is a universal human enterprise. When an important discovery is made, it eventually touches the lives of every singe human being. If you keep a discovery secret from anyone, you are behaving reprehensibly. It is only fair that your credit and reputation should suffer as a consequence. It is only fair for other scientists to question the legitimacy of your claim to the credit.

The simple and appropriate punishment for keeping secrets is to always treat the first openly available paper as the actual record of discovery. The fact that someone else may have discovered it first, and kept it secret, is a technicality of interest only to historians. This is what science has always done. I am merely suggesting that we regard secrecy on a sliding scale, and to take into account the role that science plays in the world.

What scientists, especially American scientists, need to begin doing is to take a more pragmatic view of what constitutes a secret. What fraction of human beings on Earth have access to Elsevier's catalog? One in a thousand? One in a hundred thousand? One in a million?

The problem isn't the cost. It's the behavior. If you told both of your children about the poisoned soup, but not your spouse, you'd still be an asshole. If you decreased the number of toys that needed to be sacrificed to have access to the discovery, you'd still be an asshole. If you relaxed the punishment for sharing the secret discovery, you'd still be an asshole. If you shared only a summary of your discovery (e.g., 'some of the food in this house has been poisoned'), you'd still be an asshole.

There is one, and only one ethical way to handle a discovery, and that is to share it freely.

## These are not the microbes you are looking for

Posted by Russell on October 04, 2011 at 6:45 p.m.
A few months ago, I tweeted, "I've been working in a microbiology lab for two years, and just realized we don't actually have a microscope. Huh."

Jack Gilbert and some other people proceeded to give me grief for what I intended as an interesting observation about the current state of the art in microbiology. So, I decided to remedy the situation. Evidently, we do have a microscope, I just didn't know where it was.

Here are some cool things I found by randomly poking around in some of my samples from Borax Lake. This first thing I found is probably some kind of diatom from in the sediment of the little hot spring just north of Borax Lake. I'm not looking for diatoms, but it looks really, really cool.

Here they are at 100x magnification.

This is somewhat less cool-looking, but is probably what I'm actually looking for. In the little bubble of water surrounding the granule in the center, there were a couple little rods hopping around. No clue what they are, they're there, doing what they do.

## 3D printing update

Posted by Russell on October 04, 2011 at 12:19 a.m.
I've been working a bit on the software that generates my 96-well dilution plate. I have a new version that cuts the plastic use by about 80% and print time by about the same. Also, it now prints with the wells upside-down on the build platform, which should help cut down contamination during the printing process.

Things to do :

• I'm going to try cutting the plastic use even more by adding a skirt around the plate (like a normal titer plate), and adjusting the outer height of each well.
• Add a fill-line to each well.
• Raise the well edges a little more, and add drain-holes between wells to prevent spillage between wells and to make filling easier.
• Add embossed row and column labels.
• Add an embossed text area for user notations (e.g., for which sample group is this plate calibrated).
Hmm. I might pull this off yet.

Also, if you are interested in this stuff, the UC Davis Biomedical Engineering made me instructor of a variable unit class (graded P/NP) called "Research internship in robotics for the laboratory" for Winter 2012. Sign up for BIM192, sec 2 (the CRN is 24791).

## Borax Lake : Sample collection and processing

Posted by Russell on October 01, 2011 at 1:57 a.m.
I've admired Rosie Redfield's lab-notebook-as-blog from the moment I started reading it, and I've been looking for an excuse to steal pilfer abscond with adapt her idea. However, most of the work I've been doing up to this point has been computational, and try as I might, I can't make myself keep a lab notebook for programming. That's what things like GitHub are for. I've started actually doing some of the laboratory and field work for my thesis project, and so I finally have an excuse to do some open lab and field notebook blogging.

Rosie might be amused to that I'm starting off with some work I'm doing at an arsenic-heavy lake, although Borax Lake is known more for boron than arsenic. If I bugger up my assays, I just hope she'll get on my case in the comments before I submit anything for publication.

I will write more about this as I go along, but my goal for my thesis project is to try to get an idea about the modality of microbial migration. Specifically, I want to know if microbial taxa, when they colonize a new environment, arrive individually or as an existing consortium. I hope to find out by reconstructing population structures from metagenomic samples from widely dispersed but ecologically similar environments.

I learned about Borax Lake from Robert Mariner of the US Geological Survey, who was kind enough to respond to my emails and patiently discuss his survey results over the course of several lengthy telephone calls. He also volunteered a lot of useful information, such as which sites have rattlesnakes and where they are likely to be found, and helped enormously in search and selection of sampling sites. Without this help, I probably would have had to give up on this project as I originally imagined it.

Borax Lake was one of many thousands of sites across the American West surveyed over a 40 year long project USGS project led by Ivan Barnes and Robert Mariner to study the chemistry and isotopic composition of mineral springs. Extensive analysis of Borax Lake water was conducted by in August of 1972 by John Rapp, and again in July of 1991.

Dr. Mariner pointed out that Borax Lake is administered by the Nature Conservancy, and a little bit of Googleing and emailing got me in touch with Jay Kerby, the Southeast Oregon Project Manager for the Nature Conservancy. Jay was very helpful, and walked me through the process of obtaining sampling permits for my project.

Before I talk about Borax Lake, I need to say that it is absolutely essential that you obtain explicit, written permission before collecting samples. As scientists, we've got to get this stuff right if we want to avoid stuff like this. The fact that some researchers did not (for whatever reason) obtain permission to use the cells they used to make important discoveries, or did not cooperate in good faith with the originators of those cell lines, has made it much more difficult for me to do my own research. Kary Mullis, if you're reading this, thanks for PCR (really), but...

Anyway, I'm not sure if Borax Lake itself is going to be a good candidate for my project (it has very unique chemistry), but it is surrounded by ephemeral pools of brine that may be good analogs to coastal salt ponds. You could think think of this as island biogeography, but inverted; I'm looking for islands of ocean isolated by oceans of land.

The lake itself has a very peculiar mineralized ledge a few inches above the shore. The water has been precipitating an extremely hard material for a very long time. I tried to collect a small amount of it to examine in the lab, and discovered that it is as hard as concrete. Even with the aid of a hammer, I couldn't dislodge any small pieces. I didn't want to damage the ledge itself by taking a larger piece, so I left without any samples of the precipitated material. Borax Lake is sitting atop a thirty foot high pedestal of this stuff.

The Nature Conservancy has been working on some plans to make the site more accessible, but I don't imagine it will get many visitors. It is way off the beaten path. Next time I visit, I'm going to bring a truck. The lake is off of a very lonely state road, up several miles on unpaved, unmarked fire roads, followed by a few miles of ATV tracks. A horse would probably the the ideal way of getting there, but my trusty little Toyota still managed.

This is one of the hotsprings just north of Borax Lake. The first record I have of it is from May 1957 by D.E. White of the USGS. It was visited again in June 1973 by Robert Mariner, and again in September 1976 by Robert Mariner and Bill Evans. The next visit was in July 1991 by Robert Mariner. I measured a surface temperature of 65°C. To my surprise, I saw a couple of Borax Lake chubb swimming around near the cooler (but not much cooler) periphery.

I took four kinds of samples : Unprocessed water samples in 500ml bottles, unprocessed sediment samples in 50ml conical tubes, processed water samples for environmental DNA in Sterivex filters, and processed sediment samples for environmental DNA using Zymo's Xpedition Soil/Fecal miniprep kits. I divided the unprocessed samples between the freezer and the 37° room, and I'll save my notes on the filtered water samples for another article.

One of the unusual things about the Xpedition miniprep kit is that the first spin column is not a DNA binding column; it's more like a crap-catcher. So, you are supposed to keep the flow-through, not discard it as you would with a DNA binding column. John got a little ahead of himself, and discarded the flow-through from four columns before he realized the protocol was different from, well, just about all of the other DNA extraction mini-preps on the market. Fortunately, I collected many extra samples. Also, when I split the work between John and myself, I split up the samples into evens and odds, so that neither of us would be working on all of one group of replicates.

This led to an important lesson : Do not discard the lysis tubes after you've removed the supernatant. It occurred to me that the wreckage of beads, muck and buffer at the bottom of the spent tubes was probably full of DNA, so I added 500 μL of molecular-grade water, vortexed them, and put them back into the centrifuge at 10,000g for a minute, and spun the supernatant through the orange-capped columns. Two of the four yielded plenty of DNA. I'd probably have gotten more if I'd used lysis buffer instead of water, and the bead-beater instead of the vortexer.

I'm still not totally sure what rationale to apply for the last step. The Xpedition miniprep lets you elute the DNA with anywhere from 10 to 100 μL of buffer. If you use less elution buffer, you get less total DNA, but the DNA you get will be at higher concentration. Elute with more, and you get more DNA, but at lower concentration. The actual amount of DNA can vary over four orders of magnitude, and so guessing right is very helpful. But... impossible. I decided to elute in 50 μL, and that seems to have worked OK for my purposes.

I then measured the DNA concentration in a Qubit fluorometer with Invitrogen's Quant-iT high sensitivity assay for dsDNA. Because this requires me to go one-by-one, this is not how I would like to quantify my samples in the future. But, for thirty seven samples, it was easy enough.

 Sample μg/mL Source Description 1 0.723 Don Edwards Wildlife Refuge Salt crystals from site A23 2 0.582 Don Edwards Wildlife Refuge Salt crystals from site A23 3 0.531 Don Edwards Wildlife Refuge Salt crystals from site A23 4 0.824 Don Edwards Wildlife Refuge Salt crystals from site A23 5 0.209 Don Edwards Wildlife Refuge Salt crystals from site A23 6 27.8 Don Edwards Wildlife Refuge Mat community from site A23 7 - Don Edwards Wildlife Refuge Mat community from site A23 (field processing failed) 8 2.57 Borax Lake Sediment (poor collection) 9 44.9 Borax Lake Sediment 10 44.9 Borax Lake Sediment 11 25.1 Borax Lake Sediment 12 47.1 Borax Lake Sediment 13 48.2 Borax Lake Sediment 14 41.2 Borax Lake Sediment 15 41.2 Borax Lake Sediment 16 40.9 Borax Lake Sediment 17 26.1 Borax Lake Sediment 18 31.6 Borax Lake Sediment 19 - Borax Lake Sediment (lost during extraction) 20 81.6 Borax Lake Sediment 21 35.6 Borax Lake Sediment 22 35.9 Borax Lake Sediment 23 - Borax Lake Mat community from hot spring 24 - Borax Lake Mat community from hot spring 25 3.13 Borax Lake Mat community from hot spring 26 44.0 Borax Lake Mat community from hot spring 27 - Borax Lake Mat community from hot spring (salvaged sample) 28 - Borax Lake Mat community from hot spring 29 2.76 Borax Lake Mat community from hot spring (salvaged sample) 30 - Borax Lake Mat community from hot spring 31 - Borax Lake Mat community from hot spring (salvaged sample) 32 - Borax Lake Mat community from hot spring 33 0.72 Borax Lake Mineralized mat community from hot spring 34 22.8 Borax Lake Mineralized mat community from hot spring 35 100 Borax Lake Mineralized mat community from hot spring 36 29.2 Borax Lake Mineralized mat community from hot spring 37 39.2 Borax Lake Mineralized mat community from hot spring (salvaged sample)

For the 19 samples that had DNA concentrations above about 20 μg/mL, I ran a gel to check the size distribution. It looks like the Zymo miniprep performed about as well as they claimed; most of the fragments seem to be between 5 and 10 kilobases, with a fair amount of DNA in fragments larger than 10 kilobases.

I only need about a picogram of input DNA for each transposase tagmentation library, and I only need fragments bigger than about 3 kilobases. So, this process exceeds my absurdly modest requirements by a lot.

I should mention that Anna-Louise Reysenbach graciously lent me a pH probe to use in the field after mine turned out to be dead as a doornail. Issac Wagner, a postdoc in her lab, spent a couple of hours helping me get their field probe calibrated with my meter. Unfortunately, their probe turned out to be in only somewhat better condition than mine, and Anna-Louise asked that I leave it in Portland rather than risk taking bad data with it. I drove directly from Borax Lake to the UC Davis Genome Center in about seven hours, and immediately took the chemical measurements on our benchtop pH meter. It didn't work out, but I still greatly appreciate the help from Anna-Louise and Issac! (Also, thanks goes to my little sister Anna, who has to take Portland's MAX over to Portland State to return the ailing pH probe.)

## Spoon to Bench: A field DNA processing gadget review

Posted by Russell on September 30, 2011 at 1:24 a.m.
In my previous article, I outlined my plans for sequencing a very large number of metagenomes. Assuming that works, there also the problem of actually getting the samples in the first place. Aaron Darling likes to begin the story of metagenomics by saying, "It all begins with a spoon..."

So, how do you get the microbes from the spoon to the laboratory?

One of the things I learned from my experience in Kamchatka was just how tricky collecting samples in the field really is. From lining up permissions and paperwork, to dealing with cantankerous Customs officials, to avoiding getting mauled by bears, the trip from the spoon to the bench is fraught with difficulties. If you mess it up, you either don't get to do any science or you'll end up doing science on spoiled samples.

And then there is the DNA extraction. My lab mate Jenna published a paper last year where she created synthetic communities from cultured cells, and then examined how closely metagenomic sequencing reproduced that community. She found that the community representation was heavily skewed, but that the DNA extraction methodology was critically important. Because it was very difficult to know how well the extraction process was going to work on hot spring sediment, Albert Colman's group basically brought every DNA extraction kit they could lay hands on to Kamchatka. Also, they brought a whole lab with them; a 900-watt BioSpec bead beater (that almost killed our generator), a centrifuge, mini-fuge, a brace of pipetters, gloves, tips, tubes, tube racks, and a lab technician to run the show (see my Uzon Day Four post to see a little of that; also, most of the of heavy crates in the photos).

Albert, Bo and Sarah really did an excellent job pulling all of this together, but it was hard. Watching them (and helping them where I could) got me to think very carefully about how I want to conduct my field research. One thing is for sure; as much as I respect our BioSpec bead beater, I am not going to carry it into the field. Period. In fact, if I can possibly manage it, I am going to restrict my supplies and equipment to what I can carry in a daypack.

I'm still working on how I will do water sampling, but I think I might have found a solution to sediment sampling at the ASM meeting in New Orleans. Zymo Research just came out with a line of field DNA extraction kits that are intended specifically for field collection. The idea is pretty straight-forward; they combined a DNA stabilization buffer with a cell lysis buffer, and made a portable, battery-operated bead beater to go with it.

It's super cool, but I hemmed and hawed for a few months after ASM. I was a little suspicious of my own judgement; the system includes a cool gadget, and so of course I wanted it. I spent a month reading protocols and tinkering around before I finally decided that if the system works the way Zymo claims, it's just about the best thing for my purposes. What clinched it was re-reading Jenna's paper, which clearly shows the importance of thorough cell disruption.

So, I finally decided that I had to give it a try, and that's what this article is about. If you like, you can think of it as a parody of the tedious gadget reviews on Gizmodo and Engadget, with maybe a dollop or two of Anandtech's penchant for brain-liquefying detail.

I guess this wouldn't be proper gadget review unless I started with a meticulous series of photos documenting the unboxing. So, uh, here are the boxes.

The big one contains the sample processor, and the two smaller ones contain 50 DNA extraction mini-preps each. I'm going to leave the mini-prep kits sealed for now, since I'm going to use them for my field work. Zymo provides two DNA extraction mini-kits with the sample processor, so I'm going to use those to test out the system.

Underneath the documentation (directions are for suckers) and the mini-kits, there is the sample processor, a charging station, a 12 volt lithium ion battery pack, and an international power adapter. They also provide some little disks, which I think are for using with conical tubes (they recommend using skirted tubes, since conical tubes can shatter), and a couple of pairs of earplugs.

The earplugs turned out to be... prescient.

The sample processor itself is an modified Craftsman Hammerhead Auto Hammer. Upside? I can buy extra batteries from Sears! Downside? Seeing the $71.99 pricetag from Sears really makes Zymo's$900 pricetag hurt. Our super-powerful bench-top BioSpec bead beater is only about twice that.

When I asked, Zymo said that they've actually modified some of the internals of the Crafstman tool, but this might have just been to discourage me from traipsing off to the hardware store to buy some PVC pipe fittings and a hacksaw. Experience tells me, though, that I could easily fritter away $800 worth of time replicating their engineering. OK,$700. It's a really nice international power adapter.

I was a little disappointed to note that the Craftsman part is made in China. Not that I have anything against things being made in China, but I was under the impression that Craftsman was an American brand. It's a little like discovering that a jar of authentic-seeming salsa is made in New Jersey, or something. I'm sure they make perfectly good salsa in New Jersey. Nevertheless, I have a deep-seated belief that salsa should be made in a Southwestern state by grandmothers who each know five hundred thousand unique salsa recipes, and Craftsman tools should be made in Pennsylvania or West Virginia by guys who wear blue overalls and carry their lunches in pails.

OK, so maybe I do have something against everything being manufactured in China. While using the sample processor in the lab, it suddenly made a very loud click that I hadn't heard before. When I looked carefully, I noticed that there was a piece of metal debris caught in the motor vent. It seems to be made out of aluminum (it's not ferromagnetic). My guess is that this is debris from the manufacturing process, not a broken part of the device. I shook out two other smaller pieces, but lost them before I could photograph them. It looks like the three pieces are part of a square. Most likely this is the remains of an improperly handled punch-out, like a metal version of a paper chad. As you can see, it got kicked around inside the motor housing until it was ejected into the vent. I think Craftsman (or their subcontractor) should get the blame for this, rather than Zymo.

Here is the soil/fecal mini-kit. Each prep uses three sets of spin columns. The bead bashing tubes, as they are labeled, are in the upper right, along with two tubes of lysis/stabilization buffer and a tube of elution buffer.

The protocol says to add the sample first, and then add 750ml of lysis/stabilization buffer, and then bead-beat. But... then you would have to bring a p1000 and tips along with you. No thanks. The sample tubes and the beads had better be chemically stable, or they'd wreck everything. So, I aliquated the buffer into the bead tubes before leaving the lab, and left the p1000 behind. Zymo includes some very fancy spin columns with this kit; they have their own caps, and little nubs on the flowthrough channels that you need to snap off before you use the columns. I've not encountered anything quite like these.

The final step of the kit includes these green-capped columns that are pre-filled with buffer. I wasn't expecting any liquid to be in them, and so of course I spilled the first one on my foot. Don't do that.

So, I took a little miniature field expedition to the exotic environs of the Putah Creek Riparian Reserve to try this out. It didn't take long to find a place that promised to have plenty of microbes.

Here's a soil sample before processing.

I processed some of these samples for 45 seconds (the directions recommend a minimum of 30 seconds). Usually it seems to work fine, but occasionally the tube explodes and splatters mud and buffer all over the inside of the lysis chamber.

The exploding tube problem appears be caused by grit preventing the threads from closing correctly. In other words, it was my fault. Be extra careful to get the dirt actually inside the tube. Here's what it's supposed to look like.

After processing, the samples are noticeably warm. If you are going to process for much longer than 45 seconds, I suggest you stop and let the sample cool for a few minutes before continuing.

Here are the yields I measured for the mini-kit preps (minus the tube that exploded), eluted into 100 μL of buffer.

 Source Yield Potted plant 80.6 μg/mL River muck 0.669 μg/mL River muck 1.13 μg/mL River muck 0.595 μg/mL
I messed up the extraction protocol a little bit (and I used too much elution buffer at the end), but still got enough DNA to work with. Not too shabby for a first try.

I decided I had to throw these samples and DNA away because I don't actually have permission to use samples collected on UC Davis's campus. That's also why I'm not showing a gel.

## How to sequence 10,000 metagenomes with a 3D printer

Posted by Russell on September 19, 2011 at 1:15 a.m.
For my thesis project, one of the things I would like to do is sequence many different samples, perhaps on the order of several hundred or thousand. It's easy enough to build sequencing libraries these days, at least, with Illumina, anyway. Obviously, doing a couple of hundred lanes of Illumina sequencing would be ridiculous (not even Jonathan Eisen is that nice to his graduate students), and so I'll be using several barcoded samples pooled into each lane. The barcoding chemistry itself was fairly tedious, until people starting doing transposon-based library construction.

A transposon is a little piece of DNA that copies itself around inside the genome of an organism, via an enzyme called transposase. Here's what the genetic element looks like :

Transposase binds the element at the inverted repeats on either end, and coils it into a loop. Then it cuts the DNA at the inverted repeats, and the complex floats away. It leaves complementary overhanging ends in the chromosome, which are usually repaired by DNA polymerase and DNA ligase (DNA gets broken surprisingly frequently in the normal workaday life of a cell; that's why DNA repair mechanisms are so important). When it's complexed to DNA, transposase grabs the DNA like this :

The transposase we're using (Tn5) is a homodimer; the two subunits are in dark and light blue. The inverted repeats (red) are bound to the complex at the interfaces between the subunits. The pink loop is the DNA that gets cut and pasted.

The complex then floats around in the cell until the transposase recognizes an integration site somewhere else in the genome. It then cleaves the DNA and inserts the payload into the break. DNA ligase then comes along and fixes the backbones. You can see why this kind of transposon is also called a cut-and-paste transposon.

The reason these are interesting for library construction is that you can prepare a transposon complex where the loop of payload DNA is broken. When the transposon integrates, it pastes in a gap. If you add a lot of transposons that aren't too choosy about their binding sits, they will chop up your target DNA. Fragmentation is one of the steps needed for sequencing library construction. What's nice about transposons is that when you use them to chop up your target DNA, they leave the two halves of their payload stuck onto the ends.

If you stuck your sequencing adapters on there, the fragmentation process also includes adapter ligation. If you added barcodes along with the sequencing adapters, the reaction combines almost all of the library construction into a single digest. Epicentre whimsically named this process "tagmentation." Get it?

However, there's still a fly in this ointment. The distribution of transposon insertions is a function of the relative concentrations of charged transposon complexes to target DNA, and DNA extraction, even from seemingly identical samples, can have highly variable yields. So, it's very important to control the input concentrations and reaction volumes during the digest. This is fairly easy if you're only making a dozen or so libraries, but what if you want to make ten thousand of them?

Measuring DNA concentrations of lots of samples is relatively easy, and there are lots of ways of doing it. We have a plate reader that can do this by florescence on titer plates with 1534 wells, or we could (ab)use the qPCR machine to give us DNA concentrations on 384 well titer plates. There are other ways, too.

However you quantify the DNA concentrations, you have to dilute each sample to the desired concentration before you can start the tagmentation process. If you get the concentrations wrong, the library comes out funny.

A few dozen library constructions calls for hours of tedious work at the bench. I've gotten better at wetlab stuff since my first rotation, and the transposon-based library construction helps a lot, but staking my Ph.D. on reliably powering through lots of molecular biology would be a bad idea. Some people might not blink an eye at this, but as soon as I find myself repeating something four or five times, my computer science upbringing starts whispering there has got to be a better way in my ear. And lo, there is indeed a better way.

Hundreds or thousands of library constructions would call for a robotic liquid handling machine. I spent some time researching these things, and I'm not impressed. The hardware is nice, but programming the protocols involves wading into a morass of crumbling, poorly maintained closed source software, expensive vendor support contracts, and a lot of debugging and down-time. Oh, and they're terrifyingly expensive, and can be kind of dangerous.

Dispensing water into titer plates doesn't seem like a very challenging robotics application, so I thought about building my own robot. It would probably be about the same amount of work as ordering, programming and debugging one of the commercial robots, and it would be more fun.

But, robots are just such a mainframe-ish solution. If there is one thing my dad taught me, it's that a lot of little machines working in concert will beat the stuffing out of a single big machine. The trick is figuring out how to organize and coordinate lots of little machines. The key to this problem is to do lots and lots of little reactions in parallel; the coordination requires lots of precise dilutions simultaneously. Getting this part right would crack the whole thing wide open, allowing you to easily do more reactions than you probably even want.

So. I'm going to make my own custom microtiter plates, just for the dilution. This satisfies the coordination criteria, and allows me to treat a plate-load of reactions identically. If each well has the right volume for the dilution, I can just fill all the wells up to the top, pipette in the same volume of raw DNA with a multichannel pipetter, let the DNA mix a little, and all the wells will be at equal concentration. Then I pipette that into the tagmentation reaction, and I'm done. With a good multichannel pipetter, I can do 384 reactions about as easily as I could do one.

All that's necessary is a 3D printer, and the ability to procedurally generate CAD/CAM files from the measured DNA concentrations. As it happens, this is really easy, thanks to a little Python library called SolidPython :

These are the wells of a 96-well plate with randomly chosen volumes for reach well.

One of the things I'm worried about is contamination. 3D printers are not really designed for making sterile parts. So, what I've done here is design a mold, and I'm going to cast the plate itself in PDMS silicone elastomer. PDMS is easy to cast, and it has the nice property of being extremely durable once it's set. And, even better, when exposed to UV, the surface depolymerizes and turns into, essentially, ordinary glass. I can autoclave the heck out if it, blast it with UV, and indulge in all manner of molecular paranoia.

If I can figure out a way to reliably sterilize thermoplastic, I'll skip the business with the PDMS casting, and simply print microtiter plates directly, like this :

By the way, I used the dimensions of a Corning round-bottom 96 well microplate. You can download the model from my account on Thingiverse.

So, I ordered a personal 3D printer. It looks like the hottest Open Source personal 3D printer right now, and the only one with a build volume larger than a titer plate, is the Ultimaker. I'd have really liked to have gone with MakerBot Industries' Thing-o-Matic, but the build volume is just a scoche too small. Come on, guys! Just a few more millimeters? Please?

Unfortunately, the Ultimaker has a four to six week lead time, so I have to wait for a while before ours arrives. At the suggestion of Ian Holmes, I headed off to Noisebridge, a hackerspace in the San Francisco's Mission District where they have a couple of 3D printers available for people to use. The machines are Cupcake CNC's, MakerBot's first kit. The ones at Noisebridge are... well, let's just say they are well-loved. The one I used had to be re-calibrated before it would go. 3D printers are pretty straightforward machines when it comes down to it, so it only took me a couple of minutes of poking around at it to figure out how to make the right adjustments. Then, it worked like a charm!

As you can see, I was a bit conservative about the design, since I wasn't sure how good the print quality would be (especially after my cack-handed ministrations).

I'm experimenting with PDMS casting now, but I'm going try some tests to see how thoroughly I can clean thermoplastic with UV. I'd really like to just order up a nice 384 well plate, and get right to it!

Anyway, I need to thank (or perhaps blame) Aaron Darling for getting me interested in transposon-based library construction, and for pointing out their significance to me.

## New Equipment Thursday

Posted by Russell on August 25, 2011 at 7:40 p.m.
My vacuum desiccator arrived today, and so naturally I put it to productive use. You know. For science.

Haw! This thing is cool.

## Sneak Pique

Posted by Russell on July 13, 2011 at 3:41 a.m.
I'm about to release a new piece of Open Source software; it's a fast, fully automated and very accurate analysis package for doing ChIP-seq with bacteria and archaea. I'm doing my best to avoid the sundry annoyances of of bioinformatics software; it uses standard, widely used file formats, it generates error messages that might actually help the user figure out what's wrong, and I've designed the internals to be easily hackable.

I have two problems, though.

Most of the people who will be interested in using this software are microbiologists and systems biologists, not computer people. At the moment, the software is a python package that depends on scipy, numpy and pysam. I used setuptools, wrote tests with nose, and hosted it on github. If I were going to distribute it to experienced Linux users, it's basically done. However, installing these dependencies on Windows and MacOS is a showstopper for most people -- scipy in particular. So, how would you suggest distributing it to MacOS and Windows users?

The second problem is... it's kind of ugly. I wrote the GUI in Tk, which is not particularly great in the looks department. Should I bother creating native tooklkit GUIs? Would that make people significantly more comfortable? Or would it be a waste of time?

## PLoS TWO : A new open access journal

Posted by Russell on June 14, 2011 at 9 p.m.
People surely suffer worse injustices than the denial of instantaneous access to the latest scientific research. Nevertheless, this particular habit lacks a feature common to most unjust practices; that one party benefits at another's expense. The restriction of access to basic research hurts absolutely everyone. It's throwing sand in the gears of the engine of progress. We can debate about whether it is more apt to call it a spoonful or a truckload, but the damage is clear.

One might suppose that publishers benefit from the system of closed-access journals, but this cynical view exaggerates the importance of the relatively paltry sums of cash involved and disparages the value of scientific and technological progress. Publisher's cannot cure their cancers by wallowing in hoarded journal papers, but posting the hoard on the internet stands a fighting chance.

So, I'm exceedingly glad that Nature Publishing Group has elected to step into the ring in the corner of this fighting chance by launching Scientific Reports. Many have noted that in many important respects, Scientific Reports is a clone of PLoS ONE.

It has not escaped our notice that as recently as 2008, Nature's Declan Butler was sneering at PLoS for the practice of "bulk, cheap publishing of lower quality papers." I'm sure the powers that be at Nature were expecting quite a few wry smiles at news of the launch.

Nevertheless, it's one thing to do the right thing when you're filled with righteous light. It's another thing altogether when it makes you look a bit silly. I'm as guilty as anyone for poking fun at Nature; as soon as I heard the news back in January, I gleefully registered PLoSTWO.org, and have spent the last couple of months plotting elaborate satire.

Now that they've launched, I must admit that the inaugural papers look pretty interesting. You'd better get your jokes in while they're still funny. This is not a case of Nature verses PLoS. In this fight, it's Nature and PLoS in one corner of the ring, and ignorance on the other. It is good news for everyone that Nature has learned from its comrade-in-arms how to throw a better uppercut.

Update July 1, 2001 : While they evidently appreciated the humor, PLoS regretfully asked me to take down PLoSTWO.org. Instead of just letting it go dark, I'm going to (voluntarily!) transfer it over to them once they think of something to do with it. For now, it'll point at the PLoS ONE About page.

## Questions of microbial ecology

Posted by Russell on April 27, 2011 at 10:50 p.m.
When the first environmental sequencing projects were conducted, the genetic bredth present within an environmental sample so far outstripped the available sequencing capacity at the time that it was only possible to obtain a tiny slice of the genetic material present. This gave researchers two choices; either target a particular gene, or go fishing. Both approaches have been extremely fruitful. Targeted studies of ribosomal RNA led to the discovery of the archaea, among other important accomplishments. The "fishing" approach (which has a shorter history) has also led to exciting discoveries. If you do a literature search for your favorite enzyme with the word "novel," it's quite likely that most of the recent publications will involve some kind of metagenomic survey.

As the cost of sequencing continues to plummet, a third approach to environmental sequencing has suddenly become possible: Exhaustive sequencing. It should be possible not only to survey the entire genomes of the organisms present (although assembling them is another story), but also to survey the population-level variability of the organisms present. This is a rather unprecedented development. Microbial communities have suddenly gone from the most challenging ecologies, with only a handful of observable characters, to a spectacularly detailed quantitative picture.

Here is an example from one of my datasets :

This is a small region in the genome of Roseiflexus castenholzii. I have mapped reads from an environmental sample to the reference genome, yielding an average coverage of about 190x. If you look closely at the column in the middle (position 12519 in the genome, in case you care), we see some clear evidence of a single nucleotide polymorphism in this population of this organism.

As it happens, this coordinate falls in what appears to be an intergenic region, between a phospholipid/glycerol acyltransferase gene on the forward strand to the left and a glycosyl transferase gene one the reverse strand to the right. The two versions appear with roughly equal frequency in the data. For this organism, I've found single nucleotide polymorphisms at thousands of sites. There are also insertions and deletions, and probably rearrangements.

In this ecosystem, I'm able to get between 50x and 300x coverage for almost every taxon present. This should make it possible to see variants that make up only a percent or two of their respective taxon's population. With data like this, it should be possible to do some really beautiful ecology!

For example, suppose one wanted to see if a community obeys the island biogeography model. One could measure the theory's three parameters, immigration, emigration and extinction, by comparing the arrivals and disappearances of variants between the "mainland" and the "island" over time. The ability to examine variants within taxa should make these measurements very sensitive. Additionally, because these are genomic characters, it should be possible to control for the effects of selection (to some extent) by leveraging our knowledge of their genomic context. The 12519th nucleotide of the R. castenholzii genome is perhaps a good example of a character that is unlikely to be under selection because it happens to sit downstream from both flanking genes.1

So, here is my question to you : What ecological model or process would you be most excited to see studied in this way?

1 Well, actually I haven't looked at this site in detail, so I'm not sure if one would or wouldn't reasonably expect it to be under selection. My hunch is that it is less likely to be under stringent selection than most other sites. I'm basing this hunch on eyeballing the distance of this locus from where I think RNA polymerase would be ejected on either side, and that both transcripts terminate into its neighborhood. My point is that it should be possible to have some idea of how selection might operate on a particular locus based on its genomic context. One should take this with the usual grain of salt that accompanies inferences drawn solely from models. A better example would be a polymorphism among synonymous codons, but I wasn't able to find one in a hurry.

## Bioengineering side project

Posted by Russell on February 22, 2011 at 5:10 p.m.
I've been working on a little bioengineering side project, and I just finished putting together a working version of the firmware. It'll probably take some refinement, but I've managed to get the microcontroller to do what I need it to do -- measure visible light irradiance over a wide range of intensities.

This is the light intensity in microwatts per square centimeter measured at about a 0.3 second resolution. I haven't done any of the actual bio- part of the bioengineering, so for the moment the light curve is the beginnings of a sunset at Mishka's cafe.

## A sequencer of our own

Posted by Russell on January 27, 2011 at 4:12 p.m.
We just finished running our new GS Jr. gene sequencer for the first time. It produced 115,698 shotgun reads of our E. coli. Here is the read length histogram :

And the GC content histogram :

This was our first time going through the shotgun library protocol, which is pretty involved. For example, we're going to have to be more careful next time when we load the picotiter plate. We got a few bubbles trapped in there. It's kind of funny how obvious the bubbles are in the raw florescence images (this is an A, around cycle 200) :

I've uploaded the FASTA file and the qual file, in case you want to try to assemble your own E. coli genome.

## Needed : Django hacker

Posted by Russell on December 23, 2010 at 6:41 p.m.
So, back in 2008, I ported Vort.org fro Rails to Django. It was about one evening of hacking, and since then I haven't touched the code, nor have I upgraded to Django 1.0. All I've done is patch security holes. Since then, a whole new comment framework was introduced, the one I'm using has been obsoleted, and a whole bunch of other changes have happened.

I've been much too busy to maintain or upgrade the code, and it wasn't very good to begin with. So, I'd like to hire someone who actually has some professional experience with Django to institute a do-over. I can handle importing all my articles and comments myself, even if the DB model is a little different. If you can do some nice graphic design, I'll happily pay extra for that.

As it happens, our lab is looking to hire a full-time web developer for a very awesome new project. If I'm impressed with the way you handle this simple little project, you can consider it a job interview for the full time position in our lab. And honestly, how often do you get paid for a job interview?

Email me your resume and a portfolio if you are interested.

## Engadget thinks everything is an iPod dock

Posted by Russell on December 23, 2010 at 5:43 p.m.
Oh Engadget, how trite art thou? Leave it to them to get excited about the iPod dock on a machine that will be the least expensive and fastest DNA sequencer yet built. The iPod dock is one of several options the Ion Torrent machine has for attaching external mass storage. Why is it there? Because if you're going to pay fifty grand for a sequencer, it's just good business to spring for the two dollars worth of parts if it lets potential customers use whatever mass storage options they have available.

Harf. Leave it to Engadget to write an article that is actually less interesting than the press release.

## Fun with de Bruijn graphs

Posted by Russell on October 29, 2010 at 4:34 a.m.
One of the projects I'm working on right now involves searching a better approaches to assembling short read data metagenomic data. Many of the popular short read assembly algorithms rely on a mathematical object called a de Bruijn graph. I wanted to play around with these things without having to rummage around in the guts of a real assembler. Real assemblers have to be designed with speed and memory conservation in mind -- or, at least they ought to be. So, I decided to write my own. My implementation is written in pure Python, so it's probably not going to win any points for speed (I may add some optimization later). However, it is pretty useful if all you want to tinker around with de Bruijn graphs.

Anyway, here is the de Bruijn graph for the sequence gggctagcgtttaagttcga projected into 4-mer space :

This is the de Bruijn graph in 32-mer space for a longer sequence (it happens to be a 16S rRNA sequence for a newly discovered, soon-to-be-announced species of Archaea).

It looks like a big scribble because it's folded up to fit into the viewing box. Topologically, it's actually just two long strands; one for the forward sequence, and one for its reverse compliment. There are only four termini, and if you follow them around the scribble, you won't find any branching.

## raygun : a simple NCBI BLAST wrapper for Python

Posted by Russell on October 25, 2010 at 11:42 a.m.
Things have been a little quiet on Vort.org for the last couple of weeks, but a lot of frantic activity has been going on behind the peaceful lack blog updates. When I returned from Kamchatka, Jonathan had a little present for me -- he took the DNA from the 2005 Uzon field season for Arkashin and Zavarzin hotsprings, and ran a whole lane of paired-end sequencing on one of our Illumina machines. Charlie made some really beautiful libraries, and the data is really, really excellent. For the last couple of weeks, I've been trying to do justice to it.

I'm not quite ready to talk about what I've been finding, but I thought I would take a moment to share a little tool I wrote along the way. It's made my life a lot easier, and maybe other people could get some use out of it.

It's called raygun, a very simple Python interface for running local NCBI BLAST queries. You initialize a RayGun object with a FASTA file containing your target sequences, and then you can query it with strings or other FASTA files. It parses the BLAST output into a list of dictionary objects, so that you can get right to work.

It doesn't take a lot of scripting chops to do this without an interface, of course, and there are other Python tools for running BLAST queries. The advantage of raygun over either the DIY approach or the BioPython approach is that raygun is extremely simple to use. I wanted something that would basically be point-and-shoot :

import raygun
import cleverness

rg = raygun.RayGun( 'ZOMG_DNA_OMG_OMG.fa' )

hits = rg.blastfile( 'very_clever_query.fa' )

results = []
for hit in hits :
results.append( cleverness.good_idea( hit[ 'subject' ] ) )

cleverness.output_phd_thesis( results )

Unfortunately, you must furnish your own implementation of the cleverness module.

I designed raygun is with interactive use in mind, particularly with ipython (by the way, if you do a lot of work in python and you're not using ipython, you're being silly). The code is available on github.

## My talk at the Thermophiles Workshop

Posted by Russell on August 25, 2010 at 2:34 p.m.
When I registered for the workshop, the organizers asked me to give a 20 minute talk on my topic of research. I only just finished my coursework in May, so I didn't have a great deal of work to present (at least not in microbiology...).

So, I submitted an abstract titled, "Classification of environmental sequence data using multiple sources of inference." This project is a collaboration with Andrey Kislyuk, who has just graduated from Georgia Tech, supervised by Joshua Weitz. It's a pretty cool project, but Andrey has just graduated and moved on to Pacific Biosciences, so things haven't moved as quickly as I would have liked.

After the first day of talks, I started to get pretty nervous; I thought I would have some downtime during the field expedition to work on my slides. Downtime when Frank Robb, Albert Colman and Anna Perevalova are around? Ha! If I'd met them before walking off the airplane in Petropavlovsk, I would have known how ridiculous an idea that was.

To make matters worse, the organizers had to shift the schedule forward by a day because weather delayed the excursion to Uzon (which I was not planning to join, since I'd just spent a week there). Thus, I found myself in the position of giving and unfinished talk about an unfinished project. Worse, I was going to stand up and talk about probability theory and Bayesian priors to a roomfull of people who ride submarines into underwater volcanoes and discover whole new branches of Earthly life. Worse still, I had to follow Frank Robb's talk about isolating and sequencing organisms that grow on syngas, which he had to cut short because there was just too much awesome for one talk to hold.

To my surprise, I manged to finish the slides during lunch and the coffee break. Also to my surprise, I got a lot of really great questions, and lots of people seemed weirdly excited about the idea of using more than one mathematical technique for sifting through metagenomic data.

I've recently started working on one such analysis (a different project altogether), and I'm gaining an appreciation for just how difficult it is. Perhaps the interest in my talk has more to do with the fact that people in the field really, really want better tools, and there's a lot of enthusiasm for anything that looks halfway promising.

Also, I have to give a big thumbs up to the Russians (and other folks) who gave their talks in English. I once had to give a brief talk on physics in Japanese, and it was one of the most difficult, stressful experiences of my life. It was only five minutes, and I was aided by the fact that Japanese borrows many technical and scientific terms from English. It's not really fair that English is the de facto international language, but I'm really, really glad it is.

## Thermophiles workshop overview

Posted by Russell on August 23, 2010 at 12:27 p.m.
After landing at the airport, we crammed our equipment and ourselves into a taxi-van, and returned to the little apartment we stayed in before leaving for Uzon. There's a washing machine there, so we ran as many loads of laundry as possible.

Back at the apartment in Petropavlovsk, we tried (and mostly failed) to get the smell of hydrogen sulphide off of us.

Then next day, we piled into another taxi-van and rode to the Flamingo Hotel, where the workshop will start tomorrow.

Update : Below is a summary of my favorite talks at the workshop that I wrote on the flight back to California.

There have been a number of really exciting talks here at the workshop, and I can't summarize all of them. So, here are a few talks that have kept me thinking.

### Sergey Varfolomeev : The youngest natural oil on Earth

Carbon-14 dating indicates that Uzon contains petroleum-like oil that is less than 50 years old. Very similar compounds were obtained by low-temperature pyrolysis of cyanobacteria and microalgae isolated in the vicinity to the hydrocarbon sample sites.

### Albert Colman : Chemistry and geobiology of life in hot carbon monoxide

One of the key events in the establishment of our existing ecology was the development of an oxygen rich atmosphere. This process occurred in several stages, and one of the key stages marked the end of the Archean eon. Archean ecosystems are thought to have included oxygen-producing organisms, but during the Archean eon there were enough free reducing compounds in the atmosphere, ocean and soil to consume all the oxygen they produced. The Archean eon ended when these chemical oxygen sinks were finally overwhelmed, and oxygen started to build up in the atmosphere. In order to understand how and why we have an oxygen-rich atmosphere, it is important to understand how the Earth's atmosphere worked during this period.

Albert and his group are studying the role of carbon monoxide in the Archean atmosphere. There are a variety of organisms that exist today (particularly in volcanic environments like Uzon) that grow on carbon monoxide, and for this reason, the biosphere is usually treated as a sink for carbon monoxide. However, there are also organisms that produce carbon monoxide as a waste product, and so the coupling of atmospheric carbon monoxide to the biosphere in Archean climate models needs to treat the biosphere as a source and a sink to properly capture the dynamics.

I find all of this to be fascinating. It's very important that we get a handle on this stuff; mankind has been conducing a huge, uncontrolled experiment with the Earth's atmosphere since around 1820. Learning about other such "experiments" in Earth's history (in Archean, by microbes rather than humans) is pretty important.

### Evengy Nikolaev : Mass spectrometry

I had no idea there were so many kinds of mass spectrometers! I guess that's what I get for my background in theoretical physics. My inclination is to write

$f_c=\frac{Bq}{2\pi m}$

and call it a day. Mass spectrometry, to me at least, has always meant this :

Schematic of a basic mass spectrometer.

If you stick some ions in a constant magnetic field, their orbital frequencies will depend only on their mass and charge. So, you just aim your beam of ions through a magnet, and all your ions will segregate out like colors in a rainbow. Done. High school physics, right? Wrong!

Evengy's talk was like looking up a recipe for pancakes and discovering that there are breakfast, lunch, tea, and dinner pancakes; that they can be made from fifty different grains and pulses; and that there are pancake recipes suitable for every occasion ranging from a quick bite while driving to work in the morning to the main course of a king's coronation. That's a lot of mass spectrometry!

### Juergen Wiegel : Interspecies heterogeneity and biogeography of Thermoanaerobacter uzonensis

I'm really interested in biogeography generally, and so I was waiting for this talk. The Baas-Becking hypothesis that "everything is everywhere, but the environment selects" has been one of the key ideas in microbiology. As gene sequencing has gotten more powerful, it has been possible to test this hypothesis with increasing confidence. Juergen presented some findings that take another step toward disproving hypothesis and establishing the importance of locality in evolution.

Basically, his group at the University of Georgia obtained 16s small subunit rRNA sequences from Thermoanaerobacter uzonensis isolates collected in different spots in Kamchatka. The collection sites ranged from a few meters apart to about 300 kilometers. It was found that divergence among the sequences correlated positively with geographic distance.

The environment does indeed select, but the Baas-Becking hypothesis only holds for fuzzy definitions of "everything" and "everywhere."

### Anna Perevalova : Novel thermophilic archaea of order Fervidicoccales - diversity, distribution and metabolism

I had been bugging Anna during the field expedition to tell me more about Fervidococcus fontis, which she discovered. F. fontis grows between 55C and 85C, which is an unusually wide range. The genome has recently been sequenced, and she presented some of the preliminary results from the annotation.

I still find it mysterious how one sets out to find new species (in this case, a new genus). Anna works in Elizaveta Bonch-Osmolovskaya's lab at the Winogradsky Institute of Microbiology, where they used a technique I'd never heard of called Denaturing Gradient Gel Electrophoresis and a myriad of selective media cultures to coax this organism out of the woodwork. Pretty hard-core, if you ask me.

### Sergey Gavrilov : Electrochemical potential and microbial community composition of bioelectrochemical systems employed in situ in hotsprings of Uzon Caldera

This is a pretty awesome idea. Microbial fuel cells exploit the fact that cellular metabolism requires the transport of electrons outside the cell to deposit on acceptor substances, and couple this process to an electrical circuit. Sergey discussed a modification of this idea called sediment microbial fuel cell; instead of growing his microbes in the lab, he carried his cathode and anode out into the field and stuck them into a sedimentary formation in the environment.

The awesome part of this study is that Sergey isn't just looking for high power output. He's using the fuel cell to select for current-producing organisms from a diverse community, and then studying those organisms. After letting his circuits run for ten days, he found biofilms growing on the electrodes that had very different community structure from the controls (same setup, but with an open circuit). It's basically an enrichment culture that enriches for microbes that like to make electricity.

### David Bernick : New discoveries in the hyperthermophilic genus Pyrobaculum enabled by deep RNA and genome sequencing

It's interesting to see how much fine structure can be found when an organism is sequenced deeply enough to capture it. David's team is using massive Illumina sequencing to do something like the Hubble Deep Field for an archaeal genome and its small RNA. They also sequenced a new member of the genus, P. oguniense, and discovered therein a new virus and a number of cool virus-related genomic features in the host.

### Frank Robb : Lessons learned from sequencing carboxydotrophic bacteria and the race to discover hyperthermophilic cellulases

Frank was the only person at the workshop to give two talks, and they were both pretty cool. The first talk summarized results presented in a paper amusingly titled ‘That which does not kill us only makes us stronger’: the role of carbon monoxide in thermophilic microbial consortia. This work covered a lot of ground, including some compelling evidence for archaea-to-bacteria lateral gene transfer of chaparonins, as well as a results showing rapid accumulation of frameshift mutations when C. hydrogenoformans is grown under syngas, allowing it to grow rapidly by fixing carbon monoxide from syngas. Syngas is also known as wood gas, a simple intermediate for converting a variety of biomass feedstocks into usable fuel. If one wanted to obtain pure hydrogen gas from syngas, an organism that can eat the carbon monoxide could be handy.

The second talk presented some really interesting work in which a consortium of one cultured and two novel archaea was isolated from a thermal spring in Nevada that was able to grow on filter paper at 90C. A cellulase capable of degrading crystalline cellulose into reducing sugars at 100C was isolated, and the genes responsible were cloned and expressed in E. coli.

This is also pretty exciting for the biofuels people. One of the problems with moderate-temperature cellulases is that it's impossible to keep a huge vat of wet, ground up plants sterile. As soon as cellulase activity starts putting simple sugars into solution, something will start to eat the sugars. However, if you conduct the process at pasteurization temperatures, then you just have to worry about contamination by hyperthermophiles. So, as long as you keep people like Frank Robb and Karl Stetter from dropping their used lab equipment into your processing vat, you should get a nice yield of sugars from the cellulose without having it all eaten up by pesky yeasts and suchlike.

## Uzon, Day Seven

Posted by Russell on August 22, 2010 at 8 a.m.
This post is for August 11th, 2010

Sarah and Albert managed to finish the DNA extractions last night, much to everyone's relief. Early that morning, we were visited by another bear, which we caught on video this time.

As we started packing up our gear, we got word that the helicopter would be arriving to pick us up around mid-morning, rather than mid afternoon as we had expected. A furious scramble to pack everything up began, and Frank set off into the field alone to retrieve enrichment cultures that we hadn't collected yesterday.

I tried my best to stay out of the way, since I had already mostly packed up the day before (no great accomplishment -- I didn't bring much in the first place). All of my samples were already safely stowed with Albert's. In retrospect, I should have gone with Frank to help him, but he vanished almost the instant we heard the thwak-thwak of the helicopter coming over the caldera wall.

Bo He carrying equipment to the helicopter flight from Uzon.

Just as we finished packing the helicopter, Frank came charging over the ridge from Central Thermal Field carrying all the enrichment samples he could find.

Packing the helicopter.

Amazingly, we only left a few things behind. A small digital camera, my toothbrush, and a few enrichment samples that sank too deep into one of the springs. Later on, Albert was able to retrieve the samples and the camera by joining the workshop excursion. He did not retrieve my toothbrush.

Karymsky volcano erupted again on our flight back. A most majestic farewell.

On our flight back, Karymsky volcano erupted again, again just as we flew past. It was a majestic farewell indeed.

I really, really regretted having to leave Uzon. It was a privilege and an honor to have gone, and to have gone with such company. In the next weeks and months I will have to work very hard; perhaps a big enough scientific payoff might justify a return trip. I certainly hope so!

## Uzon, Day Six

Posted by Russell on August 20, 2010 at 4:44 p.m.
This post is for August 11th, 2010

Russia is working hard to reign in the chaos that followed the end of the Soviet Union, and Kronotsky National Biosphere Park is no exception. Restrictions on hunting and fishing that were once widely ignored or impossible to implement are now being enforced. The rules are not exactly settled, but it is clear that the park administration is serious about protecting the wild state of the preserve. This is a Very Good Thing.

In 2005, Frank joined an expedition to Uzon led by Juergen Wiegel; this was before the research station was built, and so they flew in several large tents packed in crates. The crates could be unfolded to form a platform for the tents. When they broke camp, they left the crates behind. If the park administration is going to be serious about protecting the natural state of the caldera, Frank and Albert thought it would be a good idea to do our part too. So, we spent the morning breaking down the crates at the 2004 camp. We then hauled the disassembled crates to the research station (new since 2004), and arranged them in neat stacks. The rangers will find some use for the wood now that in easy reach, I'm sure.

When we arrived, the crates from the old camp were piled up in the middle of the camp. I'm not sure exactly how long the crates were splayed over the ground at the old site (they were designed to form a platform for the tents) before they were piled up there, but I find it interesting that the footprint of the old camp is still clearly visible. The plants are still in the process of recolonizing the space. There can be no more explicit evidence that Uzon's ecology is indeed fragile. The lush meadows I wrote about yesterday would probably take decades or centuries to form if they had to start over from scratch. I'm sorry I don't have any pictures; one cannot be both a good photographer and diligent manual labor at the same time.

Alex thinks he has pulled a fast one on me. Anna is not amused by any of this. Not even Frank's hat.

After lunch, Frank and I set out together to collect some samples from Burlyaschy and K4 Well.

Collecting a sample from Burlyaschy (Boiling Spring). It's about 90C where my feet are, and it's deeper than my ankles. It's a good thing I'm wearing thigh waders and three pairs of socks!

While Frank was working on his own samples, I waded a few meters into Burlyaschy Spring to fill a liter bottle with water. The water is about 90C there, and boiling vigorously only three or four meters beyond. I was wearing three layers of insulated gloves, and three pairs of socks under my waders, but the heat was almost unbearable. You really don't want to fall down in this thing!

Filtering a liter of water from Burlyaschy with a Sterivex filter and a 60ml syringe. The bottle was almost too hot to handle, even with insulated gloves. If there's anything alive in the planktonic community, it's definitely a hyperthermophile!

After (carefully) returning to what passes for dry land in the thermal field, I decanted the liter bottle into a 60ml syringe with a LuerLok fitting, and attached a Sterivex-HV 0.45 micron filter. I then forced the water through the filter, which started to block up after about 600ml. The last 300ml went through really, really slowly and with a lot of sweat and cursing. It took a 20 repetitions to finish off the bottle.

Decanting spring water collected from K4 Well into a 60ml syringe, to be forced through a Sterivex filter.

After that, we walked over to K4 Well to collect Frank's slides. Frank is planning to use them for electron microscopy, so he had to fix them before storing them, which took a long time. This gave me time to process two liters of water and steam spewing from the rupture on the K4 wellhead and shove them through two more Sterivex filters.

We walked back to the station, and I fixed my filters in ethanol and D-PBS buffer.

This was to be our last full day in Uzon, so I packed most of my things before going to bed. Albert and Sarah stayed up all night finishing the DNA extractions.

## Uzon, Day Five

Posted by Russell on August 20, 2010 at 3:36 p.m.
This post is for August 10th, 2010

The weather is absolutely beautiful today; sunny with a few puffy, fast-moving clouds, about 60F with gusts of cool wind.

After breakfast, Frank, Alex, Anna and I hiked to Orange Field. Most of the hike was over open country without trails; we had the GPS coordinates, but no route. We passed through a few stands of birch and pine. The prospect of encountering a bear in enclosed areas makes entering these clumps of trees an unattractive course of action, one could say. Encountering the occasional bear seems to be unavoidable in Uzon, so we stuck to open country and burned up some calories circling around the trees. The August sun could have made this torture back in Davis, but at almost 55 degrees north with patches of snow lurking in the shady spots of the caldera, it wasn't so bad.

A meadow abutting the caldera wall on the hike to Orange Field springs.

It's astonishing how much plant diversity there is here. What looks like fields from a distance are really dense mixtures of dozens (hundreds?) of species of plant, crowded together in a tangled riot. When I put my face near the ground, it looks a tropical rain forest, only ten inches high.

We are here to study microbes, but it's very difficult not to wonder about this hardy community of plants. How do they survive the winter? Why does one kind of plant cluster in one place and not another? For what do they compete, and how do they do it? Do any of them cooperate? How do the seeds disperse? What pollinates the flowers?

I am puzzled I that there seem to be so few pollinators in Uzon. I found a few insects that looked like bees, but I'm not familiar enough with entomology to rule out the possibility that they could be bee-like flies, or possibly wasps. In any event, there were not very many of them. The only insect I found visiting a flower today is a thing that looked like an earwig, but it was probably there because it took a wrong turn somewhere. The millions of flowers in Uzon seem to go mostly unvisited.

Panorama from the ridge overlooking Orange Field springs.

Anya was here in Uzon in 2005, except a few weeks later in the year. In her pictures from that expedition, the whole caldera looks like it's been set afire as the hardwood brush gets ready to drop its leaves.

The other thing that puzzles me was how few birds there are. The caldera is bursting with blueberries and mosquitoes, and yet I've seen only one swallow and heard not a single songbird. Meadows in California with a tenth the productivity (i.e., insects, fruit and seeds) are usually crammed with swallows, starlings (introduced, of course), jays, finches and songbirds. In Uzon, there are only a few white, long-winged birds with V-shaped tails that fly low and fast above the streams. They look a bit like a quarter-scale seagull, but re-engineered for speed and extreme distance-flying. They have bodies built like marathon runners, so I suppose Uzon must be a quick stop on a long journey for them. I've only seen two or three of these on a given day so far.

The lack of birds, especially songbirds, and the lack of pollinators are probably related. The winter in Kamchatka is too harsh for most birds to overwinter, so most birds found here would be migratory. Insects are ideal diet for a long-distance migratory birds, and they need lots of them to build up enough fat reserves for their world-crossing journeys. Maybe our timing is off, and we've just missed the migratory birds, or maybe they will arrive later when the blueberries are riper, or when their favorite species of insect reaches its crescendo. Or, perhaps the birds that used to come here are gone, their migratory route destroyed by a parking lot in a faraway place.

A carnivorous plant waits for the arrival of small, unlucky insects on the bank of Orange Field springs.

Karnosky National Biosphere Preserve is for Russia what Yellowstone, Yosemite and the Grand Canyon are to America. It is perhaps the single most beloved natural site in this vast country, and the people who have studied and explored it are heroes in Russia (they should be heroes worldwide). Tatiana Ustinova, who discovered the Valley of the Geysers, could be the John Muir of Russia. I'm sure that someone has studied the songbirds in Uzon, or lack thereof, just as the songbirds of Yosemite have been meticulously studied. However, Ustinova only discovered the Valley of the Geysers in 1941, whereas Yosemite and Hetch Hetchy were well known to the world more than a hundred years prior.

The problem, I think, is the disconnect of the scientific literature among countries. Before I left for Kamchatka, I looked for books like John Muir Laws' beautifully illustrated field guide to the plants, fungi and animals of the Sierras, but I could not find anything. My questions have probably been asked and answered, but only in Russian, and probably only in journals far from the beaten path.

I hope that this will change.

Debris left over from Karpov's old house (I think), which was heated by geothermal power. I'm holding the auger used to drill the well.

At Orange Field, Alex and Anna collected several samples for their colleagues at the Russian Academy of Science. We spent about two hours roaming around and waiving Anna's GPS at the sky, trying to pinpoint which spring was which. This is an uncertain proposition in a place like Uzon, which is subject to the vicissitudes snow, snowmelt erosion, the dynamic processes of volcanism, and curious bears that like to dig holes.

Team Russia, for the win!

When we got back, we ran the generator for a while so Sarah could do her DNA extractions. I used the opportunity to work on metagenomic analysis for Arkashin and Zavarzin a bit, organize photos, assemble some panoramas, and edit the last couple of days of blog entries. I also got in some really excellent procrastination on finishing my talk. I squished one hundred and sixteen mosquitoes and three biting black flies.

Anna made Borscht for us again, and it was, if anything, even more delicious than her previous Borscht. Same ingredients, same pot, same stove, same sour cream. I am puzzled, but that seems to be my lot in life.

## Uzon, Day Four

Posted by Russell on August 19, 2010 at 4:16 a.m.
This post is for August 9th, 2010

It was very cold, wet and windy this morning, and I had a rough time getting started. A double shot of espresso helped, but it took a brisk hike to Burlyaschy to collect my first samples of the expedition to actually wake up.

Collecting sediment samples from the outflow of Burlyaschy (Boiling Spring). This is my first field sample since starting grad school. Neat!

Frank and I thought it might be interesting to try to sample from the center of the spring near the heat source, so we tied a 50ml tube and a rock to a rope and dragged it across the bottom of the pool a few times. It didn't work, unfortunately, so we're going to try using a long tube and a hand pump tomorrow.

Our improvised sampling gadget. It didn't work unfortunately. The bottom of Burlyaschy evidently doesn't have any sediment.

Our efforts were interrupted by a bear, a real one this time, that wandered out from behind a hummock about fifty feet away. We dropped what we were doing and circled to the bank of Burlyaschy opposite the bear. In principle, we could sidle in one direction or the other to keep the spring between us and the bear. A bear can easily outrun a human in a straight line, but on a turn, particularly in boiling mud, we have a better chance. If it tried to cross the pool, we would quickly end up with a few thousand gallons of bear-and-microbe soup.

A bear interrupted our work at Burlyaschy.

Happily, the bear showed little interest in us, and wandered off. It doesn't make for great photography, but I've decided that the preferred view of a bear is the posterior as it walks away.

Fortunately, he showed very little interest in us.

We returned to the station without incident and had some lunch. Alex and Anna went off to collect some samples for their colleagues in Moscow, while Frank and Bo packed up his computer, a huge APC power supply, and his scanning voltometry apparatus lumbered off to Red White and Green. Right now, Sarah and I are upstairs with the Russian expedition to use the lab bench for DNA extractions.

Sarah working on DNA extractions.

Later on, the weather cleared up to reveal an extrodinary afternoon. I was persuaded to go to so-called Bath Pool with the Russians. I'm not sure if I am any cleaner as a result, but the experience was... interesting.

Anna and Alex returning from Central Thermal Field.

## Uzon, Day Three

Posted by Russell on August 19, 2010 at 4:05 a.m.
This post is for August 8th, 2010.

Anna at Central Thermal Field. As the ranking Russian in our group, she is our chief scientist for this field expedition.

We awoke to heavy fog and rain this morning, and it was very cold. I went with Alex, Anna and Frank on a long hike to a group of petroleum-bearing springs. Along the way, we stopped at Boiling Spring (Burlyaschy in Russian), which really is boiling. We measured 96C near the edge, and it's about the size of a backyard swimming pool!

Boiling Spring.

Frank suggested on the walk back a few hours later that Boiling Spring might be an interesting metagenomic target; it's surrounded by extremely acidic formations -- we measured pH of 0.8 at one of them -- and yet Boiling Spring itself is at pH 7. It's likely to be relatively isolated from the surrounding environments. Because Uzon is much nearer to sea level than Yellowstone (650 meters, according to my phone), it's actually possible to find water at nearly 100C at the surface here. This suggests that it could be a good place to look for high temperature chemoautotrophs. Boiling Spring is also nearby an area known to be rich in petroleum sediments, so there could be high-temperature hydrocarbon utilizers too.

A petroileum-rich spring.

We then proceeded on to what Frank calls "the oil fields," where Alex, Frank and Anna took some more samples. There is a talk scheduled later at the Thermophiles Workshop by S.D. Varfolomeev called "The youngest oil on earth (Uzon, Kamchatka)," presenting evidence that there is petroleum at Uzon that is less than 50 years old!

Given the name "Oil Fields," I was expecting it to resemble La Brea Tar Pits in Los Angeles. I spent a lot of time at the Page Museum when I was young, so many of my formative experiences involved mammoths and smiladons and lakes of bubbling tar. I caught a few whiffs of that smell, but it was mostly the usual hotspring rotten-eggs.

We passed the ranger station on the way back, around three o'clock in the afternoon.

Around three o'clock, the rain finally let up enough for me to crawl out of my cheap yellow poncho. We ate a little bread and cheese we brought with us (and a chocolate bar, of course), and started hiking back toward the station. Along the way, we stopped to check on Frank's slides at K4 Well and then back to Red White and Green. Frank and Alex left some enrichment cultures to incubate at Red White and Green and another nearby spring with a very high temperature.

Alex and Anna wanted to keep working in the area, and so Frank and I hiked back to the station.

There was a tetrahedron of milk we opened for breakfast coffee, so I used it up to make an onion, garlic and dill fritatta for the two of us, and we talked some more about what might be living in the outflow from Boiling spring.

Alex and Anna eventually got back, and Anna made some scrambled eggs, and the ranger (Evgenij) joined us for lunch.

We spent the afternoon struggling to charge UPS for Bo's scanning voltometry gear. Balky generators and rain make a poor mix.

While that was going on, Bo, Albert and Sarah went to Burlyaschy (Boiling Spring). Albert spotted a mother bear with a cub nearby and moving toward them, so he readied one of our flare torches to scare them away. Before igniting the torch, Albert tried shouting a bit, and took a few steps toward the bears. The bears suddenly revealed themselves to be bushes in the fog, rattling in the wind.

## Uzon, Day Two

Posted by Russell on August 16, 2010 at 3:41 p.m.
This post is for August 7th, 2010.

It was cold and cloudy today, which is actually a blessing. We have to walk around in thigh-high rubber boots and stand around boiling pots of sulfurous water, and the mosquitoes are murderous.

I think I've got the hang of making a decent espresso in the field, at least with this incredibly delicious water. I found an adapter in an outdoor store in Petropavlovsk that mates the valve socket for my camping stove to cheap cans of cooking gas available practically everywhere. Unfortunately, the cans are completely unstable with any sort of pot or pan sitting on the stove, so I braced the can with some bricks from an old cook fire.

Some espresso on a cold, wet morning.

I packed light, which means that tomorrow I'll be on my last pair of clean pants. Tomorrow I will have to do laundry.

This morning we visited Arkashin spring, which is the other sampling site for the metagenomic data I've been analyzing. It's loaded with realgar (arsenic sulphide), and so it's expected to be full of species that are resistant to the various forms of arsenic, arsenide, arsenate, and possibly arsenic respiring organisms. Alex and Sarah took some samples of the sediment.

Arkashin Spring, one of my metagenomic targets.

We also spent a lot of time looking at a nearby site called K4 well, which is the remains of an old exploratory well drilled sixteen meters into Central Thermal Field. As you can see, the steel has been pretty much destroyed by corrosive hydrogen sulphide gas. The interesting thing about K4 is that the outflow starts out as a mix of steam and boiling water at about 100C, and cools off to about 40C over a space of about three meters. As a result, the organisms that live in each temperature band between 100C and 40C are organized into stripes following the contours of the isotherms.

K4 Well, a possible site for investigating spacial organization of microbes.

Frank and Albert inserted some microscope slides into the flow (if you leave them there for a while, the microbial mat will incorporate the slide which you can then remove to study). I'm very interested in studying this sort of spacial organization, and so Frank gave me a slide to insert transecting three of these bands. Glass is a good conductor of heat, and so I'm not very confident that it will work.

On the way back to the station, we came across a very interesting pool that Albert thought would be a perfect for Bo to try out his electrochemical instruments. Bo didn't come with us for the morning trip because he was still polishing, plating and testing the electrodes for his setup. There obviously isn't any cell phone reception out here in Uzon, but my little Android phone still makes a really great field GPS. I marked the coordinates for the pool as "Red White and Green" (sadly, there was no blue).

For lunch, we had buckwheat with tomato sauce, green peas and tofu (the carnivores added their canned mystery meat). It tasted great, but the buckwheat didn't agree with me at all. I took some anti-acid tablets, and then passed out for an hour. I woke up a bit overwhelmed by the taste of buckwheat and hydrogen sulfide (for some reason, whenever I smell hydrogen sulphide, I seem to keep smelling and tasting it for a long time afterward). Bo still had to work on his electrodes for a while longer, and so I sat around with a cup of tea and waited for the afternoon trip.

Bo packed up his electrodes, data acquisition system, laptop and portable power supply into a huge backpack/duffel, and I guided everyone back to Red White and Green. The Android phone worked great as a field GPS.

Bo getting his first scanning voltammetry data.

This is some of Bo's data from the field. The peaks and dips represent changes in current detected passing from one electrode to the other (in the presence of a reference) as the voltage was swept from zero to -2V and back. The cathodes are made of gold wire plated with mercury film (sort of like old-fashioned dental fillings), and the anode is elemental platinum. Scanning voltammetry is also known as cyclic voltammetry; the trace on the bottom is the return signal when the voltage swept back to zero. In the order they appear in the scan, Bo's first guess as to the identity of each dissolved compound are as following; thiosulphate, hydrogen sulphide, iron sulphide, hydrogen peroxide, iron (or maybe manganese) (II)+, and on the return scan, acid volatile sulphide (AVS).

The software Bo uses to drive the probes is a little crusty, so I decided to help him out with a little Python/Matplotlib awesomeness.

One of Bo's voltammetry scans; the annotations are based on Bo and Albert's experience with the technique and their best judgment while in the field; this is not their "final" conclusion about the water chemistry. As the science goes, think of this as somewhere between raw ingredients and the finished product, like bowl of cake batter.

When we got back, I cooked dinner; pasta with corn, onions Lithuanian-style cheese and some Georgian spice mix. The carnivores added a mysterious can of meat with a picture of a cow on it.

I went outside this evening to send a twitter update with the Iridium phone, and I thought I was safe from the swarming mosquitoes in my bug suit and thick socks. When I say "swarm," I really mean it. As I stood on the boardwalk, it sounded exactly like a 2010 World Cup game, complete with vuvuzelas. I miscalculated badly, and I got twenty-nine bites on my feet -- through my hiking socks -- in the three minutes I was standing still. I didn't notice until my feet started burning, like the way your mouth burns when you eat a chili pepper. I ran inside and dunked my feet in a bucket of near-freezing stream water until the burning stopped. Then soap and more freezing water, topical astringent, and three antihistamine pills. I have a little bit of swelling, but hopefully not enough to stop me from getting out tomorrow.

To my delight, the entomologist staying here decided this was a great evening to take some samples of her own. She fired up the generator and put a huge flood light on the upstairs portico. Then she used sweep nets to capture bucketloads of mosquitoes, which she preserved in formaldehyde (or something of the sort). It warmed my heart to see that.

Before heading to bed, I figured out how to bathe with three liters of water. The pump is broken, and so if you want water, you have to lug it from the stream, and if you want hot water, you have to use the tea kettle.

## Uzon, Day One

Posted by Russell on August 15, 2010 at 5:01 p.m.
The day began with a breakfast of rehydrated oatmeal, instant coffee (my espresso had not yet been located amongst the luggage), and yogurt. Just as we were drifting off to unpack and organize, a juvenile bear showed up on our doorstep, and snuffled around a bit near our outhouse and the little bridge over the stream.

A bear visiting the research station on our first day in Uzon. That boardwalk is the path to our outhouse; this was shot with a short lens from the kitchen porch.

This is why we carry signal flares to the toilet, and only go in groups.

After watching the bear wander off to forage for blueberries (which are everywhere), I sat down on the little bridge and made myself a cup of espresso from the stream water. The Russian team upstairs tells me that when they've tested it, it came back almost as clean as the molecular-grade water they brought with them. It was the best damn espresso I've ever had.

Espresso.

Preparations for sampling and measurements proceeded in fits and starts through the morning as Albert, Frank and Anna hammered out a plan for each day of the expedition. While that was going on, Bo, Sarah and I continued unpacking and organizing the gear.

I attempted to shave at the stream, but this did not go as well as the espresso. Hot water is important for shaving, and I didn't make enough of it. The stream is only seven degrees Celsius, which I discovered is utterly unsuitable for shaving.

Shaving did not go so well.

Around ten o'clock, the ranger took us on a tour of the thermal fields. Frank and Albert have been here several times of course, but the fields are never quite the same year-to-year. In 2008, for example, a geyser popped up near the ranger station; Uzon is not known to have geysers.

Zavarzin, one of my metagenomic targets. Alex is measuring the temperature, and we worked around some enrichment cultures set up by another research team.

We stopped by Zavarzin Spring along the way, which was particularly interesting for me. For the last few months, I've been analyzing some metagenomic data taken from Zavarzin a few years ago as part of the Tree of Life project. Until today, Zavarzin was just a FASTA file containing about ten thousand Sanger reads, like so :

>ZAVAK94TR 6000 12000 9000 21 953 GTAGCTGTAAGGGCGGGGAGGGCTCACCTGGTCCCGGCCTTCGACGGCGGCCCCAATCCG GCCAGCGCCCAGGCCCTCACCGAAGTCGAAGCCTACGCCTTTTCCTGTTCCGATTTCCGG AAACTGATAGGGGAGTTCCCCCGGATTGCCGGCAATATCCTGGCCGATTTTGCCGCCAAA TTGCGCCTGCTGGTAGGGCTGGTGGAGGACCTCTCCTTCCGTACGGTGGAGGCGCGTCTG GCCCGTTTCCTCCTGAGCCGGGATGTGGCCGTGCCCGGCCGGCGCTGGACCCAGGAGGAG ATGGCCGCCCACCTGGGCACGGTGCGCGAGGTGGTCGGCCGAGTGCTTCGGGCCTGGCGT GAGGAGGGTCTGATTCGCCAGGAACGCGGCCGCATCGTCATCCTGGACCGGGCCGCGCTG GAGAAGAAGGCTCAAATCTGACATCATTCGTGCCAGGACGAGTTATGCAAAGATGTCAGG AAAAAGGACTTTTTGACAAAGAGAGGGGAATATGCTACATTGTCAGCCCCGGAGGGCCGG CCCGCATGGACCAACCGCATCCGGGTGACCCGAAAGGCAGAAACGTTCGGGCAGGCTGAT GATGGACACGTTCCGCGCCATCCGTCGGGTCCTCTGGATCACGATGGGGCTCAACCTTCT GGCTATGGCGGCCAAACTGGGCGTGGGCTACCTCACCGGCTCCCTCAGCCTGGTCGCCGA CGGCTTCGATTCGGCCTTTGACGGTGCCTCCAACGTGGTGGGGCTGGTGGGGATTTATCT GGCCGCCCGACCGGCCGACGAAGGCCACCCCTACGGCCACCGCAAGGCCGAAACCCTCAC CGCCCTGGGCGTCTCCGCCCTCCTCTTCCTGACGACCTGGGAACTGGTGAAGAGCGCGGT CGAGCGCCTGCGCGACCCGACTCGGATACAGGCCGAGGTCACGGTCTGGAGTTTCGGGGC CCTCGTCCTCAGCATCCTGGTGCACGCGACCGTGGTCTGGTACGAGATGCGGGAGGGCCG GCGGTTGAGGAGCGATTTCCTGGTGGCCGATGCCCAGCACAC
After so much time working on this data, it was pretty exciting to see the actual site.

I came back to the research station with Albert and Bo, and I fixed some lunch for everyone (apples and pears with Nutella, cheese and black bread, olives, some cucumbers sliced with lemon and dill, and the ubiquitous Russian sausage for the meat eaters).

After lunch, everyone except Bo went back to Zavarzin (Bo stayed at the station to work on the electrodes for his instrument). Albert and Sarah took measurements and tried out some home-made core samplers, and Anna and Alex started some enrichment cultures. This was a preliminary trip, so I mostly just tried to stay out of the way. I got some nice photographs of the rather extraordinary microbial mats growing in the smaller springs nearby.

I mentioned in a previous post that volcanic liquids are very diverse; this is the reason it's worth traveling all the way to Kamchatka. Here is a nice example of what I was talking about. These are three springs within about four feet of each other. You can see just by looking at them that they are different. The colors range from in clear to white to gray, indicating different redox states (probably of sulfur); the temperature ranges from 91C, to 86C to 81C, and the pH from 7 to 5.6 to 6.1.

Three adjacent yet very different springs.

That might not sound particularly dramatic, but recall that when you catch a fever, the shift from 37 degrees to 39 degrees is enough to halt the growth of a wide array of organisms. This is why fever is a response to infection. Microbes often adapt to very particular circumstances, and so a change of a few degrees can shift the ecology dramatically, or replace it altogether. As environments, these three springs are as different from each other as the inside of your mouth and the eyelid of a duck.

We finished up with our poking around at Zavarzin, and came home for a dinner of Borscht prepared by Anna. It was delicious. After dinner, we started setting up our lab space upstairs for DNA extractions. I managed to trip the breaker on the generator several times trying to charge up the UPSs.

Posted by Russell on August 15, 2010 at 4:59 p.m.
My apologies for getting behind on posting my updates from Uzon. After we returned from Uzon, we rested for a day, and then crammed ourselves and our equipment into a van and went to Peratunka for the Biodiversity, Molecular Biology and Biogeochemistry of Thermophiles international workshop, where I was scheduled to give a 20 minute talk.

The speaking docket got shuffled around a lot, and I ended up having to give my talk much earlier than planned. I suppose this is the inevitable downside of procrastination. While I was scrambling to finish it, I didn't have much time for blog updates!

I survived the talk. There were lots and lots of excellent questions, and I have a lot to think about now. Anyway, back to the updates from Uzon.

## Uzon, Day Zero

Posted by Russell on August 12, 2010 at 8:25 a.m.
We were picked up by a mini-bus taxi from our apartment in Petropavlovsk, only to have to turn around to retrieve a forgotten jacket whose owner shall remain nameless. From there, we drove to Kronotsky Nature Preserve, and met up with a bunch of other people including Shpilenok, the director of the preserve, a Russian TV station crew, and some photographers. Also aboard was a group of people that included the granddaughters of Tatiana Ustinova, the woman who discovered the Valley of the Geysers with Anisifor Krupenin in 1941.

The discovery of the valley is an adventure all unto itself -- beginning with a dogsled trip that got off track and ending up with the discovery of first hydrothermal site in Russia. Tatiana, who eventually settled in Vancouver, passed away recently. Her family was aboard our helicopter on a visit in her memory to Geyser Valley. Her valley, one could say.

Frank spent much of our time in Petropavlovsk regaling us with stories of helicopters left over from Russia's war in Afghanistan and held together with bits of string. If our helicopter was that old, it has been lovingly maintained.

Our ride to Uzon touching down at the airfield.

I was expecting the ride itself to be exciting, but there is none of the rush and acceleration of an airplane takeoff; when a helicopter takes off, it gets very, very loud, and rises with all the grace and charm of a freight elevator. The excitement came entirely from the view out the portal, which we could open. Kronotsky Nature Preserve is spectacularly beautiful from any angle; as interesting as it was to see it from the air, I kept wishing we would land so I could get out and have a look around.

The view from the helicopter portal as we entered Kronotsky Nature Preserve.

I lost track of how many volcanoes we passed. The most exciting was Karimsky, which happened to erupt just as I snapped a picture of it!

Karymsky Volcano erupting as we fly nearby.

Actually, I didn't take this picture. There was a photographer sitting next to me using the same portal, and I had asked him to snap a few shots of Karimsky -- which was not erupting at the time -- because he had a better angle from were he was sitting. He snapped one shot of the volcano and gasped, and then dropped my camera in his lap and grabbed his own.

Karymsky Volcano erupting as we fly nearby.

Eruption of Karymsky Volcano continues as we fly over an inland delta.

We touched down in Uzon Caldera a few minutes later, and immediately ran into some confusion over accommodations. There are two buildings in Uzon Caldera; a ranger station, and the research station. The structures are each about the size of a modest single family home. There was already a team from Winigradsky Institute staying at the research station (the director, actually), as well as the ranger and an entomologist. Meanwhile, the ranger station is being renovated, and the work crew is staying there.

Our ride continuing on to Geyser Vally. The family of Tatiana Ustinova were aboard.

The helicopter crew had been told that we would be staying at the ranger station for some reason, and so the earlier flight had delivered all of our food and lab equipment to the landing pad nearest the ranger station. The ranger station is about a kilometer away from the research station, and so we had to schlep all thirteen boxes of lab equipment and four heavy boxes of food.

Shifting our food and lab equipment from the ranger station to the research station. It was a long and exhausting job.

Once installed at the research station, Sarah, Bo and I organized our gear and luggage, and Frank and Albert -- dead tired, like the rest of us -- went upstairs bearing gifts to make friends with the other research team.

We rehydrated some freeze-dried pasta primavera, to which Sara and I added tofu. I was too hungry to notice what everyone else ate, but I think sausage was involved. Then we passed out.

## Back from Uzon

Posted by Russell on August 10, 2010 at 8:49 p.m.

Panorama overlooking Orange Fields in Uzon Caldera

We just arrived back in Petropavlovsk after a week in the field. I was very sad to leave Uzon, and it was a privilege and an honor of the highest order to have spent those days there.

The expedition was, I think, a great success. We'll know for sure once we're back at our labs and can use more sophisticated methods to examine our samples. I am very confident, though.

It was a bit touch-and-go right at the end. Our high speed centrifuge crapped out last night, just as Sarah was in the middle of the last big run of DNA extractions. The Russian team brought their own centrifuge, but we couldn't run it on our generator. Much to our relief, Albert was able to magically get the thing working again by holding it at just the right angle. They worked through the night to finish processing the samples; I think Albert must have had his thumb wedged under the centrifuge for the entire run.

I'm sorry I wasn't able to send many Twitter updates toward the end of the expedition. Once I had identified my sampling targets, I suddenly had a lot less free time on my hands (and I didn't have much to begin with). Also, I'm sorry for updating in ALL CAPS. Iridium handsets are essentially 1993 technology. Composing text messages is extremely painful, and the battery only lasts long enough to compose two or three of them. This is a pain when you have to recharge on generator power, and the generator only cranks up for a few hours a night, and even then only to power lab equipment for DNA extractions. Hats off to my dad for relaying the messages!

Right now, I'm sitting in a friendly internet cafe in Petropavlovsk where they've let me use their wireless connection. When we arrived at our crowded little apartment, the hot water was broken, and thus no showers yet. A wide selection of interesting geologic samples are wedged under my fingernails, and I think I have wads of some sort of hardened liquid sulfur caked in my hair. The helicopter arrived ridiculously early, and we just barely get everything aboard. As a result, I'm still wearing my field clothes from yesterday, which are splattered with volcanic mud. I may actually be the worst-smelling person in Petropavlovsk. Perhaps it is fortunate that this internet cafe caters mainly to kids playing StarCraft.

I composed blog entries for each day we were in Uzon, and I'll be posting them as soon as I run them past the rest of the team. I also have almost two thousand photos to sort, tag and upload.

That said, I have a correction for one of my Twitter updates. I wrote :

YERTERDAY ALBERT & TEAM WERE CHASED AWAY FROM A SITE BY A BEAR THAT WAS ACTUALY A BUSH IN THE FOG.
Albert pointed out that they were interrupted for a few minutes, but not actually chased away. He stepped forward and shouted see if he bear (or bears) would go away, with his signal torch uncapped and ready. The bears were revealed to be bushes as the wind shifted and created a channel in the mist. It's funny, but given how foggy it was that day, it wasn't actually that surprising. We were at the same site the next day, and were surprised by an actual bear. It wandered pretty close to us before we could actually see it (the full story will come with the article for that day).

A bear interrupting important EisenLab work at Boiling Spring.

Update : Albert also says that I'm wrong about having to wedge his thumb under the centrifuge the whole time. It started working again after shaking it around in the air a bit, and placing it just so on the table. He only had his thumb wedged underneath it for a minute or two to check to see if it was overheating.

## Last minute preparations

Posted by Russell on August 04, 2010 at 5:10 a.m.
We're planning to return from Uzon on the 11th of August, so assuming we leave this evening, we'll be there for seven days. We have to plan for the possibility that we might get fogged in up there, so we might be stuck for a few extra days. Hopefully that won't happen, because we've got a workshop to prepare for once we get back. Also, I've picked up a bad habit from Jonathan, and I still have to write my talk for the 17th.

We had an exhausting day yesterday.

First, the cost of the helicopter has gone up since last time they made the trip, so Frank and Albert had to arrange to transfer the difference from America to Petropavlovsk. This turned out to be an agonizing process, and I'm not even sure of all the details. Albert came back to the apartment after the first day of working on it and passed out instantly. Suffice it to say that both of them have extremely patient and resourceful spouses, without whom we would now be stranded in town with no way to get to our research site.

There remains a great deal of confusion and uncertainty about the status of the generator (or generators?) at Uzon, and so we've had to prepare for the worst. I spent the day with Albert and Alex hunting down motor oil, spark plugs, and two-stroke oil (in case it's a two-stroke engine), and other small-engine stuff. Supposedly there is a new American-market Honda generator up there, a Soviet-era machine that can still be persuaded to work, and perhaps something else of unknown providence and status. We were also told that there was no generator at all, sending us scrambling all over town to buy a new generator, but that was evidently a misscommunication. Fortunately we got it straightened out before we actually started laying out Rubles for the first generator we could carry away!

In the summertime, the research station would be a truly ideal place for an off-grid solar array. One of the things I'm going to do while I'm there is to study the structure an write up a proposal for its owners to install one, if they should so desire.

After much looking around I found that, nobody sells regular fuel canisters for backpacking stoves in this part of Russia. However, they do sell adapters that let you plug them into butane refill canisters. The canisters are very cheap, but they are shaped like cans of hairspray; narrow and tall. Not a very stable platform for cooking! I'm going to set up my stove in a bucket, and pack dirt around the fuel canister to keep it stable and upright (and far from anything that might melt or burn). And yes, I'll only use it outside.

Demonstrating the use of mosquito protection gear for Bo -- you can tell I'm not really excited about mosquitoes

I was able to find a SIM card for my MyTouch 3G, which is awesome. Unfortunately, MTS doesn't know how to automatically configure Android phones for GPRS. At least, that's what I could understand from the girl at the MTS store. That conversation was conducted mostly through hand gestures and giggling, and was a testament to the power of technology-related acronyms to puncture language barriers. It's strange to say, "IP for DNS server?" and see the light of understanding spread across a person's face.

We bought more than 15,000 Rubles of food for the trip! Actually, that's pretty reasonable for seven people.

Last of all, there was the food. By the time we all got to the grocery at 7:00 in the evening, we were almost totally spent. Still, we had to shop for another two hours before we had everything we need (at least, I hope we have everything we need).

This morning a truck from the Institute arrived at the apartment to pick up our food and laboratory equipment. We're not totally sure if we will be riding with it to Uzon, or if it will go on a separate helicopter. So, we had to waterproof everything last night in case it had to spend the day (or evening) on the landing site in the rain. I am glad we had plenty of plastic bags and tape!

Our food and lab equipment getting picked up

With luck, we will catch our helicopter to Uzon this evening.

## Uzon field season team, 2010

Posted by Russell on August 01, 2010 at 9:24 a.m.

Professor Frank T. Robb, University of Maryland

Frank is the co-chair of the workshop, and is leading our expedition. Frank is a regular in Uzon Caldera, and has made several expedition to the site since 1995.

Frank has been studying thermophiles for around twenty years, including their physiology, genomes, proteins, and ecology.

Professor Albert Colman, University of Chicago

Albert is organizing the expedition this year, and has accompanied Frank (and others) to Uzon several times. Albert was Frank's graduate student back in the day.

Alex Merkel, Winogradsky Institute of Microbiology

Alex graduated from Moscow State University, and is now a Ph.D. candidate at the Winogradsky Institute of Microbiology in Moscow. He is studying the functional diversity of methanogenic genes and culturing methane producing microorganisms.

He is also secretly the lead singer from Coldplay.

Anna Perevalova, Moscow State University

Anna graduated from Moscow State University and obtained her Ph.D. from Winogradsky Institute of Microbiology in Moscow. She is now a postdoctoral researcher at Winogradsky. Her specialty is growing extremely difficult organisms, and she also works with Alex on methanogens.

Sarah Griffis, Caltech and University of Chicago

Sarah is a senior at Caltech, and has been working in Albert's lab in Chicago for the summer doing DNA extractions.

Bo He, University of Chicago

Bo is a graduate student in Albert's lab; he studies the electochemistry of cellular redox metabolism, particularly as it pertains to metal chemistry. He did his MS at Chapel Hill on the kinetics of iron III and hydrogen sulphide in sediment formation. It's nice to have someone with a physical sciences background along for the trip!

Russell Neches, University of California, Davis

And, of course, me.

## Live from Petropavlovsk

Posted by Russell on August 01, 2010 at 8:18 a.m.

Singlehandedly bringing PLoS to new frontiers!

I arrived safely in Petropavlovsk yesterday after a very long layover in Khabarovsk and an even longer layover in Vladivostok. Frank Robb and Alex Merkel met me at the gate, and we wobbled off with our driver to the Volcanology Institute to file my paperwork.

Beer! Where have you been all this time?

After dropping my stuff off at the apartment, Frank took everyone out for pizza. Airport and airline food in Russia leaves a bit to be desired, especially if you are vegetarian and don't speak Russian. Pretty much everything is covered in, stuffed with, or made entirely out of sausages.

I basically hadn't had anything to eat in 24 hours, so I was extremely glad to get my hands on the pizza (I ate almost two). The beer was also extremely welcome.

Fog, cursed fog.

Unfortunately, Petropavlovsk is fogged in with what everyone keeps caling a "cyclone," but I don't think the word is used in the same sense as I'm used to. It seems to be a huge fog bank with drizzle coming in from the ocean. The helicopters we will fly to Uzon Caldera are fly-by-sight, so we're grounded in Petropavlovsk until the weather clears.

For now, it's seven scientists crammed into a tiny one-bedroom Soviet era apartment with a dozen laptops, piles of camping gear, and two whole laboratories (one for geochemistry, one for recombinant DNA) stuffed into freight boxes. Time to go exploring...

The door to Petropavlovsk; due for a little maintenance

## Kamchatka for those who've never played Risk

Posted by Russell on July 27, 2010 at 1:30 p.m.

Anyone who's played Risk will probably remember Kamchatka as "That place you can attack Alaska from." Like most of the territories in Risk, Kamchatka of the Hasbro game doesn't exactly match its modern political boundaries :

However, the Risk territory does reflect the range of the Chukotko-Kamchatkan language family, which includes the language spoken by the Koryaks (Kamchatka's indigenous people) :

400,000 people live on the peninsula, and about 13,000 are Koryak (about 3%). For comparison, Alaska has about 686,000 people, of which roughly 100,000 (15%) are native peoples. In terms of population, the Koryaks' situation more closely resembles that of the Ainu of Hokkaido (also about 3% of the population, going by self-identification) than native Alaskans.

Kamchatka has volcanoes. Lots and lots of volcanoes. It's part of the Ring of Fire, with 160 volcanoes, 29 of which are active. The whole area is seismically active, and there was a decent-size quake off the coast just this Sunday.

Phil Plait at Bad Astronomy posted about an this awesome photo of two Kamchatkan volcanoes erupting at the same time. It was captured in February, 2010 by NASA's TERRA Earth-observing satellite as it flew over (the TERRA website appears to be down right now - this isn't rocket science, NASA!).

These volcanoes, and the microbes that live in and around them, are the reason why we're traveling around the world to see this place. Wherever magma is close enough to the surface to interact with groundwater, superheated steam can be forced toward the surface. Depending on the how much it cools before reaching the surface and the pressure under which it emerges, the liquid can for a variety of hydrothermal features; geysers, fumaroles and springs if the liquid emerges on land, and black smokers and white smokers if it emerges under water.

Along the way, the water dissolves various minerals and gases from the rock, and catalyzes the formation of new minerals and gases. By the time it emerges at the surface, it has become a complex suspension of minerals, gases and liquids, some dissolved, others suspended as a colloid, and others in bubbles and grains. I'm going to stop calling it "water" and call this stuff "volcanic liquid."

The chemistry of the emerging liquid depends on the chemistry, temperature, depth, thickness, packing and order of each layer of rock and soil it transits on the way to the surface, as well as the pressure and temperature of the liquid at each step along its journey.

A thermal pool at Lassen Volcanic National Park

My favorite way to explain how there could be so much variety in volcanic liquids is to think about coffee. It's possible to make several very different kinds of coffee from the same beans. If you grind them very fine, pack them tightly, and force steam through the grounds at high pressure, you get espresso. If you grind them even finer and suspend them in hot water as a colloid, you get Turkish coffee. If you grind them coarsely, suspend them in water, and remove them with a sieve, you get French-style coffee. If you grind them moderately, put them in a filter cone, and pour hot water through them, you get American-style drip coffee. They each taste totally different, despite being made from exactly the same ingredients.

Now, instead of coffee grounds, imagine many layers of rock, each with different chemistry, packing density, and thickness. Rocks, by the way, are pretty complicated things, and can be made out of almost anything. Practically every source of volcanic liquid from around the world has a unique chemical composition.

This variety is one of the reasons microbiologists are so interested in the organisms that live in these liquids. Organisms that live in the Earth's atmosphere, like you and me, have only a few attractive options for how we run our metabolisms. For organisms that live in volcanic liquids, every combination of dissolved and suspended minerals and gases offers its own unique metabolic opportunities. Volcanic structures tend to persist for a long time, and so their denizens have time to evolve very well-adapted strategies for living in these places.

Visiting these volcanic vents is like taking a trip to an alien world, or like visiting Earth when it was a radically different planet. Volcanic zones don't just look alien, they are alien!

An alien habitat at Lassen Volcanic National Park

I will be spending almost two weeks up-close-and-personal with some of these alien habitats, so there will be more to come.

## Science, the practice of

Posted by Russell on July 25, 2010 at 6:40 a.m.

This is the first in a series of articles I plan to write over the next three weeks covering my field expedition to Uzon Caldera and attendance the 2010 International Workshop on Biodiversity, Molecular Biology and Biogeochemistry of Thermophiles. In this post, I'll outline my plans for the series and explain why I'm writing it.

If you would like to follow along, check in here, or subscribe to my RSS feed. Or if you would like to follow the series and not the rest of my blog, I will be tagging all of the posts in the series kamchatka. At Uzon Caldera, I will be posting updates to my Twitter feed by satellite phone (you can also subscribe to my Twitter RSS feed.)

Before I leave on Tuesday, I will post articles introducing the natural history of Kamchatka, my plans and preparations for getting getting there and working there, and maybe a few other things.

I have two broad goals :

• Study the biochemistry, genomics, and physiology of thermophilic organisms in their natural habitat.
• Document and share the experience.
These are two fairly distinct missions. First of all, I'm looking for material for my thesis, particularly a metagenomic target suitable for the technique I'm developing. For the hard science, I will try to confine myself to observations and avoid drawing conclusions. I'll save that for the journals.

The second mission is to bring you along. I've been asked by my thesis advisor to write about, photograph, tweet and film as much of the field expedition and the workshop as possible, and present it as an example of what it's like to actually do science. My goal is to present the company, the food, the work, the travel, the joys, the annoyances, the surprises, the good, the bad, and the ridiculous.

Science remains firmly misunderstood by the public. My personal experience suggests that the public actually understands the products of science -- powerful theories and key facts -- a bit better than polling data suggests. The core of public misunderstanding, I think, rests in how people believe science works as an institution and as a profession.

A couple of years ago, Fermilab invited a group of seventh graders to visit the laboratory to check out the various awesome things they have available for the public to see. Before the visit, the students were asked to write about what they thought scientists were like, and to draw a picture to go along with it. After the visit, they were asked to repeat the exercise. The results eye-opening. Here is an example I particularly liked, from a girl named Rachel :

### after

Most of the before pictures feature lab coats filled by older, white men without much hair. Many of the kids mentioned that they thought scientists were "a little bit crazy," and most represented their scientist as some sort of authority figure. The after-visit results are equally interesting; many of the comments seem astonished that scientists have families, and that they enjoy things other than science.

The phrase "regular people" comes up again and again in their after-visit writing. Students are usually pretty good at ignoring phrases that are deliberately emphasized. When you see a bunch students incorporate exactly the same phrase into a free-form writing assignment, it's usually something that an adult mentioned without anticipating the impact it would have. The concept that scientists could be "regular people" was evidently a bit of a shock.

Obviously this is anecdotal, and it's important not to read too much into it. It is, however, a useful example of the sort of challenges we face if we want society to understand science itself, rather than simply memorizing the things science produces. None of this is original to me. If you want an entertaining treatment of science in the media, check out Christopher Frayling's Mad, Bad and Dangerous?: The Scientist and the Cinema (I apologize for the bizarre question-mark colon thing).

The problem is that scientists do not spend enough time talking with the general public. Only a small minority of scientists take the trouble to arrange their findings in a form digestible by the lay audience, as Darwin did. When they do, it is almost never cutting-edge research that fills the pages. Very few scientists go on television or the radio. The practice today is to bring research to lay the audience only when it is neatly tied up (or, the research community feels that it is, anyway). There are those who do otherwise, but there is a negative stigma to it; scientists who announce their findings with press releases instead of peer-reviewed papers are usually regarded with suspicion.
Scientists have a responsibility to share what they do.

Over the next three weeks, I'm going to put that thought into action.

## I'm going to Kamchatka!

Posted by Russell on July 19, 2010 at 5:47 p.m.
I just got the reservations for my flight to Petropavlovsk-Kamchatsky for the International Workshop on Biodiversity, Molecular Biology and Biogeochemistry of Thermophiles, hosted by Moscow State University and Winogradsky Institute of Microbiology.

I've been working on the analysis of environmental samples from two sites at Uzon Caldera (about 10,000 Sanger reads from each sequenced at the JGI), and I'm hoping that I'll be able to reprocess the DNA here at the UC Davis Genome Center using some of our high-throughput machines. Licensing and customs restrictions will probably make it impossible to bring my own samples back, but I may be able to entrust them to a colleague with fancier credentials than my own.