Russell's Blog

New. Improved. Stays crunchy in milk.

Updates, continuing

Posted by Russell on August 15, 2010 at 4:59 p.m.
My apologies for getting behind on posting my updates from Uzon. After we returned from Uzon, we rested for a day, and then crammed ourselves and our equipment into a van and went to Peratunka for the Biodiversity, Molecular Biology and Biogeochemistry of Thermophiles international workshop, where I was scheduled to give a 20 minute talk.

The speaking docket got shuffled around a lot, and I ended up having to give my talk much earlier than planned. I suppose this is the inevitable downside of procrastination. While I was scrambling to finish it, I didn't have much time for blog updates!

I survived the talk. There were lots and lots of excellent questions, and I have a lot to think about now. Anyway, back to the updates from Uzon.

Back from Uzon

Posted by Russell on August 10, 2010 at 8:49 p.m.

Panorama overlooking Orange Fields in Uzon Caldera

We just arrived back in Petropavlovsk after a week in the field. I was very sad to leave Uzon, and it was a privilege and an honor of the highest order to have spent those days there.

The expedition was, I think, a great success. We'll know for sure once we're back at our labs and can use more sophisticated methods to examine our samples. I am very confident, though.

It was a bit touch-and-go right at the end. Our high speed centrifuge crapped out last night, just as Sarah was in the middle of the last big run of DNA extractions. The Russian team brought their own centrifuge, but we couldn't run it on our generator. Much to our relief, Albert was able to magically get the thing working again by holding it at just the right angle. They worked through the night to finish processing the samples; I think Albert must have had his thumb wedged under the centrifuge for the entire run.

I'm sorry I wasn't able to send many Twitter updates toward the end of the expedition. Once I had identified my sampling targets, I suddenly had a lot less free time on my hands (and I didn't have much to begin with). Also, I'm sorry for updating in ALL CAPS. Iridium handsets are essentially 1993 technology. Composing text messages is extremely painful, and the battery only lasts long enough to compose two or three of them. This is a pain when you have to recharge on generator power, and the generator only cranks up for a few hours a night, and even then only to power lab equipment for DNA extractions. Hats off to my dad for relaying the messages!

Right now, I'm sitting in a friendly internet cafe in Petropavlovsk where they've let me use their wireless connection. When we arrived at our crowded little apartment, the hot water was broken, and thus no showers yet. A wide selection of interesting geologic samples are wedged under my fingernails, and I think I have wads of some sort of hardened liquid sulfur caked in my hair. The helicopter arrived ridiculously early, and we just barely get everything aboard. As a result, I'm still wearing my field clothes from yesterday, which are splattered with volcanic mud. I may actually be the worst-smelling person in Petropavlovsk. Perhaps it is fortunate that this internet cafe caters mainly to kids playing StarCraft.

I composed blog entries for each day we were in Uzon, and I'll be posting them as soon as I run them past the rest of the team. I also have almost two thousand photos to sort, tag and upload.

That said, I have a correction for one of my Twitter updates. I wrote :

YERTERDAY ALBERT & TEAM WERE CHASED AWAY FROM A SITE BY A BEAR THAT WAS ACTUALY A BUSH IN THE FOG.
Albert pointed out that they were interrupted for a few minutes, but not actually chased away. He stepped forward and shouted see if he bear (or bears) would go away, with his signal torch uncapped and ready. The bears were revealed to be bushes as the wind shifted and created a channel in the mist. It's funny, but given how foggy it was that day, it wasn't actually that surprising. We were at the same site the next day, and were surprised by an actual bear. It wandered pretty close to us before we could actually see it (the full story will come with the article for that day).


A bear interrupting important EisenLab work at Boiling Spring.

Update : Albert also says that I'm wrong about having to wedge his thumb under the centrifuge the whole time. It started working again after shaking it around in the air a bit, and placing it just so on the table. He only had his thumb wedged underneath it for a minute or two to check to see if it was overheating.

I'm going to Kamchatka!

Posted by Russell on July 19, 2010 at 5:47 p.m.
I just got the reservations for my flight to Petropavlovsk-Kamchatsky for the International Workshop on Biodiversity, Molecular Biology and Biogeochemistry of Thermophiles, hosted by Moscow State University and Winogradsky Institute of Microbiology.

I've been working on the analysis of environmental samples from two sites at Uzon Caldera (about 10,000 Sanger reads from each sequenced at the JGI), and I'm hoping that I'll be able to reprocess the DNA here at the UC Davis Genome Center using some of our high-throughput machines. Licensing and customs restrictions will probably make it impossible to bring my own samples back, but I may be able to entrust them to a colleague with fancier credentials than my own.

Insofar as it will be possible, I will be blogging from Kamchatka and uploading photographs and data, so please ask questions in the comments!

I'll be arriving in Petropavlovsk on the 30th of July, with the help of a generous grant from the Carnegie Institution for Science Deep Carbon Observatory.

Good luck, whoever you are

Posted by Russell on June 08, 2010 at 3:54 p.m.
Last week, I got an urgent call from the National Marrow Donor Program. Somewhere, there is a girl about to go into chemotherapy, and my tissue type matches hers. They needed to run some more detailed tissue typing, and screen for infections diseases. The NMDP sent a kit to the UC Davis student health center (the new building is right next to my house), and I had my blood drawn this morning.

I hate giving blood. They didn't need very much, but I don't get along very well with steel needles. I count it as a major victory that I didn't barf until I got home.

Now we all wait for the results.

Good luck, whoever you are.

20s

Posted by Russell on May 31, 2010 at 9:17 p.m.
To steal the idea from John Scalzi, here is the last photograph of me in my 20s.

My twenties; better than my teens. Some good times, and some pretty awful times, and on average kind of meh. If the trend holds, my thirties should be in the tolerable to nice range. Hopefully the underlying process is geometric, and not linear or logarithmic.

Hence the awkward sort of half-smile.

Holly Allan-Young

Posted by Russell on May 26, 2010 at 10:57 p.m.

Terry Young: She is gone
Sent: 2:54PM
I will miss her terribly.

What Google knows

Posted by Russell on April 28, 2010 at 11:26 a.m.
After six months of using Google Latitude, I've amassed about 7108 location updates, or about 38 a day. It would probably be a lot more if I hadn't managed on occasion to break the GPS or automatic updating by fiddling with the software.

It's actually quite useful to have this data, especially if it's correlated with some richer information. For example, I've consulted the data to answer questions like, "Where was that awesome sandwich place I ate at last month?" It's also extremely useful to be able to share this data with Google because it allows me to quickly cross-reference location coordinates with Google's database of businesses and addresses. You can also download your complete location history in one giant blob (just ignore the warning that the History map only displays 500 datapoints, and download the KML file). Once you have the KML file, you can do whatever you want with it. For example, I uploaded mine to Indiemapper to map my wanderings for the last six months (Indiemapper is cool, but I quickly found that this dataset is really much too big for a Flash-based web application).

Not surprisingly, I spent most of my time in California, mostly in Davis and the Bay Area, with a few trips to Los Angeles via I-5, the Coast Starlight, and the San Joaquin (the density of points along those routes is indicative of the data service along the way).

The national map shows my trip to visit my dad's family in New Jersey and Massachusetts, as well as a layover in Denver that I'd completely forgotten about.

I have somewhat mixed feelings about this dataset. On one hand, it's very useful to have, and sharing it with my friends and with Google is very useful. It's also cool to have this sort of quantitative insight into my recent past so easily accessible. On the other hand, I'm not particularly happy with the idea that Google controls this data. I chose the word controls deliberately. I don't mind that they have the data -- after all, I did give it to them. As far as I know, Google has been a good citizen when it comes to keeping personal location data confidential. The Latitude documentation makes their policy pretty clear :

Privacy

Google Location History is an opt-in feature that you must explicitly enable for the Google Account you use with Google Latitude. Until you opt in to Location History, no Latitude location history beyond your most recently updated location if you aren't hiding is stored for your account. Your location history can only be viewed when you're signed in to your Google Account.

You may delete your location history by individual location, date range, or entire history. Keep in mind that disabling Location History will stop storing your locations from that point forward but will not remove existing history already stored for your Google Account.

...

If I delete my history, does Google keep a copy or can I recover it?

No. When you delete any part of your location history, it is deleted completely and permanently within 24 hours. Neither you nor Google can recover your deleted location history.

So, that's what they'll do with it, and I'm happy with that. What bothers me is this: Who owns this data?

This question leads directly to one of the most scorchingly controversial questions you could ask for, and there are profound legal, social, economic and moral outcomes riding on how we answer it. This isn't just about figuring out what coffee shops I like. If you want to see how high the stakes go, buy one of 23andMe's DNA tests. You're giving them access to perhaps the most personal dataset imaginable. In fairness, 23andMe has a very strong confidentiality policy.

But therein lays the problem -- it's a policy. Ambiguous or fungible confidentiality policies are at the heart of an increasing number of lawsuits and public snarls. For example, there is the case of the blood samples taken from the Havasupai Indians for use in diabetes research that turned up in research on schizophrenia. The tribe felt insulted and misled, and sued Arizona State University (the case was recently settled, the tribe prevailing on practically every item).

You can't mention informed consent and not revisit HeLa, the first immortal human cells known to science. HeLa was cultured from a tissue biopsy from Henrietta Lacks and shared among thousands of researchers -- even sold as a commercial product -- making her and her family one of the most studied humans in medical history. The biopsy, the culturing, the sharing and the research all happened without her knowledge or consent, or the knowledge or consent of her family.

And, of course, there is Facebook -- again. Their new "Instant Personalization" feature amounts to sharing information about personal relationships and cultural tastes with commercial partners on an op-out basis. Unsurprisingly, people are pissed off.

Some types of data are specifically protected by statute. If you hire a lawyer, the data you share with them is protected by attorney-client privilege, and cannot be disclosed even by court order. Conversations with a psychiatrist are legally confidential under all but a handful of specifically described circumstances. Information you disclose to the Census cannot be used for any purpose other than the Census. Nevertheless, there are many types of data that have essentially no statutory confidentiality requirements, and these types of data are becoming more abundant, more detailed, and more valuable.

While I appreciate Google's promises, I'm disturbed that the only thing protecting my data is the goodwill of a company. While a company might be full of a lots of good people, public companies are always punished for altruistic behavior sooner or later. There is always a constituency of assholes among shareholders who believe that the only profitable company is a mean company, an they'll sue to get their way. Managers must be very mindful of this fact as they navigate the ever changing markets, and so altruistic behavior in a public company can never be relied upon.

We cannot rely on thoughtful policies, ethical researchers or altruistic companies to keep our data under our control. The data we generate in the course of our daily lives is too valuable, and the incentives for abuse are overwhelming. I believe we should go back to the original question -- who owns this data? -- and answer it. The only justifiable answer is that the person described by the data owns the data, and may dictate the terms under which the data may be used.

People who want the data -- advertisers, researchers, statisticians, public servants -- fear that relinquishing their claim on this data will mean that they will lose it. I strongly disagree. I believe that people will share more freely if they know they can change their mind, and that the law will back them up.

Update

The EFF put together a very sad timeline of Facebook's privacy policies as they've evolved from 2005 to now. They conclude, depressingly :
Viewed together, the successive policies tell a clear story. Facebook originally earned its core base of users by offering them simple and powerful controls over their personal information. As Facebook grew larger and became more important, it could have chosen to maintain or improve those controls. Instead, it's slowly but surely helped itself — and its advertising and business partners — to more and more of its users' information, while limiting the users' options to control their own information.

A desirable extinction

Posted by Russell on March 25, 2010 at 3:19 p.m.
Some weeks ago, Buzz (my cat) escaped out my front door while I was carrying my bicycle into the apartment. For ten or twenty minutes, he romped through the ivy and bushes around my apartment while I followed him around rattling a bag of cat treats. Eventually, he let me pick him up and take him back inside. Naturally, he picked up a few fleas. Naturally, they have multiplied.

Oddly, the fleas don't seem to like Neil very much, nor do they like me. It's just poor Buzz that's beset by the nasty little critters.

Figure 1: A flea.


As it happens, I've been thinking about endogenous metrics for estimating the sampling quality of an environmental shotgun sequencing dataset, and Buzz's little problem presented an opportunity to play with a simplified problem. So, I have decided to make Buzz, or rather his fleas, into a small experiment in ecology. I am going to try to see if I can drive them into extinction.

Now, this is normally what a pet owner does when they discover their pet has contracted some sort of annoying parasite, but I decided to take a more quantitative approach.

Figure 2: A cat.


It's simple enough to count fleas on a cat, if the cat is willing to cooperate. Buzz loves the flea comb, and will gleefully hop onto the coffee table and wait to be combed if you show it to him. So, in the interest of science, I convinced my roommate to count the number of passes I made with the flea comb and how many fleas I captured (posterity will remember your efforts, Mehdi). Using his tally, I plotted the cumulative number of passes verses the cumulative number of fleas.

Figure 3: Fleas captured

As expected, it became somewhat more difficult to capture the next flea as more fleas were captured, suggesting a depletion curve. The value of the asymptote should be the actual number of fleas on Buzz at the time, and reaching that number would imply local extinction for the fleas. Of course, there are probably other fleas lurking about that would recolonize Buzz. In principle, if I were to repeat the exercise frequently enough, Buzz would become a sink for fleas, and their migration to his fur would gradually deplete them from the environment.

There are a couple of different ways to model the impact of the combing on the flea population, with various advantages and disadvantages. All we really want to do here is to estimate the value of the asymptote, and so a simple model is probably sufficient. I showed this data to my fried Sharon Shewmake, an economics graduate student. Sharon, after editorializing on the endeavor ("Ew."), suggested this very simple model.

Assume that Buzz is not going to sit still long enough for the fleas to reproduce, for more fleas to migrate to his fur, and that the fleas already on his fur are going to stay put unless captured. Thus, there is a fixed initial population which only changes as a result of capturing fleas. Next, we assume that any given flea is equally likely to be captured on a single pass of the comb. So, the expectation value for number of fleas captured on a single pass is the product of the current population and the probability of capturing a flea.

where N is the population of fleas and p is the probability of any particular flea being captured on a single pass. One could tart this up a bit by modeling it as a stochastic process and executing a bunch of Monte Carlo trials until the outcomes converge, but that seems like overkill for a simple single variable problem like this. We will put up with the intellectual inconvenience of capturing fractional fleas.

This is a little easier to see if we let N represent the number of fleas remaining on the cat, rather than the number of fleas captured.

If we stretch our credulity far enough to imagine this as a continuous function, we can express it as a differential equation.

Sorry if this bothers you. Not only are we extracting fractional fleas, but we are now modeling the combing process as a sort of flea-killing-combine continuously mowing its way through the fur. This is a model, so you shouldn't be surprised to find massless rope and spherical cows. Anyway, it has a nice easy solution.

Well, what the heck. This is a decaying function, so let's pluck a minus sign out of the exponential factor, and maybe tack on a scale factor for the initial population.

While we're at it, why don't we go back to letting the function stand for the number of fleas captured, rather than the fleas on the cat.

This gives us a nice function to use for a linear regression. A little help from scipy, and we find that the initial population is estimated at 39.7 fleas, and the decay factor is 0.011.

Figure 4: Flea population

I captured 34 fleas, so that means I missed about five or six. In order to be reasonably confident that I'd captured all 39 fleas, I would have had to continued for about 400 passes with the comb, instead of 173. Buzz is a patient cat, but he started to loose interest around 120 passes, and had to be fetched back onto the coffee table a few time times during the last 50 passes. My guess is that 400 passes would require some kind of sedative. On the other hand, he does seem to like Guinness, so there may be something to that.

Science has been served. I'm going to the pet store to buy some flea collars.

Espresso

Posted by Russell on March 13, 2010 at 3:56 p.m.
A few weeks ago, my dad sent me this really nice espresso machine to cheer me up. Actually, he sent it to me because it was it was his birthday. He's a pretty awesome dad that way -- I only sent him a book.

I'm still getting the hang of getting a decent pull of espresso out of it. I've found that my burr grinder doesn't quite go fine enough for espresso, so I'm going to have to take it apart and see if I can adjust the grinding wheels so they're closer together. Anyway, here is my latest effort :

Facepaw

Posted by Russell on February 10, 2010 at 3:42 a.m.
Spent the day writing a piece of code I already wrote six months ago. Not sure how I managed to forget. The new code wasn't very good, so I threw it away. Day down the tube.

Even asleep, Neil seems to understand.

Loss

Posted by Russell on February 03, 2010 at 11:48 p.m.

I normally don't talk a lot about my personal life on my blog, and except for the occasional announcement, I'd like to keep it that way. People's little triumphs and tragedies are mainly interesting to those directly involved, and are at best kind of boring to everyone else. A lot of my friends and family do read this blog, but by and large most of you are strangers or acquaintances. I try to respect that.

Those of you who are close to me know that I'm going through a sad time in my life right now. Those of you who work or study with me have probably noticed that I've not been my usual cheerful self. In deference to the many people who aren't here to read about that, and the fact that I can barely think about it (never mind write about it), I'm not going to discuss what's happened on my blog.

The one thing that has helped has been hearing about all the cool things that other people are doing. So, even though I'm not exactly Mr. Social right now, please don't take that as a sign that I want to be left alone.

On the contrary. Now would be a great time to tell me about whatever is on your mind, especially if it's cool.

To those of you who've been kind enough to treat me like a normal person over the last two weeks despite my melancholy behavior, I owe you guys. Really.