Anyway, here is the de Bruijn graph for the sequence gggctagcgtttaagttcga projected into 4-mer space :
This is the de Bruijn graph in 32-mer space for a longer sequence (it happens to be a 16S rRNA sequence for a newly discovered, soon-to-be-announced species of Archaea).
It looks like a big scribble because it's folded up to fit into the viewing box. Topologically, it's actually just two long strands; one for the forward sequence, and one for its reverse compliment. There are only four termini, and if you follow them around the scribble, you won't find any branching.
I'm not quite ready to talk about what I've been finding, but I thought I would take a moment to share a little tool I wrote along the way. It's made my life a lot easier, and maybe other people could get some use out of it.
It's called raygun, a very simple Python interface for running local NCBI BLAST queries. You initialize a RayGun object with a FASTA file containing your target sequences, and then you can query it with strings or other FASTA files. It parses the BLAST output into a list of dictionary objects, so that you can get right to work.
It doesn't take a lot of scripting chops to do this without an interface, of course, and there are other Python tools for running BLAST queries. The advantage of raygun over either the DIY approach or the BioPython approach is that raygun is extremely simple to use. I wanted something that would basically be point-and-shoot :
Unfortunately, you must furnish your own implementation of the cleverness module.import raygun import cleverness rg = raygun.RayGun( 'ZOMG_DNA_OMG_OMG.fa' ) hits = rg.blastfile( 'very_clever_query.fa' ) results =  for hit in hits : results.append( cleverness.good_idea( hit[ 'subject' ] ) ) cleverness.output_phd_thesis( results )
I designed raygun is with interactive use in mind, particularly with ipython (by the way, if you do a lot of work in python and you're not using ipython, you're being silly). The code is available on github.