Russell's Blog

New. Improved. Stays crunchy in milk.

A plauge of duplicates

Posted by Russell on April 19, 2008 at 7:47 a.m.
I have a bad habit. I often open my IMAP mailbox directly on the mail host with mutt. This inevitably causes occasional complicated messes when I butterfinger something. For example, a few days ago I accidentally moved all of the read messages to a separate mailbox. I merged the message back in, but not before my desktop, which was patiently monitoring my IMAP mailbox on the other side of town, decided to synchronize a few thousand messages with the IMAP server. This resulted in lots and lots of duplicate messages.

The trouble was, the duplicate messages had different X-IDs so, their MD5 hashes would be different. After fiddling around with formail for a few minutes, I got impatient and banged out this fun little Python hack :

import email, imaplib, getpass

M = imaplib.IMAP4_SSL( '**********' )

typ, data = M.login( getpass.getuser(), getpass.getpass() )
if typ != 'OK' :
    raise Exception, 'Login failed.'

typ, data = M.select()
if typ != 'OK' :
    raise Exception, 'Selection failed.'

typ, data = M.search( None, 'ALL' )
if typ != 'OK' :
    raise Exception, 'Could not get message IDs.'

id_list = data[0].split()
mids = []
for id in id_list :
    typ, data = M.fetch( id, '(RFC822)' )
    if typ != 'OK' :
        raise Exception, 'Could not fetch message ' + id
    mail = email.message_from_string( data[0][1] )
    mID = mail.get( 'message-id' )
    print mID
    mids.append( (mID, id) )

mids.sort()

dupes = []
for i in range(len(mids)) :
    if m[i] == m[i+1] :
        dupes.append( m[i+1] )

print 'Found ' + len(dupes) + ' duplicate messages.'

for m in dupes :
    typ, data = M.store( m[1], "+FLAGS", '(\\Deleted)')

print 'Marked ' + len(dupes) + ' for deletion.'

typ, data = M.expunge()

print 'Expunged ' + len(data.split()) + ' messages.'
Duplicates begone!

It's a little annoying that imaplib doesn't have a friendly wrapper function for marking messages for deletion, but M.store( m[1], "+FLAGS", '(\\Deleted)') does the job just fine.

Vort.org now running on Django

Posted by Russell on March 27, 2008 at 12:53 a.m.
For the last couple of years, I've been running Typo, a Ruby on Rails blogging tool. It was nice, but there were a few persistent problems I encountered :
  • It was sloooooow. Nothing I did seemed to get it to run faster, even with carefully tuned caching.
  • It was unstable. Typo would run happily for months, and then mysteriously explode. This usually happened while I was traveling, or busy with something more important.
  • It was difficult to fix. Usually, when Typo would come down, it took a few days of research and pestering people to figure out why.
  • The database migrations between versions were awful. You'll notice that the first year of posts don't have any tags. They were deleted by a bad migration. I have backups, but merging them back in is nightmarish.
A lot of the problems I experienced are with older versions of Typo. They've definitely gotten better over the years. But there is one big reason I'm abandoning Typo; I don't want to code in Ruby. I do most of my coding in python. I like python. I seems silly not to use the knowledge I have from doing computational physics in python.

I have used Blogmaker for most of the main elements on my site, but with a fair bit of hacking to make it do more of what I want. I also wrote a Typo-to-Django import utility, if anyone is interested. The URLs are slightly different, so I'm going to watch the 404s for a few days.

Tasty Python Snacks

Posted by Russell on November 06, 2006 at 2:51 p.m.
Languages like Python are great for tasks that require manipulating complicated data. The main complaint about such languages is that they aren't as fast as compiled languages. For scientific computing (and a lot of other things), that's a show-stopper. However...
from weave import inline
a = 25
code = \
	"""
	int i = a;
	while( i > 1 ) {
		printf("a number: %d\\n",i);
		i = i / 2;
	}
	return_val = i;
	"""
inline(code, ['a'])
===output===

a number: 25
a number: 12
a number: 6
a number: 3
1
My mind just spins thinking of all the horrible, horrible things I can do with that.