Russell's Blog

New. Improved. Stays crunchy in milk.

A plauge of duplicates

Posted by Russell on April 19, 2008 at 7:47 a.m.
I have a bad habit. I often open my IMAP mailbox directly on the mail host with mutt. This inevitably causes occasional complicated messes when I butterfinger something. For example, a few days ago I accidentally moved all of the read messages to a separate mailbox. I merged the message back in, but not before my desktop, which was patiently monitoring my IMAP mailbox on the other side of town, decided to synchronize a few thousand messages with the IMAP server. This resulted in lots and lots of duplicate messages.

The trouble was, the duplicate messages had different X-IDs so, their MD5 hashes would be different. After fiddling around with formail for a few minutes, I got impatient and banged out this fun little Python hack :

import email, imaplib, getpass

M = imaplib.IMAP4_SSL( '**********' )

typ, data = M.login( getpass.getuser(), getpass.getpass() )
if typ != 'OK' :
    raise Exception, 'Login failed.'

typ, data = M.select()
if typ != 'OK' :
    raise Exception, 'Selection failed.'

typ, data = M.search( None, 'ALL' )
if typ != 'OK' :
    raise Exception, 'Could not get message IDs.'

id_list = data[0].split()
mids = []
for id in id_list :
    typ, data = M.fetch( id, '(RFC822)' )
    if typ != 'OK' :
        raise Exception, 'Could not fetch message ' + id
    mail = email.message_from_string( data[0][1] )
    mID = mail.get( 'message-id' )
    print mID
    mids.append( (mID, id) )

mids.sort()

dupes = []
for i in range(len(mids)) :
    if m[i] == m[i+1] :
        dupes.append( m[i+1] )

print 'Found ' + len(dupes) + ' duplicate messages.'

for m in dupes :
    typ, data = M.store( m[1], "+FLAGS", '(\\Deleted)')

print 'Marked ' + len(dupes) + ' for deletion.'

typ, data = M.expunge()

print 'Expunged ' + len(data.split()) + ' messages.'
Duplicates begone!

It's a little annoying that imaplib doesn't have a friendly wrapper function for marking messages for deletion, but M.store( m[1], "+FLAGS", '(\\Deleted)') does the job just fine.

Vort.org now running on Django

Posted by Russell on March 27, 2008 at 12:53 a.m.
For the last couple of years, I've been running Typo, a Ruby on Rails blogging tool. It was nice, but there were a few persistent problems I encountered :
  • It was sloooooow. Nothing I did seemed to get it to run faster, even with carefully tuned caching.
  • It was unstable. Typo would run happily for months, and then mysteriously explode. This usually happened while I was traveling, or busy with something more important.
  • It was difficult to fix. Usually, when Typo would come down, it took a few days of research and pestering people to figure out why.
  • The database migrations between versions were awful. You'll notice that the first year of posts don't have any tags. They were deleted by a bad migration. I have backups, but merging them back in is nightmarish.
A lot of the problems I experienced are with older versions of Typo. They've definitely gotten better over the years. But there is one big reason I'm abandoning Typo; I don't want to code in Ruby. I do most of my coding in python. I like python. I seems silly not to use the knowledge I have from doing computational physics in python.

I have used Blogmaker for most of the main elements on my site, but with a fair bit of hacking to make it do more of what I want. I also wrote a Typo-to-Django import utility, if anyone is interested. The URLs are slightly different, so I'm going to watch the 404s for a few days.