Andrew vs SPAM Part III

Today, I received an email which was quite clever. The spammer, knowing that spam filters are very good at checking text, decided to create a multipart message with HTML and gif images. Using images as the payload, the message goes through, and I’d be willing to bet a high percentage of users have their mail clients setup to display HTML emails, rather than the Plaintext alternatives. Thus the chances of the reader seeing the images and their contained text, is greatly increased. Though, since this message didn’t have a Plaintext portion, a smart email client might display the HTML anyway, to show the user something. I haven’t tested this theory.

However, what this creates is a challenge to those of you/us that are into OCR, and image processing. Can we assemble seperated images, find all the text within them and run SPAM filtering on the contained text fast enough to make it a useful operation? Have the SPAMMERs finally won? Will Batman save Robin before The Joker’s laughing gas makes Robin laugh so hard he comes out of the closet?

— 2006-06-14