This is part 2 in a series of articles about what powers Beetle behind the scenes. Last time we looked at how Beetle signs up to email campaigns. This time we will look at the next step in the process; downloading our emails.
We use Google Apps to manage the multiple email domains we use when we sign up. All email gets caught by one ‘catchall’ inbox for each domain.
The Fetcher opens IMAP connections to all inboxes, and listens for new email. Legacy systems would process email as soon as it came in. As email volume ramped up we found we needed to start processing email in batch. Once the batch size is reached, all emails in the batch are downloaded at once.
Once the raw files are downloaded, we decode the email - text and HTML part - and give it a unique name based on when we received it and the subject line. We discard any attachments.
The downloaded files are then uploaded to Amazon S3 and the details are saved to our MySQL database. We leave the content of the emails totally untouched for this stage.
In a similar fashion to how the emails are downloaded, the emails are also marked as read in batch. We ran into problems with marking lots of emails as read very quickly when our volume started to increase.
The Fetcher is a relatively simple application, comprising one of the more simple stages the emails must go through before being visible on the site.
In the next article we’ll look at the most complicated application in the process: The Localiser.