Pulling Mail Out of Gmail And Retaining Your Labels

GmailIf you are fed up with Gmail and want to pull all your mail, here is how you do it. This technique was used on over 30 mail accounts so I’m sure it will work for you.

The problem of exporting your mail from Gmail is not a trivial one. From discussions by Opera Software’s lead QA for Opera Mail’s posting on Gmail’s Buggy IMAP Implementation to Matt Cutts’ posting on How to back up your Gmail on Linux in four easy steps to LifeHacker’s posting on Back up Gmail on Linux with Getmail to Wired’s wiki entry on Make a Local Backup Of Your Gmail Account, it seems that there is no single definitive source on how to pull your mail and retain your labels.

So here is what I’ve done to solve this problem:

  1. Use getmail – this has been the best archiver I’ve run across. There are other applications – isync, OfflineIMAP, Fetchmail, etc. – that probably do a decent job, but getmail is still the best in my view. There are other hacks – use to synch the Gmail IMAP directory, then convert emlx to maildir; same for Thunderbird and mbox; etc – but we wanted something a little more straightforward – Occam’s razor, right?
  2. Install getmail – On my dev machine, I used macports (port install python25; port install getmail) to install the latest getmail which had dependencies on Python 2.5. After this was done, I set up the getmailrc config file and fired off an attempt using SimpleIMAPSSLRetriever… which failed due to a lack of SSL in the newly installed Python. I had to go back and install Readline (port install py25-readline), then install SSL for Python (port install py25-socket-ssl).
  3. Patch Python – There is a malloc bug in imaplib when fetching large documents using SSL. So open up from your Python lib dir (in my case /opt/local/lib/python2.5/) and replace:
    data =


    data =, 16384))

    to maintain a 15MB memory block if necessary.

  4. Configure getmail – Now that most of the fun is taken care of, we need to set up a configuration file for getmail (~/.getmail/getmailrc) and create the proper local destination. First the getmailrc file:
    type = SimpleIMAPSSLRetriever
    server =
    mailboxes = ("[Gmail]/Starred",)
    username =
    password = xxx
    type = Maildir
    path = ~/Maildir/
    verbose = 2
    message_log = ~/.getmail/gmail.log

    First of all, we are using IMAP to retrieve mail as POP has a limit of 99 documents per access and that would take forever.

    Second, we are using the Maildir format for the destination so we need to make sure the target directories have been created (~/Maildir/cur, ~/Maildir/new, ~/Maildir/tmp).

    Third, we need to specify a mailbox or mailboxes to download or the INBOX will be the default.

    Fourth, we need a trailing comma on the list of mailboxes to download due to a parsing error in getmail (actually the mailboxes option needs to be a tuple, but the trailing comma negates that).

    Fifth, we need to know the syntax of Gmail’s internal IMAP structure to pull down discrete folders. Non-label folders (Starred, Sent Mail, Drafts, etc.) are accessed with “[Gmail]/Starred” (as in the above config) and labels are accessed directly. For example, the label “Important Project” would have this in the config:

    mailboxes = ("Important Project",)
  5. Download your Gmail – For every folder/label I had within Gmail, I downloaded to a separate folder so I could import into dovecot IMAP without hassle. This entailed changing the mailboxes option in getmailrc, running getmail, renaming Maildir to label/directory name, rinsing, repeating.
  6. Retain Times – Because maildir uses the modification time of every file to determine the sent date, all emails pulled by the above method will basically lose their sense of time. The below PHP script will restore the modification times:
/* VARS ***********************************************************/
$box = '';
$stem = SITE_DIR.'Maildir/'.$box.'/new/';

$dir_contents = scandir($stem);
foreach($dir_contents as $item) {
  if(!ListFind('.,..,.DS_Store',$item)) {
    $file = $stem.$item;
    $content = file_get_contents($file);
    $date = extractText($content,"nDate: ","n");
    $utime = strtotime($date);
    $converted = date('YmdHi.s',$utime);
    shell_exec('touch -mt '.$converted.' "'.$file.'"');

function extractText($content,$start,$end) {
  if(strripos($content,$start)===false) { return false; }
  $startpoint = strripos($content,$start)+strlen($start);
  $endpoint = strripos($content,$end,$startpoint);
  $length = $endpoint - $startpoint;
  return trim(substr($content,$startpoint,$length));

function ListDeleteAt($inList, $inPosition, $inDelim = ',') {
  $aryList = _listFuncs_PrepListAsArray($inList, $inDelim);
  array_splice($aryList, $inPosition-1, 1);
  $outList = join($inDelim, $aryList);
  return $outList;

function _listFuncs_PrepListAsArray($inList, $inDelim) {
  $inList = trim($inList);
  $inList = preg_replace('/^' . preg_quote($inDelim, '/') . '+/', '', $inList);
  $inList = preg_replace('/' . preg_quote($inDelim, '/') . '+$/', '', $inList);
  $outArray = preg_split('/' . preg_quote($inDelim, '/') . '+/', $inList);
  if(sizeof($outArray) == 1 && $outArray[0] == '') {
    $outArray = array();
  return $outArray;

photo: chris ivarson

This is a reprint of a post I originally made at I felt it was relevant to the current Gmail posts so am reprinting with slight modifications.

4 thoughts on “Pulling Mail Out of Gmail And Retaining Your Labels”

  1. I was using the imap2maildir script on github, which worked for my old gmail accounts, but failed on an account for a domain I owned which I used Google as the mail provider for. I didn't really use labels, so I just needed to extract All Mail and your getmail example did the trick.

    I never really liked Gmail's broken IMAP implementation, broken auto-add contacts and it's 2013 and we still don't have partial word search. That plus the NSA spying crap made me realize I should just really host my own e-mail server again.

  2. xpress/Windows Mail could also be used to take a copy of the mail – or Mozilla Thunderbird will do the same for an MBOX file. All of those options involve installing yet another bloated app though, and there’s something to be said for a few simple commands running in the background and watching some text scroll by on a Terminal window every 10 minutes.

Leave a Reply

Your email address will not be published. Required fields are marked *