TIL

TIL (Today I Learned): Importar MBOX do Google Takeout no Gmail free via IMAP + App Password

· gmailpythonmigration

Ao migrar de Google Workspace para Gmail gratuito + Cloudflare Email Routing, o histórico de emails vem pelo Google Takeout como arquivo .mbox.

O Thunderbird (ImportExportTools NG) trava com volumes grandes (~37k emails). O GYB (Got Your Back) exige OAuth verification que contas pessoais não passam.

A solução mais simples: imaplib do Python + App Password do Gmail.

#!/usr/bin/env python3
import mailbox
import imaplib
import email.utils
import time

MBOX_PATH = "./All_mail_Including_Spam_and_Trash.mbox"
IMAP_HOST = "imap.gmail.com"
EMAIL = "<seu email aqui>"
APP_PASSWORD = "xxxx xxxx xxxx xxxx"  # App Password da FASE 6

mbox = mailbox.mbox(MBOX_PATH)
total = len(mbox)
print(f"Total: {total} emails")

imap = imaplib.IMAP4_SSL(IMAP_HOST)
imap.login(EMAIL, APP_PASSWORD)
imap.select('"[Gmail]/All Mail"')

# Buscar Message-IDs já existentes
print("Buscando Message-IDs existentes no Gmail...")
_, data = imap.search(None, 'ALL')
existing_ids = set()
if data[0]:
    nums = data[0].split()
    print(f"Emails existentes: {len(nums)}")
    for i in range(0, len(nums), 500):
        batch = b','.join(nums[i:i+500])
        _, msgs = imap.fetch(batch, '(BODY.PEEK[HEADER.FIELDS (MESSAGE-ID)])')
        for item in msgs:
            if isinstance(item, tuple):
                mid = email.message_from_bytes(item[1]).get('Message-ID', '').strip()
                if mid:
                    existing_ids.add(mid)
    print(f"Message-IDs indexados: {len(existing_ids)}")

skipped = 0
uploaded = 0
errors = []
start_time = time.time()
last_report = start_time

for i, msg in enumerate(mbox):
    mid = msg.get('Message-ID', '').strip()
    if mid and mid in existing_ids:
        skipped += 1
    else:
        try:
            imap.append('"[Gmail]/All Mail"', None, None, msg.as_bytes())
            uploaded += 1
        except Exception as e:
            errors.append((i, str(e)))
            print(f"ERRO #{i}: {e}")

    now = time.time()
    if now - last_report >= 300:  # 5 minutos
        elapsed = now - start_time
        processed = i + 1
        remaining = total - processed
        rate = processed / elapsed * 60  # emails/min
        eta_min = remaining / (processed / elapsed) / 60 if processed > 0 else 0
        print(f"[{int(elapsed/60)}min] {processed}/{total} ({remaining} restam) | "
              f"Uploaded: {uploaded} | Skipped: {skipped} | Erros: {len(errors)} | "
              f"~{rate:.0f}/min | ETA: ~{eta_min:.0f}min")
        last_report = now

elapsed = time.time() - start_time
print(f"\nConcluído em {int(elapsed/60)}min. Uploaded: {uploaded} | Skipped: {skipped} | Erros: {len(errors)}")

Para evitar duplicatas em re-runs, indexar os Message-ID existentes no Gmail antes do loop e pular os que já existem.

Pré-requisito: 2FA ativo na conta Gmail e App Password gerada (Security → App passwords). Não precisa de OAuth client, GCP project, nem ferramenta externa. Depois é só curtir:

while true
        ./mbox-import.py
    end
Total: 37573 emails
Buscando Message-IDs existentes no Gmail...
Emails existentes: 32405
Message-IDs indexados: 32373
[5min] 32183/37573 (5390 restam) | Uploaded: 71 | Skipped: 32112 | Erros: 0 | ~6417/min | ETA: ~1min
[10min] 32275/37573 (5298 restam) | Uploaded: 163 | Skipped: 32112 | Erros: 0 | ~3220/min | ETA: ~2min
[15min] 32373/37573 (5200 restam) | Uploaded: 261 | Skipped: 32112 | Erros: 0 | ~2149/min | ETA: ~2min
[20min] 32476/37573 (5097 restam) | Uploaded: 364 | Skipped: 32112 | Erros: 0 | ~1615/min | ETA: ~3min
[25min] 32577/37573 (4996 restam) | Uploaded: 465 | Skipped: 32112 | Erros: 0 | ~1297/min | ETA: ~4min
[30min] 32681/37573 (4892 restam) | Uploaded: 569 | Skipped: 32112 | Erros: 0 | ~1085/min | ETA: ~5min
[35min] 32781/37573 (4792 restam) | Uploaded: 669 | Skipped: 32112 | Erros: 0 | ~933/min | ETA: ~5min