Saturday, April 26, 2008

Movement at the station


So I have decided to finally signup for google apps. Previously I had my DNS provider forwarding my mail to my gmail account, a temporary measure as my mail server was down. But now I see no point having my own mail server, especially when google apps is free, and has more redundancy (7 mail servers!) than I could ever offer.
I also had an issue that my DNS provider had implemented a draconian and over zealous spam filter, who checks that the reverse DNS matches the sender domain, sheesh. Other than this debacle they have been pretty good.
Problem was I have about 18months of email that I needed moved across. Sure I could just forward from googleapps to gmail, but whats the point of that. Now how to move it across. I found a nice guide here, but it needed some tweaking to run on my home server, I didn't have enough email (or money) to justify running it on the amazon compute cloud. Below is how I did it on my new Ubuntu Hardy Heron Server.

as I am ssh'd into my server, I ran screen
#screen
#apt-get install imapsync make unzip lynx
#cpan
cpan> install Date::Manip
#vim run-imapsync.sh


now type/copy and paste in the below and save it (escape+:wq)

#!/bin/bash
COUNTER=0
while [ $COUNTER -lt 10 ]; do
echo The counter is $COUNTER
imapsync --host1 imap.gmail.com \
--port1 993 --user1 morganstorey@gmail.com \
--passfile1 ./passfile1 --ssl1 \
--host2 imap.gmail.com \
--port2 993 --user2 me@morganstorey.com \
--passfile2 ./passfile2 --ssl2 \
--syncinternaldates --split1 100 --split2 100 \
--authmech1 LOGIN --authmech2 LOGIN \
--regexmess 's/Delivered-To: morganstorey\@gmail.com/Delivered-To: me\@morganstorey.com/g' \
--regexmess 's///g' \
--regexmess 's/Subject:(\s*)\n/Subject: (no--subject)$1\n/g' \
--regexmess 's/Subject: ([Rr][Ee]):(\s*)\n/Subject: $1: (no--subject)$2\n/g'
let COUNTER=COUNTER+1
sleep 10m
done


save it (escape+:wq)

chmod 740 run-imapsync.sh
./run-imapsync.sh


A bit of explanation on the changes I made, I removed a -maxage 1 from the script as that was only pulling down emails 1 day or newer in age. I also added the loop so that it would repeat the process to make sure all mail is gotten, and I added the sleep as the original article put it you don't want to hammer the google IMAP servers or they will block you.

Then just export the calendar to ICS and import on the Google apps one, export the contacts to CSV and import to the new Google apps, and just in case there are any people emailing your google address direct set it to forward to your new google apps and your done.

I also intend eventually to use some other method to backup Fionas and My google mail at regular intervals, not because I don't trust google (hey I am posting this through blogger, and get most of my news through google news, or my google rss reader), but becasue I like having backups.

This is all even more humorous, when I tell you the book I am reading at the moment; The Google Story, by David A. Vise. Ryan from work loaned it to me, and there is now a time limit on reading it as he has given his notice.

Well sorry about the long and technical post, I am starting to do them a bit more now, but well see, I am sure this blog will get back to its usual unusual ramblings.

Peace out all, especially the bright cookies at the Googleplex worldwide.

11 comments:

Kunal Jain said...

Hi I read your post regarding migration of mails from gmail to google apps but anyhow this scripts is not working for me will you please help me i got the following errors
Unknown option: ssl1
Unknown option: ssl2
Unknown option: split1
Unknown option: split2
Unknown option: authmech1
Unknown option: authmech2
Unknown option: regexmess
Unknown option: regexmess
Unknown option: regexmess
Unknown option: regexmess


Thanks in advance
kunal jain

Morgan said...

Hi Kunal,
looks like maybe a different version of imapsync, or something else in your path called imapsync.
Try
#which imapsync
you should get: /usr/bin/imapsync

If that is right then try
#md5sum /usr/bin/imapsync
is it: 503566b5bc73c36f38b9aeb05e6cf344

If that is all the same then something very odd is happening. If it isn't the same md5sum go through it again, making sure you do the apt-get install imapsync make unzip lynx
The script you made was called the same as mine run-imapsync.sh, cause if it was just called imapsync then it will call itself when you run it and not have options for ssl1 etc?

kunal jain said...

Ya Morgan i got it.
Earlier i was using imapsync version 1.99 which dont have options like ssl1 and authmech. Then i use imapsync 1.2x as this version have all the options and the script is working fine now.

Thanks a lot for you help.

Fabian said...

Hi, thanks for your nice howto. I had to try it and synced all my GMail account to my GApps account.
All mail was there but most of the attachments were f***** up. Do you have any solution for this?

Morgan said...

if you follow the guide to the letter, the attachments should be fine(tm)... I just checked back on a few of mine and they seem fine. I had to admittedly run the sync a couple of times, what has happened to the attachmnents?

Fabian said...

Somehow some bytes are missing at the end of those mails which leads to f***** up attachments.

My temporary solution now is to search for all mails in my GMail accout with an attachment, attaching a label "hasAttachment" to them and then copying them over with Thunderbird.

Not all of them were broken, but about 60-80%.

Version info:
Here is a [linux] system (Linux hermes 2.6.16.27-061216a #1 SMP Sat Dec 16 13:15:27 CET 2006 x86_64)
with perl 5.8.8
Mail::IMAPClient version is 2.2.9
$Id: imapsync,v 1.252 2008/05/08 02:30:17 gilles Exp gilles $

Perhaps it's the "split"-Parameter?

Morgan said...

it could come back to internet speed, I have an adsl2+ running at 16mbps/1mbps. It could be timing out on the attachments if they are too large.
What speed internet connection is that Linux box behind?
The original guide was done by a guy who did it using the Amazon flexible computing cloud as their bandwidth is simply incredible, I didn't do this as I don't have much mail and I am cheap. If you have a lot of mail or a slow link this is the best way to go, see his guide in the link in the post.

Kon said...

I'm experiencing truncated messages too.
I tried with & without the split option also.
When the process is running I notice these error messages that lead me to believe its the program version or perl libraries at fault:

+ Copying msg #12:3563 to folder Account Login Info
Mail::IMAPClient=HASH(0x9744dd8)::message_string: expected 3563 bytes but received 3643 at /usr/bin/imapsync line 2473

The message copies, but the salutation is missing.

Using
imapsync 1.252
Mail::IMAPClient 3.15
Date::Manip 5.54 Though no reason to suspect a time/date helper package just yet


Interestingly the change log for the next version of imapsync 1.255 says:

Changes: Some IMAP servers return a message size that is not equal to the real message size. Now imapsync accepts this silly behavior and leaves the message as is


Will try the upgrade/downgrade/sidegrade of each of these and see how it goes.

Morgan Storey said...

Good luck Kon, that is the one annoying thing about a lot of OSS packages stuff breaks between versions. Doesn't tend to happen in the closed source world, but at a loss of agility, can we say 6 years between OS releases.

Kon said...

Ok, so ended up going to the latest version of imapsync 1.267 followed by downgrading Mail::IMAPClient to 2.2.9 (recommended version in FAQ)

This now allowed the messages to copy across. It would print a message advising that the message should be xxx bytes but received yyy bytes, but thankfully yyy was greater than xxx and thus I had all my attachments.

The next problem I have is that my Ubuntu terminal automatically closes itself. screen helps get around this, but in my imapsync logs I see that the process is sometimes also being killed as well. Thankfully the while loop kicks it in again.

I have no idea what is causing this kill but it is annoying since imapsync has to start from scratch and copy all the messages again - not a very efficient use of bandwidth - and makes the while loop somewhat redundant.

The next step is to find out how to make imapsync not copy messages it thinks are there - I think the size difference fools it into thinking it has to resize them, but need to further try out to be sure.

Kon said...

Just to followup. If you do want to have imapsync just update new messages and ignore the others use the flags --skipsize and --useheader 'Message ID' (cant remember the exact syntax but it is in the imapsync faq)

eXTReMe Tracker