Tarsnap. Tarsnap lets you say: tarsnap -c -f backup01302009 mysql_dir/ And you c...

joshu · on Jan 31, 2009

This doesn't work. You can't just copy the files and expect them to be in a sane or consistent state.

You either need to a) use InnoDB hotbackup or b) use a slave, stop the slave, run the backup, and restart the slave to catch up.

At delicious we used B, plus a hot spare master, plus many slaves.

Additionally, every time a user modified his account, it would go on the queue for individual backup; the account itself (and alone) would be snapped to a file (perl Storable, iirc.) Which only got generated when the account changed, so we weren't re-dumping users that were inactive. A little bit of history allowed us to respond to things like "oh my god all my bookmarks are gone" and various other issues (which were usually due to API-based idiocy of some sort or another.)

bjclark · on Jan 31, 2009

Using a slave isn't fool proof either. If someone were to run a malicious command, it gets replicated, and could get backed up before being caught.

joshu · on Jan 31, 2009

I didn't say that. Read what I wrote.

You use the slave so you can shutdown the database and get a consistent file snapshot. Then you do offline backup.

mdasen · on Jan 31, 2009

Yeah, it's true. I was a little simplistic. I usually use A, but I'm not dealing with the amount of data that delicious is.

jrockway · on Jan 31, 2009

Whenever Tarsnap is mentioned, I have to mention Duplicity which does the same thing, but is Free Software.

I use this for my personal backups, as well as backups of our work svn (fsfs) and git repositories. I use it against S3, and have found it incredibly reliable.

As a bonus, it encrypts everything but still does incremental backups. It's a really nice piece of software, and you don't have to pay anyone to use it.

cperciva · on Jan 31, 2009

...Duplicity which does the same thing...

Duplicity is not the same thing as tarsnap. Duplicity uses a full plus incrementals model compared to tarsnap's snapshot model, so with duplicity you're either going to be stuck paying to store extra versions you don't want or be stuck paying for multiple full backups. Moreover, tarsnap is considerably more secure than duplicity.

Before I started working on tarsnap, I considered using duplicity; but it simply didn't measure up.

jrockway · on Jan 31, 2009

How is tarsnap considerably more secure?

cperciva · on Jan 31, 2009

Some problems with duplicity off the top of my head -- I'm sure there are others (there always are):

1. Duplicity uses GnuPG. GnuPG has a long history of security flaws, up to and including arbitrary code execution. Yes, these specific bugs have been fixed; but the poor history doesn't inspire much confidence.

2. Duplicity uses librsync, which follows rsync's lead by making rather dubious use of hashes. In his thesis, Tridge touts the fact that 'a failed transfer is equivalent to "cracking" MD4' as a reason to trust rsync; but now that we know how weak MD4 is, it's possible to create files which rsync -- and thus Duplicity -- will never manage to back up properly.

3. When you try to restore a backup, the storage system you're using can give you your most recent backup... or it can decide to give you any previous backup you stored. Duplicity won't notice.

4. If you try to use the --sign-key option without also using the --encrypt-key option, duplicity will silently ignore --sign-key, leaving your archives unsigned. Based on comments in the duplicity source code, this seems to be intentional... but this doesn't seem to be documented anywhere, and it seems to me that this is an incredibly dumb thing to do.

paddy_m · on Jan 31, 2009

EBS does deltas too. Is anyone else using it? I like the ability to mount a volume or clone a volume almost instantly and mount it on another machine.

mdasen · on Jan 31, 2009

EBS does deltas, but there are a few caveats. The most important being that you need to be using EC2. For many, $72/mo plus bandwidth might be a bit much for what they're doing if it can work on a 512MB Xen instance for under $40 with a few hundred gigs of transfer included.

Beyond that, drive snapshots aren't the easiest things to do. I know that Right Scale tells their customers to freeze the drive so that no changes can occur until the backup is complete. With S3 performance around 20MBytes/sec, to backup 1GB would take around a minute. That's not bad and only doing deltas it's unlikely you're going to have a huge amount to backup at any given time, but it isn't exactly good either. With file-level backup, you can do a mysqldump and then just back up that file. Eh, maybe I'm just preferring the devil I know in this situation.

It's a little more complex to set up (doing file-level backups), but if you're going the volume route, you need to make sure you don't leave the drive in an inconsistent state.

All that said, EBS is awesome. If it fits what you're looking for, then go for it!

WALoeIII · on Jan 31, 2009

This is not totally accurate. EBS snapshots are basically instantaneous, its just the copy to S3 that takes time, but Amazon performs this in the background. We use XFS on our EBS volumes (running MySQL 5 innodb) and then have a little perl script (http://ec2-snapshot-xfs-mysql.notlong.com/) that does FLUSH TABLES WITH READ LOCK -> xfs_freeze -> snapshot -> xfs_thaw -> UNLOCK TABLES. The whole process takes a fraction of a second, and it also logs where in the binlog the snapshot was made (handy since we create new slaves based off snapshots and reduces how much data we shuttle around).

We snapshot a slave every 10 minutes and the master once a night (just in case something totally weird happens to the slave and the sync isn't right). This is a multi-gig DB and we've had no problems.

Here is a link to a full tutorial about running MySQL on EC2 with EBS: http://developer.amazonwebservices.com/connect/entry.jspa?ex...

I wanted to also point out that a live slave is NOT a backup scheme. If someone hacks your database and runs DROP ALL FROM PRODUCTION_DATABASE you've now got a perfect copy of nothing.