Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's always important to remember the difference between a syncing service and a backup service. A syncing service sometimes feels like a backup, because you can use it to recover files if a local device is destroyed or lost. HOWEVER, any service capable of syncing files is equally capable of destroying them.

It's important to have an automated one-way backup system that you can manually restore from. Something like Tarsnap [1] looks like a really good possibility (I haven't used it myself, but it seems solid)

[1] http://www.tarsnap.com/gettingstarted.html



This distinction is really without a difference. Dropbox consistently advertises themselves as both a syncing service as well as a backup service. Their tag line is even "Dropbox - Secure backup, sync and sharing made easy," with secure backup being ahead of sync and sharing. [1]

"Even if your computer has a meltdown, your stuff is always safe in Dropbox and can be restored in a snap.

In fact, if you're using the Dropbox desktop application, your files are backed up several times. The primary copy on your computer's hard drive is synced online and that copy is then backed up again for safety (emphasis mine). If you are using Dropbox to sync files between multiple computers, your files are backed up on those computers as well. If that isn't enough, Dropbox also keeps backups of all of your deleted and changed files too.

...

It's hard to imagine a scenario where Dropbox could lose your files. Hypothetically, let's say a nuclear bomb blows up the data centers where your files are saved. Even then, your files are still safe and sound on your computer and any other computers linked to your Dropbox account."

Clearly, though, Dropbox appears to have lost files (at least if we take it on faith that "these are bugs in Dropbox's syncing logic"), despite the fact that I see no mushroom cloud nearby.

[1] https://www.dropbox.com/help/122/en


>Even then, your files are still safe and sound on your computer and any other computers linked to your Dropbox account."

Unless, as happened here, Dropbox erased the file and synced a blank version across all your computers. They they're safe and sound on any computer linked to your dropbox connection that hasn't been connected to the internet since the file got corrupted.

If you catch it in the 30 day window when they keep old versions you're find, but there are files in my Dropbox that I don't use every 30 days.


As you said, "Dropbox erased the file and synced a blank version across all your computers." I agree. As such, I'm not sure why individuals are so quick to blame the end user just yet.

On the other hand, I don't want to immediately blame Dropbox just yet either. If you backup garbage (say, because you have disk corruption), then you can't blame Dropbox for backing up exactly what you told it to.

And Dropbox does offer a premium Packrat service if you want file history indefinitely. Perhaps the user can be blamed for assuming that he/she would only need 30 days of history, but this is really contingent on who caused the corruption to happen in the first place -- and that's unknown at the moment. [1]

[1] https://www.dropbox.com/help/113/en


Agreed on not being sure that it's Dropbox's fault. But even if the user killed the file, you've got the same problem.

A person running Time Machine (or similar incremental backup system) is a lot safer from this sort of problem than a free tier Dropbox user. Free Dropbox is better than nothing, but people can't keep assuming their files are safe because "they're in the cloud" and get synced to a few places.


"This distinction is really without a difference." "In fact, if you're using the Dropbox desktop application, your files are backed up several times." "It's hard to imagine a scenario where Dropbox could lose your files..."

Please stop. Right there. Stop believing some vendor marketing blindly (even if you do come to another conclusion later), stop reassuring other people who do so and stop calling a sync a backup. There is a very important distinction: Sync has mechanisms in place that are capable to touch the files on your backup. At least in dropbox' case these mechanisms are not completely separated from the initial backup mechanism of each version. I've had two almost catastrophical data losses with Dropbox until I was able to make that distinction. I've come to the conclusion that IT professionals should never ever treat a sync system as a backup system, and if you still think so please don't spread that advice to others.


Also, never ever treat any system as a backup, if you haven't tested that restore actually works. Do that, and a likelihood of loosing your files would get a lot smaller.


Agreed, and it helps to have 2 backups of all your data using different techniques/products. I run a local backup to cheap disk, network backup to remote location, and I suppose my remotely hosted svn is my third backup of most assets I really care about.

Most of us here are developers. How many of our products are perfect? Always assume something will fail in a new and interesting way in the future. I'm not letting Dropbox off the hook - their product shouldn't do this - but you'll be happier if you treat backups with the same level of redundancy and planning as the rest of your infrastructure.


Remote backup is indeed very important. Imagine a fire in the server room. Where was the backup tape? Lying on top of the server? Oops...


Yes - sync is not a backup but services as Dropbox are designed to be backup too (with their revision feature).

So I think the underlaying problem here is that any backup/syncing system might have a bug (like this one) or there might be operator or user error (deleting your revision history is just a couple clicks away). Recovery oriented computing website has a lot good papers on this topic [1].

This is very similar to problems with outages on Amazon EC2 - yes Amazon cloud is great but in order to make your service highly available you do need to have standby system on some other cloud (for example, we run on Rackspace but our standbys are on Amazon).

One approach to protect yourself against problems like this is to replicate/sync all your files from one cloud storage (your primary one) to some other cloud service (GDrive, SugarSync, Box, etc.). So should Dropbox have a bug, then you still have everything in other cloud service: including all revisions.

Services like cloudHQ [2] (that is my baby) can replicate and sync all your files from Dropbox to, for example, GDrive. And of course cloudHQ has options like "two-way" sync, "don't replicate deletion", "backup" (weekly incremental are in folders - so your will be fine even if "revisions" feature fails), etc.

[1] http://roc.cs.berkeley.edu/

[2] http://cloudHQ.net


This is one of the reasons that SpiderOak (an encypted sync AND backup service) keeps historical versions of everything forever, until customers explicitly chose to remove them.

For services that purge old versions and deleted files at 30 days, you lose if you don't notice a problem promptly. You can't be expected to be watchful over gigabytes of data; that's the whole point of a backup service.


Just for the record (no big issue but I am accustomed to founders/employees disclosing their affiliation)

  "I'm Alan Fairless, a co-founder at SpiderOak" [1]
[1] http://news.ycombinator.org/user?id=rarrrrrr


Dropbox give you the choice, you can pay and have the versions stored forever or accept the lower level of service with 30 days for free. I am sure you have calculated the economics on keeping the extra copied for all non paying customers, it is possible it would be less sustainable.

As an aside, it seems Dropbox is biding on the [SpiderOak] keyword on Adwords, you should probably at least outbid them on your brand terms to reduce confusion/ misdirection for potential customers.


As an aside, it seems Dropbox is biding on the [SpiderOak] keyword on Adword

So they are. Is it legally acceptable to bid on competitor trademarks? I thought that was regarded as being over the line these days - anyone know for sure?

If you look at the adwords link, it's clear that Dropbox is bidding on "competitor keywords" as a class.


Depending on the country there are different rules about this[1]. There are different rules for trademarks in Ad copy and in the keyword.

In terms of a trademark in your keyword, this is only unacceptable in: Australia, Brazil, China, Hong Kong, Macau, New Zealand, North Korea, South Korea, or Taiwan, and only after the trademark holder files a complaint.

[1] http://support.google.com/adwordspolicy/bin/answer.py?hl=en&...


So in general, Google's position is that (for the US, UK and EU at least) if the ad text doesn't contain the Tradmark text in question, it's ok if the keyword used to match on does, except for the countries you listed.

Thanks for the link.


To be fair, a good sync service with history is supposed to be a backup at the same time. I'm using git rather than dropbox but if I push something, it's there forever. So, I'd have to think that Dropbox being able to completely destroy files is a bug, not a feature. Maybe a more experienced dropbox user can correct me?


The difference is that a backup service shouldn't ever be able to corrupt your primary copy.

Bugs happen. If a bug happens on the sync-ing service and it trashes your 'back-up'/history then syncs and trashes your primary copy, your toast.

Sync'ing != Backup. They are for different problems and have different restrictions/pitfalls.


The thing is, Dropbox treats their version as the primary copy. For instance, I added several directories to Dropbox, and decided that I didn't want it syncing a few of them. I de-selected the directories under "Selective Sync" and it removed them from my computer - even though my computer was the original source.


Yes, sync != backup [1].

I do agree with you but, I can tell you that selling backup service is harder than you think. Also as pointed by paper [2], the human error accounts for ~50% of all system failures. And the worst thing is that majority of users who accidentally delete data, don't even notice data loss until lost data is needed and they don't recollect doing something wrong.

What I found out interesting that people (i.e., small business owners) will are scared of losing a credit card (even though you can call the bank and cancel your lost credit card and get a new one - inconvenience but not a big deal), but they will not backup critical company documents and data (even if they lose them the company will be pretty much closed - there is no "bank" to go to and get data back).

[1] http://blog.cloudhq.net/post/33844549768/the-difference-in-d... [2] http://roc.cs.berkeley.edu/talks/pdf/HP.pdf


Not sure it is harder than I think. Other than that, I haven't nothing else to disagree with. :)


Why would a good sync service with history be a backup? They don't store a history indefinitely. Of course destroying files and history is a bug, but something that instantly replicates changes to those files from machine-to-machine shouldn't be considered a reliable way to recover them weeks or months later.


Dropbox does have "Packrat unlimited undo history" for $4/month. I wonder if he could have recovered these files if he had this option.


Does git push+pull not count as a "good sync service"?


Tarsnap is amazing for Unix systems. Flat out, one of the coolest services I've ever used, and one of the cheapest. I'm backing up my VPS with it, and I can't recommend it enough.

I have a friend that strongly recommends CrashPlan, but I haven't tried it out yet on my Mac. I'm curious to though.


I use CrashPlan both on Mac and on Windows. On the Mac it works very well. On Windows it is okay, though it needs some hand holding -- sometimes the service gets stuck and needs to be restarted manually. Overall I find that it is better than Mozy (a competing service).


If the service tends to hang, you might need to assign it more than the 512MB RAM it defaults to.

The setting is controlled thru the CrashPlanService.ini file.


Seconded. I switched from Mozy and would not go back.


I love Crashplan -- especially because with essentially the same client, you can do backups to a public service, a hosted business service (multiple machines, centrally managed keys, etc.) or an enterprise hosted-by-yourself service.


Doesn't Tarsnap store your data on their S3 account, rather than your own? If so, if how do you get your data back from Amazon if Tarsnap vanished tomorrow?


Tarsnap isn't going to vanish tomorrow. It's steadily profitable so I don't need to worry about "runway".

Even if I get hit by lightning tomorrow, the service runs perfectly fine on its own for months at a time, so you'd have plenty of time to get your data back.


How would people know if anything had happened to you and that they should begin to recover data - is there a dead man switch or notification procedure in place?

p.s. Please avoid golf courses this weekend!


  >> is there a dead man switch or notification procedure in place?
The absence of weekly HN posts.


is there a dead man switch or notification procedure in place?

There are people who should send out that notification if needed, yes.

p.s. Please avoid golf courses this weekend!

Don't worry, I don't play golf. :-)


It would be worth formalizing your bus plan. I specifically chose a mainstream backup service over Tarsnap because you're running a one man shop.


Thanks for the feedback -- yes, this is something I plan on doing (amidst all the other tasks I'm juggling...).

I very commonly hear why people are using Tarsnap, and from time to time I hear why people are no longer using Tarsnap, but I very rarely hear why people never started using Tarsnap, so I really appreciate you taking the time to comment.


Please do! When I explained your service to a friend, this was the only "except for that ...", the only tar-snag, if you will.


You can say that about any service--what if Crashplan's data center caught fire tomorrow? Tarsnap might be run by a single guy (I think?) but saying S3 is "more" or "less" reliable than any other private company isn't a great comparison. In any case a massive company like Amazon is the most likely to be reliable in these cases, I imagine.


That's why I cobbled together a poor man's cloud RAID :-) http://news.ycombinator.com/item?id=4689238

My earlier point was that I think data is stored on Colin Percival's S3 account (he is the creator of Tarsnap) and therefore you might lose access to the data (if he couldn't pay the bills or got hit by a bus) even though S3 itself is fine.


I use Arq to backup files on my MacBook - it saves them to my S3 bucket, so even if everyone working on Arq dies (I sincerely hope they don't!) and their servers all explode, my backups are still intact on S3 - which is less likely to go down permanently and lose all my files than Tarsnap (not that that is particularly likely either!).


There are definitely tiers of reliability. An external drive drive is easily lost or damaged. A backup service can take individual drive/server losses but that might be the limit. S3 can lose an entire data center - the only real risk comes from software bugs.


This isn't really a concern for me, but I'd probably email Colin and ask for it. The tarsnap program is open-source, presumably if he stopped hosting it the server-side component could be replaced. The client does all the work.


What if Colin is dead? It's a backup system for the truly paranoid so this is a legit question.


The truly paranoid use multiple backups. The encryption ensures that without the keys, nobody else could make use of the data anyway.


Tarsnap is really good, I use it from some of my servers to do backup.

Maybe I could plug my new app here as well, tidy.io[1] lets you archive or backup your files directly to and from your Dropbox. Feedback is always appreciated!

[1]: https://www.tidy.io/


I've been a very happy Backblaze customer for over a year now.

http://www.backblaze.com/


If backblaze backs up my dropbox folder and this dropbox bug affects my files that are getting backedup, if I don't catch the 0 byte files within the 4 week backblaze revision window, won't I still lose my files?


Crashplan hangs on to deleted files for quite some time. I just used it last weekend to recover some stuff I'd deleted about a year ago and realized I wanted back. I was pleasantly surprised to find them still in the system.

It has a lot of options for retaining old versions, too: http://support.crashplan.com/doku.php/reference/version_rete...


Yep. And your wife will divorce you.


Agreed. Early in my Dropbox days I lost some files due to my account going over the 2GB limit. Since then, I've treated Dropbox as a syncing server, not a backup service.

My current scheme is to make rolling snapshots of my Dropbox folder backed up to a local RAID array which backs up to a separate RAID array nightly. More info: http://aaronparecki.com/2010/190/article/1/how-to-back-up-dr...

Hackernews link for that post if you're in to that sort of thing: http://news.ycombinator.com/item?id=4704667


Good idea.

You could also add a bit to the script explicitly looking for zero byte files and alert yourself to their formation.


There really is no difference between syncing and backup. Think of the backup/restore cycle as a sync and it will be clear. Backup is one way sync, restore completes the bidirectionality of sync.

Sync doesn't necessarily have the capability to destroy files, rsync has a switch to delete files that are locally deleted. However Dropbox is supposed to be rsync + rcs so this kind of problem is supposedly easy to fix by simply reverting to a good previous version.


HOWEVER, any service capable of syncing files is equally capable of destroying them.

That's simply not true (I've built sync systems that are incapable of destroying files).

IMO, syncing designs that do have unrevokable overwrites are inherently brittle. (I don't know if Dropbox is built that way, but AFIAK, iCloud (and MobileMe before it) is -- and it sucks.)


Dropbox is revocable, the issue is for how long. In addition to online versions, the client holds on to old versions for three days. It used to be longer but the cache could get unreasonably large.


"Permanently Lose Your Files" -> "I am able to restore each file from Dropbox's version history"

It's also important to remember the meaning of the word "permanently", while this is a UX disaster, the title is misleading.


...except the author permanently lost files that were 0-byted before some sort of internal Dropbox process wiped the version history.


Tarsnap isn't the best choice in most cases, it's just by a HN user. But backblaze, carbonite, crashplan, spideroak are more practical/windows available options.


Tarsnap is great and secure. Not just any HN user, Colin Percival, security officer for Freebsd and a Putnam fellow. It's a very practical and well priced option.


No, it's only for UNIX like OSs and is command line only, making it impractical or putting it out of reach of many users, even those on HN. Putting it as the drop in replacement for Dropbox for backup is simply inaccurate. The services I listed are more applicable. And Percival's stature is irrelevant here.


You're the one who brought up that it's "just by a HN user", as if that was the most salient aspect of his life. And I don't think the top-level put Tarsnap forward as a drop-in replacement for Dropbox anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: