Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

diego, thanks so much for raft! i'm a student in the brown class you shout out, and i can testify as to its relative simplicity and the clarity with which you guys communicate the ins and outs.

i have a question for you, though. why is raft not concerned with byzantine failure? the focus on byzantine fault tolerance from the paxos family of algos (and a lot of the literature/educationally material on distributed consensus) makes me feel like it's important, but your approach suggests it perhaps isn't. do you think this focus is a side-effect of the ubiquity of paxos which is disproportionately concerned with this due to its roots in academia?



It's a good question, and I don't really know where the community as a whole sits on Byzantine vs non-Byzantine. A few thoughts:

Byzantine is more complex, and most people in industry aren't doing it: there are a lot of Byzantine papers out there but few real-world implementations. I think Byzantine is important for uses where the nodes really can't be trusted for security reasons, and maybe there's easier fault-tolerance payoffs elsewhere when the entire system is within one trust domain such as a single company.

Byzantine consensus is slower and requires more servers.

If you don't have independent implementations running on each of your servers, the same software bug could still take out your entire cluster. You get some benefit if the hardware fails independently, but you don't get protection from correlated software outages. Maybe the difficulty in characterizing which faults a particular deployment can handle makes it harder to sell to management.

With Raft, we were just trying to solve non-Byzantine consensus in a way people could understand, and we think it's still a useful thing to study even if your ultimate goal is Byzantine consensus. You might be interested in Tangaroa, from the CS244b class at Stanford, where Christopher Copeland and Hongxia Zhong did some work towards a Byzantine version of Raft [1][2] and Heidi Howard's blog post on it [3]. But really, Castro and Liskov's PBFT is a must read here [4].

[1] http://www.scs.stanford.edu/14au-cs244b/labs/projects/copela...

[2] https://github.com/chrisnc/tangaroa

[3] http://hh360.user.srcf.net/blog/2015/04/conservative-electio...

[4] http://pmg.csail.mit.edu/papers/osdi99.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: