- Run transactions to completion –single threaded –in timestamp order [will use multiple cores, with each having a single thread]
- Only data changed within a single invocation of a stored procedure is in a transaction, transactions can't span multiple rounds of communication with a client.
- You are also discouraged from doing SUM operations because it would take a long time and block other transactions.
I don't see how this is different than a NoSQL database. You cut a number of features (some critical to certain applications) from a relational database and get a bastardized version of a major database. It's a NoSQL database that uses a subset of the SQL standard and an enforced schema!
I find these discussions extremely annoying. I'm currently using MongoDB for my application because:
1. I don't need join support
2. I prefer my current schema to be denormalized.
3. Documents store better than rows for my data.
I could have used a relational database just fine. My data will fit with a little nudging.
The point is, you use the technology that best fits your problem. My current problem fits well into MongoDB but it could be solved less nicely with a different database.
All VoltDB is is another option if you have corners you can cut from the normal relational database model.
FYI: Relational databases don't require transactions of any kind. SQL92 does require transactions; however it's not required to support multiple rounds of communication with the client.
Suggesting there is no difference between a Key Value store and an SQL database capable of ad hock queries, Transactions, and Joins is ridiculous.
> Suggesting there is no difference between a Key Value store and an SQL database capable of ad hock queries, Transactions, and Joins is ridiculous.
My intent was to make the claim that VoltDB is a feature subset of a relational database and is overhyped in the same lines that many NoSQL servers are. It's just a different subset of features than the key/value store servers.
My point boils down to VoltDB is merely another option if you have corners you can cut from the normal relational database model.
I probably could have been more clear in stating that. Bringing mongodb into the discussion muddied things.
I get where you are coming from, however if your application is already using SQL then the transition to a more limited but faster database is a lot simpler than going to NoSQL.
Also one of the simple hacks to increase speed is to have a smaller working set database on a separate system to handle more recent items. Because it's under a much higher load the "live" database tends to have really simple usage pattern also due to its smaller size it tends to fit into RAM. And, the “Live” DB tends to have different optimizations (EX: Fewer indexes because you have more writes). Based on this a "striped down" but still SQL database seems like a perfect fit.
PS: It's a lot like Memcached, something that can speed up your application with minimal development time is worth a lot.
For many of my use cases, the distributed transaction control is the key feature of any database product. The reason there are so many products out there with simple transaction models is that it is not that hard to write one.
It is really hard to do the distributed transaction thing [1] in a horizontally scalable fashion.
The problem I have with this approach is that I don't know from the start what my needs are.
For example if I'm building a web app in my free time ... I'm starting with Mysql, and an ORM client like Django's, and I'm doing data-modelling that fits these tools.
Then if I would want to switch to MongoDB, if the need arises, I end up throwing away a lot of code / rewriting lots of logic. This means waisted hours and my free time for working on such stuff is very limited.
In lots of scenarios it's easier for me to just do sharding where I need it on top of MySql.
And that's what RDBMSs are good for ... they come with lots of crap you don't need, but that you may eventually which makes them fit most problems, unless you have extreme scalability needs, and even then you've got workarounds to turn to.
How do you know if / where you'll have scalability problems? And how do you know what to choose when you're doing exploratory programming?
How do you know if / where you'll have scalability problems
You won't.
Just code it for the best model for your data. 95% of the time a RDBMS will work just fine for your data. If you have scaling issues down the road, chances are you'll be working on this full-time and will have the energy to devote to properly thinking about scaling. You aren't Google or Facebook (yet). Until you are, just work with what best fits your data or is fastest to code.
Well, the one-thread-per-region thing is an interesting way of sidestepping locks. So I'll give them that.
But it's basically memcached with an SQL parser and some hand wavey stuff about how since it's replicated, the data never needs to make it to disk. That one doesn't sit well with me for some reason, even if it's theoretically true.
> Hey Mike, I'll get in contact with you shortly once I pull some logs. We've been analyzing the data and it seems that where we lose data coincides with some system reboots, we had a few problems with replication a few weeks ago, and had it disabled, so that's likely why we are seeing loss.
RAM is cheap and 32+GB is a lot of data so for a wide range of real world workloads can fit the database in memory. (You can get a server with 96GB on a PowerEdge R710
for less than $6,066$ today if it's 20x faster than a traditional database you will save money.)
The problem is MySQL etc doesn’t are not designed for this so while they become a lot faster they are still relatively slow. Also, RAM is only getting larger so once you make the transition to a RAM based DB you are unlikely to need to transition back.
PS: I suspect most real world workloads revolve around small datasets that connect to big blobs of data. Think Users/Projects linking to Pictures, Documents, and Video etc. You don't edit documents in your database but you do want to track version information, users, permissions timestamps, approval etc. It's often a good idea to mix a Key Value style data store with a front end SQL database.
While VoltDB's persistence story is perhaps bleeding edge (active replication with rolling snapshots as a double backup), MySQL MEMORY storage has none.
Even if it is fair I only get two points from this article (and their product page):
* A system which holds all data in memory is faster than a system which uses memory and disks. True, but not really news.
* If you leave out features your implementation can be faster (e.g. they only support transactions which span a single stored procedure). Again .. no news here.
* A system that is designed for mostly disk will run faster when run in memory, but won't be as fast as one designed from the outset for in-memory operation.
* Leaving out features is harder than adding features, because you have to judiciously decide which ones to leave out. A large bushel o' features can have many different subsets, and only some of these subsets are worthwhile.
I like this point. At VoltDB, we like to think we've removed a different set of features than many other scalable solutions, leaving a product that is useful for a set of problems that might not have a great option right now.
That said, VoltDB's architecture was built for in-memory and horizontal scaling from day 1. Based on hstore (http://hstore.cs.brown.edu/), it's not simply a removal of what's in existing systems.
I think this could be a worthwhile comparison if tested on a platform where the in-memory store is as safe as traditional hard-disk storage, say a machine with some form of static (non-volatile) RAM for main memory.
Thus the question would be, does the performance advantage of VoltDB's approach outweigh the penalty of static RAM performance?
I should have been more specific. I'm familiar with all sorts of non-volatile memory (flash, etc.) but I'm not aware of any modern computer that uses it as main memory (that is, directly addressable RAM as opposed to connected via a storage bus).
I think the last thing I used that would qualify as such was the Apple Newton.
The article links to a presentation by one of the authors of VoltDB. On slide 17 he makes the crucial point that overhead (latching, locking, recovery, and buffer pool) makes up 88% of the work that a typical RDBMS must do. Useful work (doing your SQL) accounts for only 12%. Attempting to gain speed by optimizing the useful work (sorting, joining, indexes) is thus pointless. Doing the database in memory (getting rid of buffer pool) is a good way to gain speed, but it can only hope to be two times faster (because the other 3 types of overhead remain). The only way to get really fast is to address all four types of overhead, which is what VoltDB attempts to do. Point taken. However, the author needs to work his pie chart skills. The 12% useful work on slide 17 is displayed as a tiny sliver (more like 3%) of the pie. This hinted to me that the whole presentation is a bit of an exaggeration.
The pie chart is based on this paper: http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.... The pie chart numbers have been updated to reflect the right charts in the paper, but the graphic hasn't. We'll try to get that fixed for future presentations.
The point is very real. Stavros, Dan, Sam and Mike took the Shore RDBMS and stuck it on top of memory. Then they removed parts of it and measured the performance difference. Logging, buffer management, and concurrency management make up about 93% of the instructions run and 88% of the cycles for TPC-C "new order". There was no single area that dominated this overhead, so removing one piece doesn't make enough of a difference. Results on Oracle, SQL Severer and DB2 might be different, but it's hard to imagine they'd be dramatically different.
Because VoltDB was built without these sources of overhead, it's usually quite a bit faster than systems that take legacy RDBMSs and back them with memory.
VoltDB has limitations (especially in 1.0), but if your workload fits, it will go as fast as you need it to.
kbd+ is a column store (analytic database) and does not partition horizontally. The two things they have in common is that they are both in memory (and kdb+ not necessarily) and support SQL as a query language (kdb+ only supports SQL like stuff). kbd+ does not emphasize stored procedures as a means of bringing multiple queries and arbitrary logic to the data. Nor does it need to since it doesn't partition.
The whole way the thing's written. I hope the blog author is the voltDB guy's uncle or publicist or something because otherwise that is one heck of a man crush to hype this up that much.
Not taking anything away from voltDB - haven't seen it, will try to see it, and encourage people to pursue stuff like that in general. But the article read like football announcers talking about Brett Favre.
I guarantee he's not related in anyway. Did you read the whole thing? There are quite a few criticisms of it's operational model, limitations of SQL, limitations of using stored procedures only. What did you consider to be excessive sucking?
Definitely. If you don't grab you may as well not write. And it wasn't googley-eyed as much as it was laying out what they claim to see how it held up later.
- Run transactions to completion –single threaded –in timestamp order [will use multiple cores, with each having a single thread]
- Only data changed within a single invocation of a stored procedure is in a transaction, transactions can't span multiple rounds of communication with a client.
- You are also discouraged from doing SUM operations because it would take a long time and block other transactions.
I don't see how this is different than a NoSQL database. You cut a number of features (some critical to certain applications) from a relational database and get a bastardized version of a major database. It's a NoSQL database that uses a subset of the SQL standard and an enforced schema!
I find these discussions extremely annoying. I'm currently using MongoDB for my application because:
1. I don't need join support
2. I prefer my current schema to be denormalized.
3. Documents store better than rows for my data.
I could have used a relational database just fine. My data will fit with a little nudging.
The point is, you use the technology that best fits your problem. My current problem fits well into MongoDB but it could be solved less nicely with a different database.
All VoltDB is is another option if you have corners you can cut from the normal relational database model.