I Can't Wait for NoSQL to Die

rufo · on March 27, 2010

NoSQL will never die, but it will eventually get marginalized, like how Rails was marginalized by NoSQL.

Is it just me, or does this statement make absolutely no sense whatsoever?

mechanical_fish · on March 27, 2010

It doesn't make sense because it's overstated. Let me try to state his opinion differently:

NoSQL, as it exists today, is a tasty souffle composed of more air than substance. That's how the hype cycle works:

http://en.wikipedia.org/wiki/Hype_cycle

NoSQL is somewhere on that initial "peak of inflated expectations". (I don't think it's hit the top yet, but it sure is soaring.) Rails passed the peak many months ago and is somewhere to the right of the Trough of Disillusionment. [1] (I don't think it's plateaued yet: Rails still isn't quite done being invented.)

Once the hot air leaks out in a year or two, NoSQL databases will still exist, and will in fact be better understood and better built than ever, but they will no longer be a trending buzzword. That's the day that the author is devoutly wishing for.

(I, personally, find the hype cycle to be kind of fun to watch, and educational too, so I'm not as bothered by it as he is.)

---

[1] Though your mileage may vary. The world isn't perfectly connected, so there isn't just one hype cycle. It's fun watching (e.g.) Facebook sweep through the world of my parents. They get really excited by "new" technologies about two to four years after the folks on HN have moved on from them.

alextgordon · on March 27, 2010

I don't know if this is what he was trying to say, but it makes a little more sense if you change it to

NoSQL's hype will never die, but it will eventually get marginalized, like how Rails' hype was marginalized by NoSQL's hype.

wmf · on March 27, 2010

If Rails == ActiveRecord == SQL, then NoSQL implies !Rails. (This isn't true of course, but we're already dealing with oversimplifications here.)

Or maybe he's just saying that irrational NoSQL hype has replaced irrational Rails hype.

gaius · on March 27, 2010

The Rails message is "you don't need a DBA, you don't need to know SQL, just have your developers do the default install of MySQL and we'll do the rest". That is also the NoSQL message.

xal · on March 27, 2010

Speaking as a member of the rails core team, if you heard someone say that then he made it up.

Rails until 3.0 was heavily invested into SQL simply because it's what everyone used. However, it was swimming against the stream in that rails declared DBs to be something which is incrementally developed by the software through migrations instead of setup by DBAs per change tickets. This has been enormously successfully, there is probably not a single web framework left that pretends we still live in a DBA dominated world. Part of the shrapnel of this decision is that rails did away with triggers, db constrains and stored procedures but this is simply because most high volume sites don't use these things anyways because they get very hard to scale later on.

Rails past 3.0 will work natively with any data store that you can imagine. It ships something called ActiveModel which is a tiny interface that you can implement on top of Mongo, Cassanda, Redis. ActiveRecord is just the SQL incarnation of this interface.

Very high quality libraries based on ActiveModel already exist. Have a look at Cassandra Object as an example.

gaius · on March 27, 2010

this is simply because most high volume sites don't use these things anyways because they get very hard to scale later on.

If you start with the assumption that "you don't need a DBA" then that's probably true.

Just to give you an idea of my background, I work on a system using a commercial RDBMS that "scales" to thousands of commits/sec and tens of terabytes of data. We expect to take it to tens of thousands of commits (we already do that many reads!) and hundreds of teras with no major structural changes. One thing you learn in this game is that database agnosticism is a wild goose chase. To really scale, you need to intelligently choose a technology and use its features to the fullest and just accept that you will be "locked in". We couldn't port to another RDBMS if we tried because certain things, like our chosen database's locking strategy for example, are baked in to the way we do things. It's not a matter of SQL syntax. We'd be starting again from scratch, we'd need new algorithms. But we can do things, we take things for granted, that most of the Internet peanut gallery takes to be impossible, because they start from assumption that abstracting the database actually helps anything.

etaque · on March 27, 2010

I think that the message wasn't "app database-agnosticism", but "framework database-agnoticism".

nkassis · on March 27, 2010

That's what I got from it too. The framework being this flexible should help a DBA fully use the database to his advantage. The project I work on has had to do some non standard things with Rails and it's flexibility has made it much easier. In our view rails and it's default are a starting point providing the basic infrastructure to get the project going, not an end all be all solution. I see the same thing in NoSQL solutions, they try to give the developers and DBA as much flexibility as possible. They don't really seem to offer any default. RDBMS is basically the sensible defaults without the flexibility. Maybe the in between can be found.

efuquen · on March 28, 2010

Is it just me, or did most of that article make no sense whatsoever?

papachito · on March 27, 2010

It's not just you.

teilo · on March 27, 2010

There are a rash of these articles popping up. While there are usually valid points in urging people to avoid the hype of the "next big thing", a lot of these guys seem to be bitching that they might have to learn a new skill set.

I have been in this business long enough to remember Clipper and FoxPro devs bitching about SQL when it was on the rise. This sounds about the same to me.

mpk · on March 28, 2010

These articles are just link-bait with no content. The 'NoSQL' buzzword is as distasteful to me as, say, 'cloud', but that doesn't mean that articles discussing them don't have value.

The flurry of 'NoSQL' articles often cover the different approaches to data stores, their implementation, their interfaces, their management, performance, scalability, etc. That interests me, but doesn't mean I'm going to go to work on Monday and kill all the non-NoSQL dbs we have running.

Hate-against-hype articles are waaaaay overrated.

paulgb · on March 27, 2010

Ted's point may be valid for BigTable-like databases. (I'm not saying it is, but I don't know enough about those to say so.) Those are designed for scalability and if you don't need the scalability you probably should use a RDBMS instead.

But there are other advantages of SQL-less databases that don't deal with scale. I deployed my first MongoDB app a couple weeks ago. Even though it was a small (~1 developer month) project, and neither myself nor the other developer had used MongoDB before, I still think we finished faster than if we'd used MySQL. Just like Cassandra is a premature optimization if you just need an RDBMS, an RDBMS is a premature optimization if you only need an object store.

japherwocky · on March 27, 2010

I got downvoted awfully last night for trying to say this, but I'll say it again to back you up:

Developing with mongodb is _lightning fast_ and holds up very very well. If you end up with problems, switch to SQL later!

I don't think a lot of these people hating on the nosql projects have actually tried building something with them.

simonw · on March 27, 2010

I dunno... I find the combination of the Django ORM and South (in particular the --auto flag for auto-creating migrations) is incredibly productive. With the ORM, I can conjure up a query that answers pretty much any question I might have of my data. I've experimented with MongoDB (and a bunch with Redis) and I find I'm much more likely to end up with a query that I can't resolve without having to do a bunch of extra work.

Most of this is probably having expertise with the tools, but I find that for rapid prototyping the ability to run relational queries is really important, especially since query performance during the prototyping phase isn't really an issue.

paulgb · on March 27, 2010

I've found that relational ORMs force me to write convoluted and weird code for all but the simplest of joins. Maybe django's ORM is better; I've mostly used SQLAlchemy and a few others.

rbanffy · on March 28, 2010

You know that you can use SQL with Django, if you have an insanely convoluted join.

codexon · on March 27, 2010

Switching back and forth between Mongo and SQL is not without technical cost. The syntax and semantics are quite different and translating complex queries to map/reduce is also a problem.

mikebo · on March 27, 2010

I agree that developing with MongoDB is really really nice. We'll see how it holds up under more load, but for prototyping data models I have not used a better stack than mongodb + mongomapper

richcollins · on March 27, 2010

Why is it faster?

paulgb · on March 27, 2010

It doesn't force you to fit non-tabular (eg. hierarchical) data into a table structure.

Also, no schema means the data structure is more malleable. With the right ORM, this fits in well nicely with polymorphism: I can store objects with some common features in the same collection, but when I retrieve them from the database I get different types of objects which inherit the same base object. mongoengine is one ORM that does this.

heresy · on March 28, 2010

Nice. But I have a difficult time seeing how storing "malleable" data like that, opaque to the storage engine, is going to be performant for querying.

Must be nice to have requirements that never change once you've decided on a data representation...

paulgb · on March 28, 2010

The database is still aware of the fields, so MongoDB can build indices on certain fields if you wish. Admittedly I haven't deployed Mongo in an environment that really tested its performance, but we've been serving about 20k pageviews per day with no issues. Granted, this was a fairly basic application.

As for changing requirements, mongo handled those well too.

It's certainly not a silver bullet, but when I just need a basic object store the query performance trade-off is worth it.

richcollins · on March 28, 2010

Does MongoDB manage indexes for you?

japherwocky · on March 27, 2010

you don't even have to think about schemas or SQL, you can focus completely in whatever your language of choice is.

rythie · on March 27, 2010

"Did you know that Cassandra requires a restart when you change the column family definition? Yeah, the MySQL developers actually had to think out how ALTER TABLE works, but according to Cassandra, that's a hard problem that has very little business value. Right."

Really did the MySQL people think about it? because it takes ages to do an ALTER. Even when you are doing something like dropping an index it can lock up for hours, where no one can do any inserts. In contrast restarting a service is no big deal.

newhouseb · on March 28, 2010

> In contrast restarting a service is no big deal.

Yikes, restarting a service that's so essential to everything else in web infrastructure is certainly a big deal. Where I work (large 30+ million users/month site), we have batches that do all sorts of processing and DB crashes (basically equivalent to a restart) can be a major pain because it can be difficult to figure out exactly what failed and when and how to best recover. You might say, "oh just re-schedule all the batches to allow downtime", but once you have 30 developers with a hundred or so batches, that can be damn near impossible to orchestrate.

A simple stateless web-app can probably tolerate a DB restart, but Cassandra was built to scale - not to host a to-do app.

ALTER takes ages to do because of the ACID constraints of MySQL. If you want, you can sacrifice the ACID constraints by just cloning the table with the proper modifications and then dropping the table, but I'm venturing into DBA-land for which I am in no way qualified to profess knowledge.

pradocchia · on March 28, 2010

> ALTER takes ages to do because of the ACID constraints of MySQL.

No, I don't think it has much to with ACID. Rather, they made a simple single implementation of ALTER TABLE that copies the whole table out on any change whatsoever. Add a column? Recreate the table. Add a table comment? Recreate the table. Drop an index? Recreate the table.

They could have identified cases where in principle the table & metadata could be modified in-place, but that would be a lot harder than a simple copy. It would probably necessitate changes to the legacy architecture, which in turn would require a host of other changes.

spudlyo · on March 28, 2010

No kidding, came here to say the same thing. ALTER TABLE operations in MySQL essentially copy the entire table, even if you're changing something trivial like a table comment. It's a huge pain in the ass. Restarting the service is a freakin' cakewalk in comparison.

pradocchia · on March 28, 2010

It occurs to me that MySQL started off as a thin SQL wrapper on a NoSQL database: here, have a SELECT and WHERE, but you'd best not JOIN, and forget about transactions or referential integrity.

Then, over time, they tacked on a few more relational features, but they had yet to solve the hard problems of relational databases.

Meanwhile, the people who were originally drawn to MySQL as a dumb-and-quick datastore got frustrated with this line of development and christened the NoSQL movement. It's not so much a departure from relational databases (they were never really there), but a return to MySQL basics, w/out the SQL.

mark_l_watson · on March 27, 2010

I can't believe some of what is said by people on both sides of the NoSQL arguments. Discounting use of RDF data stores, almost all of my recent work involves PostgreSQL and MongoDB. I think that it is blatantly obvious which to use in specific circumstances. I have not had to do this yet, but using Datamapper.setup, you can integrate the use of both in the same application by storing some model data in a relational database and some in MongoDB, as it makes sense to do so.

mark_l_watson · on March 27, 2010

Here is an article on mixing Datamapper + MongoDB + MySQL: http://lunarlogicpolska.com/blog/2010/02/15/mysql-and-mongod...

jbyers · on March 27, 2010

As is to be expected from this author, this is definitely on the flame-bait side of things. I submit it because I believe there is an important point here: for the vast majority of startups, going with a relatively unproven "NoSQL" database is a premature optimization and an unneeded technical risk. I disagree with the author that these databases are a flash in the pan, but their over-application is.

psadauskas · on March 27, 2010

I dunno, I see the opposite: RDBMS's are a premature optimization. In my experience, it's /much/ easier to hack together a quick webapp in MongoDB, because you don't have to worry about relations, migrating schema, etc. Sure, it might be slower than Postgres on a billion-row table, but wait until you have a million rows before you shackle yourself to the relational constraints.

bilbo0s · on March 27, 2010

Absolutely right.

He has it backwards. You use NoSQL to 'get shit done'. When you have a billion rows, then worry about schemas. By that point you will have a much better idea, a - what said schema should look like, b - what the architecture of the Postgres, or mysql, or Oracle should look like, and c - how much money you will have to solve the problem.

prodigal_erik · on March 27, 2010

I once worked in a Notes shop. Notes has no schema for documents and nothing to enforce migrating data from older documents to the current format. After a few years of customers manipulating data with various versions of the code, they had documents in such bizzare combinations of states that it was no longer possible for anyone on our dev team to inspect them and say which behavior would be right for the workflow.

Schemaless data should only be a summary of data properly maintained elsewhere, which you can regenerate at need. If your authoritative data has no schema, it will decay to garbage.

bilbo0s · on March 27, 2010

That's because you waited years to address the problem. Not only that, you also rewrote the code, as I advise. But you did not take the opportunity to address the structural data issues you were having, contrary to my advice.

My strategy is to rewrite the code, if needed, but with an eye towards addressing structural data issues. After a few months use of a web app you have a good idea of any surprising usage patterns that may appear. Readjust at that point when you are 'talking with data'.

This advice is for small startups of the HN variety, where 'customers' are a lot more important than 'authoritative' data stores initially. NoSQL systems are useful tools for mitigating the danger of doing too much engineering upfront. Many tech entrepreneurs fall victim to doing too much upfront engineering in the hopes of their data store not 'decaying to garbage', only to find that no one wants to use their product. NoSQL makes it easy to go back and migrate off the data you want to store 'on the move'. When you have a better idea of how much of it there is, and how it is used.

prodigal_erik · on March 28, 2010

If you are not doing data migration, after n revisions to the data management code, each record can be in any of 2^n states, depending on which code revisions did or did not modify it. How many revisions can you make before your code can no longer handle some of your older data? I'd say days' worth, not months, because you're trying to iterate a lot faster than we did. And the odds of a complete rewrite understanding all your old data are even worse.

If you are doing data migration, you necessarily have an old and new schema in your head. At that point you're just refusing to write it down and let the tools tell you whether the code agrees with you.

papachito · on March 27, 2010

I don't really see the point of starting with NoSQL and then going for Oracle. NoSQL DBs are real tools you know, not toys you through after your app gets traction. It's actually the contrary that has happened so far.

bilbo0s · on March 27, 2010

Maybe I should have been clear. You only go to SQL RDBMS if you need it. Which, in the vast majority of cases you will not. Further, you only go to SQL where you need it.

For instance, right now someone is developing a street car racing game for Facebook. The XBox kind, not the FarmVille kind. At any rate, one of the features is obviously, playback. Now keeping all of those physics updates in an SQL is pointless. And figuring out a schema for that data would have only gotten in the way of them getting that out the door. Throw the physics messages in a queue and write them to Cassandra. If you have even 10000 MAUs, you will easily generate billions of rows. It's just not data that really needs to go into mysql.

I don't think Cassandra is a toy. I think you should 'get it out the door' with everything in Cassandra, and then slowly, move the business stuff off. User names, what cars they bought for instance. Stuff that is not read often. But at first, get it out the door. Don't stop to figure out a perfectly normalized schema with balanced indexes.

lmz · on March 28, 2010

Isn't Cassandra slightly more complicated than MySQL? Sure adding another column is trivial, but the access patterns & indexing need to be determined first.

bilbo0s · on March 28, 2010

Not if you use AKKA!

duncanwilcox · on March 27, 2010

NoSQL might be hype. Let's get specific. Cassandra eliminates the SQL database single point of failure and hard to replace masters via a lose sync, "eventually consistent" protocol.

Is there some startup offering a web service that doesn't need that?

And have you ever tried to deploy an SQL database capable of thousands of miles apart syncing?

Eventually consistent is quite a different model than ACID. If you accept that, and accept that you can't rely on networks to always be up, you'll live comfortably and cost effectively.

rbanffy · on March 28, 2010

I wonder how long will it take for the simple "if you need ACID, go SQL, if you don't, you'll be fine with NoSQL" truth to sink in.

NoSQL databases have been in use since before I was born. Is anybody doing airline reservations on DB2?

derefr · on March 28, 2010

No one has ever explained this to me: why are we partitioning this space? Why can't a single database management system:

* have individual tables, indeces and views that are either relational or document-oriented, or graph- or object-based while we're at it, on a case-by-case basis,

* manage them all in a single, well-known distributed pool,

* and present a unified API to access all of them (e.g. a Structured Query Language of some sort)

* that allows tables of disjoint types to be joined in queries, with appropriate warnings when it creates non-optimized query plans?

In other words, why can't I say that my reports table should use the "relation" backend, while my messages table should use the "document" backend, and be done with it?

It's as if, when you went to a car dealership, they asked you whether you wanted to see the "cars with cigarette lighters" or "cars with automatic windows" section. Why can't my car do both?

vog · on March 28, 2010

have individual tables, indeces and views that are either relational or document-oriented, or graph- or object-based while we're at it, on a case-by-case basis

This is already the case. Nowadays, almost all relational databases (except, of course, MySQL) support XML columns. PostgreSQL supports them rudimentary, and DB2 and MSSQL have even special storage strategies and index structures for XML, i.e. for generic tree structures, data-oriented as well as document-oriented ones.

Also, abstract data types ("encapsulation", the base of OO) are implemented in these databases, too (except, of course, in MySQL), as well as other OO features such as table inheritation and some kinds of polymorphism.

derefr · on March 28, 2010

I'd love to see a comparison between using these XML engines for queries and NoSQL, then. I'm betting they'd be competitive at least to the point that, if you already had one of the supporting DBMSes set up, there would be little point in training your DBA on NoSQL as well.

megaman821 · on March 28, 2010

I thought most of this already exists, just not on the free databases.

Having XML or JSON columns that can their interior fields indexed, would replicate what document databases do.

Also what having a master-master relational database with a bunch of materialized views replicate what people are using Cassandra for. Is it just because the free databases don't have materialized view support?

codexon · on March 27, 2010

I recorded a list of issues that may prevent you from using Cassandra as a general purpose website storage right now.

http://www.codexon.com/posts/is-cassandra-ready-yet

It appears as though Ted's complaint about needing to restart Cassandra to modify ColumnFamilies (tables) is nearly obselete. A patch for the last remaining subtask has been submitted.

physcab · on March 27, 2010

I can't wait for these types of articles to die.

jgerman · on March 27, 2010

I'm getting tired of both sides of this argument, I'll be happy when the whole back and forth dies :). Rarely do you see a balanced opinion. Sometimes it's people that are fanatical about the new-ish NoSQL idea. Other times, like this, it's someone so stuck in their ways they think that everything but what they like is a fad and nothing will ever change.

One of the key things I look for when I interview developers is that they can recognize the right tool for the job. Potentials that get married to a technology or language are shown the door pretty quickly.

Also, as others have pointed out, this particular article seems to not quite understand the decisions involved, to the point of getting some things backwards.

strlen · on March 27, 2010

AdWords implemented on top of MySQL? Perhaps the CRM portion of AdWords (i.e., where the advertisers submit their ads and publishers view their balances) is -- it's fairly easy to partition by functionality and doesn't have extremely tight latency bounds. This isn't where real time auctions (what really distinguishes AdWords from what came before) happen.

You can be sure, however that the data used for real time ad auctions is extracted out of MySQL and into a highly customized data store (likely, a pure in memory one). It's all about using a right tool for the job. You can also be sure that you'll never see a paper on that data store, as that's their competitive edge. If you could duplicate it with off the shelf components (whether MySQL or Cassandra), Google would be toast.

Likewise, I am sure Amazon uses Oracle for their billing system and catalog submission interface, but they use specialized systems for search, shopping cart and recommendations.

For a business app that only needs to scale to the amount of paying customers (i.e., advertisers, account managers and customer support) and has no real time constraints -- but on the other hand involves complex and frequently changing business logic (e.g., where altering tables may be required) an RDBMS is the right tool for the job.

Where latency matters, data grows much faster than Moore's law (in relational to main memory size), Amdahl's law starts to matter in regards to computation (computation work load needs to be partitioned to take advantage of parallelism), and traditional caching strategies simply don't work, something else is. That situation is starting to become more and more common across web companies. You can also be sure that places like Wallmart and the like employ plenty of non-relational technologies (my personal bet would be is that they're likely using Coherence or Terracotta): usually, however, they're expensive and are built/configured by field-engineers to be custom tailored for their workloads. When you employ a world-class engineering team, "build" starts to make more sense than buy when you're solving a very specific and constrained problem (e.g., fault tolerant shopping cart system).

You don't need to be of Google's size to be at that stage. Talking about scalability and performance without taking the workloads into account (e.g., "Google Facebook or Amazon" as if e-commerce, search and social networking were compatible) is also an anti-pattern: I am sure engineers at Google would laugh when you compare Facebook's scale to theirs; likewise Facebook's engineers would laugh when you compare the real time aggregation that happens on their site to what happens at Amazon; Amazon's engineers would likely tell you holiday season pager duty horror stories that would scare Facebook or Google engineers.

derefr · on March 28, 2010

> a highly customized data store (likely, a pure in memory one)

That's when we stop calling it a data store, and start calling it a data structure. Data stores are where data goes when it's not part of the working set. With that definition, it's perfectly sensible for AdWords to use MySQL as its data store.

strlen · on March 28, 2010

(Edit: this is a longer reply than I intended, no longer really intended as a direct reply to the parent; this is more a reflection on systems architecture of data-intensive applications).

That's a good point, but a pure in memory data structure is:

a) Not persistent to disk at all. Judging from my own experience with similar low-latency systems used in ad serving (where we called these "data servers") and other similar systems, the data is likely to be persisted to local disk and the deltas replayed to it from a MySQL db to avoid long restart times.

b) Lives within the ad server process. This is likely not true, as the ad server process will need to compose a "working set" for particular ad auctions from multiple data sources (bid price for each ad, keywords, budget/delivery/campaign specifications for each ad, keyword relevance of ad/ad campaign). Each of these data sources is likely represented by a different data structure (red-black tree for one, hash table for another, trie for yet another, graphs, B-Trees, etc...), has very different characteristics in terms of cache-locality, rate of change, size, density and comes from multiple places (some from RDBMS, others from Map/Reduce)

(Interesting side note: earlier I also wanted to say that neither the data structures are usually not partitioned in how they're store, now is computation done on them partitioned. However, with the age of parallel computing this is simply not true: there are now parallel data structures and algorithms).

One compromise is perhaps we can call these systems "data servers" or "data structure servers" (afaik Redis does the latter). MySQL (or any other RDBMS) merely feeds these systems through some form of message oriented middleware. In this case RDBMS (and this is an over simplification which doesn't cover all the corner cases) is merely acting as tape: changes are played forward and not randomly accessed. RDBMS that is the source of truth for ad-serving is never queried real time and can easily be taken down for maintenance while ad serving continues. It doesn't even need to be highly available (if advertisers can't submit ads it would certainly be a huge and costly outage, but much less costly than if users see ads!).

Note, such a system is also necessarily eventually consistent (in the truest meaning of the word: customer receives an SLA which corresponds with a point where the serving component is consistent with the DBMS).

There still needs to be an efficient OLAP component to back the CRM/ERP functionality of this system, for which an RDBMS is still a good bet (combined with an off-line system e.g., Map/Reduce for more complex reporting and optimization). However, had an end-to-end ad-serving system been written from scratch now, would the RDBMS component serve as primary source of truth (rather than just as the backend for the publisher/advertiser/support UI component).

In addition, this ("write to RDBMS, serve from elsewhere") design is also very specific: writes to the "ad submission database" are rare and don't always require high availability. Consistency (between the RDBMS and serving component) can be much more eventual than would be in a Dynamo based system (where the weak "can't read my writes" eventual consistency is only a failure condition).

Now suppose you also want highly available, low-latency writes (even if not at the same frequency as reads) and you'd want to be able to read-your-writes in normal situations. This makes the "write to RDBMS, serve from something else" (effectively what popular memcache+MySQL deployments are) scenario more brittle. You now have much harder questions to answer (do I want a system that's always in a consistent state e.g., to avoid having to do quorum reads/writes? am I okay with eventual consistency as a failure scenario? etc...) but with many workloads this becomes a necessity.

Despite speaking at NoSQL events, I am not a big fan of the NoSQL name. Not only do these systems not intend to completely displace SQL based RDBMS systems (and as with ad server example can exist side-by-side with them), additionally these systems provide functionality that can't be provided by RDBMS systems (and not just due to scalability concerns).

wanderr · on March 27, 2010

I myself am not actually sold on the noSQL movement, at least not on the idea of ditching SQL entirely. It has its place, but may not be the best solution for every problem.

That said, on the authors complaint about having to restart cassandra when doing the equivalent of an alter table: lately every time we do an alter table in MySQL (which takes hours on large tables, during which time you can do nothing with them), when the alter finally finishes, MySQL mysteriously crashes. MySQL may have given more thought to the problem, but their solution obviously has problems too.

wrath · on March 27, 2010

Macs are better than Pc, C# is better than Java, Unix is better than Windows, RDBMS are better than NoSQL databases...

Why can't people just use the best tool for the job and move on...

From a personal stand-point we've switched from MySQL to Google AppEngine (and BigTable). Although I find there are some major drawbacks (e.g. joining tables) not having to worry about database servers and scalability is a major advantage. That said, if MySQL becomes the best tool for a particular feature, then let it be...

tlrobinson · on March 27, 2010

The scalability aspect of "NoSQL" is interesting, but I think possibly the more interesting part is the wide diversity of data models (key value, schema-less tables, document databases, etc)

True, some of these models are more restrictive than traditional RDBMSs to provide scalability, but I think some of them will often be useful even if scalability isn't initially a concern.

In fact the term "NoSQL" itself is more relevant to the data models than the scalability.

shin_lao · on March 27, 2010

The author makes some very valid points, but I will retort that RDBMS are overused as well.

Sometimes you just want to store data on the disk in a safe and language agnostic way.

You don't care about relations.

In that case, many "NoSQL" engines are really great.

aheilbut · on March 27, 2010

"Sometimes you just want to store data on the disk in a safe and language agnostic way."

You mean, kind of like a file?

paulgb · on March 27, 2010

Well, since you need a safe way, you'll need a locking mechanism, too. And since it needs to be language agnostic, you can't just dump the internal representation of your object to disk, so you need some sort of serialization.

At that point, it's probably easier to go with an already existent object or document store.

mpk · on March 28, 2010

Ladies and gentlemen, we have a winner!

If you start using the filesystem as a datastore that requires concurrent access you open up a whole new can of worms. You need a locking mechanism - which you'll probably implement using (wrapped) native syscalls. Not only does that break cross-platform operation, you'll also have to work on and fix (but find first, of course) bugs in the locking implementation. As you spend more and more time on this and your app starts growing, you'll find yourself spending more and more time working with the limitations of the filesystem you're using (file size limits, directory size limits, access times for files in large directories). You can hack your way around all that but then you have to face other critical tasks. Say .. backup and restore procedures. Can you do partial backup/restore operations? No? Well, get ready to write code for that too. And you preferably want to be able to do those live. Remember those locking issues you solved when you started down this road to hell? Yeah, they're back with a vengeance now.

How about a full restore? Maybe you should have implemented a replay-able log system to get that full restore up to speed with the state of the db since the time of the last backup.

Or maybe this isn't exactly the right point at which to re-invent the wheel :)

aaronblohowiak · on March 27, 2010

XML is nice for this because it supports multiple schema versions, validation, and has support in just about every language. My chief complaint with json as a cross-language serialization/interchange format is lack of a great way to validate your format. Most of the json schema definitions i've seen require your schema to follow a certain convention, which seems backwards and wrong to me.

Files can be locked if your OS supports it.

derefr · on March 28, 2010

Perhaps developers wouldn't be going nearly as crazy for NoSQL if Apple's CoreData had a platform-agnostic FOSS equivalent.

aaronblohowiak · on March 27, 2010

Filesystem is the original document store. Unfortunately, most filesystems really suffer when you put $LOTS of files in the same folder, so you end up implementing a nested folder structure and that complicates your code. Now, if this is more complex than running an entire "object" store depends on the application. Also, the filesystem's addressing may not be granular enough (and so waste a lot of space,) if you are going to store < $BLOCK_SIZE files.

hassy · on March 27, 2010

You should check out vertexdb. It's designed to be used just like a filesystem, but it fixes the shortcomings of filesystems when they are used as dbs.

http://github.com/stevedekorte/vertexdb

In production use at http://stylous.com

rbanffy · on March 28, 2010

> but I will retort that RDBMS are overused as well.

Oh boy... How many times I had to explain my clients their sites would be just fine using ZODB instead of MySQL...

Never thought it that way. Hey! I am using NoSQL databases since 2001!

Actually, I wrote a lot of Dataflex 2 code, so make it 1987 or so.

dacort · on March 27, 2010

+1 for the Batman rollerblader. But that's about it. Use what works, some apps do justify nosql off the bat. Many don't.

aaronbrethorst · on March 27, 2010

He's a fixture of Seattle's big Solstice parade: http://www.flickr.com/photos/daffodilious/3646480480/

Keep digging through that photoset and you'll probably see pix of the naked bikers, too.

VBprogrammer · on March 28, 2010

I found the comparison with 'Real Businesses' particularly funny, given Wal-mart have have 2.1 million employees worldwide and Twitter has 75 million users...their scaling requirements are different by a factor of 35.

rm-rf · on March 28, 2010

I'm pretty sure that Walmart's databases track way more interesting things than tweets.

I'd assume that they know where every pallet of product is located anywhere in the world and what's in it, where every truck is and what's in it, the referrer for every click on their web site, details on every purchase at every store, location of every incoming product from every supplier.....

And then I'd assume that they take all that data and de-normalize it, move it to a whole different series of databases, poke it into star schemas and warehouse it for further analysis - 'cause y'know - red wieners might be more popular than blue wieners next xmas, and they'd better anticipate that by at least 6 months, 'cause the wiener factories have to re-tool.

Real business have real data, not tweets.

gsk · on March 29, 2010

Listen to rm-rf. Way back when (2003-2004), I was involved in Walmart's supply chain managament software. I recall a few million rows to be optimized every day (before noon, so the hundreds of trucks plying the road can move things around efficiently).

jimbokun · on March 28, 2010

Walmart also has a few products they need to track.

pradocchia · on March 28, 2010

...and just a few points-of-sale.

The Wal-Mart data warehouse operation is legendary in the industry. As of 2008, they were running Teradata, w/ 2.5 PB of data:

http://www.dbms2.com/2008/10/15/teradatas-petabyte-power-pla...

By comparison, Facebook had ~2.5 PB in Hadoop/Hive in early 2009:

http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/

code_duck · on March 28, 2010

You're comparing two completely different businesses on disparate metrics.

How many 'users' does WalMart have worldwide? I'd say at least 500 million on whom they keep a purchase record. Then there's products, credit card numbers, suppliers, etc.

VBprogrammer · on March 28, 2010

But in terms of their 'real business' it is not the number of customers they have but the number of they have but only their employees who will be using their systems. The purchase records, products and credit card numbers are closer to Tweets than users.

code_duck · on March 29, 2010

I don't think the number of people doing data entry or accessing a system matters nearly as much as how much data one has to be tracked. WalMart's data needs exceed that of Twitter, they exceed the needs of Facebook. I don't know if they meet or exceed Google, but I'd imagine it's up there.

abalashov · on March 28, 2010

Thank goodness - someone finally said it.