I'd like to know how its model of transaction isolation works given that reads a...

psykotic · on March 5, 2012

> I'd like to know how its model of transaction isolation works given that reads and writes are claimed to be independent.

Any MVCC-style model allows full concurrency between readers and writers. The bigger problem is managing concurrency between conflicting writers in what amounts to a distributed database system. None of the material on Datomic's website explains how they intend to tackle that issue, which seems especially tricky with their model of distributed peers. All they say is that the Transactor is responsible for globally linearizing transactions and that this is better than existing models. However, if there is a genuine conflict, the loose coupling among peers seems to make the problem much worse than existing models, not better.

I'd love to know more details.

jamii · on March 5, 2012

The FAQ says that writes favour consistency over availability, so I guess that means synchronous calls to the transactor.

olivergeorge · on March 6, 2012

Some kind of compare-and-set! operator which occurs at the transactor perhaps.

Update:

1. you can do synchronous transactions.

http://datomic.com/docs/javadoc/datomic/Connection.html#tran...

2. transactions can include data functions.

"The database can be extended with data functions that expand into other data functions, or eventually bottom out as assertions and retractions. A set of assertions/ retractions/functions, represented as data structures, is sent to the transactor as a transaction, and either succeeds or fails all together, as one would expect."

puredanger · on March 5, 2012

When data is immutable, append-only, and tagged by timestamp, there is no conflict. Rather, there are facts on the same entity that are asserted at different times. In this case where changes come in at two times (which are subject to all the raciness of the real world that exists regardless), one will win.

psykotic · on March 5, 2012

> In this case where changes come in at two times (which are subject to all the raciness of the real world that exists regardless), one will win.

That describes many methods of optimistic concurrency control, but it doesn't answer my question of how this supposed to work in practice with high write contention, the higher latency of a distributed peer model, the long-running transactions the video mentions (or maybe that remark only applied to long-running queries), etc. My point being, if the distributed transaction problem was easily solved by sprinkling on optimistic multi-versioning concurrency control, it would have been solved a long time ago. There must be some special sauce they're not mentioning.

stevelosh · on March 5, 2012

From the FAQ:

    Thus, Datomic is well suited for applications that require write consistency and read scalability.

Seems like they're not focusing on high-write situations.

lukev · on March 5, 2012

Correct, it's not write-scalable in the same way it is read-scalable. The transactor is a bottleneck for writes.

However, that doesn't mean it has slow writes - it should still do writes at least on a par with any traditional transactional database, and probably a good deal faster since it's append-only.

stevelosh · on March 5, 2012

I'm more concerned with what happens when the transactor goes down, or gets silently partitioned from some of the clients. I assume reads will continue to work but all writes will break?

I'd also like to know more about how the app-side caching works. If I've got a terabyte of User records and want to query for all users of a certain type, does a terabyte of data get sent over the wire, cached, and queried locally? Only the fields I ask for? Something else?

lukev · on March 5, 2012

1. You're correct, however, the architecture does allow you to run a hot backup for fast failover.

2. The database is oriented around 'datoms', which are an entity/attribute/value/time. Each of these has its own (hierarchical) indexes, so you only end up pulling the index segments you need to fulfill a given query. You'd only pull 1TB if your query actually encompassed all the data you had.

edwardw · on March 5, 2012

I noticed that in FAQ, too. Since read is relatively easy to scale (simple master / slave setup), I wonder how to scale datomic on write side.