Bloom distributed programming language released

3dFlatLander · on April 8, 2011

I think I'm a distributed computing fanboy. I'd like to pull a few EC2 (or GPU) instances and see how quick brute forcing different encryption/hashing would take, how big of a prime number one could compute in a few hours, do the same pi/e/eulers, or let some genetic algorithms generate a huge wad of data. No idea why or to what ends, just seems like a cool thing to do with spare time.

jerf · on April 8, 2011

You mean in general or with this project? This project appears to be a Ruby DSL with aspirations to be its own language someday, but while it's a Ruby DSL it's not going to be a great language for performance exploration. Ruby is slow. Writing in a faster language(/implementation) and running it on a single computer is like running a Ruby cluster of ~30 machines, for raw performance work.

(Ruby being slow isn't a criticism. "Some of my best friends are slow languages." But it's not where I'd start any sort of clustered/cloud computing project.)

nathanmarz · on April 8, 2011

What these guys are doing with Bloom/Bud are searching for dramatically better abstractions for building distributed systems. Getting the performance right should and will come later.

I'm a big believer in the mantra "First make it possible. Then make it beautiful. Then make it fast." In the distributed systems community, there's a lot of experience with "making it possible": Hadoop, Dynamo, etc. Bloom/Bud is attempting to figure out the "make it beautiful" part by leveraging what we already understand about the problem domain of building distributed systems.

Worrying about the kinds of constant-time performance things you mentioned at this stage would be premature optimization. I commend them for building this system in a language that allows them to iterate fast and experiment. I'm sure in the future they'll look at using technology like JRuby to improve the performance of the project.

I think what they're doing is very interesting and potentially groundbreaking -- I can't wait to see where this project goes.

jerf · on April 8, 2011

I get where you are coming from, and it's a good plan, as long as the plan is to eventually fully detach from Ruby. Being even two or three times as fast as Ruby, which seems to be an optimistic interpretation of JRuby's performance, is still starting from a terrible position in so many ways.

I don't get the idea that some people seem to have that performance doesn't matter for distributed systems, when the truth is the exact opposite. Desktops and even cell phones, we see a great deal of sloppiness around performance, because it doesn't really matter that much. Small servers or small clusters, we still say throw more hardware at it and just hack some stuff together for clustering. But when you're serious about distributed systems is also when you are counting every one of something; maybe disk hits, maybe CPU cycles, maybe bytes of RAM, but there is something you are obsessing over. And maybe you're obsessing over more than one of these at once, all with an intensity that would credit an Atari 2600 programmer. (Facebook apparently published the specs for their machines today. Tell me they aren't too concerned about performance.) I'm not sure leaving performance for later is a good idea, they may well iterate their way into a cool abstraction that will never perform. Designing a distributed system abstraction without worrying about performance strikes me as about as sensible as designing a new 3D framework without worrying about performance... not necessarily a fatal flaw but I sure hope you have a good plan.

jacques_chester · on April 8, 2011

> I don't get the idea that some people seem to have that performance doesn't matter for distributed systems, when the truth is the exact opposite.

I think that investment in performance follows a curve.

A bowl, actually. And that this interest is based on the cost of optimisation vs the payoff.

      -                                              -
      --                                            --
      ---                                          ---
      ----                                        ----
      ------                                    ------
      ---------                              ---------
      ---------------                 ----------------
      ------------------------------------------------
    <-- embedded ... SME web/desktop ... data-centre -->

Assume that an optimisation costs $X of programmer time and pays back $Y dollars.

When your cost of production is very large, $Y > $X. That's what you see for embedded systems with millions of units shipped and for data-centre computing with tens of thousands of units installed. The cost of one programmer optimising is well worth it.

But for the sunny plain of mediocrity in the middle, the cost of extra hardware ($Y) will be less than the cost of the programmer time $X.

Here endeth the extemporising.

nathanmarz · on April 8, 2011

Perhaps I was unclear. Performance is critical. They just shouldn't worry about the kinds of performance issues that can be optimized later. These include things like language choice and other micro-optimizations like zero-copy, etc.

Getting the abstraction right is making something that can perform well once it's optimized. That is, you worry about the high level performance properties -- things like how it will perform asymptotically -- when figuring out the abstraction.

For this project especially, the really difficult part is figuring out what those abstractions look like. Even having to do a complete rewrite later on in a faster language isn't that big of a deal cost-wise.

neilc · on April 8, 2011

This project appears to be a Ruby DSL with aspirations to be its own language someday

Well, we actually started with our own language a few years ago (a Datalog variant), and have been exploring embeddings into traditional languages as a way to make our ideas more appealing to mainstream programmers.

while it's a Ruby DSL it's not going to be a great language for performance exploration

Absolutely: we've been focusing on pleasant syntax, sensible semantics, tools, and analysis techniques. Performance has definitely not been a goal, but certainly it will be important in the future. Bloom rules can be compiled into a dataflow graph (think: DB query plan, albeit recursive), and executing dataflow graphs efficiently is not difficult -- e.g., we can certainly envision generating query plans from Bloom rule sets and then running those query plans with a C runtime.

BTW, nothing about Bloom is Ruby-specific: we picked Ruby because it let us prototype our ideas quickly. Any language with decent support for collections would be suitable; for example, Scala would be a good candidate for a subsequent version of the language.

JulianMorrison · on April 8, 2011

Have you looked at Lua/LuaJIT?

neilc · on April 8, 2011

I'm aware of the language, but we haven't looked at it in any depth. If you think it would make for a good host language for Bloom, I'd be curious to hear your reasoning.

3dFlatLander · on April 8, 2011

In general. I've got zero experience doing distributed computing, hence my fanboy label. Just felt like sharing a thought. :) However, any starting points you could provide on being able to accomplish such things would be awesome. I'm especially interested in being able to use cloud services like EC2 for it.

pumpmylemma · on April 8, 2011

See wealth of background material: http://boom.cs.berkeley.edu/papers.html

gwern · on April 8, 2011

FYI: the name seems to stem from the BOOM project, and have nothing to do with 'Bloom filters'.

joe_hellerstein · on April 8, 2011

Leopold Bloom, not Burton Bloom.

mx2323 · on April 8, 2011

i feel like this is an interesting step in the right direction

but... what happens if i want to read, process and write a log message....? thats three different bloom blocks that require 2 partial orderings.

instead of totally unordering everything, id rather have the ability to declaratively order functions for a request, where a function is a typical sequenced set of operations.

their examples arent particularly helpful, looks like most of these bloom blocks are single lines...

joe_hellerstein · on April 11, 2011

See the sandbox at https://github.com/bloom-lang/bud-sandbox/ Some involved examples there (including a GFS clone).

CoffeeDregs · on April 8, 2011

I like it! umm... but what the hell is it? I STFA (skimmed-the-f*ing-article) and don't know what's going on here. Quick! What does this mean?