More

sambigeara · 2026-05-13T13:06:34 1778677594

Author here. Happy to answer any questions, vague or intricate! I love bitmap indexes and can talk about them all day.

sambigeara · 2026-05-06T15:40:09 1778082009

Thanks. Definitely crossed my mind, but it's in the "distant future" bucket, for now at least.

mitchsayre · 2026-05-06T18:02:38 1778090558

Do you think Pollen is applicable to distributed AI inference? I think it could work for realtlime Voice Agents running directly on mobile hardware.

There are speech-to-speech LLMs that are big and do pure audio in audio out. But you can also make voice agents that use multiple smaller models cascadded. ASR for transcription, LLM for response text, TTS for speech, interrupt detection. If you try to load ASR, LLM, and TTS models that actually do a good job onto the same mobile device all at once, you can't get it to be realtime. But if you run them in a distributed setup, where each device has only one model loaded and streams its output to the next task device, you might achieve realtime performance while using stronger models for each task.

Does this sound possible, or am I misunderstanding how Pollen works?

sambigeara · 2026-05-07T10:16:38 1778148998

From a conceptual, workload-deployment perspective, I'd say yes--this is largely what I'm trying to achieve with Pollen. In fact I'd go so far as to say that it would be the recommended way of deploying workloads. Pollen's placement model responds better to single functions per seed rather than a single module with multiple, disparate functions, because you'd get a natural balancing of compute; heavy functions scale more aggressively, light functions less so.

The wonder if the limiting factor would be _which_ models can actually be compiled into a reasonably sized WASM module (I'm not familiar with this right now--are you aware of efforts in this space?). If there are genuinely effective WASM models that fit into a reasonable sized modules, then it would fit nicely.

All this with the previously acknowledged limitation that it's not yet on mobile (but perhaps a number of edge Pollen nodes could act as ingresses into the cluster in the interim).

I'm super interested to hear how you might employ it though, if you did start experimenting. I'd be interested to learn where it's useful and where it falls short. Please feel free to hit me up on Github or by email (in my profile)!

mitchsayre · 2026-05-07T15:05:00 1778166300

Just emailed you but I'll reply here as well in case anyone comes across this thread and finds it useful later.

-TTS: I am actively working on this at Wfloat and just released a 30M param model with 20 voices, emotion, and intensity control that supports running on even legacy 2017 phones. -ASR: I think this is relatively in a good spot, the current ones small enough to fit on-device just mess up more at transcribing -LLM: For sure the main bottleneck. I know a bunch of people are working on this one. The problem with LLMs is just that they have to be so big to actually know how to do anything.

sambigeara · 2026-05-04T08:36:13 1777883773

Thanks! It's just Wazero's default config[1] right now, so it implements (and is constrained to) those capabilities--WASI p1 is supported, WASI p2 isn't (Wazero yet to implement). Yes to SIMD, no to GC and tail calls (I think), etc. Full capabilities can be inferred from digging around in the code linked below.

Good suggestion on listing capabilities, will add a note.

[1]https://github.com/wazero/wazero/blob/2bbd517b7633bf6a126305...

sambigeara · 2026-05-03T07:17:33 1777792653

Wow! This is seriously cool. And certainly not bad form, there is a level of convergence here and it's always interesting to see what else is being built out in the ecosystem.

I'd agree that Pollen's current cap-enforcement story is limited, I'm not sure what direction I'll be heading in for that, but I was erring on the side of "bring your own enforcement" as a design pattern (ultimately, people deploy their own decision engines as first class seeds in the cluster). Naturally, the enforcement is weaker than the (fascinating) pattern you've landed on--seriously cool.

> and there’s a tiny Clojure-inspired Lisp (“Glia”) that doubles as an LLM-facing or human-facing shell.

This is a _lovely_ abstraction. How does it work? Does the LLM emit Glia directly or is there a translation layer between natural language and the interpreter..?

> It's a les polished compared to what Sam has shipped, but moving fast, and this post has jolted me into sharing a bit before I had planned!

I'm _far_ from polished. I suspect you're underselling your own position here, looks like you have something very compelling. And apologies for the jolt! Certainly happy to compare notes--I (think) I've added my email to my profile.

sambigeara · 2026-05-03T06:20:09 1777789209

Hypothetically, yes! If your workloads are bounded and can compile to WASM, break them into logical units which would benefit from individual scalability, and `pln seed` them into the cluster. Ingress can be from any node. Any workload that doesn't suit the WASM seeds can be `pln serve`d on dedicated hosts.

You could also establish a dev cluster (/environment) where all devs run a local instance. You can iterate on services quickly, expose ngrok-like capabilities by exposing a local dev instance of a server with `pln serve 8080 test_server` for your colleagues to consume with `pln connect test_server`, etc, etc.

A more whacky idea I've not been able to get out of my head which might become possible as the access story solidifies: imagine a customer could access a controlled subset of your companies offering by having a delegated node, running in their own infrastructure, that ultimately you can delegate and revoke at any given time.

ivere27 · 2026-05-03T09:45:33 1777801533

good to hear that. there are plenty of idle cpus in a company. then, we can use it safely in sandboxing wasm.

sambigeara · 2026-05-03T06:12:44 1777788764

Thank you. Me too!

sambigeara · 2026-05-02T18:35:59 1777746959

So, the moment a partition occurs, nodes within their resultant partitions then view the remaining peers as the full view of the world. There is _no_ concept of a split brain scenario.

ANY decision around network topography or workload placement is a deterministic calculation run by all nodes individually. If all nodes see the same sub-set of peers representing their entire "cluster", they'll all naturally converge on the same view of what the cluster should look like. If the calculated output determines that Node A should claim Seed B, and it doesn't have it, it requests it from a peer who has it.

As soon as the partition recovers, nodes see the additional nodes re-enter the candidate set, which is then added in to future routing and placement decisions.

The main tradeoff to understand here is that you're at mercy of the random (best attempt redundant) placement of a seed. If the entire cluster has, say, 2 replicas stored on any given nodes, if a resultant partition doesn't happen to have either of those two nodes, then the seed will be unavailable until the partition recovers. You can work around this with "smart" initial placements (one near, one close, for example) but you're still at the mercy of random partition events. An additional factor is of course getting very unlucky with dropped gossip events, which would also impact the rate of convergence across the cluster.

sambigeara · 2026-05-02T17:52:56 1777744376

Fair comment that I'm hearing in a lot of places. I'll work on trying to land some concrete examples.

sambigeara · 2026-05-02T17:36:28 1777743388

Well, I have a lot to thank you for. The single binary, heterogeneous story would have fallen flat on it's face if it wasn't for the brill work you lot are doing, so, thanks!

sambigeara · 2026-05-02T17:18:09 1777742289

Honestly, not really. It started as an experiment in local-first, convergent state (I have a historical fascination of this: https://news.ycombinator.com/item?id=27606604, https://news.ycombinator.com/item?id=42444856) and then continued to grow.

I do absolutely despise the complexity of administering modern distributed systems, hence my attempt to make Pollen as ergonomic and (as much as I hate to use this term) batteries-included as possible.

I've not come across either of those projects, oddly. I have a tendency to avoid looking for similar projects during the development of my own, lest I get despondent and run out of steam. Both sound cool, though. I'd say WASM was a natural workload "type" that fit nicely into what I was trying to achieve with Pollen, rather than a driving factor, if you know what I mean.