Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yup, from where I see it, the only thing(s) holding llms back from generating api calls on the fly in a voice chat scenario is probably latency (and to a lesser degree malformed output)


Yea, the latency is absolutely killing a lot of this. Alexa first-party APIs of course are tuned, and reside in the same datacenter, so its fast, but a west-coast US LLM trying to control a Philips Hue will discover they're crossing the Atlantic for their calls, which probably would compete with an LLM for how slow it can be.

> and to a lesser degree malformed output

What's cool, is that this isn't a huge issue. Most LLMs how have "grammar" controls, where the model doesn't select any character as the next one, it selects the highest-probability character that conforms to the grammar. This dramatically helps things like well-formed JSON (or XML or... ) output.


Disagree. Extra latency of adding LLMs to a voice pipeline is not that much compared to doing voice via cloud in the first place. Improved accuracy and handling of natural language queries would be worth it relative to the barely-working "assistants" that people only ever use to set timers, and they can't even handle that correctly half the time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: