I understand that. The problem is that in many scenarios users would want to see... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		lostmsu on Oct 17, 2024 \| parent \| context \| favorite \| on: Ichigo: Local real-time voice AI I understand that. The problem is that in many scenarios users would want to see transcripts of what they said alongside the model output. Like if I have a chat with a model about choosing a place to move to, I would probably also want to review it later. And when I review it, I will see: me: /audio record/ AI: 200-300m. No easy way to see at glance what the AI answer was about.

readyplayeremma on Oct 18, 2024 [–]

You can just run whisper on the conversations as a background job populating the text versions of all the user inputs, so it doesn't interfere with the real-time latency.

lostmsu on Oct 18, 2024 | [–]

It's not going to match what model hears.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact