Hacker Newsnew | past | comments | ask | show | jobs | submit | rcarmo's commentslogin

Not sure why I should use this instead of the baked-in OS dictation features (which I use almost daily--just double-tap the world key, and you're there). What's the advantage?

I haven't used this one but WisprFlow is vastly better than the built-in functionality on MacOS. Apple is way behind even startups, even for fundamental AI functionality like transcribing speech

WisprFlow has a lot of good recommendations behind it but the fact they used Delve for SOC2 compliance gives me major pause.

The fact that a company could slurp up all of your data and then use Delve for their SOC2 is a great reason to use local models.

I use the baked in Apple transcription and haven't had any issues. But what I do is usually pretty simple.

What makes the others vastly better?


I’ve rarely had macOS TTS produce a sentence I didn’t have to edit

Whisper models I barely bother checking anymore


- Way more accurate, especially with technical jargon. Try saying JSON as part of a sentence to macOS dictation and see what comes out.

- macOS dictation mutes other sounds while it's running. This is a deal-breaker for me.


This is fun. I just wish I could add more skills, the UX is too dumbed down but knowing there is a run_js tool there is a lot that can be done here.

I’ve found tmux does a lot more than you’d expect, and it’s become a trivial suggestion that most models can act upon without any real prompting.

tmux is a human-oriented terminal multiplexer, it cannot programmatically control host terminals like iTerm2/Windows Terminal natively via API like TermHub, nor can it provide hack-free native terminal automation for AI Agents.

tmux can read content but only from a buffer with unstructured text and unstable parsing, while TermHub delivers real-time, structured, AI-native terminal output natively.


Well, LLMs don’t know that. I have Codex happily running gdb inside tmux to debug stack traces, and it works fine.

This must be specific to Common Lisp. I’ve had no significant issues with Fennel and Chez Scheme, although to be fair it was on existing projects and they are not languages I would start a project with today.

TinyGo was instrumental in getting https://github.com/rcarmo/go-rdp to work. It generates very tight, pretty high performing WASM, and that allowed me to push all the RDP decoding to the browser side while making sure I had a sane test suite. Heartily recommended.

Nice. My take on this is https://github.com/rcarmo/piclaw and https://github.com/rcarmo/webterm since I prefer to run my agents away from my desktop but still have a nice UX, but I have been thinking of packing them in electrobun.

Codex needs none of this :)

I've had Calibre running someplace and mailing me news every weekend for around... 15 years?

I keep waiting for Amazon to break mail-to-kindle, but fortunately that hasn't happened yet. Gmail, though... breaks every three months or so.


TL;DR: I wish they'd just align with Blender on UX, TBH.

I wish they settled on a nicer UX with less visual clutter. I use Blender and it is a _massively_ more complex application in every regard, yet its right-aligned panel and progressive exposure of toolbars feels infinitely more polished than FreeCad's clunky panel (which is often rendered with huge, oversized fields and buttons) and their legendary five-stacked toolbars.

Feels like that satirical Gillette ad, and is much harder to use and navigate, especially since quite a few UX options need to be turned on in Preferences to be usable...


The day I found out there’s a dropdown menu that turns blender into a multi-track video editor on par with Vegas if not Final Cut… blender hides its complexity well.

I built https://github.com/rcarmo/umcp to be tiny _and_ fast, but this has some nice twists on the theme. Will investigate for sure (even if it seems like a much larger dependency).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: