Since you are using LLMs to create the transcriptions, I wonder whether you've m...

Since you are using LLMs to create the transcriptions, I wonder whether you've measured the difference in precision between the chosen model, Gemini 2.5 Flash-Lite, and newer/larger models such as Gemini 3.5 Flash, Gemini 3.1 Flash-Lite or GPT5.5.

I've read the README in the feat-api branch and, from what I understand, you've already assessed that false negatives are not a model failure, but I'm not sure I understand why (haven't spent that much time looking at it though, just curious to hear from you).

This is a really cool project, by the way! In my opinion this is a place where LLMs shine: produce the work of hundreds of hours of manual human labor much quicker and cheaper, for something that no one else would ever bother to do the work!