> I watch tons of bike races and highlights on youtube TV and then almost all my ads are for cars, generic laundry detergent, and obvious scam crap products, anything but something bike related
If I had to guess, niche products for niche interests have small ad budgets, but the random detergent ad buyer is happy to bid on anyone's eyeballs. You can't target ad buys that don't exist!
On the other hand, before I bought YT premium I was regularly getting ads for Chevron gas in Spanish (which I don't speak), and would be unsurprised if YT ad enshittification drove premium sales.
Every six months or so, someone at work does a hackathon project to automate outage analysis work SRE would likely perform. And every one of them I've seen has been underwhelming and wrong.
There's like three reasons for this disconnect.
1. The agents aren't expert at your proprietary code. They can read logs and traces and make educated guesses, but there's no world model of your code in there.
2. The people building these apps are unqualified to review the output. I used to mock narcissists evaluating ChatGPT quality by asking it for their own biography, but they're at least using a domain they are an expert in. Your average MLE has no profound truths about kubernetes or the app. At best, they're using some toy "known broken" app to demonstrate under what are basically ideal conditions, but part of the holdout set should be new outages in your app.
3. SREs themselves are not so great at causal analysis. Many junior SRE take the "it worked last time" approach, but this embeds a presumption that whatever went wrong "last time" hasn't been fixed in code. Your typical senior SRE takes a "what changed?" approach, which is depressingly effective (as it indicates most outages are caused by coworkers). At the highest echelons, I've seen research papers examining meta-stablity and granger causality networks, but I'm pretty sure nobody in SRE or these RCA agents can explain what they mean.
> The key insight: individual session failures look random. But when you cluster the hypotheses, failure patterns emerge.
My own insight is mostly bayesian. Typical applications have redundancy of some kind, and you can extract useful signals by separating "good" from "bad". A simple bayesian score of (100+bad)/(100+good) does a relatively good job of removing the "oh that error log always happens" signals. There's also likely a path using clickhouse level data and bayesian causal networks, but the problem is traditional bayesian networks are hand crafted by humans.
So yea, you can ask an LLM for 100 guesses and do some kind of k-means clustering on them, but you can probably do a better job doing dimensional analysis first and passing that on to the agent.
Great points, but I think there's a domain confusion here . You're describing infra/code RCA. Kelet does an AI agent Quality RCA — the agent returns a 200 OK, but gives the wrong answer.
The signal space is different. We're working with structured LLM traces + explicit quality signals (thumbs down, edits, eval scores), not distributed system logs. Much more tractable.
Your Bayesian point actually resonates — separating good from bad sessions and looking for structural differences is close to what we do. But the hypotheses aren't "100 LLM guesses + k-means." Each one is grounded in actual session data: what the user asked, what the agent did, what came back, and what the signal was.
Curious about the dimensional analysis point — are you thinking about reducing the feature space before hypothesis generation?
Wouldn’t be the first time people assumed it was porn just from the cover. Chaos;Head/Noah was mistakenly banned from Steam for this reason until there was an outcry.
About half of the DVDs and Blu-rays I get from the library skip at some point in my PS5. They're usually not visibly scratched, though usually the scratches that matter are on the top not the bottom.
I'm sorry Japan, but this is not how cats work. Cat "flight or fight" response is to run and climb a tree. They prefer to be up high, not down low in some cabinet. Feels safer for naps.
If I had to guess, niche products for niche interests have small ad budgets, but the random detergent ad buyer is happy to bid on anyone's eyeballs. You can't target ad buys that don't exist!
On the other hand, before I bought YT premium I was regularly getting ads for Chevron gas in Spanish (which I don't speak), and would be unsurprised if YT ad enshittification drove premium sales.
reply